You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Mam Thenebo bd510d0ec8 docs: Update README.md with improved descriptions and instructions for the python-pdf2txt application 8 months ago
logs feat: Add Dockerfile, app.py, docker-compose.yml, pdf_to_word.py, and requirements.txt- Add Dockerfile for setting up a Python environment- Add app.py for a Flask application to upload PDF files- Add docker-compose.yml for containerizing the Flask app- Add pdf_to_word.py for converting PDF to Word using pytesseract- Add requirements.txt with necessary dependencies 8 months ago
tessdata feat: Add Dockerfile, app.py, docker-compose.yml, pdf_to_word.py, and requirements.txt- Add Dockerfile for setting up a Python environment- Add app.py for a Flask application to upload PDF files- Add docker-compose.yml for containerizing the Flask app- Add pdf_to_word.py for converting PDF to Word using pytesseract- Add requirements.txt with necessary dependencies 8 months ago
uploads feat: Add new sample_input.pdf file and remove np11.docx files from outputs folder 8 months ago
.DS_Store feat: Add new sample_input.pdf file and remove np11.docx files from outputs folder 8 months ago
Dockerfile feat: Add Dockerfile, app.py, docker-compose.yml, pdf_to_word.py, and requirements.txt- Add Dockerfile for setting up a Python environment- Add app.py for a Flask application to upload PDF files- Add docker-compose.yml for containerizing the Flask app- Add pdf_to_word.py for converting PDF to Word using pytesseract- Add requirements.txt with necessary dependencies 8 months ago
LICENSE Initial commit 8 months ago
README.md docs: Update README.md with improved descriptions and instructions for the python-pdf2txt application 8 months ago
app.py feat: Add Dockerfile, app.py, docker-compose.yml, pdf_to_word.py, and requirements.txt- Add Dockerfile for setting up a Python environment- Add app.py for a Flask application to upload PDF files- Add docker-compose.yml for containerizing the Flask app- Add pdf_to_word.py for converting PDF to Word using pytesseract- Add requirements.txt with necessary dependencies 8 months ago
docker-compose.yml feat: Add Dockerfile, app.py, docker-compose.yml, pdf_to_word.py, and requirements.txt- Add Dockerfile for setting up a Python environment- Add app.py for a Flask application to upload PDF files- Add docker-compose.yml for containerizing the Flask app- Add pdf_to_word.py for converting PDF to Word using pytesseract- Add requirements.txt with necessary dependencies 8 months ago
pdf_to_word.py feat: Add Dockerfile, app.py, docker-compose.yml, pdf_to_word.py, and requirements.txt- Add Dockerfile for setting up a Python environment- Add app.py for a Flask application to upload PDF files- Add docker-compose.yml for containerizing the Flask app- Add pdf_to_word.py for converting PDF to Word using pytesseract- Add requirements.txt with necessary dependencies 8 months ago
requirements.txt feat: Add Dockerfile, app.py, docker-compose.yml, pdf_to_word.py, and requirements.txt- Add Dockerfile for setting up a Python environment- Add app.py for a Flask application to upload PDF files- Add docker-compose.yml for containerizing the Flask app- Add pdf_to_word.py for converting PDF to Word using pytesseract- Add requirements.txt with necessary dependencies 8 months ago

README.md

python-pdf2txt

python-pdf2txt is a Dockerized Python application designed to convert PDF files into editable Word documents. The application utilizes Flask to expose a web service that handles PDF file uploads through HTTP requests and returns the converted DOCX files using OCR technology.

Features

  • PDF to Word Conversion: Transforms PDF documents into DOCX format using advanced OCR capabilities.
  • Dockerized Application: Facilitates easy deployment and consistent performance across various environments.
  • REST API: Simple API for straightforward integration, supporting PDF uploads and DOCX retrievals.

Getting Started

Step 1: Clone the Repository

Clone the repository to your local machine to get started:

git clone https://github.com/your-username/python-pdf2txt.git
cd python-pdf2txt

Step 2: Build and Run the Docker Container

Use Docker Compose to build and run your container:

docker-compose up --build -d

This command constructs the Docker image if it hasn't been built previously and runs the container in detached mode. The service will be available at localhost on port 4000.

Step 3: Convert a PDF to Word

Convert a PDF to a Word document by executing the following curl command:

curl -X POST -F "file=@path_to_your_pdf_file.pdf" http://localhost:4000/upload-pdf --output converted.docx

Make sure to replace path_to_your_pdf_file.pdf with the actual path to the PDF you intend to convert. The output will be saved as converted.docx.

Step 4: View Application Logs

To track the application's processes in real-time, you can view the logs:

tail -f ./logs/*

This command tails the log files, offering a live view into the application’s operational logs.

Additional Configuration

The application uses environment variables for additional configurations, such as the Tesseract data prefix, which can be adjusted in the docker-compose.yml file to suit your setup needs.