Browse Source

docs: Add detailed documentation for the Python application that converts PDF files to Word documents using Flask and OCR technology

main
Mam Thenebo 10 months ago
parent
commit
ac86ca5b76
  1. 44
      README.md

44
README.md

@ -1,2 +1,46 @@
# python-pdf2txt
`python-pdf2txt` is a Dockerized Python application that converts PDF files to editable Word documents. It uses Flask to create a web service that accepts PDF uploads via HTTP requests and returns the converted DOCX files, leveraging OCR technology for text extraction.
## Features
- **PDF to Word Conversion**: Efficiently converts PDF documents into Word format using OCR.
- **Dockerized Application**: Easy deployment and consistency across different environments.
- **REST API Interface**: Simple and accessible REST API for uploading PDFs and receiving Word documents.
## Getting Started
### Step 1: Clone the Repository
Start by cloning this repository to your local machine:
```bash
git clone https://github.com/your-username/python-pdf2txt.git
cd python-pdf2txt
```
### Step 2: Build and Run the Docker Container
Build and deploy the application using Docker Compose:
```bash
docker-compose up --build -d
```
This command will build the Docker image and run the container in detached mode, making the application accessible on localhost:4000.
Step 3: Convert a PDF to Word
To convert a PDF file to a Word document, execute the following curl command:
```bash
curl -X POST -F "file=@path_to_your_pdf_file.pdf" http://localhost:4000/upload-pdf --output converted.docx
```
Ensure you replace path_to_your_pdf_file.pdf with the actual path to the PDF file you wish to convert. The resulting Word document will be saved as converted.docx.
### Step 4: View Application Logs
Monitor the application's operations in real-time by viewing the logs:
```bash
tail -f ./logs/*
```
This command will display ongoing log outputs, providing insights into the application's processing activities.
Loading…
Cancel
Save