|
|
@ -1,2 +1,46 @@ |
|
|
|
# python-pdf2txt |
|
|
|
|
|
|
|
`python-pdf2txt` is a Dockerized Python application that converts PDF files to editable Word documents. It uses Flask to create a web service that accepts PDF uploads via HTTP requests and returns the converted DOCX files, leveraging OCR technology for text extraction. |
|
|
|
|
|
|
|
## Features |
|
|
|
|
|
|
|
- **PDF to Word Conversion**: Efficiently converts PDF documents into Word format using OCR. |
|
|
|
- **Dockerized Application**: Easy deployment and consistency across different environments. |
|
|
|
- **REST API Interface**: Simple and accessible REST API for uploading PDFs and receiving Word documents. |
|
|
|
|
|
|
|
## Getting Started |
|
|
|
|
|
|
|
### Step 1: Clone the Repository |
|
|
|
|
|
|
|
Start by cloning this repository to your local machine: |
|
|
|
|
|
|
|
```bash |
|
|
|
git clone https://github.com/your-username/python-pdf2txt.git |
|
|
|
cd python-pdf2txt |
|
|
|
``` |
|
|
|
|
|
|
|
### Step 2: Build and Run the Docker Container |
|
|
|
|
|
|
|
Build and deploy the application using Docker Compose: |
|
|
|
|
|
|
|
```bash |
|
|
|
docker-compose up --build -d |
|
|
|
``` |
|
|
|
This command will build the Docker image and run the container in detached mode, making the application accessible on localhost:4000. |
|
|
|
Step 3: Convert a PDF to Word |
|
|
|
|
|
|
|
To convert a PDF file to a Word document, execute the following curl command: |
|
|
|
|
|
|
|
```bash |
|
|
|
curl -X POST -F "file=@path_to_your_pdf_file.pdf" http://localhost:4000/upload-pdf --output converted.docx |
|
|
|
``` |
|
|
|
Ensure you replace path_to_your_pdf_file.pdf with the actual path to the PDF file you wish to convert. The resulting Word document will be saved as converted.docx. |
|
|
|
|
|
|
|
### Step 4: View Application Logs |
|
|
|
|
|
|
|
Monitor the application's operations in real-time by viewing the logs: |
|
|
|
|
|
|
|
```bash |
|
|
|
tail -f ./logs/* |
|
|
|
``` |
|
|
|
This command will display ongoing log outputs, providing insights into the application's processing activities. |