You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

46 lines
1.7 KiB

8 months ago
  1. # python-pdf2txt
  2. `python-pdf2txt` is a Dockerized Python application that converts PDF files to editable Word documents. It uses Flask to create a web service that accepts PDF uploads via HTTP requests and returns the converted DOCX files, leveraging OCR technology for text extraction.
  3. ## Features
  4. - **PDF to Word Conversion**: Efficiently converts PDF documents into Word format using OCR.
  5. - **Dockerized Application**: Easy deployment and consistency across different environments.
  6. - **REST API Interface**: Simple and accessible REST API for uploading PDFs and receiving Word documents.
  7. ## Getting Started
  8. ### Step 1: Clone the Repository
  9. Start by cloning this repository to your local machine:
  10. ```bash
  11. git clone https://github.com/your-username/python-pdf2txt.git
  12. cd python-pdf2txt
  13. ```
  14. ### Step 2: Build and Run the Docker Container
  15. Build and deploy the application using Docker Compose:
  16. ```bash
  17. docker-compose up --build -d
  18. ```
  19. This command will build the Docker image and run the container in detached mode, making the application accessible on localhost:4000.
  20. Step 3: Convert a PDF to Word
  21. To convert a PDF file to a Word document, execute the following curl command:
  22. ```bash
  23. curl -X POST -F "file=@path_to_your_pdf_file.pdf" http://localhost:4000/upload-pdf --output converted.docx
  24. ```
  25. Ensure you replace path_to_your_pdf_file.pdf with the actual path to the PDF file you wish to convert. The resulting Word document will be saved as converted.docx.
  26. ### Step 4: View Application Logs
  27. Monitor the application's operations in real-time by viewing the logs:
  28. ```bash
  29. tail -f ./logs/*
  30. ```
  31. This command will display ongoing log outputs, providing insights into the application's processing activities.