diff --git a/README.md b/README.md index 64c8d49..c58ee3b 100644 --- a/README.md +++ b/README.md @@ -1,18 +1,18 @@ # python-pdf2txt -`python-pdf2txt` is a Dockerized Python application that converts PDF files to editable Word documents. It uses Flask to create a web service that accepts PDF uploads via HTTP requests and returns the converted DOCX files, leveraging OCR technology for text extraction. +`python-pdf2txt` is a Dockerized Python application designed to convert PDF files into editable Word documents. The application utilizes Flask to expose a web service that handles PDF file uploads through HTTP requests and returns the converted DOCX files using OCR technology. ## Features -- **PDF to Word Conversion**: Efficiently converts PDF documents into Word format using OCR. -- **Dockerized Application**: Easy deployment and consistency across different environments. -- **REST API Interface**: Simple and accessible REST API for uploading PDFs and receiving Word documents. +- **PDF to Word Conversion**: Transforms PDF documents into DOCX format using advanced OCR capabilities. +- **Dockerized Application**: Facilitates easy deployment and consistent performance across various environments. +- **REST API**: Simple API for straightforward integration, supporting PDF uploads and DOCX retrievals. ## Getting Started ### Step 1: Clone the Repository -Start by cloning this repository to your local machine: +Clone the repository to your local machine to get started: ```bash git clone https://github.com/your-username/python-pdf2txt.git @@ -21,26 +21,34 @@ cd python-pdf2txt ### Step 2: Build and Run the Docker Container -Build and deploy the application using Docker Compose: +Use Docker Compose to build and run your container: ```bash docker-compose up --build -d ``` -This command will build the Docker image and run the container in detached mode, making the application accessible on localhost:4000. -Step 3: Convert a PDF to Word -To convert a PDF file to a Word document, execute the following curl command: +This command constructs the Docker image if it hasn't been built previously and runs the container in detached mode. The service will be available at localhost on port 4000. + +### Step 3: Convert a PDF to Word + +Convert a PDF to a Word document by executing the following curl command: ```bash curl -X POST -F "file=@path_to_your_pdf_file.pdf" http://localhost:4000/upload-pdf --output converted.docx ``` -Ensure you replace path_to_your_pdf_file.pdf with the actual path to the PDF file you wish to convert. The resulting Word document will be saved as converted.docx. + +Make sure to replace `path_to_your_pdf_file.pdf` with the actual path to the PDF you intend to convert. The output will be saved as `converted.docx`. ### Step 4: View Application Logs -Monitor the application's operations in real-time by viewing the logs: +To track the application's processes in real-time, you can view the logs: ```bash tail -f ./logs/* ``` -This command will display ongoing log outputs, providing insights into the application's processing activities. + +This command tails the log files, offering a live view into the application’s operational logs. + +## Additional Configuration + +The application uses environment variables for additional configurations, such as the Tesseract data prefix, which can be adjusted in the `docker-compose.yml` file to suit your setup needs.