Back to all Bounties
Earn 3,600 ($36.00)
due 2 years ago
Canceled
A script to convert pdfs to text
cpdwk8kh2w
Details
Applications
6
Discussion
Bounty Description
Problem Description
I have a directory containing hundreds of PDF files. I want to convert them to text files using Google's Cloud Vision API. I would like to have a script that does the following:
- Take two directory names as input - input-dir, output-dir
- Crawls the input-dir directory to read the pdf files
- Converts the pdf files into the same structure in the output-dir.
- It should use google json auth through the standard GOOGLE_APPLICATION_CREDENTIALS environment variable.
Acceptance Criteria
the script uses python 3.10+
Any libraries must be specified in a requirements.txt file.
It should create a one-to-one mapping between the pdf file and the text file using Google's Cloud Vision API.