Skip to content
    Back to all Bounties

    Earn 3,600 ($36.00)

    Time Remainingdue 2 years ago
    Canceled

    A script to convert pdfs to text

    cpdwk8kh2w
    cpdwk8kh2w
    Posted 2 years ago

    Bounty Description

    Problem Description

    I have a directory containing hundreds of PDF files. I want to convert them to text files using Google's Cloud Vision API. I would like to have a script that does the following:

    1. Take two directory names as input - input-dir, output-dir
    2. Crawls the input-dir directory to read the pdf files
    3. Converts the pdf files into the same structure in the output-dir.
    4. It should use google json auth through the standard GOOGLE_APPLICATION_CREDENTIALS environment variable.

    Acceptance Criteria

    the script uses python 3.10+
    Any libraries must be specified in a requirements.txt file.
    It should create a one-to-one mapping between the pdf file and the text file using Google's Cloud Vision API.