Skip to content
    Back to all Bounties

    Earn 324,000 ($3,240.00)

    Time Remainingdue 9 months ago
    Completed

    Bulk File Upload, Data Extraction, and Automated Form Population

    aaron229
    aaron229
    Posted 10 months ago
    This Bounty has been completed!

    Bounty Description

    Problem Description

    Jupyter Notebook, Google Colab, or Visual Studio file upload system that can handle multiple file types and extract data from them. The system should be able to process PDF, XLSX, and CAD files from Microsoft Dynamics, extract relevant information, and store it in a SQL database. With the raw data, new forms will need to be repopulated. At least one form will need to be passed back to CAD/Microsoft Dynamics. This system will serve as the foundation for an automated quoting and document generation process.

    Acceptance Criteria

    File Upload and Processing:

    The system can accept uploads of PDF, XLSX, and CAD files from Microsoft Dynamics.
    Implement functionality to handle bulk uploads of multiple files simultaneously.
    Provide feedback on upload status and any errors encountered.

    Data Extraction:

    Successfully extract relevant information from each file type:

    PDF: Extract text content and structured data (e.g., tables, forms).
    XLSX: Extract data from specified sheets and columns.
    CAD files: Extract metadata and key measurements.

    Implement error handling for cases where data extraction fails or is incomplete.

    Database Storage:

    Design and implement a SQL database schema suitable for storing the extracted data.
    Successfully store all extracted data in the SQL database.
    Implement data validation to ensure integrity of stored information.

    Form Repopulation:

    Create functionality to generate at least three new forms using the extracted data.
    Demonstrate accurate population of these forms with the extracted information.
    Implement error handling for cases where required data is missing.

    Integration with CAD/Microsoft Dynamics:

    Develop functionality to generate at least one form that can be passed back to CAD/Microsoft Dynamics.
    Ensure the generated form is compatible with CAD/Microsoft Dynamics file formats.
    Implement a method to transfer this form back to the CAD/Microsoft Dynamics system.

    User Interface:

    Create a user-friendly interface in Jupyter Notebook, Google Colab, or Visual Studio.
    Provide clear instructions and prompts for file upload and processing.
    Display results of data extraction and form generation clearly to the user.

    Documentation:

    Provide comprehensive documentation on how to use the system.
    Include technical documentation explaining the data extraction methods and database schema.

    Testing:

    Include a set of unit tests covering critical functions of the system.
    Provide a sample set of test files (one for each supported file type) and expected results.

    Performance:

    The system should process and store data from at least 10 files within 5 minutes.
    Handle files up to 50MB in size without crashing or significant performance degradation.

    Security:

    Implement basic security measures to protect uploaded files and extracted data.
    Ensure that any integration with external systems (e.g., CAD/Microsoft Dynamics) is secure.

    Bonus Points:

    Implement a feature to handle corrupted or partially damaged files.
    Create a dashboard to visualize the extracted data and form generation status.
    Develop a method to automatically identify and categorize different types of forms or data structures within the uploaded files.