How to Build a Legal Contract Database and AI Analyzer
What you'll build: A contract database, where you can upload a contract PDF and an LLM creates a sheet of high level terms (name, term, size) and allows for additional fields of info to be added in the future.
Introduction #
This guide will walk you through the process of creating a Python-based web application that analyzes legal contracts to extract key information, such as names, termination dates, and contract size.
Tuning the prompt to the OpenAI model can extract other fields from the uploaded contracts, it can be customized to fit your particular needs.
This guide presumes you have an OpenAI API Key available. If you don’t, you can sign up for an account and get one here: https://platform.openai.com/api-keys
Getting Started #
Step 1: Initializing the Flask Application
In a new Repl, open main.py and start by importing the necessary libraries and setting up the Flask application. Flask is a lightweight web server that will serve as the backbone of our web application.
We've also imported the required modules for handling file uploads and PDF processing, which we’ll need for later.
Step 2: Defining the OpenAI Role and Prompt
Next, we define the role and prompt for the OpenAI API. This role instructs the AI to act as a legal expert and to extract specific information from contracts:
This step also sets the expectation of the output format, if you need to make tweaks.
Step 3: Configuring Flask Settings
In this step, we configure Flask to handle file uploads and set a maximum file size limit.
Step 4: Setting Up the OpenAI API Key
To interact with the OpenAI API, you'll need an API key. You can use the “Secrets” tool inside your Repl to make it available to your application.
The secret should be called OPENAI_API_KEY
Step 5: Creating the Index Route
We define a route for the home page where users can upload their PDF contracts.
You’ll also need some HTML to put into that index.html:
You’ll notice that we’re using {% if gpt_response %} to interpolate our responses from ChatGPT, since this file gets evaluated using the jinja2 templating engine: https://jinja.palletsprojects.com/en/3.1.x/
Step 6: Handling File Uploads
The HTML we’ve entered so far also expects to be able to upload a file to the backend for processing. We’ll need another route for that, so let’s add it here:
This step ensures that only valid PDF files are processed.
Step 7: Extracting Text from the PDF
With the PDF saved, we use PyPDF2 to text from the file. Add this to the end of the upload_file function:
Once we’ve got the text, we can use the openai python library to process it
Step 8: Preparing the OpenAI API Request
Since we set the ROLE and PROMPT variables at the top of main.py, we can use them here to set the intent and desired response from ChatGPT:
Step 9: Calling the OpenAI API
Once we’ve prepared the messages we’re sending off, we can fire off a request to OpenAI, destructure the result, and display it back to the user:
That’s the end of the upload_file() function!
Step 10: Running the Application
Finally, need to set our code up to run on a particular port. Add this at the end of main.py:
Once your main.py is finished, simply hit the "Run" button in your Repl and watch your web server come to life!
Conclusion #
- By following these steps, you now have a fully functional web application that can extract critical information from legal contracts. This tool can be invaluable for quickly identifying key details in lengthy documents.
- Consider modifying the above template to extract additional details about your contract and save to a database or spreadsheet for future access and review.
- Give it a try and see how it can streamline your legal document analysis process!