How to Enrich Target Accounts with AI

GuidesBuild an applicationHow to Enrich Target Accounts with AI

How to Enrich Target Accounts with AI

A guide by

•

Table of ContentsIntroduction Getting Started Deploying to Production Using the Company Enrichment CSV Analyzer Customizing the Application (Optional)Conclusion

Introduction #

Revenue Operations and Marketing spend tens of thousands of dollars a year on data enrichment and the process is slow. This guide will show you how you can launch an app that can enrich data on any list of target accounts and companies, 100x faster and 1,000x cheaper.

Using Replit and OpenAI’s API, you’ll create a tool that allows users to upload a CSV file containing company website domains and enrich it with custom information extracted from those companies’ websites, using the power of large language models.

Example use cases:

Check the company's website to see if they have a blog
Infer what industry the company is in based on their website
Understand what primary product or service is offered by the company
Analyze language used by competitors

The web app can be invaluable for market research, lead generation, and competitive analysis tasks. The possibilities are endless!

By the end of this guide, you'll have a web application that can:

Accept CSV file uploads containing company domains
Scrape the websites of these companies
Use OpenAI to generate enriched information based on the scraped content
Add the enriched data back into your CSV!

Note: You will need a Replit Core or Teams account to deploy the web application.

Getting Started #

To access the code for this project, for the code for the template by heading to this link and clicking green Fork button to bring the starter code into your account.

Use the template

Set Up Your OpenAI API Account

Before we begin with the Replit project, you'll need to set up your OpenAI API account. You can follow the steps here: OpenAI Developer Quickstart. To summarize:

Create an OpenAI Account:

Visit the OpenAI website and sign up for an account if you don't already have one.

Sign In to Your Account:

Once you have an account, log in using your credentials.

Navigate to API Keys:

After logging in, go to the OpenAI Platform.
Click on the dashboard button on the top bar.
Then select “API keys” from the options on the left side of the screen.\

Create a New API Key:

On the API Keys page, you'll see an option to create a new API key. Click on "Create new secret key."
The system will generate a new API key for you. Be sure to copy and securely store this key, as it will only be shown to you once.

Secure Your API Key:

Treat your API key like a password. Do not share it publicly or hard-code it into your applications. You’ll use Replit’s Secrets tool to securely manage it.
Copy the secret key and add it to Replit's Secrets tab as OPENAI_API_KEY.

Set Up Your Replit Project

Fork the template linked at the start of this Guide or by following this link
In your forked Repl, to keep your OpenAI API key secret safe, add it to the Secrets tab in your Repl. In the bottom-left corner of the Replit workspace, there is a section called "Tools." Select "Secrets" within the Tools pane, and add your OpenAI API key to the Secret labeled OPENAI_API_KEY.

Running Your Application

Click the "Run" button in your Repl to start the Flask application.
You should see a "Serving Flask app 'main'" message and other debugging logs in the Console.
Click on “New tab” within the Webview tab to open the application in a new browser window.

Deploying to Production #

Setting Up Replit Deployment

In order to use your company enrichment tool going forward, you’ll want to deploy it on Replit (as a reminder, deploying an application requires a Replit Core or Replit Teams subscription).

Open a new tab in the Replit Workspace and search for “Deployments” (or find “Deployments in the Tools section near the bottom of the left-hand pane).

In your Repl, open the Deploy tab. Choose “Autoscale” as your deployment type.

Keep the default deployment settings. They should be more than enough resources for now and you can always come back and edit the settings later.

Click the blue Deploy button to start your Autoscale deployment.

Let it run until it is complete and you get a new, production URL.

Your company enrichment web app is now deployed and ready to handle your prompts!

Using the Company Enrichment CSV Analyzer #

To use your deployed company enrichment CSV analyzer:

Open the application URL in a web browser.

Upload a CSV file containing a list of company domains
Enter the name of the column header in your CSV that contains the company domains.
Enter the name that you want the new column with enriched data to be called
Enter a prompt for the information you want to extract about each company, i.e.:
1. “Fill in the primary product or service this company provides”
2. “Infer what industry this company is in”
“If available, fill in the state the company is headquartered in. Format the state as its 2-letter code, for example use ‘OR’ for Oregon.”
Click the "Analyze Company Data" button.
Wait for the processing to complete. The enriched results will be displayed on the page.
To download your CSV with the new enriched data column, click the green “Download results” button

Customizing the Application (Optional) #

By now you should have a working web application. If you’d like to understand what’s happening in the code and potential ways to extend or remix this template, the following sections cover that. Feel free to skip to the Conclusion if you are happy with the application as is.

Understanding The Code

The main components of the code are:

main.py: The main Flask application file
templates/index.html: The HTML template for the web interface

Let's break down the key parts of main.py.

Importing Dependencies

import asyncio
import csv
import io
import os

import aiohttp
import markdown
from bs4 import BeautifulSoup
from flask import Flask, Response, flash, jsonify, render_template, request, session
from openai import AsyncOpenAI
from werkzeug.utils import secure_filename

from flask_session import Session

This section imports all necessary libraries, including Flask for the web application, BeautifulSoup for web scraping, and the aiohttp library for making HTTP requests, and the async OpenAI API.

Flask App Configuration

client = AsyncOpenAI()

app = Flask(__name__)
# Limit upload size to 16MB
app.config["MAX_CONTENT_LENGTH"] = 16 * 1024 * 1024
app.config["SESSION_TYPE"] = "filesystem"
app.secret_key = os.urandom(24)
Session(app)

Here, we configure the Flask app, setting up the upload folder, secret key for session management, and maximum file size for uploads (16MB). We also initialize the OpenAI client.

Web Scraping Function

async def scrape_all_domains(domains):

    async def scrape_domain(domain, session):

        async def get_page_content(url):
            try:
                if not (url.startswith("https://")
                        or url.startswith("http://")):
                    url = "https://" + url
                async with session.get(url, timeout=5) as response:
                    if response.status == 200:
                        text = await response.text()
                        soup = BeautifulSoup(text, "html.parser")
                        for script in soup(["script", "style"]):
                            script.decompose()
                        text = soup.get_text()
                        lines = (line.strip() for line in text.splitlines())
                        chunks = (phrase.strip() for line in lines
                                  for phrase in line.split("  "))
                        text = " ".join(chunk for chunk in chunks if chunk)
                        return text[:1000]
                    else:
                        return f"Failed to access the website. Status code: {response.status}"
            except Exception as e:
                return f"Error occurred while scraping the page: {str(e)}"

        home_content = await get_page_content(f"https://{domain}")
        about_content = await get_page_content(f"https://{domain}/about")

        # Check if about page failed
        if about_content.startswith("Failed to access"):
            return f"Home page content:\n{home_content}"
        else:
            return f"Home page content:\n{home_content}\n\nAbout page content:\n{about_content}"

    async with aiohttp.ClientSession() as session:
        tasks = [scrape_domain(domain, session) for domain in domains]
        return await asyncio.gather(*tasks)

This function takes a url, scrapes the homepage and about page, and returns the text content included on those pages. We also use asyncio in order to speed up the time spent scraping by allowing for multiple sites to be scraped at once.

CSV Processing and OpenAI API Integration:

async def process_csv_with_llm(file_content, prompt, domain_column,
                               response_column):
    results = []
    output = io.StringIO()
    csv_writer = csv.writer(output)

    domains = []
    rows = []

    # Read CSV and collect domains
    csv_file = io.StringIO(file_content)
    csv_reader = csv.reader(csv_file)
    csv_headers = next(csv_reader)
    csv_writer.writerow(csv_headers + [response_column])
    index = csv_headers.index(domain_column)
    for row in csv_reader:
        domain = row[index]
        domains.append(domain)
        rows.append(row)

    # Scrape all domains in parallel
    scraped_data = await scrape_all_domains(domains)

    url = "https://api.openai.com/v1/chat/completions"
    headers = {
        "accept": "application/json",
        "content-type": "application/json",
        "authorization": "Bearer " + os.environ.get("OPENAI_API_KEY", ""),
    }

    # Make GPT calls for each domain asynchronously
    async def process_gpt(row, domain, scraped_content):
        result = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role":
                "system",
                "content":
                "Be concise. Give 1 sentence responses in plain text. Respond with factual information from the content given. Do not use first-person pronouns."
            }, {
                "role":
                "user",
                "content":
                f"Here is information about {domain}:\n{scraped_content}\n\nRespond to the following query in 1 sentence: {prompt}"
            }])
        result = result.choices[0].message.content.strip()

        return {"row": row, "domain": domain, "result": result}

    # Process all domains concurrently
    tasks = [
        process_gpt(row, domain, content)
        for row, domain, content in zip(rows, domains, scraped_data)
    ]
    results = await asyncio.gather(*tasks)
    # Now, write to CSV synchronously
    output = io.StringIO()
    csv_writer = csv.writer(output)
    csv_writer.writerow(csv_headers + [response_column])

    formatted_results = []
    for item in results:
        row = item['row'] + [item['result']]
        csv_writer.writerow(row)
        formatted_results.append({
            "domain": item['domain'],
            "result": markdown.markdown(item['result'])
        })

    return formatted_results, output.getvalue()

This function reads the CSV file, processes each domain, scrapes the website content, and sends it to the OpenAI API for enrichment. It returns the enriched results for each domain.

File Upload and Processing Route:

@app.route("/upload", methods=["POST"])
async def upload_file():
    session.clear()

    # Check if the post request has the file part
    if "csv_file" not in request.files:
        return jsonify({"error": "No file part in the request."}), 400
    file = request.files["csv_file"]
    # Check if a file was selected
    if not file.filename:
        return jsonify({"error": "No file selected for uploading."}), 400
    # Check if the file is a CSV file
    if not file.filename.endswith(".csv"):
        return jsonify(
            {"error": "Invalid file type. Please upload a CSV file."}), 400
    # Check that the file contains CSV content
    prompt = request.form.get("prompt", "")
    if not prompt:
        return jsonify({"error": "No prompt provided."}), 400

    domain_column = request.form.get("column_name", "Website")
    response_column = request.form.get("response_column", "LLM Response")

    try:
        file_content = file.read().decode('utf-8')
        session['uploaded_file'] = file_content
        session['file_name'] = secure_filename(file.filename)

        # Process the CSV file with AI
        results, csv_content = await process_csv_with_llm(
            file_content, prompt, domain_column, response_column)

        # Store the CSV content in the session
        session["csv_content"] = csv_content

        return jsonify(results)
    except Exception as e:
        return jsonify({"error": str(e)}), 500

This route handles file uploads, validates the CSV file, processes it using the OpenAI API, and returns the enriched results.

File Download Route:

@app.route("/download_csv", methods=["GET"])
def download_csv():
    csv_content = session.get("csv_content", "")
    if not csv_content:
        return jsonify({"error": "No CSV content available"}), 400
    output = io.StringIO(csv_content)
    return Response(
        output.getvalue(),
        mimetype="text/csv",
        headers={"Content-disposition": "attachment; filename=results.csv"},
    )

This route handles downloading the modified CSV file. It checks the session data for the new CSV content and downloads it to results.csv
Main Application Entry Point:

if __name__ == "__main__":

    @app.errorhandler(413)
    def request_entity_too_large(_error):
        flash("File too large. Please upload a file smaller than 16MB.",
              "error")
        return render_template("index.html"), 413

    app.run(host="0.0.0.0", port=5000, debug=False)

This section ensures the upload folder exists and starts the Flask development server on port 5000.

Understanding these components will help you navigate and customize the application to suit your specific needs. For example, you could modify the scrape_all_domains() function to target different pages or extract specific information from the websites. Similarly, you could adjust the OpenAI API prompt in process_csv_with_llm() to get different types of enriched information about the companies.

Remixing or Extending the Application

Using a different LLM

The web app could just as easily use a different LLM, such as Anthropic’s Claude Sonnet 3.5, or Google’s Gemini. While the instructions to convert the app to working with their APIs is beyond the scope of this guide, anyone should be able to figure out how to do it with relative ease.

HINT: use Replit’s built in AI Chat feature to ask the AI how to convert the app to working with your preferred LLM.

Adding additional functionality

The app could easily be extended to add additional functionality. Some ideas here might include:

Adding a column to have the AI express its confidence in its results, i.e. if you prompted the AI to grab the state that the companies in your list are headquartered in, you could have it add a column next to the state column that shows, on a percentage basis, how confident it was that the state it input was accurate
Convert the app to analyze people (leads or contacts) instead of companies
A more advanced user could create an integration between this app and a CRM like Salesforce or HubSpot, and have the analysis trigger in near real-time upon company/account creation

We’re sure there’s 1000 more ideas you’ll have! Feel free to take our template and run with it!

Conclusion #

You now have a powerful target account enrichment tool that can automatically gather and analyze information about multiple companies using their website content. This tool can significantly speed up research and analysis tasks for various business applications.

Some potential use cases for this bot include:

Generating company profiles for lead qualification
Analyzing competitors' product offerings
Identifying industry trends across multiple company websites

Feel free to further customize and expand the functionality of your app to suit your specific needs. Happy enriching!