Find Duplicate HubSpot Deals with AI

GuidesQuickstart guidesFind Duplicate HubSpot Deals with AI

Find Duplicate HubSpot Deals with AI

A guide by

"HubSpot Deal Duplicates" title with Replit, HubSpot, and OpenAI logos in bottom-right corner

Table of ContentsIntroduction Getting started Create a Private app in HubSpot Set up OpenAI API access Running your duplicate finder What's next

Introduction #

Identifying duplicate deals in your HubSpot CRM can be crucial for maintaining clean data and accurate reporting. This template uses the HubSpot API to fetch deals, analyzes them for potential duplicates using OpenAI, and exports the results to a CSV file.

Using our template, you can have a HubSpot Deal duplicate finder running and exporting data to a CSV file in less than 5 minutes.

Getting started #

To get started, head to the template below and click "Use template":

Use the Template

Create a Private app in HubSpot #

HubSpot Private apps creation UI - 2024-06-28

Click "Create a private app” and set a name for your app.

HubSpot Private app naming UI - 2024-06-24

Set the necessary scopes (you'll need access to deals, contacts, companies, and owners).

Once your scopes are set, click Create app. You'll receive a private app access token.

To keep this secret safe, add it to the Secrets tab in your Repl. In the bottom-left corner of the Replit workspace, there is a section called "Tools."

Select "Secrets" within the Tools pane, and add your Private App access token to the Secret labeled HUBSPOT_ACCESS_TOKEN.

Replit Secrets UI adding a HUBSPOT_ACCESS_TOKEN

Set up OpenAI API access #

Log in to your OpenAI developer portal and navigate to API keys. Click on the button Create new secret key in the top right corner and then again in the modal after you name your key.

Copy the secret key to your clipboard. Again, to keep this secret safe, let’s also add it to the Secrets tab in your Repl. Head back to the bottom-left corner of the Replit workspace, to "Tools."

Select "Secrets" within the Tools pane, and add your API key to the Secret labeled OPENAI_API_KEY.

Your secrets tab should look like this after adding both your HubSpot Private App key and your OpenAI API key.

Running your duplicate finder #

Once you have your application configured and tokens set, you can hit Run (⌘ + Enter) in Replit to execute main.py. The script will:

1. Prompt you to enter the number of deals to fetch from HubSpot

2. Fetch and process the deals

3. Identify potential duplicates using OpenAI's GPT model

4. Export the results to a CSV file

Here's a breakdown of the main components:

Fetching deals from HubSpot

The fetch_deals() function in hubspot_client.py uses the HubSpot API to retrieve deals:

async def fetch_deals(limit):
  all_deals = []
  after = None
  
  while len(all_deals) < limit:
    try:
      batch_limit = min(100, limit - len(all_deals))
      
      search_request = {
        "sorts": [
          {
            "propertyName": "createdate",
            "direction": "DESCENDING"
          }
        ],
        "properties": ["dealname", "amount", "closedate", "createdate", "dealstage", "hubspot_owner_id"],
        "limit": batch_limit
      }
      
      if after:
        search_request["after"] = after
      
      deals_page = await api_call(
        hubspot_client.crm.deals.search_api.do_search,
        public_object_search_request=search_request
      )
      
      all_deals.extend(deals_page.results)
      
      if not deals_page.paging:
        break
      
      after = deals_page.paging.next.after
      
    except ApiException as e:
      print(f"Exception when fetching deals: {e}\n")
      break
  
  return all_deals[:limit]

Identifying potential duplicates

The compare_deals function in duplicate_finder.py uses OpenAI's GPT model to compare deals:

async def compare_deals(deal, other_deal):
  company_name = get_company_name(deal)
  other_company_name = get_company_name(other_deal)

  # Pre-check to avoid unnecessary API calls
  if company_name != other_company_name:
    return {
      "deal": other_deal['name'],
      "company": other_company_name,
      "percentage": 0,
      "explanation": "Different companies"
    }

  if deal['close_date'] != other_deal['close_date'] and deal['amount'] != other_deal['amount']:
    return {
      "deal": other_deal['name'],
      "company": other_company_name,
      "percentage": 0,
      "explanation": "Neither close date nor amount match"
    }

  prompt = f"""
  Deal 1:
  - Name: {deal['name']}
  - Company: {company_name}
  - Close Date: {deal['close_date']}
  - Amount: {deal['amount']}
  - Owner: {deal['owner_email']}

  Deal 2:
  - Name: {other_deal['name']}
  - Company: {other_company_name}
  - Close Date: {other_deal['close_date']}
  - Amount: {other_deal['amount']}
  - Owner: {other_deal['owner_email']}

  Are these deals potential duplicates?

  A deal is considered a duplicate if and only if ALL of these conditions are true:
  1. The company names are exactly the same AND
  2. EITHER the close dates are exactly the same OR the amounts are exactly the same

  If these conditions are not ALL met, the deals are not duplicates.

  Respond ONLY in the following format - no other format is a valid response from you:
  Percentage: [100 if all conditions are met, 0 if not]
  Explanation: [Brief explanation of why it is or isn't a duplicate based on the specific criteria]
"""
  max_retries = 5
  for attempt in range(max_retries):
    try:
      await rate_limit()
      completion = await client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
          {
            "role": "system",
            "content": """
            You are a strict duplicate deal identifier. You MUST follow the given criteria exactly.
            Always respond in the exact format specified.
            """
          },
          {
            "role": "user", 
            "content": prompt
          }
        ]
      )

      result = completion.choices[0].message.content.strip()
      percentage, explanation = parse_response(result)

      if percentage > 50:
        print('Comparing', deal['name'], 'from', company_name, 'and', '\n', other_deal['name'], 'from', other_company_name, '\n', result, '\n')

      return {
        "deal": other_deal['name'],
        "company": other_company_name,
        "percentage": percentage,
        "explanation": explanation
      }
    except Exception as e:
      if hasattr(e, 'status_code') and e.status_code == 429:
        if attempt < max_retries - 1:
          wait_time = (2 ** attempt) + 1  # exponential backoff
          print(f"Rate limit error. Retrying in {wait_time} seconds...")
          await asyncio.sleep(wait_time)
        else:
          print(f"Max retries reached. Skipping this comparison.")
          return {
            "deal": other_deal['name'],
            "company": other_company_name,
            "percentage": 0,
            "explanation": "Error: Unable to compare due to rate limiting"
          }
      else:
        print(f"Error comparing deals: {e}")
        return {
          "deal": other_deal['name'],
          "company": other_company_name,
          "percentage": 0,
          "explanation": f"Error: {str(e)}"
        }

def get_company_name(deal):
  if deal['associatedCompanies']:
    return deal['associatedCompanies'][0]['company_name']
  return "Unknown"

def parse_response(response):
  percentage_match = re.search(r'Percentage:\s*(\d+)', response)
  explanation_match = re.search(r'Explanation:\s*(.+)', response)

  percentage = int(percentage_match.group(1)) if percentage_match else 0
  explanation = explanation_match.group(1) if explanation_match else "No explanation provided"

  return percentage, explanation

Note: This part of the script uses OpenAI which will incur a cost on your OpenAI account. If you would like to increase accuracy of the duplication detection, switch out this line for the most powerful OpenAI model:

model="gpt-3.5-turbo" or make edits to the prompt to be more specific to your goals.

Exporting results to CSV

The export_to_csv function in csv_exporter.py writes the results to a CSV file:

def export_to_csv(deals, filename="deal_duplicates.csv"):
  with open(filename, 'w', newline='') as csvfile:
    fieldnames = ['ID', 'Name', 'Company', 'Close Date', 'Amount', 'Stage', 'Owner', 'Potential %', 'Top Duplicate Deal', 'Top Duplicate Deal - Company', 'Potential Duplicates']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    
    writer.writeheader()
    for deal in deals:
      top_duplicate = max(deal['potential_duplicates'], key=lambda x: x['percentage']) if deal['potential_duplicates'] else None
      writer.writerow({
        'ID': deal['id'],
        'Name': deal['name'],
        'Company': get_company_name(deal),
        'Close Date': deal['close_date'],
        'Amount': deal['amount'],
        'Stage': deal.get('stage', 'Unknown'),
        'Owner': deal['owner_email'],
        'Potential %': deal['duplicate_percentage'],
        'Top Duplicate Deal': top_duplicate['deal'] if top_duplicate else '',
        'Top Duplicate Deal - Company': top_duplicate['company'] if top_duplicate else '',
        'Potential Duplicates': format_potential_duplicates(deal['potential_duplicates'])
      })

This script will generate a csv with the following columns:

ID
Name
Company
Close Date
Amount
Stage
Owner
Potential %
Top Duplicate Deal
Top Duplicate Deal - Company
Potential Duplicates

Tip: If you want to edit your script or the final columns that will be written to your CSV file, ask Replit AI to make some edits or suggestions.

What's next #

Now that you have a working HubSpot Deal duplicate finder, you could consider the following enhancements:

Add a web interface to trigger the duplicate finding process and display results.
Implement more sophisticated duplicate detection algorithms.
Add functionality to automatically merge or resolve duplicate deals in HubSpot.
Set up scheduled runs to regularly check for duplicates.
Integrate with other CRM systems or data sources for more comprehensive duplicate detection.
Use the exported CSV data to create visualizations or reports on duplicate deals.

If you’d like to speak with the Replit team on how you can make the best use of Replit, feel free to schedule some time here. We can walk through your use case and point out the best ways to implement them.

Happy coding!