Find Duplicate HubSpot Deals with AI

Introduction #
Identifying duplicate deals in your HubSpot CRM can be crucial for maintaining clean data and accurate reporting. This template uses the HubSpot API to fetch deals, analyzes them for potential duplicates using OpenAI, and exports the results to a CSV file.
Using our template, you can have a HubSpot Deal duplicate finder running and exporting data to a CSV file in less than 5 minutes.
Getting started #
To get started, head to the template below and click "Use template":
Create a Private app in HubSpot #
Log in to your HubSpot account and navigate to Settings > Integrations > Private Apps

Click "Create a private app” and set a name for your app.

Set the necessary scopes (you'll need access to deals, contacts, companies, and owners).

Once your scopes are set, click Create app. You'll receive a private app access token.
To keep this secret safe, add it to the Secrets tab in your Repl. In the bottom-left corner of the Replit workspace, there is a section called "Tools."
Select "Secrets" within the Tools pane, and add your Private App access token to the Secret labeled HUBSPOT_ACCESS_TOKEN.

Set up OpenAI API access #
Log in to your OpenAI developer portal and navigate to API keys. Click on the button Create new secret key in the top right corner and then again in the modal after you name your key.

Copy the secret key to your clipboard. Again, to keep this secret safe, let’s also add it to the Secrets tab in your Repl. Head back to the bottom-left corner of the Replit workspace, to "Tools."
Select "Secrets" within the Tools pane, and add your API key to the Secret labeled OPENAI_API_KEY.
Your secrets tab should look like this after adding both your HubSpot Private App key and your OpenAI API key.
Running your duplicate finder #
Once you have your application configured and tokens set, you can hit Run (⌘ + Enter) in Replit to execute main.py. The script will:
1. Prompt you to enter the number of deals to fetch from HubSpot
2. Fetch and process the deals
3. Identify potential duplicates using OpenAI's GPT model
4. Export the results to a CSV file
Here's a breakdown of the main components:
Fetching deals from HubSpot
The fetch_deals() function in hubspot_client.py uses the HubSpot API to retrieve deals:
python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
async def fetch_deals(limit):
all_deals = []
after = None
while len(all_deals) < limit:
try:
batch_limit = min(100, limit - len(all_deals))
search_request = {
"sorts": [
{
"propertyName": "createdate",
"direction": "DESCENDING"
}
],
"properties": ["dealname", "amount", "closedate", "createdate", "dealstage", "hubspot_owner_id"],
"limit": batch_limit
}
if after:
search_request["after"] = after
deals_page = await api_call(
hubspot_client.crm.deals.search_api.do_search,
public_object_search_request=search_request
)
all_deals.extend(deals_page.results)
if not deals_page.paging:
break
after = deals_page.paging.next.after
except ApiException as e:
print(f"Exception when fetching deals: {e}\n")
break
return all_deals[:limit]
Identifying potential duplicates
The compare_deals function in duplicate_finder.py uses OpenAI's GPT model to compare deals:
python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
async def compare_deals(deal, other_deal):
company_name = get_company_name(deal)
other_company_name = get_company_name(other_deal)
# Pre-check to avoid unnecessary API calls
if company_name != other_company_name:
return {
"deal": other_deal['name'],
"company": other_company_name,
"percentage": 0,
"explanation": "Different companies"
}
if deal['close_date'] != other_deal['close_date'] and deal['amount'] != other_deal['amount']:
return {
"deal": other_deal['name'],
"company": other_company_name,
"percentage": 0,
"explanation": "Neither close date nor amount match"
}
prompt = f"""
Deal 1:
- Name: {deal['name']}
- Company: {company_name}
- Close Date: {deal['close_date']}
- Amount: {deal['amount']}
- Owner: {deal['owner_email']}
Deal 2:
- Name: {other_deal['name']}
- Company: {other_company_name}
- Close Date: {other_deal['close_date']}
- Amount: {other_deal['amount']}
- Owner: {other_deal['owner_email']}
Are these deals potential duplicates?
A deal is considered a duplicate if and only if ALL of these conditions are true:
1. The company names are exactly the same AND
2. EITHER the close dates are exactly the same OR the amounts are exactly the same
If these conditions are not ALL met, the deals are not duplicates.
Respond ONLY in the following format - no other format is a valid response from you:
Percentage: [100 if all conditions are met, 0 if not]
Explanation: [Brief explanation of why it is or isn't a duplicate based on the specific criteria]
"""
max_retries = 5
for attempt in range(max_retries):
try:
await rate_limit()
completion = await client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "system",
"content": """
You are a strict duplicate deal identifier. You MUST follow the given criteria exactly.
Always respond in the exact format specified.
"""
},
{
"role": "user",
"content": prompt
}
]
)
result = completion.choices[0].message.content.strip()
percentage, explanation = parse_response(result)
if percentage > 50:
print('Comparing', deal['name'], 'from', company_name, 'and', '\n', other_deal['name'], 'from', other_company_name, '\n', result, '\n')
return {
"deal": other_deal['name'],
"company": other_company_name,
"percentage": percentage,
"explanation": explanation
}
except Exception as e:
if hasattr(e, 'status_code') and e.status_code == 429:
if attempt < max_retries - 1:
wait_time = (2 ** attempt) + 1 # exponential backoff
print(f"Rate limit error. Retrying in {wait_time} seconds...")
await asyncio.sleep(wait_time)
else:
print(f"Max retries reached. Skipping this comparison.")
return {
"deal": other_deal['name'],
"company": other_company_name,
"percentage": 0,
"explanation": "Error: Unable to compare due to rate limiting"
}
else:
print(f"Error comparing deals: {e}")
return {
"deal": other_deal['name'],
"company": other_company_name,
"percentage": 0,
"explanation": f"Error: {str(e)}"
}
def get_company_name(deal):
if deal['associatedCompanies']:
return deal['associatedCompanies'][0]['company_name']
return "Unknown"
def parse_response(response):
percentage_match = re.search(r'Percentage:\s*(\d+)', response)
explanation_match = re.search(r'Explanation:\s*(.+)', response)
percentage = int(percentage_match.group(1)) if percentage_match else 0
explanation = explanation_match.group(1) if explanation_match else "No explanation provided"
return percentage, explanation
Note: This part of the script uses OpenAI which will incur a cost on your OpenAI account. If you would like to increase accuracy of the duplication detection, switch out this line for the most powerful OpenAI model:
model="gpt-3.5-turbo" or make edits to the prompt to be more specific to your goals.
Exporting results to CSV
The export_to_csv function in csv_exporter.py writes the results to a CSV file:
python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def export_to_csv(deals, filename="deal_duplicates.csv"):
with open(filename, 'w', newline='') as csvfile:
fieldnames = ['ID', 'Name', 'Company', 'Close Date', 'Amount', 'Stage', 'Owner', 'Potential %', 'Top Duplicate Deal', 'Top Duplicate Deal - Company', 'Potential Duplicates']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for deal in deals:
top_duplicate = max(deal['potential_duplicates'], key=lambda x: x['percentage']) if deal['potential_duplicates'] else None
writer.writerow({
'ID': deal['id'],
'Name': deal['name'],
'Company': get_company_name(deal),
'Close Date': deal['close_date'],
'Amount': deal['amount'],
'Stage': deal.get('stage', 'Unknown'),
'Owner': deal['owner_email'],
'Potential %': deal['duplicate_percentage'],
'Top Duplicate Deal': top_duplicate['deal'] if top_duplicate else '',
'Top Duplicate Deal - Company': top_duplicate['company'] if top_duplicate else '',
'Potential Duplicates': format_potential_duplicates(deal['potential_duplicates'])
})
This script will generate a csv with the following columns:
- ID
- Name
- Company
- Close Date
- Amount
- Stage
- Owner
- Potential %
- Top Duplicate Deal
- Top Duplicate Deal - Company
- Potential Duplicates
Tip: If you want to edit your script or the final columns that will be written to your CSV file, ask Replit AI to make some edits or suggestions.
What's next #
Now that you have a working HubSpot Deal duplicate finder, you could consider the following enhancements:
- Add a web interface to trigger the duplicate finding process and display results.
- Implement more sophisticated duplicate detection algorithms.
- Add functionality to automatically merge or resolve duplicate deals in HubSpot.
- Set up scheduled runs to regularly check for duplicates.
- Integrate with other CRM systems or data sources for more comprehensive duplicate detection.
- Use the exported CSV data to create visualizations or reports on duplicate deals.
If you’d like to speak with the Replit team on how you can make the best use of Replit, feel free to schedule some time here. We can walk through your use case and point out the best ways to implement them.
Happy coding!