Earn 54,000 ($540.00)

due 2 years ago

Completed

HTML + LLM Question and Answer Extractor

AlexReibman

Posted 2 years ago

Details

Applications

Discussion

This Bounty has been completed!

Bounty Description

Description

This is a summarized view. Please see Our Notion page for a complete overview.

We are developing a Chrome extension that uses LLMs to answer questions in HTML-based forms based on a database of questions and answers. The goal is to create a form-agnostic solution that can work on any human-readable HTML form.

Overview

The product should extract an HTML document from a live website, filter out unnecessary elements, and send the filtered DOM elements to a backend where an LLM extracts the questions and answers. The backend should then find all fields and their corresponding questions using LLM+prompt and return a list of DOM element IDs and their corresponding questions.

Deliverables

Create a Javascript function for a Chrome extension content script that:
1. Captures a snapshot of the current webpage's HTML and sends it to a backend service.
Create a backend function or service (Javascript or Python) that:
1. Filters out unnecessary elements/tokens from the HTML snapshot.
2. Sends the pruned HTML to an LLM prompted to create a list/JSON/YAML of all question+answer pairs.
3. Matches the extracted question+answer pairs against a bank of pre-answered questions and answers.

Requirements & Constraints

Return a list of DOM element unique identifiers for questions and answer elements.
Ensure functionality runs error-free on all web browsers and websites.
Must not exceed token limit constraints (4k token context limit).

Assumptions & Dependencies

The solution must be agnostic to different website styles for hosting fillable fields.
Prefer YAML formatting for LLM inputs/outputs to save tokens.
Access to GPT-4 and Claude v1.3 API keys will be provided upon request.
Test websites: Google Forms and Shipping Site (click on "Enter Document Details").
Suggested approaches: DOM minification, ARIA screen reader labels, and Cheerio for filtering out unnecessary tags.