Earn 1,980 ($19.80)

due 2 years ago

Canceled

Web Scraper for Datasets

succoallapera104

Posted 2 years ago

Details

Applications

Discussion

Bounty Description

Problem Description

Make a tool in Python for scraping entire websites (and subpages), then organizing them in a csv file.
Each data file must have 4 columns (Max rows: 3k):

one for Context
one for Question
one for Answer (answers.text)
one for Answer Start Index (answers.answer_start).

Use gpt4free and OpenAssistant to generate context, questions, answer, answer start index from the scraped content.
If the content generated is more than 3000 rows it should split in different files.
Each csv file has to be saved in a directory called by the website url and that directry has to be saved in another called Datasets.
Also the csv file's name has to be formatted like this: "WEBSITE_NAME"_"NUMBER_OF_THE_FILE", example:
google_1, google_2, google_3 ecc.
If you have any questions contact me on discord: succo104#5166

Acceptance Criteria

The web scraper has to be entirely in Python.
Shell based GUI for inputs:

First input: Model (GPT or OpenAssistant)
Second input: Website

Hosted on a Repl