Earn 13,500 ($135.00)
Scrape specific website using ScrapingBee API
Bounty Description
Problem Description
I am trying to use the scrapingbee API (https://www.scrapingbee.com/documentation/) to scrape Quizlet decks.
I am trying to scrape https://quizlet.com/_4yn8wd?x=1jqt&i=3wahxy.
On this page, when it initially loads, there are 100 TermText elements in the SetPageTerms-termsList element.
Below the SetPageTerms-termsList element, there is a button with id siycb3m and the text See More.
When the button is clicked, the whole set loads, and the SetPageTerms-termsList suddenly has 288 elements in it. But using the scrapingbee API, it always returns 100.
My script to call scrapingbee passes the following params to the scrapingbee client:
block_resources=False,
wait_for = '.SetPageTerms-termsList',
premium_proxy=False, stealth_proxy=False,
render_js=True,
js_scenario= {"instructions": [
{'wait_for' : '.SetPageTerms-termsList'},
{"infinite_scroll": # Scroll the page until the end
{
"max_count": 0, # Maximum number of scroll, 0 for infinite
"delay": 1000 # Delay between each scroll, in ms
}},
# {"wait_for_and_click": "#AssemblyButtonBase AssemblyPrimaryButton--default AssemblyButtonBase--medium AssemblyButtonBase--padding AssemblyButtonBase--fullWidth"},
{"wait": 2000},
{'evaluate': QUIZLET_SCRAPING_SCRIPT},
]},
Where the QUIZLET_SCRAPING_SCRIPT is:
let allButtons = document.querySelectorAll("button");
let buttonClicked = false;
for (let button of allButtons) {
// Check for the aria-label
if (button.getAttribute("aria-label") === "See More") {
button.click();
buttonClicked = true;
}
// Check for the id
if (button.id === "siycb3m") {
button.click();
buttonClicked = true;
}
}
// Lastly, we find the section and check for a button immediately after it
let section = document.querySelector(".SetPageTerms-termsList");
if (section) {
// The '+' CSS selector is used to select the element immediately following another element
let buttonAfterSection = section.parentElement.querySelector(".SetPageTerms-termsList + button");
if (buttonAfterSection) {
buttonAfterSection.click();
buttonClicked = true;
}
}
buttonClicked;
let waitForAllTerms = async () => {
let checkInterval = 1000; // Check every second
let maxWaitTime = 30000; // Wait up to 30 seconds
let waitedTime = 0;
while (waitedTime < maxWaitTime) {
let terms = document.querySelectorAll(".SetPageTerms-termsList .TermText");
if (terms.length >= 288) {
return true;
}
await new Promise((resolve) => setTimeout(resolve, checkInterval));waitedTime += checkInterval;
}
// If we've waited maxWaitTime and the terms haven't loaded, return false
return false;
};
waitForAllTerms();
Acceptance Criteria
I specifically want a list of scrapingbee configurations that return a scraped result that has 288 Termtext elements in. The acceptance criteria is:
- A list of arguments to the scrapingbee client
- A demonstration of the scraped output
- If I drop the arguments into my own code and it works, I'll consider the bounty completed