Skip to content
    Back to all Bounties

    Earn 13,500 ($135.00)

    Time Remainingdue 2 years ago
    Completed

    Scrape specific website using ScrapingBee API

    RobinJack
    RobinJack
    Posted 2 years ago
    This Bounty has been completed!
    @RobinJack's review of @0aky
    5.0
    Average Rating
    Communication 5/5, Quality 5/5, Timeliness 5/5

    Bounty Description

    Problem Description

    I am trying to use the scrapingbee API (https://www.scrapingbee.com/documentation/) to scrape Quizlet decks.

    I am trying to scrape https://quizlet.com/_4yn8wd?x=1jqt&i=3wahxy.
    On this page, when it initially loads, there are 100 TermText elements in the SetPageTerms-termsList element.
    Below the SetPageTerms-termsList element, there is a button with id siycb3m and the text See More.
    When the button is clicked, the whole set loads, and the SetPageTerms-termsList suddenly has 288 elements in it. But using the scrapingbee API, it always returns 100.

    My script to call scrapingbee passes the following params to the scrapingbee client:

    block_resources=False,
    wait_for = '.SetPageTerms-termsList',
    premium_proxy=False, stealth_proxy=False,
    render_js=True,
    js_scenario= {"instructions": [
    {'wait_for' : '.SetPageTerms-termsList'},
    {"infinite_scroll": # Scroll the page until the end
    {
    "max_count": 0, # Maximum number of scroll, 0 for infinite
    "delay": 1000 # Delay between each scroll, in ms
    }},
    # {"wait_for_and_click": "#AssemblyButtonBase AssemblyPrimaryButton--default AssemblyButtonBase--medium AssemblyButtonBase--padding AssemblyButtonBase--fullWidth"},
    {"wait": 2000},
    {'evaluate': QUIZLET_SCRAPING_SCRIPT},
    ]},

    Where the QUIZLET_SCRAPING_SCRIPT is:
    let allButtons = document.querySelectorAll("button");
    let buttonClicked = false;

    for (let button of allButtons) {
    // Check for the aria-label
    if (button.getAttribute("aria-label") === "See More") {
    button.click();
    buttonClicked = true;
    }

    // Check for the id
    if (button.id === "siycb3m") {
    button.click();
    buttonClicked = true;
    }
    }

    // Lastly, we find the section and check for a button immediately after it
    let section = document.querySelector(".SetPageTerms-termsList");
    if (section) {
    // The '+' CSS selector is used to select the element immediately following another element
    let buttonAfterSection = section.parentElement.querySelector(".SetPageTerms-termsList + button");
    if (buttonAfterSection) {
    buttonAfterSection.click();
    buttonClicked = true;
    }
    }

    buttonClicked;

    let waitForAllTerms = async () => {
    let checkInterval = 1000; // Check every second
    let maxWaitTime = 30000; // Wait up to 30 seconds
    let waitedTime = 0;

    while (waitedTime < maxWaitTime) {
    let terms = document.querySelectorAll(".SetPageTerms-termsList .TermText");
    if (terms.length >= 288) {
    return true;
    }

    await new Promise((resolve) => setTimeout(resolve, checkInterval));
    waitedTime += checkInterval;

    }

    // If we've waited maxWaitTime and the terms haven't loaded, return false
    return false;
    };

    waitForAllTerms();

    Acceptance Criteria

    I specifically want a list of scrapingbee configurations that return a scraped result that has 288 Termtext elements in. The acceptance criteria is:

    1. A list of arguments to the scrapingbee client
    2. A demonstration of the scraped output
    3. If I drop the arguments into my own code and it works, I'll consider the bounty completed