Earn 4,500 ($45.00)

due 2 years ago

In Progress

Scrape Odd Lots podcast

alecfwilson

Posted 2 years ago

Details

Applications

Discussion

Bounty Description

Problem Description

I want to scrape the transcripts of the Odd Lots podcast. Some of the episodes can be found at https://www.bloomberg.com/oddlots in the links that begin with "Transcript: ". I tried, but ran into issues with a Captcha and could not figure out how to get selenium working within a repl. I haven't quite figured out how to find the complete archive of the podcasts yet, but would be ok with a solution that just scrapes the episodes available at the provided link.

Acceptance Criteria

A web scraper written in Python. I have some experience with bs4 but am open to using other scraping libraries
The web scraper can successfully be run on the url https://www.bloomberg.com/oddlots and avoid the captcha to return the transcripts of the episodes linked on that page and saves them to a csv.
The csv schema should be: Speaker | Time | Text. For example, for the transcript found at https://www.bloomberg.com/news/articles/2023-02-01/transcript-viktor-shvets-declares-victory-for-team-transitory-and-the-soft-landing, the first row should read "Tracy | 0:10 | Hello and welcome to another episode of the Odd Lots podcast. I'm Tracy Alloway."
I am open to suggestions for improving the return output to be more useful