Back to all Bounties
Earn 4,500 ($45.00)
due 2 years ago
In Progress
Scrape Odd Lots podcast
alecfwilson
Details
Applications
3
Discussion
Bounty Description
Problem Description
I want to scrape the transcripts of the Odd Lots podcast. Some of the episodes can be found at https://www.bloomberg.com/oddlots in the links that begin with "Transcript: ". I tried, but ran into issues with a Captcha and could not figure out how to get selenium working within a repl. I haven't quite figured out how to find the complete archive of the podcasts yet, but would be ok with a solution that just scrapes the episodes available at the provided link.
Acceptance Criteria
- A web scraper written in Python. I have some experience with bs4 but am open to using other scraping libraries
- The web scraper can successfully be run on the url https://www.bloomberg.com/oddlots and avoid the captcha to return the transcripts of the episodes linked on that page and saves them to a csv.
- The csv schema should be: Speaker | Time | Text. For example, for the transcript found at https://www.bloomberg.com/news/articles/2023-02-01/transcript-viktor-shvets-declares-victory-for-team-transitory-and-the-soft-landing, the first row should read "Tracy | 0:10 | Hello and welcome to another episode of the Odd Lots podcast. I'm Tracy Alloway."
- I am open to suggestions for improving the return output to be more useful