A simple webscraper using the newspaper3k Python module.
import newspaper, sys url = input("Enter URL of news website to scrape: ") print("Loading...") paper = newspaper.build(url)
What this does is it first gets user input for the URL.
i = 0 for article in paper.articles: print(str(i)+" "+article.title) i = i + 1
Then, it goes on and prints out all the article titles.
def getArticleNum(): try: articleNum = int(input("Please enter the number corresponding to the article you want to view: ")) return articleNum except: print("Please enter a number.") return getArticleNum()
Then, we define a function to allow us to get our user input for our article number.
try: article = paper.articles[getArticleNum()] except: print("Sorry, an error occured while processing your request.") sys.exit()
Here, we now get the article number the user wants...
article.download() article.parse() print(article.text)
Then we download the article, parse it, and print out the contents.
NOTE: This is very glitchy.