So this is a simple web scraper where the user can input the URL of a website then you can search for a certain element to take from the website. The elements you can search are:
- Searches for all of the links on that page.
- Searches for all of the
elements on the page.
- Searches by id or class.
- Searches for all of the
NOTE: if you input
h2 you can search for any class/id in that page even if it's not from an element you choose. This is something I realized when I was done and don't know how to fix yet.
First the imports
from bs4 import beautifulSoup
- And of course I imported
Fore, Back, Stylefrom
coloramafor the coloring.
When you input the url it saves it to a variable named
url. Then I made another variable name
response that equals
requests.get(url) which honestly I can't explain well since I don't know what it does other than you need to do otherwise it won't work.
Then I turned the response to text and saved it to the variable
Next it asks which element it wants to search. It saves this to the variable
element for later use.
After this I set up the
html parser saved it to the variable
Next I set it to find all of the
elements and saved it to the variable
Then I set up an
if else statement; it's pretty simple and I use the
element variable again as one side of the equation.
if element == "a": for link in ELEMENTS: print(Fore.WHITE + link.get("href")) elif element == "p": idClass = input(Fore.BLUE + "Would you like to search by id or class?\n") if idClass == "id": ID = input("Name of id: ") searchIdP = soup.find(id=ID) print(Fore.WHITE + searchIdP.prettify()) elif idClass == "class": CLASS = input("Name of class: ") searchClassPs = soup.find(class_=CLASS) print(Fore.WHITE + searchClassPs.prettify()) else: print(Fore.RED + "Not an option")
The rest (
h2) are the same as
For the link (
element == "a") it's much better for to just see the links so the whole
for link in ELEMENTS...link.get("href")) is searching for the links which are found in the
href attribute (you can probably understand that entire section with this knowledge).
h2 it asks for class or id; the user's input is saved to the variable
idClass. Then there's another
if else statement.
- If the user says
idit will ask the user for the name of the id which is saved to
ID. Next it will search for the element with that id which is saved to
searchIdP. Afterwards it prints out the element to the console. You may have noticed that I add
.prettify()to the end of
print(), this was done to make the output easier to read, it spaces it out nice.
- This is basicly repeated for the
classoption exept you change all of the
And that's all there is to tell you about how it works. If you have any questions leave a comment below and I will answer the best I can, just note that I'm a noob to python so don't expect a great answer to a complex question.