In Part-1 of this series, I have walked through BeautifulSoup's generated object structure and the techniques to follow to search and extract data from the html tree. In this blog, I will demonstrate how data can be scraped from live websites. Python requests library can be used fetch data from websites and then give it to BeautifulSoup for parsing. The code for this part is as shown below. from bs4 import BeautifulSoup import requests URL = 'https://www.huffpost.com/' req = requests.get(URL) bs = BeautifulSoup(req.content, 'html.parser') Huffington Post Lets fetch the latest news from HuffingtonPost. In order to do that, lets study the html structure used by this site. For the "Latest News " section, there is a div id="zone-a" which has two elements under it: zone title (section) zone content (section) : this section has cards, each containing one news item The code to parse the cards and display their text will be as below: ...
My musings on Cloud, Networking and various technologies