BeautifulSoup is python library for parsing Web-sites. It is pretty easy to use it. First of all you need to install library using PIP or easy_install and import it.
# pip install beautifulsoup4
# easy_install beautifulsoup4
And this is good example from Youtube chanell of CodingEntrepreneurs user:
This is listing of this program with some my improvements:
So let me clarify some points of this script. First of all script asks us to type any city, then with help of requests library we get all data from page url. After that we created a new instance of BeautifulSoup class and assign it to soup variable. g_data is a list of all div class="info" elements on page
For discovering page better to use Inspect element tools in your browser.
I`ve trim first and last element in g_data because very often it is adds. Then I am looping through all elements from g_data list and printing all needed particular elements for city, zip code and so on.
And how this script work:
$ python yp_parser.py
Please enter city: Los Angeles
----
Cafe: Coffee Co.
City: Los Angeles
Street: 8751 La Tijera Blvd
ZIP code: CA 90045
Phone: (310) 645-7315
...
...
Cafe: Groundwork Coffee Co
City: Los Angeles
Street: 1501 N Cahuenga Blvd
ZIP code: CA 90028
Phone: (323) 871-0143
---
Cafe: Wood Cafe
City: Los Angeles
Street: 12000 W Washington Blvd
ZIP code: CA 90066
Phone: (310) 915-9663
---
My own example is Engadget parsing script for articles:
It works in the same way as yellow page parser.
$ python engadget_parser.py
--------------------------------------------------------------------------------
Watch Jony Ive and Elon Musk talk design and sci-fi transportation
--------------------------------------------------------------------------------
Tired of hearing little more than soundbites from tech luminaries such as Apple's Jony Ive and Tesla's Elon Musk? Today's your lucky day. Vanity Fair has posted its full video interviews with both Ive and Musk, giving you an insight into how the two executives work. Not surprisingly, Ive's chat focuses on his design philosophies and processes, including what he thinks of Xiaomi's eerily familiar-looking products (spoiler: he doesn't see them as "flattery"). Musk, meanwhile, drops both hints about Tesla's semi-automated Model S P85D and discusses the motivations behind the science fiction-inspired transport from SpaceX and Tesla, including why it's important for humanity to go to Mars. The two discussions are lengthy at about half an hour each, but they're definitely worthwhile if you want to see what makes key industry figures tick.
--------------------------------------------------------------------------------
Links:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/
https://www.youtube.com/watch?v=3xQTJi2tqgk
http://www.pythonforbeginners.com/scripts/imdb-crawler/


Немає коментарів:
Дописати коментар