Viewing posts by christian
In my previous post I gave a short script for scraping a particular Wikipedia page for some string-based data in one table. Then the internet had some advice for me.
Use pandas.read_html
they said. It will be easy, they said; everything will be handled for you, they said. Just clean, analyse and report.
The Beautiful Soup Python library is an excellent way to scrape web pages for their content. I recently wanted a reasonably accurate list of official (ISO 3166-1) two-letter codes for countries, but didn't want to pay CHF 38 for the official ISO document. The ISO 3166-1 alpha-2 contains this information in an HTML table which can be scraped quite easily as follows.
Most of the world's currently-operational nuclear reactors were built in the 1970s and 80s and with an expected lifespan of 30–40 years are coming to the end of their predicted lifetime. Many will need to be decommissioned in the not-too-distant future.
Bertrand's Paradox is an illustration of the need to define the mechanism for picking a random variable carefully for its associated probability to be well-defined.