At the time of writing, the first table on the Wikipedia page https://en.wikipedia.org/wiki/List_of_wine-producing_regions contains columns of the rank, country name and wine production for the principal wine-producing countries in the world. To parse it with pandas:
In [x]: dfs = pd.read_html( 'https://en.wikipedia.org/wiki/List_of_wine-producing_regions', index_col=1, match="Wine production by country") In [x]: dfs.head() Out[x]: Rank Production(tonnes) Country(with link to wine article) Italy 1 4796900 France 2 4607850 Spain 3 4293466 United States 4 3300000 China 5 1700000
In this case, the table is identified by a match to the the text inside the
<caption> element of the first
<table> on the page.
dfs is a list containing a single item, the
DataFrame parsed from the matching table.