At the time of writing, the first table on the Wikipedia page https://en.wikipedia.org/wiki/List_of_wine-producing_regions contains columns of the rank, country name and wine production for the principal wine-producing countries in the world. To parse it with pandas:
In [x]: dfs = pd.read_html(
'https://en.wikipedia.org/wiki/List_of_wine-producing_regions',
index_col=1, match="Wine production by country")
In [x]: dfs[0].head()
Out[x]:
Rank Production(tonnes)
Country(with link to wine article)
Italy 1 4796900
France 2 4607850
Spain 3 4293466
United States 4 3300000
China 5 1700000
In this case, the table is identified by a match to the the text inside the <caption>
element of the first <table>
on the page.
dfs
is a list containing a single item, the DataFrame
parsed from the matching table.