The following text file, available here as vitamins.txt, contains data concerning 13 vitamins important for human health.
List of vitamins, their solubility (in fat or water) and recommended dietary
allowances for men / women.
Data from the US Food and Nutrition Board, Institute of Medicine, National
Academies
Vitamin A Fat 900ug/700ug
Vitamin B1 Water 1.2mg/1.1mg
Vitamin B2 Water 1.3mg/1.1mg
Vitamin B3 Water 16mg/14mg
Vitamin B5 Water 5mg
Vitamin B6 Water 1.5mg/1.4mg
Vitamin B7 Water 30ug
Vitamin B9 Water 400ug
Vitamin B12 Water 2.4ug
Vitamin C Water 90mg/75mg
Vitamin D Fat 15ug
Vitamin E Fat 15mg
Vitamin K Fat 110ug/120ug
--- Data for guidance only, consult your physician ---
The recommended (daily) dietary allowances are listed in either of two units in the final column; sometimes these are different for men and women. If we wish to parse this column into an average value in µg, we can use a converter function as in the following code.
import pandas as pd
def average_rda_in_micrograms(col):
def ensure_micrograms(s):
if s.endswith('ug'):
return float(s[:-2])
elif s.endswith('mg'):
return float(s[:-2]) * 1000
raise ValueError(f'Unrecognised units in {s}')
fields = col.split('/')
return sum([ensure_micrograms(s) for s in fields]) / len(fields)
df = pd.read_csv('vitamins.txt', delim_whitespace=True, skiprows=4,
skipfooter=1, header=None, usecols=(1, 2, 3),
converters={'RDA': average_rda_in_micrograms},
names=['Vitamin', 'Solubility', 'RDA'],
index_col=0
)
In this code, the four header rows and one footer row are skipped (blank lines are skipped automatically); the Index
is set to the first used column (index_col=0
, identifying the vitamin). The converter function averages the numerical values encountered (after conversion to µg), where multiple values are assumed to be separated by a solidus character (/).