Using converter functions to read data files to pandas DataFrames

The following text file, available here as vitamins.txt, contains data concerning 13 vitamins important for human health.

List of vitamins, their solubility (in fat or water) and recommended dietary
allowances for men / women.
Data from the US Food and Nutrition Board, Institute of Medicine, National
Academies

Vitamin A   Fat     900ug/700ug

Vitamin B1  Water   1.2mg/1.1mg
Vitamin B2  Water   1.3mg/1.1mg
Vitamin B3  Water   16mg/14mg
Vitamin B5  Water   5mg
Vitamin B6  Water   1.5mg/1.4mg
Vitamin B7  Water   30ug
Vitamin B9  Water   400ug
Vitamin B12 Water   2.4ug

Vitamin C   Water   90mg/75mg
Vitamin D   Fat     15ug
Vitamin E   Fat     15mg
Vitamin K   Fat     110ug/120ug
--- Data for guidance only, consult your physician ---

The recommended (daily) dietary allowances are listed in either of two units in the final column; sometimes these are different for men and women. If we wish to parse this column into an average value in µg, we can use a converter function as in the following code.

import pandas as pd

def average_rda_in_micrograms(col):
    def ensure_micrograms(s):
        if s.endswith('ug'):
            return float(s[:-2])
        elif s.endswith('mg'):
            return float(s[:-2]) * 1000
        raise ValueError(f'Unrecognised units in {s}')
    fields = col.split('/')
    return sum([ensure_micrograms(s) for s in fields]) / len(fields)

df = pd.read_csv('vitamins.txt', delim_whitespace=True, skiprows=4,
                 skipfooter=1, header=None, usecols=(1, 2, 3),
                 converters={'RDA': average_rda_in_micrograms},
                 names=['Vitamin', 'Solubility', 'RDA'],
                 index_col=0
                )

In this code, the four header rows and one footer row are skipped (blank lines are skipped automatically); the Index is set to the first used column (index_col=0, identifying the vitamin). The converter function averages the numerical values encountered (after conversion to µg), where multiple values are assumed to be separated by a solidus character (/).