A structured array of whale species data

Question P6.1.1

Turn the following data concerning various species of cetacean into a NumPy structured array and order it by (a) mass and (b) population. Determine in each case the index at which Bryde's whale (population: 100000, mass: 25 tonnes) should be inserted to keep the array ordered.

NamePopulationMass /tonnes
Bowhead whale900060
Blue whale20000120
Fin whale10000070
Humpback whale8000030
Gray whale2600035
Atlantic white-sided dolphin2500000.235
Pacific white-sided dolphin10000000.15
Killer whale1000004.5
Narwhal250001.5
Beluga1000001.5
Sperm whale200000050
Baiji130.13
North Atlantic right whale30075
North Pacific right whale20080
Southern right whale700070

A text file containing these data is available as whale-data.txt.


Solution P6.1.1

First define a suitable data type:

In [x]: dt = np.dtype([('Common Name', 'S32'), ('Population', 'i4'),
  ....: ('Mass', 'f8')])

and populate a structured array in the arbitrary order given:

In [x]: cetaceans = np.array(
  ....: [
  ....: ('Bowhead whale', 9000, 60),
  ....: ('Blue whale', 20000, 120),
  ....: ('Fin whale', 100000, 70),
  ....: ('Humpback whale', 80000, 30),
  ....: ('Gray whale', 26000, 35),
  ....: ('Atlantic white-sided dolphin', 250000, 0.235),
  ....: ('Pacific white-sided dolphin', 1000000, 0.15),
  ....: ('Killer whale', 100000, 4.5),
  ....: ('Narwhal', 25000, 1.5),
  ....: ('Beluga', 100000, 1.5),
  ....: ('Sperm whale', 2000000, 50),
  ....: ('Baiji', 13, 0.13),
  ....: ('North Atlantic right whale', 300, 75),
  ....: ('North Pacific right whale', 200, 80),
  ....: ('Southern right whale', 7000, 70)
  ....: ], dtype=dt)

(a) Sort by mass and find the index to insert Bryde's whale:

In [x]: cetaceans.sort(order='Mass')
In [x]: new_whale = np.array(("Bryde's whale", 100000, 25), dtype=dt)
In [x]: np.searchsorted(cetaceans['Mass'], new_whale['Mass'])
Out[x]: 6

(b) Similarly, re-order by population

In [x]: cetaceans.sort(order='Population')
In [x]: np.searchsorted(cetaceans['Population'], new_whale['Population'])
Out[x]: 9