Benford's Law

Question P2.4.7

Benford's Law is an observation about the distribution of the frequencies of the first digits of the numbers in many different data sets. It is frequently found that the first digits are not uniformly distributed, but follow the logarithmic distribution $$ P(d) = \log_{10}\left( \frac{d+1}{d} \right). $$ That is, numbers starting with 1 are more common than those starting with 2, and so on, with those starting with 9 the least common. The probabilities are given below:

DigitProbability
10.301
20.176
30.125
40.097
50.079
60.067
70.058
80.051
90.046

Benford's Law is most accurate for data sets which span several orders of magnitude, and can be proved to be exact for some infinite sequences of numbers.

(a) Demonstrate that the first digits of the first 500 Fibonacci numbers (see this Example) follow Benford's Law quite closely.

(b) The length of the amino acid sequences of 500 randomly-chosen proteins are provided in the file protein_lengths.py. This file contains a list, naa, which can be imported at the start of your program with

from protein_lengths import naa

To what extent does the distribution of protein lengths obey Benford's Law?


Solution P2.4.7