A certain lottery involves players selecting six numbers without replacement from the range [1,49]. The jackpot is shared among the players who match all six numbers ("balls") selected in the same way at random in a twice-weekly draw (in any order). If no player matches every drawn number, the jackpot "rolls over" and is added to the following draw's jackpot.
Although the lottery is fair in the sense that every combination of drawn numbers is equally likely, it has been observed that many players show a preference in their selection for certain numbers such as those which represent dates (i.e. more of their numbers are chosen from [1,31] than would be expected if they chose randomly). Hence, to avoid sharing the jackpot and hence to maximise one's expected winnings, it would be reasonable to avoid these numbers.
Test this hypothesis by establishing if there is any correlation between the number of balls with values less than 13 (representing a month) and the jackpot winnings per person. Ignore draws immediately following a rollover. The necessary data can be downloaded here.
The following code reports a suggestive anti-correlation between the number of "low" (< 13) numbers drawn and the size of the (non-rollover) jackpot winnings, suggesting that players are more likely to share the jackpot if they pick low numbers.
import sys
import numpy as np
import pylab
def parse_line(fi):
# skip header rows
fi.readline()
fi.readline()
rollover = False
for line in fi:
fields = line.split()
nwinners = int(fields[6])
if nwinners == 0:
rollover = True
continue
if rollover:
rollover = False
continue
balls = np.array([int(v) for v in fields[:6]])
jackpot_share = float(fields[7])
nlow = sum(balls < 13)
yield nlow, jackpot_share
with open('lottery-draws.txt') as fi:
data = list(parse_line(fi))
data = np.array(data)
print(np.corrcoef(data, rowvar=0))
Output:
[[ 1. -0.19909853]
[-0.19909853 1. ]]