Learning Scientific Programming with Python (2nd edition)

E6.19: Simulating coin-tosses

In a famous experiment, a group of volunteers are asked to toss a fair coin 100 times and note down the results of each toss (heads, H, or tails, T). It is generally easy to spot the participants who fake the results by writing down what they think is a random sequence of Hs and Ts instead of actually tossing the coin because they tend not to include as many "streaks'' of repeated results as would be expected by chance.

If they had access to a Python interpreter, here's how they could a more plausibly random set of results:

In [x]: res = ['H', 'T']
In [x]: tosses = ''.join([res[i] for i in np.random.randint(2, size=100)])
In [x]: tosses
Out[x]: 'TTHHTHHTTHHHTHTTHHHTHHTHTTHHTHHTTTTHHHHHHHHTTTHTTHHHHHHHTHHHTHHHH
THTTTHTTHHHHTHTTTTHTTTHTHHTTHHHHHHH'

This virtual experiment features a run of 8 heads in a row, and two runs of 7 heads in a row:

  TAILS    | i |   HEADS   
---------------------------
           | 8 | *         
           | 7 | **        
           | 6 |           
           | 5 |           
        ** | 4 | **        
       *** | 3 | ***       
   ******* | 2 | ******    
********** | 1 | ********

(This figure was produced by the following code.)

import numpy as np

toss_results = ["H", "T"]
# RUn the simulation N times.
N = 100
tosses = "".join([toss_results[i] for i in np.random.randint(2, size=N)])
# Extract the sequences of consecutive heads and tails.
head_seq = [len(s) for s in tosses.split("T") if s]
tail_seq = [len(s) for s in tosses.split("H") if s]
# How long is the maximum run-length for either heads or tails?
max_streak_len = max(max(head_seq), max(tail_seq))
# Count the numbers of each streak-length for both heads and tails.
head_seq_counts = [head_seq.count(i) for i in range(1, max_streak_len + 1)]
tail_seq_counts = [tail_seq.count(i) for i in range(1, max_streak_len + 1)]
# What is the maxmimum count across all streaks?
max_streak_count = max(max(head_seq_counts), max(tail_seq_counts))

# Print a header
print(
    "{t:^{flen}} | i | {h:^{flen}}".format(
        t="TAILS", h="HEADS", flen=max_streak_count
    )
)
print("-" * (max_streak_count * 2 + 7))
# Summarize the streak lengths for both heads and tails.
for i in range(max_streak_len, 0, -1):
    print(
        "{tstreak:>{flen}s} |{i:^3d}| {hstreak:<{flen}s}".format(
            tstreak="*" * tail_seq_counts[i - 1],
            hstreak="*" * head_seq_counts[i - 1],
            flen=max_streak_count,
            i=i,
        )
    )