Learning Scientific Programming with Python (2nd edition)
E6.19: Simulating coin-tosses
In a famous experiment, a group of volunteers are asked to toss a fair coin 100 times and note down the results of each toss (heads, H
, or tails, T
). It is generally easy to spot the participants who fake the results by writing down what they think is a random sequence of H
s and T
s instead of actually tossing the coin because they tend not to include as many "streaks'' of repeated results as would be expected by chance.
If they had access to a Python interpreter, here's how they could a more plausibly random set of results:
In [x]: res = ['H', 'T']
In [x]: tosses = ''.join([res[i] for i in np.random.randint(2, size=100)])
In [x]: tosses
Out[x]: 'TTHHTHHTTHHHTHTTHHHTHHTHTTHHTHHTTTTHHHHHHHHTTTHTTHHHHHHHTHHHTHHHH
THTTTHTTHHHHTHTTTTHTTTHTHHTTHHHHHHH'
This virtual experiment features a run of 8 heads in a row, and two runs of 7 heads in a row:
TAILS | i | HEADS
---------------------------
| 8 | *
| 7 | **
| 6 |
| 5 |
** | 4 | **
*** | 3 | ***
******* | 2 | ******
********** | 1 | ********
(This figure was produced by the following code.)
import numpy as np
toss_results = ["H", "T"]
# RUn the simulation N times.
N = 100
tosses = "".join([toss_results[i] for i in np.random.randint(2, size=N)])
# Extract the sequences of consecutive heads and tails.
head_seq = [len(s) for s in tosses.split("T") if s]
tail_seq = [len(s) for s in tosses.split("H") if s]
# How long is the maximum run-length for either heads or tails?
max_streak_len = max(max(head_seq), max(tail_seq))
# Count the numbers of each streak-length for both heads and tails.
head_seq_counts = [head_seq.count(i) for i in range(1, max_streak_len + 1)]
tail_seq_counts = [tail_seq.count(i) for i in range(1, max_streak_len + 1)]
# What is the maxmimum count across all streaks?
max_streak_count = max(max(head_seq_counts), max(tail_seq_counts))
# Print a header
print(
"{t:^{flen}} | i | {h:^{flen}}".format(
t="TAILS", h="HEADS", flen=max_streak_count
)
)
print("-" * (max_streak_count * 2 + 7))
# Summarize the streak lengths for both heads and tails.
for i in range(max_streak_len, 0, -1):
print(
"{tstreak:>{flen}s} |{i:^3d}| {hstreak:<{flen}s}".format(
tstreak="*" * tail_seq_counts[i - 1],
hstreak="*" * head_seq_counts[i - 1],
flen=max_streak_count,
i=i,
)
)