In an experiment to investigate the Stroop effect, a group of students were timed reading out 25 randomly ordered colour names, first in black ink and then in a colour other than the one they name (e.g. the word "red'' in blue ink). The results are presented in the text file below: this file can be downloaded as stroop.txt
. Missing data is indicated by the character X
.
Subject Number, Gender, Time (words in black), Time (words in colour)
1,F,18.72,31.11
2,F,21.14,52.47
3,F,19.38,33.92
4,M,22.03,50.57
5,M,21.41,29.63
6,M,15.18,24.86
7,F,14.13,33.63
8,F,19.91,42.39
9,F,X,43.60
10,F,26.56,42.31
11,F,19.73,49.36
12,M,18.47,31.67
13,M,21.38,47.28
14,M,26.05,45.07
15,F,X,X
16,F,15.77,38.36
17,F,15.38,33.07
18,M,17.06,37.94
19,M,19.53,X
20,M,23.29,49.60
21,M,21.30,45.56
22,M,17.12,42.99
23,F,21.85,51.40
24,M,18.15,36.95
25,M,33.21,61.59
We can read in this data with np.genfromtxt
and summarize the results with the code below.
import numpy as np
# Read in the data from stroop.txt, identifying missing values and
# replacing them with NaN
data = np.genfromtxt('stroop.txt', skip_header=1,
dtype=[('student','u8'), ('gender','S1'),
('black','f8'), ('colour','f8')],
delimiter=',',
missing_values='X')
nwords = 25
# Remove invalid rows from data set
filtered_data = data[np.isfinite(data['black']) & np.isfinite(data['colour'])]
# Extract rows by gender (M/F) and word colour (black/colour) and normalize
# to time taken per word
fb = filtered_data['black'][filtered_data['gender']==b'F'] / nwords
mb = filtered_data['black'][filtered_data['gender']==b'M'] / nwords
fc = filtered_data['colour'][filtered_data['gender']==b'F'] / nwords
mc = filtered_data['colour'][filtered_data['gender']==b'M'] / nwords
# Produce statistics: mean and standard deviation by gender and word colour
mu_fb, sig_fb = np.mean(fb), np.std(fb)
mu_fc, sig_fc = np.mean(fc), np.std(fc)
mu_mb, sig_mb = np.mean(mb), np.std(mb)
mu_mc, sig_mc = np.mean(mc), np.std(mc)
print('Mean and (standard deviation) times per word (sec)')
print('gender | black | colour | difference')
print(' F | {:4.3f} ({:4.3f}) | {:4.3f} ({:4.3f}) | {:4.3f}'
.format(mu_fb, sig_fb, mu_fc, sig_fc, mu_fc - mu_fb))
print(' M | {:4.3f} ({:4.3f}) | {:4.3f} ({:4.3f}) | {:4.3f}'
.format(mu_mb, sig_mb, mu_mc, sig_mc, mu_mc - mu_mb))
In the absence of any provided filling_values
, np.genfromtxt
will replace the invalid fields with np.nan
.
The output shows a significantly slower per-word speed for the false-coloured words than for the words in black:
Mean and (standard deviation) times per word (sec)
gender | black | colour | difference
F | 0.770 (0.137) | 1.632 (0.306) | 0.862
M | 0.849 (0.186) | 1.679 (0.394) | 0.830