Using the database of the UK land registry's record of house sales, we can extract the total number of sales for each day as follows:
import psycopg2 conn = psycopg2.connect(database='house_prices', user='<USERNAME>', password='<PASSWORD>') cursor = conn.cursor() query = ("SELECT date_of_transfer, SUM(1) FROM pp" " GROUP BY 1 ORDER BY 1;") cursor.execute(query) with open('sales_by_day.txt', 'w') as fo: for row in cursor.fetchall(): print(*row, file=fo)
The PostGreSQL statement here sums and orders the sales according to their dates of transfer.
sales_by_day.txt now looks like:
$ head sales_by_day.txt 1995-01-01 56 1995-01-02 65 1995-01-03 981 1995-01-04 1098 1995-01-05 1292 1995-01-06 5054 1995-01-07 30 1995-01-08 74 1995-01-09 1279 1995-01-10 952
We can use Pandas to analyse these data in different ways. For example, to determine the total number of house sales by day of the month:
import numpy as np import pandas as pd import matplotlib.pyplot as plt # Read in the number of house sales each day df = pd.read_table('sales_by_day.txt', sep=' ', names=('date_of_transfer', 'nsales'), parse_dates=) # Aggregate and sum by day of month df['day'] = df.set_index('date_of_transfer').index.day nsales_by_day = df.groupby('day').sum() fig = plt.figure() ax = fig.add_subplot(111) bars = ax.bar(np.arange(31)+1, nsales_by_day['nsales']/1000, align='center', color='#cc88cc') # Highlight the 13th in yellow bars.set_facecolor('#ffff00') ax.set_xlabel('Day of month') ax.set_ylabel('Total number of sales \'000s') ax.set_xlim(0,32) ax.tick_params(length=5, width=2, direction='out',color='#cccccc') ax.yaxis.grid(color='#cccccc', linestyle='-', lw=2) for pos in ax.spines: ax.spines[pos].set_visible(False) ax.set_title('UK house sales by day of month for completion') ax.set_axisbelow(True) plt.show()
By default, the Pandas
read_table function identifies fields as separated by tabs so we need to pass
sep=' ' explicitly. We also need to tell Python to treat the first column as a date (
We can extract the day number and add it as an extra column with
df['day'] = df.set_index('date_of_transfer').index.day
And then take the sum of the sales, grouped by day number with
nsales_by_day = df.groupby('day').sum()
The plotted data show a significant minimum on the 13th. Presumably house-buyers are superstitious.
Share on Twitter Share on Facebook