Design and implement a class, Experiment
, to read in and store a simple series of $(x,y)$ data as NumPy arrays from a text file. Include in your class methods for transforming the data series by some simple function (for example, $x' = \ln x$, $y' = 1/y$) and to perform a linear least-squares regression on the transformed data (returning the gradient and intercept of the best-fit line, $y_\mathrm{fit}' = mx'+c$). NumPy provides methods for performing linear regression (see Section 6.5.3 of the book), but for this exercise the following equations can be implemented directly:
\begin{align*}
m &= \frac{\overline{xy} - \bar{x}\bar{y}}{\overline{x^2} - \bar{x}^2}\\
c &= \bar{y} - m\bar{x}
\end{align*}
where the bar notation, $\bar{\cdot}$, denotes the arithmetic mean of the quantity under it. (Hint: use np.mean(arr)
to return the mean of array arr
).
Chloroacetic acid is an important compound in the synthetic production of phamaceuticals, pesticides and fuels. At high concentration under strong alkaline conditions its hydrolysis may be considered as the following reaction:
$$
\mathrm{ClCH_2COO^-} + \mathrm{OH^-} \rightleftharpoons \mathrm{HOCH_2COO^-} + \mathrm{Cl^-}.
$$
Data giving the concentration of $\mathrm{ClCH_2COO^-}$, $c$ (in M), as a function of time, $t$ (in s), are provided for this reaction carried out in excess alkalai at five different temperatures in the data files caa-T.txt
(T
= 40
, 50
, 60
, 70
, 80
in $\mathrm{^\circ C}$): these may be obtained from this zip archive. The reaction is known to be second order and so obeys the integrated rate law
$$
\frac{1}{c} = \frac{1}{c_0} + kt
$$
where $k$ is the effective rate constant and $c_0$ the initial ($t=0$) concentration of chloroacetic acid.
Use your Experiment
class to interpret these data by linear regression of $1/c$ against $t$, determining $m(\equiv k)$ for each temperature. Then, for each value of $k$ determine the activation energy of the reaction through a second linear regression of $\ln k$ against $1/T$ in accordance with the Arrhenius law:
$$
k = Ae^{-E_\mathrm{a}/RT} \quad \Rightarrow \quad \ln k = \ln A - \frac{E_\mathrm{a}}{RT},
$$
where $R = 8.314\;\mathrm{J\,K^{-1}\,mol^{-1}}$ is the gas constant.
The Experiment
class is defined in the following program, experiment.py
:
import sys
import numpy as np
import matplotlib.pyplot as plt
class Experiment:
def __init__(self, description):
self.description = description
# The data
self.x, self.y = np.array([]), np.array([])
# The coefficients of the best-fit straight line, y = mx + c
self.m, self.c = None, None
def load_data(self, filename):
""" Load x, y data from two-column text file with header, filename. """
try:
with open(filename, 'r') as fi:
fi.readline() # Skip header
x, y = [], []
for line in fi.readlines():
fields = line.split()
try:
x.append(float(fields[0]))
y.append(float(fields[1]))
except (IndexError, ValueError):
# e.g. blank line, or field does not evaluate to float
print('Corrupt or missing data line:', line)
sys.exit(1)
except FileNotFoundError:
print('File not found:', filename)
sys.exit(1)
# Store the data as arrays to allow vectorization
self.x = np.array(x)
self.y = np.array(y)
def set_data(self, x, y):
self.x = np.array(x)
self.y = np.array(y)
def best_fit_line(self, funcx=lambda x: x, funcy=lambda y: y):
"""
Determine the equation of the line of best fit (which minimises the
root mean squared residual) and return it as m and c where y = mx + c.
If x or y are to be transformed in some way to create a straight line
plot, functions to effect these transformations are provided as funcx
and funcy.
"""
# Apply the transformation functions
self.tx, self.ty = funcx(self.x), funcy(self.y)
# Quantities needed for the least squares regression
txbar, tybar = np.mean(self.tx), np.mean(self.ty)
tx2bar, txtybar = np.mean(self.tx**2), np.mean(self.tx *self.ty)
# gradient and intercept of best fit line
self.m = (txtybar - txbar*tybar) / (tx2bar - txbar**2)
self.c = tybar - self.m * txbar
# Also calculate the coefficient of determination, r2
ty2bar = np.mean(self.ty**2)
r2 = (txtybar - txbar*tybar)**2/(tx2bar - txbar**2)/(ty2bar - tybar**2)
return self.m, self.c, r2
def plot_data(self, best_fit=False, fig_filename=None):
"""
Plot the data on a scatter plot, along with the best-fit straight
line, if requested. Save the figure to fig_filename if not None,
otherwise show the plot interactively.
"""
if best_fit:
# Plot the transformed data and best fit line
if self.c is None or self.m is None:
self.best_fit_line()
plt.scatter(self.tx, self.ty)
xfit = np.array([min(self.tx), max(self.tx)])
yfit = self.m * xfit + self.c
plt.plot(xfit, yfit, 'k', lw=2)
else:
# Just plot the untransformed data
plt.scatter(self.x, self.y)
if fig_filename:
plt.savefig(fig_filename)
else:
plt.show()
To test it with the provided data, run experiment_test.py
:
import numpy as np
import matplotlib as plt
from experiment import Experiment
R = 8.314 # Gas constant, J.K-1.mol-1
def description(T):
return 'Rate of Chloroacetic acid hydrolysis at {} degC'.format(T)
temperatures = (40, 50, 60, 70, 80)
expts = []
for T in temperatures:
filename = 'caa-{}.txt'.format(T)
expt = Experiment(description(T))
expt.load_data(filename)
expt.best_fit_line(funcy=lambda c: 1/c)
expts.append(expt)
print('k = {:9.3e} M-1.s-1'.format(expt.m))
#expts[4].plot_data(True)
TK = np.array(temperatures) + 273.15
k = np.array([expt.m for expt in expts])
arrhenius = Experiment('Arrhenius plot for Chloroacetic acid hydrolysis')
arrhenius.set_data(TK, k)
print(arrhenius.x)
print(arrhenius.y)
m, c, r2 = arrhenius.best_fit_line(funcx=lambda T: 1/T,
funcy=lambda k: np.log(k))
A, Ea = np.exp(c), -m * R
print('A = {:10.3e} M-1.s-1, Ea = {:5.1f} kJ.mol-1'.format(A, Ea/1000))
arrhenius.plot_data(True)