# The probability of cleaving DNA with EcoRI

The endonuclease enzyme EcoRI is used as a restriction enzyme which cuts DNA at the nucleic acid sequence GAATTC. Suppose a given DNA molecule contains 12000 base pairs and a 50% G+C content. The Poisson distribution can be used to predict the probability that EcoRI will fail to cleave this molecule as follows:

The recognition site, GAATTC, consists of six nucleotide base pairs; the probability that any given six-base sequence corresponds to GAATTC is $1/4^6 = 1/4096$ and so the expected number of cleavage sites for EcoRI in this DNA molecule is $\lambda = 12000/4096 = 2.93$. From the Poisson distribution, we expect the probability that the endonuclease will fail to cleave this molecule is therefore $$P(0) = \frac{\lambda^0 e^{-\lambda}}{0!} = 0.053,$$ or about 5.3%. To simulate the possibilities stochastically:

In [x]: lam = 12000 / 4**6
In [x]: N = 100000
In [x]: np.sum(np.random.poisson(lam, N)==0)/N
Out[x]: 0.053699999999999998