RNA encodes the amino acids of a peptide as a sequence of codons, with each codon consisting of three nucleotides chosen from the 'alphabet': U (uracil), C (cytosine), A (adenine) and G (guanine).
The Python script codon_lookup.py
creates a dictionary, codon_table
, mapping codons to amino acids where each amino acid is identified by its one-letter abbreviation (for example, R
= arginine). The stop codons, signalling termination of RNA translation, are identified with the single asterisk character, *
. The codonAUG
signals the start of translation within a nucleotide sequence as well as coding for the amino acid methionine.
This script can be executed within IPython with %run codon_lookup.py
(or loaded and then executed with %load codon_lookup.py
followed by pressing Enter:
In [x]: %run codon_lookup.py
In [x]: codon_table
Out[x]:
{'GCG': 'A',
'UAA': '*',
'GGU': 'G',
'UCU': 'S',
...
'ACA': 'T',
'ACC': 'T'}
Let's define a function to translate an RNA sequence. Type %edit
and enter the following code in the editor that appears.
def translate_rna(seq):
start = seq.find('AUG')
peptide = []
i = start
while i < len(seq)-2:
codon = seq[i:i+3]
a = codon_table[codon]
if a == '*':
break
i += 3
peptide.append(a)
return ''.join(peptide)
When you exit the editor it will be executed, defining the function, translate_rna
:
IPython will make a temporary file named: /var/folders/fj/yv29fhm91v7_6g7sqsy1z2
940000gp/T/ipython_edit_thunq9/ipython_edit_dltv_i.py
Editing... done. Executing edited code...
Out[x]: "def translate_rna(seq):\n start = seq.find('AUG')\n peptide = []\
n i = start\n while i < len(seq)-2:\n codon = seq[i:i+3]\n a
= codon_table[codon]\n if a == '*':\n break\n i += 3\n
peptide.append(a)\n return ''.join(peptide)\n"
Now feed the function an RNA sequence to translate:
In[x]: seq = 'CAGCAGCUCAUACAGCAGGUAAUGUCUGGUCUCGUCCCCGGAUGUCGCUACCCACGAGACCCGUAU
CCUACUUUCUGGGGAGCCUUUACACGGCGGUCCACGUUUUUCGCUACCGUCGUUUUCCCGGUGCCAUAGAUGAAUGUU'
In [x]: translate_rna(seq)
Out[x]: 'MSGLVPGCRYPRDPYPTFWGAFTRRSTFFATVVFPVP'
To read in a list of RNA sequences (one per line) from a text file, seqs.txt
, and translate them, one could use %sx
with the system command cat
(or, on Windows, the command type
):
In [x]: seqs = %sx cat seqs.txt
In [x]: for seq in seqs:
...: print(translate_rna(seq))
...:
MHMLDENLYDLGMKACHEGTNVLDKWRNMARVCSCDYQFK
MQGSDGQQESYCTLPFEVSGMP
MPVEWRTMQFQRLERASCVKDSTFKNTGSFIKDRKVSGISQDEWAYAMSHQMQPAAHYA
MIVVTMCQ
MGQCMRFAPGMHGMYSSFHPQHKEITPGIDYASMNEVETAETIRPI