RNA codons

RNA encodes the amino acids of a peptide as a sequence of codons, with each codon consisting of three nucleotides chosen from the 'alphabet': U (uracil), C (cytosine), A (adenine) and G (guanine).

The Python script codon_lookup.py creates a dictionary, codon_table, mapping codons to amino acids where each amino acid is identified by its one-letter abbreviation (for example, R = arginine). The stop codons, signalling termination of RNA translation, are identified with the single asterisk character, *. The codonAUG signals the start of translation within a nucleotide sequence as well as coding for the amino acid methionine.

This script can be executed within IPython with %run codon_lookup.py (or loaded and then executed with %load codon_lookup.py followed by pressing Enter:

In [x]: %run codon_lookup.py
In [x]: codon_table
Out[x]:
{'GCG': 'A',
 'UAA': '*',
 'GGU': 'G',
 'UCU': 'S',
     ...
 'ACA': 'T',
 'ACC': 'T'}

Let's define a function to translate an RNA sequence. Type %edit and enter the following code in the editor that appears.

def translate_rna(seq):
    start = seq.find('AUG')
    peptide = []
    i = start
    while i < len(seq)-2:
        codon = seq[i:i+3]
        a = codon_table[codon]
        if a == '*':
            break
        i += 3
        peptide.append(a)
    return ''.join(peptide)

When you exit the editor it will be executed, defining the function, translate_rna:

IPython will make a temporary file named: /var/folders/fj/yv29fhm91v7_6g7sqsy1z2
940000gp/T/ipython_edit_thunq9/ipython_edit_dltv_i.py
Editing... done. Executing edited code...
Out[x]: "def translate_rna(seq):\n    start = seq.find('AUG')\n    peptide = []\
n    i = start\n    while i < len(seq)-2:\n        codon = seq[i:i+3]\n        a
 = codon_table[codon]\n        if a == '*':\n            break\n        i += 3\n
        peptide.append(a)\n    return ''.join(peptide)\n"

Now feed the function an RNA sequence to translate:

In[x]: seq = 'CAGCAGCUCAUACAGCAGGUAAUGUCUGGUCUCGUCCCCGGAUGUCGCUACCCACGAGACCCGUAU
CCUACUUUCUGGGGAGCCUUUACACGGCGGUCCACGUUUUUCGCUACCGUCGUUUUCCCGGUGCCAUAGAUGAAUGUU'
In [x]: translate_rna(seq)
Out[x]: 'MSGLVPGCRYPRDPYPTFWGAFTRRSTFFATVVFPVP'

To read in a list of RNA sequences (one per line) from a text file, seqs.txt, and translate them, one could use %sx with the system command cat (or, on Windows, the command type):

In [x]: seqs = %sx cat seqs.txt
In [x]: for seq in seqs:
   ...:     print(translate_rna(seq))
   ...:
MHMLDENLYDLGMKACHEGTNVLDKWRNMARVCSCDYQFK
MQGSDGQQESYCTLPFEVSGMP
MPVEWRTMQFQRLERASCVKDSTFKNTGSFIKDRKVSGISQDEWAYAMSHQMQPAAHYA
MIVVTMCQ
MGQCMRFAPGMHGMYSSFHPQHKEITPGIDYASMNEVETAETIRPI