Martin Bergen, Sven Findeiß, Nick Goldman, Ivo Hofacker, Stefan Kalkhof, Stephan Müller, Peter Stadler, Stefan Washietl
Paper #: 11-04-012
Evolutionary analysis has become an powerful tool for the functional annotation of genomes due to the availability of massive amounts of comparative sequence data. The assessment of coding potential in conserved regions and the discrimination of coding from noncoding transcripts arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current gene finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied "out of the box", without any training, to data from all domains of life. We describe the RNAcode method, and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in E. coli and to analyze the coding potential of RNAs previously annotated as "noncoding". RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode.