Long non-coding RNAs as a source of new peptides

This post is by M.Mar Albà on her preprint (with co-authors) available from arRxiv Long non-coding RNAs as a source of new peptides.

Several recent studies based on deep sequencing of ribosome protected fragments have reported that many long non-coding RNAs (lncRNAs) associate with ribosomes (see for example Everything old is new again: (linc)RNAs make proteins! a comment by Stephen M Cohen). We have analyzed the original data from experiments performed in six different eukaryotic species and confirmed that this is a widespread phenomenon. This is paradoxical because lncRNAs apparently have very little coding capacity with only short open reading frames (ORFs) that do not show sequence similarity to known proteins.

In contrast to typical mRNAs, many lncRNAs are lineage-specific. Therefore, if they are translated, they should be similar to recently evolved protein-coding genes. This is exactly what we have found. It turns out that transcripts encoding young proteins show very similar properties to lncRNAs; short and non-conserved ORFs, low coding sequence potential, and relatively weak selective constraints.

Evidence has accumulated in recent years that new protein-coding genes are continuously evolving (The continuing evolution of genes by Carl Zimmer). The birth of a new functional protein is a process of trial and error that most likely requires the expression of many transcripts that will not survive the test of time. LncRNAs seem to fit the bill for this role.

Post published in Haldane’s Sieve: read post.

Most viewed in Haldane’s Sieve May 2014: see here.

Leave a Comment

Filed under Papers

Video on the evolution of orphan genes

The video explains how new genes originate. Whereas many genes are formed by duplication of preexisting genes others arise directly from genomic regions by still poorly understood mechanisms. The latter genes are called orphan genes and represent radically new evolutionary inventions. The animation was produced by the participants of the 2013 edition of the Workshop Inform.animation at University of Alghero, funded by EU Erasmus Lifelong Learning Programme. It was inspired in the work developed at the Evolutionary Genomics Group (IMIM-UPF), and in other groups around the world, on the evolution of orphan genes.

Video

Leave a Comment

Filed under Video

Origin of orphan genes

Many new genes arise by gene duplication but others arise de novo from previously non-transcribed regions. The latter type of genes are “orphans” because they cannot be associated with any other genes by sequence similarity searches. The mechanisms by which orphan genes originate are still poorly understood.

We use comparative genomics and transcriptomics to study orphan genes in different mammalian species. We are presenting the results of our research in three meetings this month:

* I Bionformatics and Computational Biology Meeting from IEC
* GDRE-RA Comparative Genomics Meeting
* 4th Meeting of the Spanish Society of Evolutionary Biology

Update:

Exciting new research on this subject explained at The Scientist.

Leave a Comment

Filed under Meetings

Our latest paper on gene duplication and molecular adaptation has just been published in MBE

Accelerated evolution after gene duplication: a time-dependent process affecting just one copy

link to paper


In rodent gene duplicates evolutionary rate accelerates after gene duplication (A,B) but returns to the initial values after speciation of mouse and rat (A). These changes only affect the daughter copy.

Leave a Comment

Filed under Papers

PALO paper published at GBE: Improving genome-wide scans of positive selection by using protein isoforms of similar length

José Luis Villanueva-Cañas, Steve Laurie and M.Mar Albà

Large-scale evolutionary studies often require the automated construction of alignments of a large number of homologous gene families. The majority of eukaryotic genes can produce different transcripts due to alternative splicing or transcription initiation, and many such transcripts encode different protein isoforms. As analyses tend to be gene-centered, one single protein isoform per gene is selected for the alignment, with the de facto approach being to use the longest protein isoform per gene (Longest), presumably to avoid including partial sequences and to maximize sequence information. Here we show that this approach is problematic because it increases the number of indels in the alignments due to the inclusion of non-homologous regions, such as those derived from species-specific exons, increasing the number of misaligned positions. With the aim of ameliorating this problem, we have developed a novel heuristic, PALO (Protein Alignment Optimizer), which, for each gene family, selects the combination of protein isoforms that are most similar in length. We examine several evolutionary parameters inferred from alignments in which the only difference is the method used to select the protein isoform combination: Longest, PALO, the combination that results in the highest sequence conservation and a randomly selected combination. We observe that Longest tends to overestimate both non-synonymous and synonymous substitution rates when compared to PALO, which is most likely due to an excess of misaligned positions. The estimation of the fraction of genes that have experienced positive selection by maximum likelihood is very sensitive to the method of isoform selection employed, both when alignments are constructed with MAFFT and with Prank+F. Longest performs better than a random combination but still estimates up to 3 times more positively selected genes than the combination showing the highest conservation, indicating the presence of many false positives. We show that PALO can eliminate the majority of such false positives and thus that it is a more appropriate approach for large-scale analyses than Longest. A web server has been set up to facilitate the use of PALO given a user-defined set of gene families, it is available at http://evolutionarygenomics.imim.es/palo.

download advance access paper here

Leave a Comment

Filed under Papers

Article in New Scientist on orphan genes

ALL ALONE
Genes from nowhere: Orphans with a surprising story

16 January 2013 by Helen Pilcher

NOT having any family is tough. Often unappreciated and uncomfortably different, orphans have to fight to fit in and battle against the odds to realise their potential. Those who succeed, from Aristotle to Steve Jobs, sometimes change the world.

Who would have thought that our DNA plays host to a similar cast of foundlings? When biologists began sequencing genomes, they discovered that up to a third of genes in each species seemed to have no parents or family of any kind. Nevertheless, some of these “orphan genes” are high achievers, and a few even seem have played a part in the evolution of the human brain.

But where do they come from? With no obvious ancestry, it was as if these genes had appeared from nowhere, but that couldn’t be true. Everyone assumed that as we learned more, we would discover what had happened to their families. ..

Some other researchers, however, are starting to think it may be surprisingly common. A study of 270 primate orphan genes, led by M. Mar Albà and Macarena Toll-Riera of the Municipal Foundation Institute for Medical Research in Barcelona, Spain, found that only a quarter could be explained by rapid evolution after duplication (Molecular Biology and Evolution, vol 26, p 603). Instead, around 60 per cent appeared to be new. “De novo evolution is clearly a strong force – constantly generating new genes over time,” says Tautz. “It seems possible that most orphan genes have evolved through de novo evolution.”

read the full article here

Leave a Comment

Filed under Papers

New paper on the evolution of low-complexity regions in vertebrate proteins

A paper entitled “Dissecting the role of low-complexity sequences in the evolution of vertebrate proteins” by Núria Radó-Trilla and M.Mar Albà, has been accepted for publication at BMC Evolutionary Biology.
Abstract
Low-complexity regions (LCRs) in proteins are tracts that are highly enriched in one or a few amino acids. Given their high abundance, and their capacity to expand in relatively short periods of time through replication slippage, they can greatly contribute to increase protein sequence space and generate novel protein functions. However, little is known about the global impact of LCRs on protein evolution. We have traced back the evolutionary history of 2,802 LCRs from a large set of homologous protein families from H.sapiens, M.musculus, G.gallus, D.rerio and C.intestinalis. Transcriptional factors and other regulatory functions are overrepresented in proteins containing LCRs. We have found that the gain of novel LCRs is frequently associated with repeat expansion whereas the loss of LCRs is more often due to accumulation of amino acid substitutions as opposed to deletions. This dichotomy results in net protein sequence gain over time. We have detected a significant increase in the rate of accumulation of novel LCRs in the ancestral Amniota and mammalian branches, and a reduction in the chicken branch. Alanine and/or glycine-rich LCRs are overrepresented in recently emerged LCR sets from all branches, suggesting that their expansion is better tolerated than for other LCR types. LCRs enriched in positively charged amino acids show the contrary pattern, indicating an important effect of purifying selection in their maintenance. In conclusion, we have performed the first large-scale study on the evolutionary dynamics of LCRs in protein families. The study has shown that the composition of an LCR is an important determinant of its evolutionary pattern.

Leave a Comment

Filed under Papers

SMBE meeting 2012

Macarena, Steve, Núria, José Luis, Cinta, Magda and Mar will be presenting their work at the meeting of the Society for Molecular Biology and Evolution to be celebrated in Dublin June 23-26 2012.

link to the meeting: http://smbe2012.org/

Leave a Comment

Filed under Meetings

The paper “Structure and age jointly influence rates of protein evolution”, in collaboration with J.Plotkin lab, accepted in Plos Computational Biology.

ABSTRACT: What factors determine a protein’s rate of evolution are still under debate. Especially unclear is the relative role of intrinsic factors of present-day proteins versus historical factors such as protein age. Here we study the interplay of structural properties and evolutionary age, as determinants of protein evolutionary rate. We use a large set of one-to-one orthologs between human and mouse proteins, with mapped PDB structures. We report that previously observed structural correlations also hold within
each age group – including relationships between solvent accessibility, designabililty, and evolutionary rates. However, age also plays a crucial role: age modulates the relationship between solvent accessibility and rate, and younger proteins, despite of being less designable, are evolving faster than older proteins. We show that previously reported relationships between age and rate cannot be explained by structural biases among age groups. Finally, we introduce a knowledge-based potential function to study the stability of proteins through large-scale computation. We find that older proteins are more stable for their native structure, and also more robust to mutations, than younger ones. Our results underscore that several determinants, both intrinsic and historical, can interact to determine rates of protein evolution.

Leave a Comment

Filed under Papers

New paper at Journal of Virology

Medya’s work on the UL1 protein from human cytomegalovirus has been published in Journal of Virology

abstract

 

Leave a Comment

Filed under Papers