Translation of neutrally evolving peptides a basis for de novo gene evolution – a short history

Our new paper “Translation of neutrally evolving peptides provides a basis for de novo gene evolution” has been published in Nature Ecology and Evolution on March 19 2018.

During the course of evolution, some genes are gained and others are lost. A well-established mechanism for the emergence of new genes is gene duplication. However, there is increasing evidence that some genes have not originated by gene duplication but de novo from previously non-coding regions of the genome. 

The two processes can be distinguished using sequence comparisons of closely related species. In gene duplication, the new gene retains sequence similarity to the other gene copy. In contrast, genes evolved de novo show no sequence similarity to other genes. In both cases, new genes initially appear by accident. A fraction of these genes will turn out to be beneficial and be subsequently maintained by natural selection.

My interest in new genes started more than fifteen years ago. At that time, I was building a database of herpesvirus protein families at University College London. When I tried to cluster the proteins into families, some would just not cluster. These proteins has unique sequences, they did not resemble any other viral or host protein, yet they performed essential functions. Improbable as it seemed, they had to have originated from DNA sequences other than genes.

Back in Barcelona I teamed up with Jose Castresana to study gene evolution in mammals. In a paper published in 2005 we described many human and mouse proteins that lacked homologues in non-mammalian species. Following the current thinking at the time we proposed that many of them could have been generated by very rapid evolution after gene duplication. However, we also argued that it was possible that some of them had evolved de novo. The reason was that the coding sequences of the young genes were unusually small and this is something one expects for randomly occurring open reading frames but not for functional gene duplicates. Then, Macarena Toll-Riera joined the lab as a PhD student and we decided to revisit this question. With more genomes at hand, the hypothesis of de novo gene birth gained strength. The results were published in 2009 in a paper entitled Origin of primate orphan genes: a comparative genomics approach.

Things became exciting again when Nicholas Ingolia and co-workers reported, in 2011, widespread translation of the mouse transcriptome, including many transcripts previously believed to be non-coding. Jorge Ruiz-Orera, a new PhD in the lab, examined ribosome profiling data from different species and found clear support for the pervasive translation of the transcriptome.

In the present study we have found that an important fraction of the translated peptides show no evolutionary conservation and evolve under no constraints. These peptides can be “tested” for new functions and eventually become new functional proteins, providing a basis for de novo gene evolution. More details of this study can be found here and in the Nature Ecology and Evolution community blog.

Mar Albà

Science at the Youth Mobile Festival 2018

The Youth Mobile (YoMo) festival is organized by GSMA Mobile World Congress to promote STEAM areas (Science, Technology, Engineering, Art and Mathematics) among young people. The different Centers of the PRBB could be found in the stand BioJuniors. Several members from our group were there on March 1st; they explained how automatic classification algorithms can distinguish between different dog breeds.

How dwarf lemurs survive the dry season in Madagascar

When one thinks about hibernation images of grizzly bears waking up after a long period of inaction come to mind. However, this is a more common physiological adaptation than usually thought, and many mammals hibernate, including diverse species of rats, bats, squirrels and hedgehogs. Some years ago, hibernation was also discovered in lemurs, the closest group to humans known to hibernate. The type of lemurs that hibernate are fat-tailed dwarf lemurs, which use this adaptation to endure the dry season in Madagascar.

Just before hibernation, in the so called fattening period, dwarf lemurs accumulate fat in their tails, which become very thick. This is the fuel that will allow them to survive during the several months they will spent buried in holes.

We wanted to know more about the molecular changes that took place during hibernation in lemurs. Sheena Faherty and collaborators from Duke University visited Madagascar and collected small amounts of fat tissue from the tails of the lemurs, before, during and after the hibernation period. We reconstructed the complete transcriptome from the RNA in the samples and investigated changes in gene expression during hibernation. We could detect a switch from fat storage to fat degradation, as well as inhibition of mitochondrial functions and increased protection against oxidative stress. Until recently, we knew very little about dwarf lemurs at the molecular level. Challenging as it was, we enjoyed diving into a complete new world.

Mar Albà

The results of this study have been published in Faherty SL*, Villanueva-Cañas JL*, Blanco MB, Albà MM, Yoder AD. Transcriptomics in the wild: Hibernation physiology in free-ranging dwarf lemurs. Molecular Ecology, 29 January 2018.
link to Article

New genes and functional innovation in mammals

Many human genes have counterparts in distant species such as plants or bacteria. This is because they share a common origin, they were invented a long time ago in a primitive cell. However, there are some genes that do not have counterparts in other species, or only in a few of them. These genes have been born much more recently. Although they may have appeared by accident, some have acquired useful functions and been preserved by natural selection. We have recently compiled thousands of mammalian-specific gene families and asked which functions they perform. We have found an enrichment in proteins from the immune system, milk, skin and the germ cells. The most recent genes, however, are rarely functionally characterized. The results of this work provide new insights into how new genes originate and what they are selected for.
Read our paper at bioRxiv and tell us what you think!
See the final paper publication in Genome Biology and Evolution. News at IMIM here.

Gene families restricted to mammals

The numbers in the nodes of the tree indicate the number of gene families identified.

Our group portrayed at El.lipse

Nov 2016

Pervasive translation of lncRNAS

Ribosome profiling is a sequencing tecnique that detects regions in mRNAs that are being translated. Using this technique, researchers have observed mysterious patterns of translation in many transcripts believed to be non-coding (lncRNAs, or long non-coding RNAs). The patterns are very similar to those observed in protein-coding genes but the translated proteins are generally smaller. Aside from their sequence, we know nothing about these peptides. Are they functional? Do they reflect some background noise of the translation machinery?

In a recent study published in bioRxiv we have investigated the signatures of selection in proteins translated from lncRNAs, using phylogenetic conservation and single nucleotide polymorphism (SNP) data. We have found that hundreds of mouse lncRNAs produce short functional proteins and thus should be considered protein coding genes. However, the largest part of translated lncRNAs appears to correspond to non-functional peptides. We conclude that, translation, like transcription, is pervasive. Due to this activity many peptides can be tested for new functions, facilitating the birth of new genes de novo.

This preprint was selected by the NODE (July 2016). It has also appeared at redcedar PRBB blog. The work was presented at XXI Evolution and Population Genetics Seminar Oct 3-5 2016 Sitges (Barcelona).

Gene regulation in a hibernating primate

We have published the first study on the molecular processes underlying primate hibernation. The study is the result of a collaboration between researchers at IMIM (Hospital del Mar Medical Research Institute, Barcelona) and at Duke University and Duke Lemur Center (Durham,USA). The work is based on the fat-tailed dwarf lemur (Cheirogaleus medius), an extraordinary primate that is capable of enduring torpor (hibernation) for several months, subsisting only on the lipids stored in its tail. The project has used high throughput RNA sequencing (RNAseq) data to learn about the changes in gene expression in white adipose tissue during hibernation.

Reference: Faherty, S., Villanueva-Cañas, J.L. et al. Genome Biology and Evolution 2016

Related links:
IMIM press release
Duke Lemur Center
Sheena’s web page
Scientific American
El Periódico

Our group at Saló de l’Ensenyament (Education Fair)

How can we analyze genomes? What is junk DNA? Why is bioinformatics useful? Today, members from our group have been trying to explain these questions to the visitors of the Education Fair. The stand included a very realistic piece of “recycled” DNA and 3D printed protein structures.

IMIM-Bionformatics at GRIB
Saló de L’Ensenyament, Barcelona

When we fail to detect homologues in other species, is it because they are too divergent or because they do not exist?

The increasing number of genomes available has made it possible to compare the genes and determine in which branch of the phylogenetic tree they are likely to have originated. This has led to the identification of many genes that are species or lineage-specific. As they have no homologues in other species they must have originated from previously non-genic parts of the genome, or de novo. However, some researchers have claimed that errors in the detection of homologues by sequence similarity search methods, such as BLAST, may largely explain this. One way to assess how many genes are missed in these searches is to perform sequence evolution simulations along a phylogenetic tree and then use BLAST to recover the homologues (Albà and Castresana, 2007). If we fail to detect them we can say we have a sensitivity problem. This will result in a percentage of the genes being misclassified in younger classes.

The simulations performed to date have all indicated that the percentage of error for proteins is relatively small (4.7% to 13.85%) even at long distances (from mammals to fungi or plants). As expected, the problem is worse for distant comparisons than for closer ones. For example human and macaque, separated some 24 Millions of years ago, only display 6 substitutions every 100 nucleotides. Lack of BLAST sensitivity is not going to be a problem for these species even when comparing neutrally evolving sequences. For more distant comparisons it depends on whether the sequence is under selection or not. Proteins tend to contain motifs that are highly conserved and for this reason BLAST works reasonably well even at long distances. The results of the simulations support the idea that many genes are likely to have originated recently. For example only 14 S.cerevisiae proteins would fail to find homologues in S.paradoxus or S.mikatae due to BLAST errors (Moyers and Zhang, 2016). Although this is interpreted by the authors of the paper as problematic, the strong contrast with the observed data (445 genes restricted to these species in Carvunis et al.,2012) supports the notion that new genes are continuously emerging.

Mar Albà
Update: A reply to Moyers & Zhang has been published in bioRxiv No evidence for phylostratigraphic bias impacting inferences on patterns of gene emergence and evolution

“Origins of de novo genes in human and chimpanzee” published in Plos Genetics

Novel genes are continuously emerging during evolution, but what drives this process? We have published a study in PLOS Genetics in which we find that the fortuitous appearance of certain combinations of elements in the genome can lead to the generation of new genes. The work, Origins of de novo genes in human and chimpanzee, is very similar to the one we published in arXiv some months ago. It includes some improvements resulting from the peer-review process and from having had more time to think about the paper.

In every genome, there are sets of genes, which are unique to that particular species. In this study, we first identified thousands of genes that were specific to human or chimpanzee. Then, we searched the macaque genome and discovered that this species had significantly less element motifs in the corresponding genomic sequences. These motifs are recognized by proteins that activate gene expression, a necessary step in the formation of a new gene.

The formation of genes de novo from previously non-active parts of the genome was, until recently, considered highly improbable. This study has shown that the mutations that occur normally in our genetic material may be sufficient to explain how this happens. Once expressed, the genes can act as a substrate for the evolution of new molecular functions. This study identified several candidate human proteins that bear no resemblance to any other known protein but which contain signatures of purifying selection.


Jorge Ruiz-Orera, Jessica Hernandez-Rodriguez, Cristina Chiva, Eduard Sabidó, Ivanela Kondova, Ronald Bontrop, Tomàs Marqués-Bonet, M.Mar Albà. Origins of De Novo Genes in Human and Chimpanzee. PLOS Genetics, 2015; 11 (12): e1005721.

