Cova Vara and Chris Papadopoulos have presented their research in this week’s Symposium on de novo genes celebrated in Texas A&M University (College Station, Nov 6-9 2023). The research aims to explore the evolution of new genes in populations and has been funded by an ERC Advanced Grant (NovoGenePop 2022-2027). If you are interested in the topic or in joining the group do not hesitate to contact us!
Completely new protein sequences in genomes can arise by gene duplication or de novo. How does the mechanism of origination influence the fate of the proteins? Do duplicated proteins tend to be retained at higher rates than de novo proteins? And, more generally, in which ways are these two types of proteins similar to each other (or different)? We investigate these questions in a new paper published in Molecular Biology and Evolution.
Using data from proteomes of yeasts and flies we infer that both types of new proteins are particularly abundant at the species-specific level, with numbers rapidly going down when we look at branches connecting several species. This implies that many new proteins probably operate during a relatively short period of time. Consequently, the phylogenetically conserved proteome probably represents only a small part of the complete set of proteins existing at any given time.
We also find that newly arisen protein show low sequence constraints, and that this applies to proteins born by either of the two mechanisms. Proteins with a likely de novo origin, however, tend to be much smaller and, initially, they are often positively charged. The latest trait tends to fade away over time, as mutations that favor substitutions into negatively charged amino acids accumulate.
Link to advanced access manuscript.
Proteins restricted to a given species or lineage are mysterious. Many of them have emerged de novo from ancestral non-coding genomic regions rather than from pre-existing genes. A new study by Vakirlis et al. shows that a large portion of the human de novo originated proteins are associated with phenotypic effects, accelerating our understanding on the functional importance of this novel class of proteins.
See the full commentary here.
Papadopoulos C, Albà MM. Newly evolved genes in the human lineage are functional. Trends Genet. 2023 Apr;39(4):235-236.
Our ERC Advanced Grant NovoGenePop has just started! This means we can already recruits scientists and start to gather data. The project will investigate how new genes arise in closely related species and populations. This will involve the development of computational methods to integrate large amounts of transcriptomics and translatomics data. We will be asking questions such as: which is the rate of formation of new genes from non-coding genomic regions (i.e. de novo)?, how can we assess the influence of selection in the emergence of new genes (noise versus functional)?, what drives the translation of initially silent open reading frames? etc.. etc. (your questions here).
In our earliest works (2005!) we started comparing the genomes of different eukaryotic species, asking ourselves why there were so many species- and lineage-specific genes (check this Nature News Feature). Over the years, and thanks to the research done by many groups around the world, we have obtained plenty of evidence for the formation of new genes de novo. It is less clear though which mutational processes drive the emergence of these genes and how they may impact differences in fitness across individuals. We expect this project will help advance these questions and perhaps others that will come along the way.
Fascinated by the potential of long read technologies to make a difference in our knowldege of the transcriptome, two years ago we started generating Nanopore dRNA data for different yeast species, including the yeast S. pombe. This organism is ideal to study alternative splicing because a large proportion of the genes have introns yet the number of introns, and their size, is quite small, meaning that we can recover many full length transcripts.
The study has resulted in the detection of hundreds of alternative splice isoforms, some of them at unexpectedly high frequencies. We have also found that lowly expressed transcripts, and a subset of the intron retention isoforms, tend to have much longer than average poly(A) lengths. In addition, some of the splice isoforms can potentially encode new proteins according to Ribo-seq data. Lots of fun and a new collaboration with our pombe neighbors at the PRBB!
The paper is now published in Genome Research, you can access it here!
We participate in a a new world-wide initiative for the large-scale annotation of small ORF translation events detected by ribosome profiling in the human genome. The initiative, led by researchers at Ensembl, Max Delbrück Center and Broad Institute, among others, provides a first list of 7,264 new translated ORFs, including many ORFs in long non-coding RNAs, as well as upstream ORFs (uORFs) in coding transcripts.
You can read the complete post here.
Preprint: Mudge, Ruiz-Orera, Prensner et al. A community-driven roadmap to advance research on translated open reading frames detected by Ribo-seq. bioRxiv June 10, 2021.
And the paper in Nature Biotechnology!
If interested, read also this comment in Science.
Will Blevins explains in an article in Ellipse his experience doing a PhD in the lab. The first challenge was culturing different yeast species, and isolating the RNA, in Lucas Carey’s lab. Then Will had to built a de novo transcript assembly pipeline that would allow us to recover novel transcripts in a reliable manner, the use of spike-ins – a set of RNAs of known concentration – was key for this. Also important was to be able to do ribosome profiling experiments in the same conditions as the RNA-Seq, thanks to a collaboration with Juana Díez’s lab. This was followed by multitude of analyses to make sense of the data and finally.. the paper in Nature Communications!
Article in Ellipse: Uncovering de novo gene birth in yeast using deep transcriptomics.
The current view of an mRNA is that of a central coding sequence (CDS) flanked by 5′ and 3′ untranslated regions (UTRs). But often UTRs contain open reading frames which, as revealed by ribosome profiling, can also be translated. The effect of these upstream and downstream ORFs (uORFs and dORFs) on the translation of the CDS, or on the production of micropeptides, is still largely unknown.
We have investigated uORF translation in yeast using ribosome profiling data from three different studies in which oxidative stress or starvation conditions were induced. During stress there is a general arrest of CDS translation. But surprisingly, we observe that uORF translation is much less affected, with the vast majority of genes showing an increase in the uORF to CDS translation ratio. Only in a specific subset of mRNAs this goes in the other direction; such regulatory uORFs decrease their translation during stress, permitting the efficient translation of the downstream CDS. The question remains as to the consequences of the increase translation of uORFs during stress, potentially generating hundreds of yet uncharacterized micropeptides. Are these small proteins of functional significance? And if so, how do they protect the cells from stress? New questions that will stimulate more research.
The article has been published in BMC Molecular Cell Biology:
Simone G. Moro, Cedric Hermans, Jorge Ruiz-Orera, M.Mar Albà. Impact of uORFs in mediating regulation of translation in stress conditions. BMC Mol Cell Biol 22, Article number: 29 (2021).