Completely new protein sequences in genomes can arise by gene duplication or de novo. How does the mechanism of origination influence the fate of the proteins? Do duplicated proteins tend to be retained at higher rates than de novo proteins? And, more generally, in which ways are these two types of proteins similar to each other (or different)? We investigate these questions in a new paper published in Molecular Biology and Evolution.
Using data from proteomes of yeasts and flies we infer that both types of new proteins are particularly abundant at the species-specific level, with numbers rapidly going down when we look at branches connecting several species. This implies that many new proteins probably operate during a relatively short period of time. Consequently, the phylogenetically conserved proteome probably represents only a small part of the complete set of proteins existing at any given time.
We also find that newly arisen protein show low sequence constraints, and that this applies to proteins born by either of the two mechanisms. Proteins with a likely de novo origin, however, tend to be much smaller and, initially, they are often positively charged. The latest trait tends to fade away over time, as mutations that favor substitutions into negatively charged amino acids accumulate.
Link to advanced access manuscript.