This month a paper that investigates the power of sequence similarity searches by BLAST to classify genes into different age classes (phylostratigraphy), Phylostratigraphic bias creates spurious patterns of genome evolution (Moyers and Zhang, Uni Michigan) states that the method substantially underestimates gene age for a considerable fraction of genes and creates spurious and unpredictable patterns. Ummh.. how does this affect previous studies?
I am not new at this. The study here is very similar to one we conducted in 2007, On homology searches by protein Blast and the characterization of the age of genes. We found that the lack of sensitivity of BLAST only affected a small percentage of proteins (4.7%) and that it did not invalidate the previously reported finding that recently emerged genes evolve more rapidly than older ones (Alba and Castresana, 2005).
So? Are the results of this study different from those back then? Well, not much really. The authors of the present paper find that in 13.85% of the cases a homolog of the protein (Drosophila) was not detected in the most distant taxa (Bacteria). As in our study we did not consider Bacteria but Eukaryota (Fungi, Plants) as the most distant taxa, the equivalent figure here is about 9% (from Figure 5). The underestimation of the age mainly affects distant comparisons (>500 Mya). And again, the patterns obtained with the simulated data do not recapitulate the observations with real data.
After this publication it is even clearer that the large number of recently originated genes that are being detected in many species cannot be explained by problems of BLAST but it is a genuine pattern. Which is the role of these genes in the generation of intra-specific variability and the evolution of new biological traits? Back to work.