2017 |
Tomislav Domazet-Lošo, Anne-Ruxandra Carvunis, M.Mar Albà, Martin Sebastijan Šestak, Robert Bakarić, Rafik Neme, Diethard Tautz Molecular Biology and Evolution, doi: 10.1093/molbev/msw284 2017. (Abstract | Links | BibTeX | Tags: BLAST, de novo gene, Homology, Sequence Analysis) @article{Domazet-Lošo2017, title = {No evidence for phylostratigraphic bias impacting inferences on patterns of gene emergence and evolution}, author = {Tomislav Domazet-Lošo, Anne-Ruxandra Carvunis, M.Mar Albà, Martin Sebastijan Šestak, Robert Bakarić, Rafik Neme, Diethard Tautz}, url = {http://mbe.oxfordjournals.org/content/early/2017/01/10/molbev.msw284.abstract}, year = {2017}, date = {2017-01-12}, journal = {Molecular Biology and Evolution}, volume = {doi: 10.1093/molbev/msw284}, abstract = {Phylostratigraphy is a computational framework for dating the emergence of DNA and protein sequences in a phylogeny. It has been extensively applied to make inferences on patterns of genome evolution, including patterns of disease gene evolution, ontogeny and de novo gene origination. Phylostratigraphy typically relies on BLAST searches along a species tree, but new simulation studies have raised concerns about the ability of BLAST to detect remote homologues and its impact on phylostratigraphic inferences. Here, we re-assessed these simulations. We found that, even with a possible overall BLAST false negative rate between 11-15%, the large majority of sequences assigned to a recent evolutionary origin by phylostratigraphy is unaffected by technical concerns about BLAST. Where the results of the simulations did cast doubt on previously reported findings, we repeated the original analyses but now excluded all questionable sequences. The originally described patterns remained essentially unchanged. These new analyses strongly support phylostratigraphic inferences, including: genes that emerged after the origin of eukaryotes are more likely to be expressed in the ectoderm than in the endoderm or mesoderm in Drosophila, and the de novo emergence of protein-coding genes from non-genic sequences occurs through proto-gene intermediates in yeast. We conclude that BLAST is an appropriate and sufficiently sensitive tool in phylostratigraphic analysis that does not appear to introduce significant biases into evolutionary pattern inferences. }, keywords = {BLAST, de novo gene, Homology, Sequence Analysis} } Phylostratigraphy is a computational framework for dating the emergence of DNA and protein sequences in a phylogeny. It has been extensively applied to make inferences on patterns of genome evolution, including patterns of disease gene evolution, ontogeny and de novo gene origination. Phylostratigraphy typically relies on BLAST searches along a species tree, but new simulation studies have raised concerns about the ability of BLAST to detect remote homologues and its impact on phylostratigraphic inferences. Here, we re-assessed these simulations. We found that, even with a possible overall BLAST false negative rate between 11-15%, the large majority of sequences assigned to a recent evolutionary origin by phylostratigraphy is unaffected by technical concerns about BLAST. Where the results of the simulations did cast doubt on previously reported findings, we repeated the original analyses but now excluded all questionable sequences. The originally described patterns remained essentially unchanged. These new analyses strongly support phylostratigraphic inferences, including: genes that emerged after the origin of eukaryotes are more likely to be expressed in the ectoderm than in the endoderm or mesoderm in Drosophila, and the de novo emergence of protein-coding genes from non-genic sequences occurs through proto-gene intermediates in yeast. We conclude that BLAST is an appropriate and sufficiently sensitive tool in phylostratigraphic analysis that does not appear to introduce significant biases into evolutionary pattern inferences. |
2007 |
Albà, M Mar, Castresana, Jose On homology searches by protein Blast and the characterization of the age of genes. (Article) BMC evolutionary biology, 7 pp. 53, 2007, ISSN: 1471-2148. (Abstract | Links | BibTeX | Tags: Amino Acid, Animals, Computational Biology, Databases, Evolution, Genes, Humans, Molecular, Phylogeny, Protein, Sequence Analysis, Sequence Homology) @article{Alba2007, title = {On homology searches by protein Blast and the characterization of the age of genes.}, author = {Albà, M Mar and Castresana, Jose}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1855329&tool=pmcentrez&rendertype=abstract}, issn = {1471-2148}, year = {2007}, date = {2007-01-01}, journal = {BMC evolutionary biology}, volume = {7}, pages = {53}, abstract = {It has been shown in a variety of organisms, including mammals, that genes that appeared recently in evolution, for example orphan genes, evolve faster than older genes. Low functional constraints at the time of origin of novel genes may explain these results. However, this observation has been recently attributed to an artifact caused by the inability of Blast to detect the fastest genes in different eukaryotic genomes. Distinguishing between these two possible explanations would be of great importance for any studies dealing with the taxon distribution of proteins and the origin of novel genes.}, keywords = {Amino Acid, Animals, Computational Biology, Databases, Evolution, Genes, Humans, Molecular, Phylogeny, Protein, Sequence Analysis, Sequence Homology} } It has been shown in a variety of organisms, including mammals, that genes that appeared recently in evolution, for example orphan genes, evolve faster than older genes. Low functional constraints at the time of origin of novel genes may explain these results. However, this observation has been recently attributed to an artifact caused by the inability of Blast to detect the fastest genes in different eukaryotic genomes. Distinguishing between these two possible explanations would be of great importance for any studies dealing with the taxon distribution of proteins and the origin of novel genes. |
Bellora, Nicolás, Farré, Domènec, Mar Albà, M PEAKS: identification of regulatory motifs by their position in DNA sequences. (Article) Bioinformatics (Oxford, England), 23 (2), pp. 243–4, 2007, ISSN: 1367-4811. (Abstract | Links | BibTeX | Tags: Algorithms, Automated, Automated: methods, Base Sequence, Chromosome Mapping, Chromosome Mapping: methods, DNA, DNA: genetics, DNA: methods, Molecular Sequence Data, Nucleic Acid, Nucleic Acid: genetics, Pattern Recognition, Regulatory Sequences, Sequence Alignment, Sequence Alignment: methods, Sequence Analysis, Software, Transcriptional Activation, Transcriptional Activation: genetics) @article{Bellora2007a, title = {PEAKS: identification of regulatory motifs by their position in DNA sequences.}, author = {Bellora, Nicolás and Farré, Domènec and Mar Albà, M}, url = {http://www.ncbi.nlm.nih.gov/pubmed/17098773}, issn = {1367-4811}, year = {2007}, date = {2007-01-01}, journal = {Bioinformatics (Oxford, England)}, volume = {23}, number = {2}, pages = {243--4}, abstract = {Many DNA functional motifs tend to accumulate or cluster at specific gene locations. These locations can be detected, in a group of gene sequences, as high frequency 'peaks' with respect to a reference position, such as the transcription start site (TSS). We have developed a web tool for the identification of regions containing significant motif peaks. We show, by using different yeast gene datasets, that peak regions are strongly enriched in experimentally-validated motifs and contain potentially important novel motifs. AVAILABILITY: http://genomics.imim.es/peaks}, keywords = {Algorithms, Automated, Automated: methods, Base Sequence, Chromosome Mapping, Chromosome Mapping: methods, DNA, DNA: genetics, DNA: methods, Molecular Sequence Data, Nucleic Acid, Nucleic Acid: genetics, Pattern Recognition, Regulatory Sequences, Sequence Alignment, Sequence Alignment: methods, Sequence Analysis, Software, Transcriptional Activation, Transcriptional Activation: genetics} } Many DNA functional motifs tend to accumulate or cluster at specific gene locations. These locations can be detected, in a group of gene sequences, as high frequency 'peaks' with respect to a reference position, such as the transcription start site (TSS). We have developed a web tool for the identification of regions containing significant motif peaks. We show, by using different yeast gene datasets, that peak regions are strongly enriched in experimentally-validated motifs and contain potentially important novel motifs. AVAILABILITY: http://genomics.imim.es/peaks |
2006 |
Mularoni, Loris, Guigó, Roderic, Albà, M Mar Mutation patterns of amino acid tandem repeats in the human proteome. (Article) Genome biology, 7 (4), pp. R33, 2006, ISSN: 1465-6914. (Abstract | Links | BibTeX | Tags: Amino Acid, Amino Acid Substitution, Amino Acid: genetics, Codon, Expressed Sequence Tags, Genetic, Humans, Mutation, Polymorphism, Protein, Proteome, Proteome: genetics, Repetitive Sequences, Sequence Analysis) @article{Mularoni2006, title = {Mutation patterns of amino acid tandem repeats in the human proteome.}, author = {Mularoni, Loris and Guigó, Roderic and Albà, M Mar}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1557989&tool=pmcentrez&rendertype=abstract}, issn = {1465-6914}, year = {2006}, date = {2006-01-01}, journal = {Genome biology}, volume = {7}, number = {4}, pages = {R33}, abstract = {Amino acid tandem repeats are found in nearly one-fifth of human proteins. Abnormal expansion of these regions is associated with several human disorders. To gain further insight into the mutational mechanisms that operate in this type of sequence, we have analyzed a large number of mutation variants derived from human expressed sequence tags (ESTs).}, keywords = {Amino Acid, Amino Acid Substitution, Amino Acid: genetics, Codon, Expressed Sequence Tags, Genetic, Humans, Mutation, Polymorphism, Protein, Proteome, Proteome: genetics, Repetitive Sequences, Sequence Analysis} } Amino acid tandem repeats are found in nearly one-fifth of human proteins. Abnormal expansion of these regions is associated with several human disorders. To gain further insight into the mutational mechanisms that operate in this type of sequence, we have analyzed a large number of mutation variants derived from human expressed sequence tags (ESTs). |
2005 |
Albà, M Mar, Castresana, Jose Inverse relationship between evolutionary rate and age of mammalian genes. (Article) Molecular biology and evolution, 22 (3), pp. 598–606, 2005, ISSN: 0737-4038. (Abstract | Links | BibTeX | Tags: Animals, DNA, Evolution, Genome, human, Humans, Mice, Molecular, Sequence Analysis) @article{Alba2005, title = {Inverse relationship between evolutionary rate and age of mammalian genes.}, author = {Albà, M Mar and Castresana, Jose}, url = {http://www.ncbi.nlm.nih.gov/pubmed/15537804}, issn = {0737-4038}, year = {2005}, date = {2005-01-01}, journal = {Molecular biology and evolution}, volume = {22}, number = {3}, pages = {598--606}, abstract = {A large number of genes is shared by all living organisms, whereas many others are unique to some specific lineages, indicating their different times of origin. The availability of a growing number of eukaryotic genomes allows us to estimate which mammalian genes are novel genes and, approximately, when they arose. In this article, we classify human genes into four different age groups and estimate evolutionary rates in human and mouse orthologs. We show that older genes tend to evolve more slowly than newer ones; that is, proteins that arose earlier in evolution currently have a larger proportion of sites subjected to negative selection. Interestingly, this property is maintained when a fraction of the fastest-evolving genes is excluded or when only genes belonging to a given functional class are considered. One way to explain this relationship is by assuming that genes maintain their functional constraints along all their evolutionary history, but the nature of more recent evolutionary innovations is such that the functional constraints operating on them are increasingly weaker. Alternatively, our results would also be consistent with a scenario in which the functional constraints acting on a gene would not need to be constant through evolution. Instead, starting from weak functional constraints near the time of origin of a gene-as supported by mechanisms proposed for the origin of orphan genes-there would be a gradual increase in selective pressures with time, resulting in fewer accepted mutations in older versus more novel genes.}, keywords = {Animals, DNA, Evolution, Genome, human, Humans, Mice, Molecular, Sequence Analysis} } A large number of genes is shared by all living organisms, whereas many others are unique to some specific lineages, indicating their different times of origin. The availability of a growing number of eukaryotic genomes allows us to estimate which mammalian genes are novel genes and, approximately, when they arose. In this article, we classify human genes into four different age groups and estimate evolutionary rates in human and mouse orthologs. We show that older genes tend to evolve more slowly than newer ones; that is, proteins that arose earlier in evolution currently have a larger proportion of sites subjected to negative selection. Interestingly, this property is maintained when a fraction of the fastest-evolving genes is excluded or when only genes belonging to a given functional class are considered. One way to explain this relationship is by assuming that genes maintain their functional constraints along all their evolutionary history, but the nature of more recent evolutionary innovations is such that the functional constraints operating on them are increasingly weaker. Alternatively, our results would also be consistent with a scenario in which the functional constraints acting on a gene would not need to be constant through evolution. Instead, starting from weak functional constraints near the time of origin of a gene-as supported by mechanisms proposed for the origin of orphan genes-there would be a gradual increase in selective pressures with time, resulting in fewer accepted mutations in older versus more novel genes. |
2004 |
Gibbs, Richard A, Et al. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. (Article) Nature, 428 (6982), pp. 493–521, 2004, ISSN: 1476-4687. (Abstract | Links | BibTeX | Tags: Animals, Base Composition, Centromere, Centromere: genetics, Chromosomes, CpG Islands, CpG Islands: genetics, DNA, DNA Transposable Elements, DNA Transposable Elements: genetics, Evolution, Gene Duplication, Genome, Genomics, Humans, Inbred BN, Inbred BN: genetics, Introns, Introns: genetics, Male, Mammalian, Mammalian: genetics, Mice, Mitochondrial, Mitochondrial: genetics, Models, Molecular, Mutagenesis, Nucleic Acid, Nucleic Acid: genetics, Polymorphism, Rats, Regulatory Sequences, Retroelements, Retroelements: genetics, RNA, RNA Splice Sites, RNA Splice Sites: genetics, Sequence Analysis, Single Nucleotide, Single Nucleotide: genetics, Telomere, Telomere: genetics, Untranslated, Untranslated: genetics) @article{Gibbs2004, title = {Genome sequence of the Brown Norway rat yields insights into mammalian evolution.}, author = {Gibbs, Richard A and Et al.}, url = {http://www.ncbi.nlm.nih.gov/pubmed/15057822}, issn = {1476-4687}, year = {2004}, date = {2004-01-01}, journal = {Nature}, volume = {428}, number = {6982}, pages = {493--521}, abstract = {The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.}, keywords = {Animals, Base Composition, Centromere, Centromere: genetics, Chromosomes, CpG Islands, CpG Islands: genetics, DNA, DNA Transposable Elements, DNA Transposable Elements: genetics, Evolution, Gene Duplication, Genome, Genomics, Humans, Inbred BN, Inbred BN: genetics, Introns, Introns: genetics, Male, Mammalian, Mammalian: genetics, Mice, Mitochondrial, Mitochondrial: genetics, Models, Molecular, Mutagenesis, Nucleic Acid, Nucleic Acid: genetics, Polymorphism, Rats, Regulatory Sequences, Retroelements, Retroelements: genetics, RNA, RNA Splice Sites, RNA Splice Sites: genetics, Sequence Analysis, Single Nucleotide, Single Nucleotide: genetics, Telomere, Telomere: genetics, Untranslated, Untranslated: genetics} } The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution. |
2002 |
Albà, M Mar, Laskowski, Roman A, Hancock, John M Detecting cryptically simple protein sequences using the SIMPLE algorithm. (Article) Bioinformatics (Oxford, England), 18 (5), pp. 672–8, 2002, ISSN: 1367-4803. (Abstract | Links | BibTeX | Tags: Algorithms, Amino Acid, Amino Acid Sequence, Amino Acid: genetics, Databases, Genetic, Genetic Variation, Internet, Minisatellite Repeats, Minisatellite Repeats: genetics, Models, Molecular Sequence Data, Protein, Protein: methods, Proteins, Proteins: chemistry, Repetitive Sequences, Saccharomyces cerevisiae, Saccharomyces cerevisiae: genetics, Sensitivity and Specificity, Sequence Analysis, Sequence Homology, Software, Statistical) @article{Alba2002, title = {Detecting cryptically simple protein sequences using the SIMPLE algorithm.}, author = {Albà, M Mar and Laskowski, Roman A and Hancock, John M}, url = {http://www.ncbi.nlm.nih.gov/pubmed/12050063}, issn = {1367-4803}, year = {2002}, date = {2002-01-01}, journal = {Bioinformatics (Oxford, England)}, volume = {18}, number = {5}, pages = {672--8}, abstract = {Low-complexity or cryptically simple sequences are widespread in protein sequences but their evolution and function are poorly understood. To date methods for the detection of low complexity in proteins have been directed towards the filtering of such regions prior to sequence homology searches but not to the analysis of the regions per se. However, many of these regions are encoded by non-repetitive DNA sequences and may therefore result from selection acting on protein structure and/or function.}, keywords = {Algorithms, Amino Acid, Amino Acid Sequence, Amino Acid: genetics, Databases, Genetic, Genetic Variation, Internet, Minisatellite Repeats, Minisatellite Repeats: genetics, Models, Molecular Sequence Data, Protein, Protein: methods, Proteins, Proteins: chemistry, Repetitive Sequences, Saccharomyces cerevisiae, Saccharomyces cerevisiae: genetics, Sensitivity and Specificity, Sequence Analysis, Sequence Homology, Software, Statistical} } Low-complexity or cryptically simple sequences are widespread in protein sequences but their evolution and function are poorly understood. To date methods for the detection of low complexity in proteins have been directed towards the filtering of such regions prior to sequence homology searches but not to the analysis of the regions per se. However, many of these regions are encoded by non-repetitive DNA sequences and may therefore result from selection acting on protein structure and/or function. |
Publication List
2017 |
Molecular Biology and Evolution, doi: 10.1093/molbev/msw284 2017. |
2007 |
On homology searches by protein Blast and the characterization of the age of genes. (Article) BMC evolutionary biology, 7 pp. 53, 2007, ISSN: 1471-2148. |
PEAKS: identification of regulatory motifs by their position in DNA sequences. (Article) Bioinformatics (Oxford, England), 23 (2), pp. 243–4, 2007, ISSN: 1367-4811. |
2006 |
Mutation patterns of amino acid tandem repeats in the human proteome. (Article) Genome biology, 7 (4), pp. R33, 2006, ISSN: 1465-6914. |
2005 |
Inverse relationship between evolutionary rate and age of mammalian genes. (Article) Molecular biology and evolution, 22 (3), pp. 598–606, 2005, ISSN: 0737-4038. |
2004 |
Genome sequence of the Brown Norway rat yields insights into mammalian evolution. (Article) Nature, 428 (6982), pp. 493–521, 2004, ISSN: 1476-4687. |
2002 |
Detecting cryptically simple protein sequences using the SIMPLE algorithm. (Article) Bioinformatics (Oxford, England), 18 (5), pp. 672–8, 2002, ISSN: 1367-4803. |