2011 |
Toll-Riera, Macarena, Laurie, Steve, Albà, M Mar Lineage-specific variation in intensity of natural selection in mammals. (Article) Molecular biology and evolution, 28 (1), pp. 383–98, 2011, ISSN: 1537-1719. (Abstract | Links | BibTeX | Tags: Amino Acid Sequence, Amino Acid Substitution, Animals, Evolution, F-Box Proteins, F-Box Proteins: genetics, G-Protein-Coupled, G-Protein-Coupled: genetics, Genetic, Genetic Variation, Humans, Mammals, Mammals: genetics, Molecular, Molecular Sequence Data, N-Methyl-D-Aspartate, N-Methyl-D-Aspartate: genetics, Odorant, Odorant: genetics, Receptors, Selection, Sequence Alignment) @article{Toll-Riera2011a, title = {Lineage-specific variation in intensity of natural selection in mammals.}, author = {Toll-Riera, Macarena and Laurie, Steve and Albà, M Mar}, url = {http://www.ncbi.nlm.nih.gov/pubmed/20688808}, issn = {1537-1719}, year = {2011}, date = {2011-01-01}, journal = {Molecular biology and evolution}, volume = {28}, number = {1}, pages = {383--98}, abstract = {The molecular clock hypothesis states that protein-coding genes evolve at an approximately constant rate. However, this is only expected to be true as long as the function and the tertiary structure of the molecule remain unaltered. An important implication of this statement is that significant deviations in the rate of evolution of a gene with respect to the species clock are likely to reflect functional and/or structural alterations. Here, we present a method to identify such deviations and apply it to a data set of 2,929 high-quality coding sequence alignments corresponding to one-to-one orthologous genes from six mammalian species--human, macaque, mouse, rat, cow, and dog. Deviated branches are defined as those that present significant alterations in both the rate of nonsynonymous substitutions (dN) and the selective pressure (dN/dS). Strikingly, we find that as many as 24.5% of the genes show branch-specific deviations in dN and dN/dS, though this is a relatively well-conserved set of genes. Around half of these genes show branch-specific acceleration of evolutionary rates. Positive selection (PS) tests based on divergence data only identify 17.7% of the accelerated branches. Failure to identify PS in accelerated branches with an excess of radical amino acid replacements suggests that these tests are conservative. Interestingly, genes with accelerated branches are significantly enriched in neural proteins, indicating that this type of protein might play a more important role than previously thought in species diversification, although they are generally not detected by PS tests. We discuss in detail several examples of genes that show lineage-specific evolutionary rate acceleration and are involved in synaptic transmission, chemosensory perception, and ubiquitination.}, keywords = {Amino Acid Sequence, Amino Acid Substitution, Animals, Evolution, F-Box Proteins, F-Box Proteins: genetics, G-Protein-Coupled, G-Protein-Coupled: genetics, Genetic, Genetic Variation, Humans, Mammals, Mammals: genetics, Molecular, Molecular Sequence Data, N-Methyl-D-Aspartate, N-Methyl-D-Aspartate: genetics, Odorant, Odorant: genetics, Receptors, Selection, Sequence Alignment} } The molecular clock hypothesis states that protein-coding genes evolve at an approximately constant rate. However, this is only expected to be true as long as the function and the tertiary structure of the molecule remain unaltered. An important implication of this statement is that significant deviations in the rate of evolution of a gene with respect to the species clock are likely to reflect functional and/or structural alterations. Here, we present a method to identify such deviations and apply it to a data set of 2,929 high-quality coding sequence alignments corresponding to one-to-one orthologous genes from six mammalian species--human, macaque, mouse, rat, cow, and dog. Deviated branches are defined as those that present significant alterations in both the rate of nonsynonymous substitutions (dN) and the selective pressure (dN/dS). Strikingly, we find that as many as 24.5% of the genes show branch-specific deviations in dN and dN/dS, though this is a relatively well-conserved set of genes. Around half of these genes show branch-specific acceleration of evolutionary rates. Positive selection (PS) tests based on divergence data only identify 17.7% of the accelerated branches. Failure to identify PS in accelerated branches with an excess of radical amino acid replacements suggests that these tests are conservative. Interestingly, genes with accelerated branches are significantly enriched in neural proteins, indicating that this type of protein might play a more important role than previously thought in species diversification, although they are generally not detected by PS tests. We discuss in detail several examples of genes that show lineage-specific evolutionary rate acceleration and are involved in synaptic transmission, chemosensory perception, and ubiquitination. |
2010 |
Mularoni, Loris, Ledda, Alice, Toll-Riera, Macarena, Albà, M Mar Natural selection drives the accumulation of amino acid tandem repeats in human proteins. (Article) Genome research, 20 (6), pp. 745–54, 2010, ISSN: 1549-5469. (Abstract | Links | BibTeX | Tags: Amino Acid, Amino Acid Sequence, Amino Acids, Amino Acids: chemistry, Amino Acids: genetics, Animals, Genetic, Humans, Molecular Sequence Data, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences, Selection, Sequence Homology) @article{Mularoni2010, title = {Natural selection drives the accumulation of amino acid tandem repeats in human proteins.}, author = {Mularoni, Loris and Ledda, Alice and Toll-Riera, Macarena and Albà, M Mar}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2877571&tool=pmcentrez&rendertype=abstract}, issn = {1549-5469}, year = {2010}, date = {2010-01-01}, journal = {Genome research}, volume = {20}, number = {6}, pages = {745--54}, abstract = {Amino acid tandem repeats are found in a large number of eukaryotic proteins. They are often encoded by trinucleotide repeats and exhibit high intra- and interspecies size variability due to the high mutation rate associated with replication slippage. The extent to which natural selection is important in shaping amino acid repeat evolution is a matter of debate. On one hand, their high frequency may simply reflect their high probability of expansion by slippage, and they could essentially evolve in a neutral manner. On the other hand, there is experimental evidence that changes in repeat size can influence protein-protein interactions, transcriptional activity, or protein subcellular localization, indicating that repeats could be functionally relevant and thus shaped by selection. To gauge the relative contribution of neutral and selective forces in amino acid repeat evolution, we have performed a comparative analysis of amino acid repeat conservation in a large set of orthologous proteins from 12 vertebrate species. As a neutral model of repeat evolution we have used sequences with the same DNA triplet composition as the coding sequences--and thus expected to be subject to the same mutational forces--but located in syntenic noncoding genomic regions. The results strongly indicate that selection has played a more important role than previously suspected in amino acid tandem repeat evolution, by increasing the repeat retention rate and by modulating repeat size. The data obtained in this study have allowed us to identify a set of 92 repeats that are postulated to play important functional roles due to their strong selective signature, including five cases with direct experimental evidence.}, keywords = {Amino Acid, Amino Acid Sequence, Amino Acids, Amino Acids: chemistry, Amino Acids: genetics, Animals, Genetic, Humans, Molecular Sequence Data, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences, Selection, Sequence Homology} } Amino acid tandem repeats are found in a large number of eukaryotic proteins. They are often encoded by trinucleotide repeats and exhibit high intra- and interspecies size variability due to the high mutation rate associated with replication slippage. The extent to which natural selection is important in shaping amino acid repeat evolution is a matter of debate. On one hand, their high frequency may simply reflect their high probability of expansion by slippage, and they could essentially evolve in a neutral manner. On the other hand, there is experimental evidence that changes in repeat size can influence protein-protein interactions, transcriptional activity, or protein subcellular localization, indicating that repeats could be functionally relevant and thus shaped by selection. To gauge the relative contribution of neutral and selective forces in amino acid repeat evolution, we have performed a comparative analysis of amino acid repeat conservation in a large set of orthologous proteins from 12 vertebrate species. As a neutral model of repeat evolution we have used sequences with the same DNA triplet composition as the coding sequences--and thus expected to be subject to the same mutational forces--but located in syntenic noncoding genomic regions. The results strongly indicate that selection has played a more important role than previously suspected in amino acid tandem repeat evolution, by increasing the repeat retention rate and by modulating repeat size. The data obtained in this study have allowed us to identify a set of 92 repeats that are postulated to play important functional roles due to their strong selective signature, including five cases with direct experimental evidence. |
2009 |
Salichs, Eulàlia, Ledda, Alice, Mularoni, Loris, Albà, M Mar, de la Luna, Susana PLoS genetics, 5 (3), pp. e1000397, 2009, ISSN: 1553-7404. (Abstract | Links | BibTeX | Tags: Amino Acids, Cell Line, Cell Nucleus, Cell Nucleus: chemistry, Cell Nucleus: genetics, Cell Nucleus: metabolism, Genome, Histidine, Histidine: chemistry, Histidine: genetics, Histidine: metabolism, human, Humans, Molecular Sequence Data, Nuclear Localization Signals, Nuclear Proteins, Nuclear Proteins: chemistry, Nuclear Proteins: genetics, Nuclear Proteins: metabolism, Protein Transport, Proteins, Proteins: chemistry, Proteins: genetics, Proteins: metabolism, Sequence Alignment, Tandem Repeat Sequences) @article{Salichs2009, title = {Genome-wide analysis of histidine repeats reveals their role in the localization of human proteins to the nuclear speckles compartment.}, author = {Salichs, Eulàlia and Ledda, Alice and Mularoni, Loris and Albà, M Mar and de la Luna, Susana}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2644819&tool=pmcentrez&rendertype=abstract}, issn = {1553-7404}, year = {2009}, date = {2009-01-01}, journal = {PLoS genetics}, volume = {5}, number = {3}, pages = {e1000397}, abstract = {Single amino acid repeats are prevalent in eukaryote organisms, although the role of many such sequences is still poorly understood. We have performed a comprehensive analysis of the proteins containing homopolymeric histidine tracts in the human genome and identified 86 human proteins that contain stretches of five or more histidines. Most of them are endowed with DNA- and RNA-related functions, and, in addition, there is an overrepresentation of proteins expressed in the brain and/or nervous system development. An analysis of their subcellular localization shows that 15 of the 22 nuclear proteins identified accumulate in the nuclear subcompartment known as nuclear speckles. This localization is lost when the histidine repeat is deleted, and significantly, closely related paralogous proteins without histidine repeats also fail to localize to nuclear speckles. Hence, the histidine tract appears to be directly involved in targeting proteins to this compartment. The removal of DNA-binding domains or treatment with RNA polymerase II inhibitors induces the re-localization of several polyhistidine-containing proteins from the nucleoplasm to nuclear speckles. These findings highlight the dynamic relationship between sites of transcription and nuclear speckles. Therefore, we define the histidine repeats as a novel targeting signal for nuclear speckles, and we suggest that these repeats are a way of generating evolutionary diversification in gene duplicates. These data contribute to our better understanding of the physiological role of single amino acid repeats in proteins.}, keywords = {Amino Acids, Cell Line, Cell Nucleus, Cell Nucleus: chemistry, Cell Nucleus: genetics, Cell Nucleus: metabolism, Genome, Histidine, Histidine: chemistry, Histidine: genetics, Histidine: metabolism, human, Humans, Molecular Sequence Data, Nuclear Localization Signals, Nuclear Proteins, Nuclear Proteins: chemistry, Nuclear Proteins: genetics, Nuclear Proteins: metabolism, Protein Transport, Proteins, Proteins: chemistry, Proteins: genetics, Proteins: metabolism, Sequence Alignment, Tandem Repeat Sequences} } Single amino acid repeats are prevalent in eukaryote organisms, although the role of many such sequences is still poorly understood. We have performed a comprehensive analysis of the proteins containing homopolymeric histidine tracts in the human genome and identified 86 human proteins that contain stretches of five or more histidines. Most of them are endowed with DNA- and RNA-related functions, and, in addition, there is an overrepresentation of proteins expressed in the brain and/or nervous system development. An analysis of their subcellular localization shows that 15 of the 22 nuclear proteins identified accumulate in the nuclear subcompartment known as nuclear speckles. This localization is lost when the histidine repeat is deleted, and significantly, closely related paralogous proteins without histidine repeats also fail to localize to nuclear speckles. Hence, the histidine tract appears to be directly involved in targeting proteins to this compartment. The removal of DNA-binding domains or treatment with RNA polymerase II inhibitors induces the re-localization of several polyhistidine-containing proteins from the nucleoplasm to nuclear speckles. These findings highlight the dynamic relationship between sites of transcription and nuclear speckles. Therefore, we define the histidine repeats as a novel targeting signal for nuclear speckles, and we suggest that these repeats are a way of generating evolutionary diversification in gene duplicates. These data contribute to our better understanding of the physiological role of single amino acid repeats in proteins. |
2007 |
Farré, Domènec, Bellora, Nicolás, Mularoni, Loris, Messeguer, Xavier, Albà, M Mar Housekeeping genes tend to show reduced upstream sequence conservation. (Article) Genome biology, 8 (7), pp. R140, 2007, ISSN: 1465-6914. (Abstract | Links | BibTeX | Tags: Animals, Base Sequence, Conserved Sequence, CpG Islands, Evolution, Gene Expression, Genetic, Genetic Variation, Humans, Mice, Molecular, Molecular Sequence Data, Promoter Regions) @article{Farre2007, title = {Housekeeping genes tend to show reduced upstream sequence conservation.}, author = {Farré, Domènec and Bellora, Nicolás and Mularoni, Loris and Messeguer, Xavier and Albà, M Mar}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2323216&tool=pmcentrez&rendertype=abstract}, issn = {1465-6914}, year = {2007}, date = {2007-01-01}, journal = {Genome biology}, volume = {8}, number = {7}, pages = {R140}, abstract = {Understanding the constraints that operate in mammalian gene promoter sequences is of key importance to understand the evolution of gene regulatory networks. The level of promoter conservation varies greatly across orthologous genes, denoting differences in the strength of the evolutionary constraints. Here we test the hypothesis that the number of tissues in which a gene is expressed is related in a significant manner to the extent of promoter sequence conservation.}, keywords = {Animals, Base Sequence, Conserved Sequence, CpG Islands, Evolution, Gene Expression, Genetic, Genetic Variation, Humans, Mice, Molecular, Molecular Sequence Data, Promoter Regions} } Understanding the constraints that operate in mammalian gene promoter sequences is of key importance to understand the evolution of gene regulatory networks. The level of promoter conservation varies greatly across orthologous genes, denoting differences in the strength of the evolutionary constraints. Here we test the hypothesis that the number of tissues in which a gene is expressed is related in a significant manner to the extent of promoter sequence conservation. |
Bellora, Nicolás, Farré, Domènec, Mar Albà, M PEAKS: identification of regulatory motifs by their position in DNA sequences. (Article) Bioinformatics (Oxford, England), 23 (2), pp. 243–4, 2007, ISSN: 1367-4811. (Abstract | Links | BibTeX | Tags: Algorithms, Automated, Automated: methods, Base Sequence, Chromosome Mapping, Chromosome Mapping: methods, DNA, DNA: genetics, DNA: methods, Molecular Sequence Data, Nucleic Acid, Nucleic Acid: genetics, Pattern Recognition, Regulatory Sequences, Sequence Alignment, Sequence Alignment: methods, Sequence Analysis, Software, Transcriptional Activation, Transcriptional Activation: genetics) @article{Bellora2007a, title = {PEAKS: identification of regulatory motifs by their position in DNA sequences.}, author = {Bellora, Nicolás and Farré, Domènec and Mar Albà, M}, url = {http://www.ncbi.nlm.nih.gov/pubmed/17098773}, issn = {1367-4811}, year = {2007}, date = {2007-01-01}, journal = {Bioinformatics (Oxford, England)}, volume = {23}, number = {2}, pages = {243--4}, abstract = {Many DNA functional motifs tend to accumulate or cluster at specific gene locations. These locations can be detected, in a group of gene sequences, as high frequency 'peaks' with respect to a reference position, such as the transcription start site (TSS). We have developed a web tool for the identification of regions containing significant motif peaks. We show, by using different yeast gene datasets, that peak regions are strongly enriched in experimentally-validated motifs and contain potentially important novel motifs. AVAILABILITY: http://genomics.imim.es/peaks}, keywords = {Algorithms, Automated, Automated: methods, Base Sequence, Chromosome Mapping, Chromosome Mapping: methods, DNA, DNA: genetics, DNA: methods, Molecular Sequence Data, Nucleic Acid, Nucleic Acid: genetics, Pattern Recognition, Regulatory Sequences, Sequence Alignment, Sequence Alignment: methods, Sequence Analysis, Software, Transcriptional Activation, Transcriptional Activation: genetics} } Many DNA functional motifs tend to accumulate or cluster at specific gene locations. These locations can be detected, in a group of gene sequences, as high frequency 'peaks' with respect to a reference position, such as the transcription start site (TSS). We have developed a web tool for the identification of regions containing significant motif peaks. We show, by using different yeast gene datasets, that peak regions are strongly enriched in experimentally-validated motifs and contain potentially important novel motifs. AVAILABILITY: http://genomics.imim.es/peaks |
2002 |
Albà, M Mar, Laskowski, Roman A, Hancock, John M Detecting cryptically simple protein sequences using the SIMPLE algorithm. (Article) Bioinformatics (Oxford, England), 18 (5), pp. 672–8, 2002, ISSN: 1367-4803. (Abstract | Links | BibTeX | Tags: Algorithms, Amino Acid, Amino Acid Sequence, Amino Acid: genetics, Databases, Genetic, Genetic Variation, Internet, Minisatellite Repeats, Minisatellite Repeats: genetics, Models, Molecular Sequence Data, Protein, Protein: methods, Proteins, Proteins: chemistry, Repetitive Sequences, Saccharomyces cerevisiae, Saccharomyces cerevisiae: genetics, Sensitivity and Specificity, Sequence Analysis, Sequence Homology, Software, Statistical) @article{Alba2002, title = {Detecting cryptically simple protein sequences using the SIMPLE algorithm.}, author = {Albà, M Mar and Laskowski, Roman A and Hancock, John M}, url = {http://www.ncbi.nlm.nih.gov/pubmed/12050063}, issn = {1367-4803}, year = {2002}, date = {2002-01-01}, journal = {Bioinformatics (Oxford, England)}, volume = {18}, number = {5}, pages = {672--8}, abstract = {Low-complexity or cryptically simple sequences are widespread in protein sequences but their evolution and function are poorly understood. To date methods for the detection of low complexity in proteins have been directed towards the filtering of such regions prior to sequence homology searches but not to the analysis of the regions per se. However, many of these regions are encoded by non-repetitive DNA sequences and may therefore result from selection acting on protein structure and/or function.}, keywords = {Algorithms, Amino Acid, Amino Acid Sequence, Amino Acid: genetics, Databases, Genetic, Genetic Variation, Internet, Minisatellite Repeats, Minisatellite Repeats: genetics, Models, Molecular Sequence Data, Protein, Protein: methods, Proteins, Proteins: chemistry, Repetitive Sequences, Saccharomyces cerevisiae, Saccharomyces cerevisiae: genetics, Sensitivity and Specificity, Sequence Analysis, Sequence Homology, Software, Statistical} } Low-complexity or cryptically simple sequences are widespread in protein sequences but their evolution and function are poorly understood. To date methods for the detection of low complexity in proteins have been directed towards the filtering of such regions prior to sequence homology searches but not to the analysis of the regions per se. However, many of these regions are encoded by non-repetitive DNA sequences and may therefore result from selection acting on protein structure and/or function. |
Publication List
2011 |
Lineage-specific variation in intensity of natural selection in mammals. (Article) Molecular biology and evolution, 28 (1), pp. 383–98, 2011, ISSN: 1537-1719. |
2010 |
Natural selection drives the accumulation of amino acid tandem repeats in human proteins. (Article) Genome research, 20 (6), pp. 745–54, 2010, ISSN: 1549-5469. |
2009 |
PLoS genetics, 5 (3), pp. e1000397, 2009, ISSN: 1553-7404. |
2007 |
Housekeeping genes tend to show reduced upstream sequence conservation. (Article) Genome biology, 8 (7), pp. R140, 2007, ISSN: 1465-6914. |
PEAKS: identification of regulatory motifs by their position in DNA sequences. (Article) Bioinformatics (Oxford, England), 23 (2), pp. 243–4, 2007, ISSN: 1367-4811. |
2002 |
Detecting cryptically simple protein sequences using the SIMPLE algorithm. (Article) Bioinformatics (Oxford, England), 18 (5), pp. 672–8, 2002, ISSN: 1367-4803. |