2011 |
Toll-Riera, Macarena, Laurie, Steve, Albà, M Mar Lineage-specific variation in intensity of natural selection in mammals. (Article) Molecular biology and evolution, 28 (1), pp. 383–98, 2011, ISSN: 1537-1719. (Abstract | Links | BibTeX | Tags: Amino Acid Sequence, Amino Acid Substitution, Animals, Evolution, F-Box Proteins, F-Box Proteins: genetics, G-Protein-Coupled, G-Protein-Coupled: genetics, Genetic, Genetic Variation, Humans, Mammals, Mammals: genetics, Molecular, Molecular Sequence Data, N-Methyl-D-Aspartate, N-Methyl-D-Aspartate: genetics, Odorant, Odorant: genetics, Receptors, Selection, Sequence Alignment) @article{Toll-Riera2011a, title = {Lineage-specific variation in intensity of natural selection in mammals.}, author = {Toll-Riera, Macarena and Laurie, Steve and Albà, M Mar}, url = {http://www.ncbi.nlm.nih.gov/pubmed/20688808}, issn = {1537-1719}, year = {2011}, date = {2011-01-01}, journal = {Molecular biology and evolution}, volume = {28}, number = {1}, pages = {383--98}, abstract = {The molecular clock hypothesis states that protein-coding genes evolve at an approximately constant rate. However, this is only expected to be true as long as the function and the tertiary structure of the molecule remain unaltered. An important implication of this statement is that significant deviations in the rate of evolution of a gene with respect to the species clock are likely to reflect functional and/or structural alterations. Here, we present a method to identify such deviations and apply it to a data set of 2,929 high-quality coding sequence alignments corresponding to one-to-one orthologous genes from six mammalian species--human, macaque, mouse, rat, cow, and dog. Deviated branches are defined as those that present significant alterations in both the rate of nonsynonymous substitutions (dN) and the selective pressure (dN/dS). Strikingly, we find that as many as 24.5% of the genes show branch-specific deviations in dN and dN/dS, though this is a relatively well-conserved set of genes. Around half of these genes show branch-specific acceleration of evolutionary rates. Positive selection (PS) tests based on divergence data only identify 17.7% of the accelerated branches. Failure to identify PS in accelerated branches with an excess of radical amino acid replacements suggests that these tests are conservative. Interestingly, genes with accelerated branches are significantly enriched in neural proteins, indicating that this type of protein might play a more important role than previously thought in species diversification, although they are generally not detected by PS tests. We discuss in detail several examples of genes that show lineage-specific evolutionary rate acceleration and are involved in synaptic transmission, chemosensory perception, and ubiquitination.}, keywords = {Amino Acid Sequence, Amino Acid Substitution, Animals, Evolution, F-Box Proteins, F-Box Proteins: genetics, G-Protein-Coupled, G-Protein-Coupled: genetics, Genetic, Genetic Variation, Humans, Mammals, Mammals: genetics, Molecular, Molecular Sequence Data, N-Methyl-D-Aspartate, N-Methyl-D-Aspartate: genetics, Odorant, Odorant: genetics, Receptors, Selection, Sequence Alignment} } The molecular clock hypothesis states that protein-coding genes evolve at an approximately constant rate. However, this is only expected to be true as long as the function and the tertiary structure of the molecule remain unaltered. An important implication of this statement is that significant deviations in the rate of evolution of a gene with respect to the species clock are likely to reflect functional and/or structural alterations. Here, we present a method to identify such deviations and apply it to a data set of 2,929 high-quality coding sequence alignments corresponding to one-to-one orthologous genes from six mammalian species--human, macaque, mouse, rat, cow, and dog. Deviated branches are defined as those that present significant alterations in both the rate of nonsynonymous substitutions (dN) and the selective pressure (dN/dS). Strikingly, we find that as many as 24.5% of the genes show branch-specific deviations in dN and dN/dS, though this is a relatively well-conserved set of genes. Around half of these genes show branch-specific acceleration of evolutionary rates. Positive selection (PS) tests based on divergence data only identify 17.7% of the accelerated branches. Failure to identify PS in accelerated branches with an excess of radical amino acid replacements suggests that these tests are conservative. Interestingly, genes with accelerated branches are significantly enriched in neural proteins, indicating that this type of protein might play a more important role than previously thought in species diversification, although they are generally not detected by PS tests. We discuss in detail several examples of genes that show lineage-specific evolutionary rate acceleration and are involved in synaptic transmission, chemosensory perception, and ubiquitination. |
2010 |
Mularoni, Loris, Ledda, Alice, Toll-Riera, Macarena, Albà, M Mar Natural selection drives the accumulation of amino acid tandem repeats in human proteins. (Article) Genome research, 20 (6), pp. 745–54, 2010, ISSN: 1549-5469. (Abstract | Links | BibTeX | Tags: Amino Acid, Amino Acid Sequence, Amino Acids, Amino Acids: chemistry, Amino Acids: genetics, Animals, Genetic, Humans, Molecular Sequence Data, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences, Selection, Sequence Homology) @article{Mularoni2010, title = {Natural selection drives the accumulation of amino acid tandem repeats in human proteins.}, author = {Mularoni, Loris and Ledda, Alice and Toll-Riera, Macarena and Albà, M Mar}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2877571&tool=pmcentrez&rendertype=abstract}, issn = {1549-5469}, year = {2010}, date = {2010-01-01}, journal = {Genome research}, volume = {20}, number = {6}, pages = {745--54}, abstract = {Amino acid tandem repeats are found in a large number of eukaryotic proteins. They are often encoded by trinucleotide repeats and exhibit high intra- and interspecies size variability due to the high mutation rate associated with replication slippage. The extent to which natural selection is important in shaping amino acid repeat evolution is a matter of debate. On one hand, their high frequency may simply reflect their high probability of expansion by slippage, and they could essentially evolve in a neutral manner. On the other hand, there is experimental evidence that changes in repeat size can influence protein-protein interactions, transcriptional activity, or protein subcellular localization, indicating that repeats could be functionally relevant and thus shaped by selection. To gauge the relative contribution of neutral and selective forces in amino acid repeat evolution, we have performed a comparative analysis of amino acid repeat conservation in a large set of orthologous proteins from 12 vertebrate species. As a neutral model of repeat evolution we have used sequences with the same DNA triplet composition as the coding sequences--and thus expected to be subject to the same mutational forces--but located in syntenic noncoding genomic regions. The results strongly indicate that selection has played a more important role than previously suspected in amino acid tandem repeat evolution, by increasing the repeat retention rate and by modulating repeat size. The data obtained in this study have allowed us to identify a set of 92 repeats that are postulated to play important functional roles due to their strong selective signature, including five cases with direct experimental evidence.}, keywords = {Amino Acid, Amino Acid Sequence, Amino Acids, Amino Acids: chemistry, Amino Acids: genetics, Animals, Genetic, Humans, Molecular Sequence Data, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences, Selection, Sequence Homology} } Amino acid tandem repeats are found in a large number of eukaryotic proteins. They are often encoded by trinucleotide repeats and exhibit high intra- and interspecies size variability due to the high mutation rate associated with replication slippage. The extent to which natural selection is important in shaping amino acid repeat evolution is a matter of debate. On one hand, their high frequency may simply reflect their high probability of expansion by slippage, and they could essentially evolve in a neutral manner. On the other hand, there is experimental evidence that changes in repeat size can influence protein-protein interactions, transcriptional activity, or protein subcellular localization, indicating that repeats could be functionally relevant and thus shaped by selection. To gauge the relative contribution of neutral and selective forces in amino acid repeat evolution, we have performed a comparative analysis of amino acid repeat conservation in a large set of orthologous proteins from 12 vertebrate species. As a neutral model of repeat evolution we have used sequences with the same DNA triplet composition as the coding sequences--and thus expected to be subject to the same mutational forces--but located in syntenic noncoding genomic regions. The results strongly indicate that selection has played a more important role than previously suspected in amino acid tandem repeat evolution, by increasing the repeat retention rate and by modulating repeat size. The data obtained in this study have allowed us to identify a set of 92 repeats that are postulated to play important functional roles due to their strong selective signature, including five cases with direct experimental evidence. |
Farré, Domènec, Albà, M Mar Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates. (Article) Molecular biology and evolution, 27 (2), pp. 325–35, 2010, ISSN: 1537-1719. (Abstract | Links | BibTeX | Tags: Animals, Evolution, Gene Duplication, Genetic, Humans, Mammals, Mammals: genetics, Models, Molecular) @article{Farre2010, title = {Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates.}, author = {Farré, Domènec and Albà, M Mar}, url = {http://www.ncbi.nlm.nih.gov/pubmed/19822635}, issn = {1537-1719}, year = {2010}, date = {2010-01-01}, journal = {Molecular biology and evolution}, volume = {27}, number = {2}, pages = {325--35}, abstract = {Gene duplication is a major mechanism for molecular evolutionary innovation. Young gene duplicates typically exhibit elevated rates of protein evolution and, according to a number of recent studies, increased expression divergence. However, the nature of these changes is still poorly understood. To gain novel insights into the functional consequences of gene duplication, we have undertaken an in-depth analysis of a large data set of gene families containing primate- and/or rodent-specific gene duplicates. We have found a clear tendency toward an increase in protein, promoter, and expression divergence with increasing number of duplication events undergone by each gene since the human-mouse split. In addition, gene duplication is significantly associated with a reduction in expression breadth and intensity. Interestingly, it is possible to identify three main groups regarding the evolution of gene expression following gene duplication. The first group, which comprises around 25% of the families, shows patterns compatible with tissue-expression partitioning. The second and largest group, comprising 33-53% of the families, shows broad expression of one of the gene copies and reduced, overlapping, expression of the other copy or copies. This can be attributed, in most cases, to loss of expression in several tissues of one or more gene copies. Finally, a substantial number of families, 19-35%, maintain a very high level of tissue-expression overlap (>0.8) after tens of millions of years of evolution. These families may have been subject to selection for increased gene dosage.}, keywords = {Animals, Evolution, Gene Duplication, Genetic, Humans, Mammals, Mammals: genetics, Models, Molecular} } Gene duplication is a major mechanism for molecular evolutionary innovation. Young gene duplicates typically exhibit elevated rates of protein evolution and, according to a number of recent studies, increased expression divergence. However, the nature of these changes is still poorly understood. To gain novel insights into the functional consequences of gene duplication, we have undertaken an in-depth analysis of a large data set of gene families containing primate- and/or rodent-specific gene duplicates. We have found a clear tendency toward an increase in protein, promoter, and expression divergence with increasing number of duplication events undergone by each gene since the human-mouse split. In addition, gene duplication is significantly associated with a reduction in expression breadth and intensity. Interestingly, it is possible to identify three main groups regarding the evolution of gene expression following gene duplication. The first group, which comprises around 25% of the families, shows patterns compatible with tissue-expression partitioning. The second and largest group, comprising 33-53% of the families, shows broad expression of one of the gene copies and reduced, overlapping, expression of the other copy or copies. This can be attributed, in most cases, to loss of expression in several tissues of one or more gene copies. Finally, a substantial number of families, 19-35%, maintain a very high level of tissue-expression overlap (>0.8) after tens of millions of years of evolution. These families may have been subject to selection for increased gene dosage. |
2009 |
Rodilla, Verónica, Villanueva, Alberto, Obrador-Hevia, Antonia, Robert-Moreno, Alex, Fernández-Majada, Vanessa, Grilli, Andrea, López-Bigas, Nuria, Bellora, Nicolás, Albà, M Mar, Torres, Ferran, Duñach, Mireia, Sanjuan, Xavier, Gonzalez, Sara, Gridley, Thomas, Capella, Gabriel, Bigas, Anna, Espinosa, Lluís Jagged1 is the pathological link between Wnt and Notch pathways in colorectal cancer. (Article) Proceedings of the National Academy of Sciences of the United States of America, 106 (15), pp. 6315–20, 2009, ISSN: 1091-6490. (Abstract | Links | BibTeX | Tags: Alleles, Animals, beta Catenin, beta Catenin: metabolism, Calcium-Binding Proteins, Calcium-Binding Proteins: genetics, Calcium-Binding Proteins: metabolism, Cell Line, Cell Nucleus, Cell Nucleus: metabolism, Colorectal Neoplasms, Colorectal Neoplasms: blood supply, Colorectal Neoplasms: genetics, Colorectal Neoplasms: metabolism, Colorectal Neoplasms: pathology, Gene Expression Profiling, Gene Expression Regulation, Genetic, Genetic: genetics, Humans, Intercellular Signaling Peptides and Proteins, Intercellular Signaling Peptides and Proteins: gen, Intercellular Signaling Peptides and Proteins: met, Membrane Proteins, Membrane Proteins: genetics, Membrane Proteins: metabolism, Mice, Neoplastic, Notch, Notch: metabolism, Receptors, Signal Transduction, TCF Transcription Factors, TCF Transcription Factors: metabolism, Transcription, Transgenic, Wnt Proteins, Wnt Proteins: metabolism) @article{Rodilla2009, title = {Jagged1 is the pathological link between Wnt and Notch pathways in colorectal cancer.}, author = {Rodilla, Verónica and Villanueva, Alberto and Obrador-Hevia, Antonia and Robert-Moreno, Alex and Fernández-Majada, Vanessa and Grilli, Andrea and López-Bigas, Nuria and Bellora, Nicolás and Albà, M Mar and Torres, Ferran and Duñach, Mireia and Sanjuan, Xavier and Gonzalez, Sara and Gridley, Thomas and Capella, Gabriel and Bigas, Anna and Espinosa, Lluís}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2669348&tool=pmcentrez&rendertype=abstract}, issn = {1091-6490}, year = {2009}, date = {2009-01-01}, journal = {Proceedings of the National Academy of Sciences of the United States of America}, volume = {106}, number = {15}, pages = {6315--20}, abstract = {Notch has been linked to beta-catenin-dependent tumorigenesis; however, the mechanisms leading to Notch activation and the contribution of the Notch pathway to colorectal cancer is not yet understood. By microarray analysis, we have identified a group of genes downstream of Wnt/beta-catenin (down-regulated when blocking Wnt/beta-catenin) that are directly regulated by Notch (repressed by gamma-secretase inhibitors and up-regulated by active Notch1 in the absence of beta-catenin signaling). We demonstrate that Notch is downstream of Wnt in colorectal cancer cells through beta-catenin-mediated transcriptional activation of the Notch-ligand Jagged1. Consistently, expression of activated Notch1 partially reverts the effects of blocking Wnt/beta-catenin pathway in tumors implanted s.c. in nude mice. Crossing APC(Min/+) with Jagged1(+/Delta) mice is sufficient to significantly reduce the size of the polyps arising in the APC mutant background indicating that Notch is an essential modulator of tumorigenesis induced by nuclear beta-catenin. We show that this mechanism is operating in human tumors from Familial Adenomatous Polyposis patients. We conclude that Notch activation, accomplished by beta-catenin-mediated up-regulation of Jagged1, is required for tumorigenesis in the intestine. The Notch-specific genetic signature is sufficient to block differentiation and promote vasculogenesis in tumors whereas proliferation depends on both pathways.}, keywords = {Alleles, Animals, beta Catenin, beta Catenin: metabolism, Calcium-Binding Proteins, Calcium-Binding Proteins: genetics, Calcium-Binding Proteins: metabolism, Cell Line, Cell Nucleus, Cell Nucleus: metabolism, Colorectal Neoplasms, Colorectal Neoplasms: blood supply, Colorectal Neoplasms: genetics, Colorectal Neoplasms: metabolism, Colorectal Neoplasms: pathology, Gene Expression Profiling, Gene Expression Regulation, Genetic, Genetic: genetics, Humans, Intercellular Signaling Peptides and Proteins, Intercellular Signaling Peptides and Proteins: gen, Intercellular Signaling Peptides and Proteins: met, Membrane Proteins, Membrane Proteins: genetics, Membrane Proteins: metabolism, Mice, Neoplastic, Notch, Notch: metabolism, Receptors, Signal Transduction, TCF Transcription Factors, TCF Transcription Factors: metabolism, Transcription, Transgenic, Wnt Proteins, Wnt Proteins: metabolism} } Notch has been linked to beta-catenin-dependent tumorigenesis; however, the mechanisms leading to Notch activation and the contribution of the Notch pathway to colorectal cancer is not yet understood. By microarray analysis, we have identified a group of genes downstream of Wnt/beta-catenin (down-regulated when blocking Wnt/beta-catenin) that are directly regulated by Notch (repressed by gamma-secretase inhibitors and up-regulated by active Notch1 in the absence of beta-catenin signaling). We demonstrate that Notch is downstream of Wnt in colorectal cancer cells through beta-catenin-mediated transcriptional activation of the Notch-ligand Jagged1. Consistently, expression of activated Notch1 partially reverts the effects of blocking Wnt/beta-catenin pathway in tumors implanted s.c. in nude mice. Crossing APC(Min/+) with Jagged1(+/Delta) mice is sufficient to significantly reduce the size of the polyps arising in the APC mutant background indicating that Notch is an essential modulator of tumorigenesis induced by nuclear beta-catenin. We show that this mechanism is operating in human tumors from Familial Adenomatous Polyposis patients. We conclude that Notch activation, accomplished by beta-catenin-mediated up-regulation of Jagged1, is required for tumorigenesis in the intestine. The Notch-specific genetic signature is sufficient to block differentiation and promote vasculogenesis in tumors whereas proliferation depends on both pathways. |
2007 |
Mularoni, Loris, Veitia, Reiner A, Albà, M Mar Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats. (Article) Genomics, 89 (3), pp. 316–25, 2007, ISSN: 0888-7543. (Abstract | Links | BibTeX | Tags: Amino Acid, Amino Acid Sequence, Animals, Complementary, Conserved Sequence, DNA, Evolution, Genetic, Humans, Mice, Molecular, Point Mutation, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences, Selection, Trinucleotide Repeats) @article{Mularoni2007, title = {Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats.}, author = {Mularoni, Loris and Veitia, Reiner A and Albà, M Mar}, url = {http://www.ncbi.nlm.nih.gov/pubmed/17196365}, issn = {0888-7543}, year = {2007}, date = {2007-01-01}, journal = {Genomics}, volume = {89}, number = {3}, pages = {316--25}, abstract = {Single-amino-acid tandem repeats are very common in mammalian proteins but their function and evolution are still poorly understood. Here we investigate how the variability and prevalence of amino acid repeats are related to the evolutionary constraints operating on the proteins. We find a significant positive correlation between repeat size difference and protein nonsynonymous substitution rate in human and mouse orthologous genes. This association is observed for all the common amino acid repeat types and indicates that rapid diversification of repeat structures, involving both trinucleotide slippage and nucleotide substitutions, preferentially occurs in proteins subject to low selective constraints. However, strikingly, we also observe a significant negative correlation between the number of repeats in a protein and the gene nonsynonymous substitution rate, particularly for glutamine, glycine, and alanine repeats. This implies that proteins subject to strong selective constraints tend to contain an unexpectedly high number of repeats, which tend to be well conserved between the two species. This is consistent with a role for selection in the maintenance of a significant number of repeats. Analysis of the codon structure of the sequences encoding the repeats shows that codon purity is associated with high repeat size interspecific variability. Interestingly, polyalanine and polyglutamine repeats associated with disease show very distinctive features regarding the degree of repeat conservation and the protein sequence selective constraints.}, keywords = {Amino Acid, Amino Acid Sequence, Animals, Complementary, Conserved Sequence, DNA, Evolution, Genetic, Humans, Mice, Molecular, Point Mutation, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences, Selection, Trinucleotide Repeats} } Single-amino-acid tandem repeats are very common in mammalian proteins but their function and evolution are still poorly understood. Here we investigate how the variability and prevalence of amino acid repeats are related to the evolutionary constraints operating on the proteins. We find a significant positive correlation between repeat size difference and protein nonsynonymous substitution rate in human and mouse orthologous genes. This association is observed for all the common amino acid repeat types and indicates that rapid diversification of repeat structures, involving both trinucleotide slippage and nucleotide substitutions, preferentially occurs in proteins subject to low selective constraints. However, strikingly, we also observe a significant negative correlation between the number of repeats in a protein and the gene nonsynonymous substitution rate, particularly for glutamine, glycine, and alanine repeats. This implies that proteins subject to strong selective constraints tend to contain an unexpectedly high number of repeats, which tend to be well conserved between the two species. This is consistent with a role for selection in the maintenance of a significant number of repeats. Analysis of the codon structure of the sequences encoding the repeats shows that codon purity is associated with high repeat size interspecific variability. Interestingly, polyalanine and polyglutamine repeats associated with disease show very distinctive features regarding the degree of repeat conservation and the protein sequence selective constraints. |
Farré, Domènec, Bellora, Nicolás, Mularoni, Loris, Messeguer, Xavier, Albà, M Mar Housekeeping genes tend to show reduced upstream sequence conservation. (Article) Genome biology, 8 (7), pp. R140, 2007, ISSN: 1465-6914. (Abstract | Links | BibTeX | Tags: Animals, Base Sequence, Conserved Sequence, CpG Islands, Evolution, Gene Expression, Genetic, Genetic Variation, Humans, Mice, Molecular, Molecular Sequence Data, Promoter Regions) @article{Farre2007, title = {Housekeeping genes tend to show reduced upstream sequence conservation.}, author = {Farré, Domènec and Bellora, Nicolás and Mularoni, Loris and Messeguer, Xavier and Albà, M Mar}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2323216&tool=pmcentrez&rendertype=abstract}, issn = {1465-6914}, year = {2007}, date = {2007-01-01}, journal = {Genome biology}, volume = {8}, number = {7}, pages = {R140}, abstract = {Understanding the constraints that operate in mammalian gene promoter sequences is of key importance to understand the evolution of gene regulatory networks. The level of promoter conservation varies greatly across orthologous genes, denoting differences in the strength of the evolutionary constraints. Here we test the hypothesis that the number of tissues in which a gene is expressed is related in a significant manner to the extent of promoter sequence conservation.}, keywords = {Animals, Base Sequence, Conserved Sequence, CpG Islands, Evolution, Gene Expression, Genetic, Genetic Variation, Humans, Mice, Molecular, Molecular Sequence Data, Promoter Regions} } Understanding the constraints that operate in mammalian gene promoter sequences is of key importance to understand the evolution of gene regulatory networks. The level of promoter conservation varies greatly across orthologous genes, denoting differences in the strength of the evolutionary constraints. Here we test the hypothesis that the number of tissues in which a gene is expressed is related in a significant manner to the extent of promoter sequence conservation. |
Bellora, Nicolás, Farré, Domènec, Albà, M Mar Positional bias of general and tissue-specific regulatory motifs in mouse gene promoters. (Article) BMC genomics, 8 pp. 459, 2007, ISSN: 1471-2164. (Abstract | Links | BibTeX | Tags: Animals, Databases, Gene Expression Regulation, Gene Expression Regulation: genetics, Genetic, Genetic: genetics, Mice, Nucleic Acid, Organ Specificity, Organ Specificity: genetics, Promoter Regions, Software, Transcription Factors, Transcription Factors: metabolism) @article{Bellora2007, title = {Positional bias of general and tissue-specific regulatory motifs in mouse gene promoters.}, author = {Bellora, Nicolás and Farré, Domènec and Albà, M Mar}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2249607&tool=pmcentrez&rendertype=abstract}, issn = {1471-2164}, year = {2007}, date = {2007-01-01}, journal = {BMC genomics}, volume = {8}, pages = {459}, abstract = {The arrangement of regulatory motifs in gene promoters, or promoter architecture, is the result of mutation and selection processes that have operated over many millions of years. In mammals, tissue-specific transcriptional regulation is related to the presence of specific protein-interacting DNA motifs in gene promoters. However, little is known about the relative location and spacing of these motifs. To fill this gap, we have performed a systematic search for motifs that show significant bias at specific promoter locations in a large collection of housekeeping and tissue-specific genes.}, keywords = {Animals, Databases, Gene Expression Regulation, Gene Expression Regulation: genetics, Genetic, Genetic: genetics, Mice, Nucleic Acid, Organ Specificity, Organ Specificity: genetics, Promoter Regions, Software, Transcription Factors, Transcription Factors: metabolism} } The arrangement of regulatory motifs in gene promoters, or promoter architecture, is the result of mutation and selection processes that have operated over many millions of years. In mammals, tissue-specific transcriptional regulation is related to the presence of specific protein-interacting DNA motifs in gene promoters. However, little is known about the relative location and spacing of these motifs. To fill this gap, we have performed a systematic search for motifs that show significant bias at specific promoter locations in a large collection of housekeeping and tissue-specific genes. |
2006 |
Furney, Simon J, Albà, M Mar, López-Bigas, Núria BMC genomics, 7 pp. 165, 2006, ISSN: 1471-2164. (Abstract | Links | BibTeX | Tags: Amino Acid, Animals, Caenorhabditis elegans, Caenorhabditis elegans: genetics, Computational Biology, Conserved Sequence, Dominant, Essential, Evolution, Genes, Genetic, Genetic Diseases, Genetic Structures, Humans, Inborn, Inborn: classification, Inborn: genetics, Mice, Molecular, Mutation, Pan troglodytes, Pan troglodytes: genetics, Recessive, Selection, Sequence Homology) @article{Furney2006, title = {Differences in the evolutionary history of disease genes affected by dominant or recessive mutations.}, author = {Furney, Simon J and Albà, M Mar and López-Bigas, Núria}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1534034&tool=pmcentrez&rendertype=abstract}, issn = {1471-2164}, year = {2006}, date = {2006-01-01}, journal = {BMC genomics}, volume = {7}, pages = {165}, abstract = {Global analyses of human disease genes by computational methods have yielded important advances in the understanding of human diseases. Generally these studies have treated the group of disease genes uniformly, thus ignoring the type of disease-causing mutations (dominant or recessive). In this report we present a comprehensive study of the evolutionary history of autosomal disease genes separated by mode of inheritance.}, keywords = {Amino Acid, Animals, Caenorhabditis elegans, Caenorhabditis elegans: genetics, Computational Biology, Conserved Sequence, Dominant, Essential, Evolution, Genes, Genetic, Genetic Diseases, Genetic Structures, Humans, Inborn, Inborn: classification, Inborn: genetics, Mice, Molecular, Mutation, Pan troglodytes, Pan troglodytes: genetics, Recessive, Selection, Sequence Homology} } Global analyses of human disease genes by computational methods have yielded important advances in the understanding of human diseases. Generally these studies have treated the group of disease genes uniformly, thus ignoring the type of disease-causing mutations (dominant or recessive). In this report we present a comprehensive study of the evolutionary history of autosomal disease genes separated by mode of inheritance. |
Mularoni, Loris, Guigó, Roderic, Albà, M Mar Mutation patterns of amino acid tandem repeats in the human proteome. (Article) Genome biology, 7 (4), pp. R33, 2006, ISSN: 1465-6914. (Abstract | Links | BibTeX | Tags: Amino Acid, Amino Acid Substitution, Amino Acid: genetics, Codon, Expressed Sequence Tags, Genetic, Humans, Mutation, Polymorphism, Protein, Proteome, Proteome: genetics, Repetitive Sequences, Sequence Analysis) @article{Mularoni2006, title = {Mutation patterns of amino acid tandem repeats in the human proteome.}, author = {Mularoni, Loris and Guigó, Roderic and Albà, M Mar}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1557989&tool=pmcentrez&rendertype=abstract}, issn = {1465-6914}, year = {2006}, date = {2006-01-01}, journal = {Genome biology}, volume = {7}, number = {4}, pages = {R33}, abstract = {Amino acid tandem repeats are found in nearly one-fifth of human proteins. Abnormal expansion of these regions is associated with several human disorders. To gain further insight into the mutational mechanisms that operate in this type of sequence, we have analyzed a large number of mutation variants derived from human expressed sequence tags (ESTs).}, keywords = {Amino Acid, Amino Acid Substitution, Amino Acid: genetics, Codon, Expressed Sequence Tags, Genetic, Humans, Mutation, Polymorphism, Protein, Proteome, Proteome: genetics, Repetitive Sequences, Sequence Analysis} } Amino acid tandem repeats are found in nearly one-fifth of human proteins. Abnormal expansion of these regions is associated with several human disorders. To gain further insight into the mutational mechanisms that operate in this type of sequence, we have analyzed a large number of mutation variants derived from human expressed sequence tags (ESTs). |
Blanco, Enrique, Farré, Domènec, Albà, M Mar, Messeguer, Xavier, Guigó, Roderic ABS: a database of Annotated regulatory Binding Sites from orthologous promoters. (Article) Nucleic acids research, 34 (Database issue), pp. D63–7, 2006, ISSN: 1362-4962. (Abstract | Links | BibTeX | Tags: Animals, Binding Sites, Chickens, Chickens: genetics, Databases, Genetic, Genomics, Humans, Internet, Mice, Nucleic Acid, Promoter Regions, Rats, Transcription Factors, Transcription Factors: metabolism, User-Computer Interface) @article{Blanco2006, title = {ABS: a database of Annotated regulatory Binding Sites from orthologous promoters.}, author = {Blanco, Enrique and Farré, Domènec and Albà, M Mar and Messeguer, Xavier and Guigó, Roderic}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1347478&tool=pmcentrez&rendertype=abstract}, issn = {1362-4962}, year = {2006}, date = {2006-01-01}, journal = {Nucleic acids research}, volume = {34}, number = {Database issue}, pages = {D63--7}, abstract = {Information about the genomic coordinates and the sequence of experimentally identified transcription factor binding sites is found scattered under a variety of diverse formats. The availability of standard collections of such high-quality data is important to design, evaluate and improve novel computational approaches to identify binding motifs on promoter sequences from related genes. ABS (http://genome.imim.es/datasets/abs2005/index.html) is a public database of known binding sites identified in promoters of orthologous vertebrate genes that have been manually curated from bibliography. We have annotated 650 experimental binding sites from 68 transcription factors and 100 orthologous target genes in human, mouse, rat or chicken genome sequences. Computational predictions and promoter alignment information are also provided for each entry. A simple and easy-to-use web interface facilitates data retrieval allowing different views of the information. In addition, the release 1.0 of ABS includes a customizable generator of artificial datasets based on the known sites contained in the collection and an evaluation tool to aid during the training and the assessment of motif-finding programs.}, keywords = {Animals, Binding Sites, Chickens, Chickens: genetics, Databases, Genetic, Genomics, Humans, Internet, Mice, Nucleic Acid, Promoter Regions, Rats, Transcription Factors, Transcription Factors: metabolism, User-Computer Interface} } Information about the genomic coordinates and the sequence of experimentally identified transcription factor binding sites is found scattered under a variety of diverse formats. The availability of standard collections of such high-quality data is important to design, evaluate and improve novel computational approaches to identify binding motifs on promoter sequences from related genes. ABS (http://genome.imim.es/datasets/abs2005/index.html) is a public database of known binding sites identified in promoters of orthologous vertebrate genes that have been manually curated from bibliography. We have annotated 650 experimental binding sites from 68 transcription factors and 100 orthologous target genes in human, mouse, rat or chicken genome sequences. Computational predictions and promoter alignment information are also provided for each entry. A simple and easy-to-use web interface facilitates data retrieval allowing different views of the information. In addition, the release 1.0 of ABS includes a customizable generator of artificial datasets based on the known sites contained in the collection and an evaluation tool to aid during the training and the assessment of motif-finding programs. |
2004 |
Huang, Hui, Winter, Eitan E, Wang, Huajun, Weinstock, Keith G, Xing, Heming, Goodstadt, Leo, Stenson, Peter D, Cooper, David N, Smith, Douglas, Albà, M Mar, Ponting, Chris P, Fechtel, Kim Genome biology, 5 (7), pp. R47, 2004, ISSN: 1465-6914. (Abstract | Links | BibTeX | Tags: Amino Acid, Amino Acid: genetics, Animal, Animals, Chromosome Mapping, Chromosome Mapping: methods, Conserved Sequence, Conserved Sequence: genetics, Disease Models, Evolution, Fishes, Fishes: genetics, Fungal, Fungal: genetics, Genes, Genes: genetics, Genes: physiology, Genetic, Genetic Diseases, Genome, Helminth, Helminth: genetics, human, Humans, Inborn, Inborn: genetics, Inborn: physiopathology, Insect, Insect: genetics, Mice, Molecular, Mutagenesis, Mutagenesis: genetics, Nucleic Acid, Nucleotides, Nucleotides: genetics, Point Mutation, Point Mutation: genetics, Rats, Repetitive Sequences, Selection, Sequence Homology, Trinucleotide Repeat Expansion, Trinucleotide Repeat Expansion: genetics) @article{Huang2004, title = {Evolutionary conservation and selection of human disease gene orthologs in the rat and mouse genomes.}, author = {Huang, Hui and Winter, Eitan E and Wang, Huajun and Weinstock, Keith G and Xing, Heming and Goodstadt, Leo and Stenson, Peter D and Cooper, David N and Smith, Douglas and Albà, M Mar and Ponting, Chris P and Fechtel, Kim}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=463309&tool=pmcentrez&rendertype=abstract}, issn = {1465-6914}, year = {2004}, date = {2004-01-01}, journal = {Genome biology}, volume = {5}, number = {7}, pages = {R47}, abstract = {Model organisms have contributed substantially to our understanding of the etiology of human disease as well as having assisted with the development of new treatment modalities. The availability of the human, mouse and, most recently, the rat genome sequences now permit the comprehensive investigation of the rodent orthologs of genes associated with human disease. Here, we investigate whether human disease genes differ significantly from their rodent orthologs with respect to their overall levels of conservation and their rates of evolutionary change.}, keywords = {Amino Acid, Amino Acid: genetics, Animal, Animals, Chromosome Mapping, Chromosome Mapping: methods, Conserved Sequence, Conserved Sequence: genetics, Disease Models, Evolution, Fishes, Fishes: genetics, Fungal, Fungal: genetics, Genes, Genes: genetics, Genes: physiology, Genetic, Genetic Diseases, Genome, Helminth, Helminth: genetics, human, Humans, Inborn, Inborn: genetics, Inborn: physiopathology, Insect, Insect: genetics, Mice, Molecular, Mutagenesis, Mutagenesis: genetics, Nucleic Acid, Nucleotides, Nucleotides: genetics, Point Mutation, Point Mutation: genetics, Rats, Repetitive Sequences, Selection, Sequence Homology, Trinucleotide Repeat Expansion, Trinucleotide Repeat Expansion: genetics} } Model organisms have contributed substantially to our understanding of the etiology of human disease as well as having assisted with the development of new treatment modalities. The availability of the human, mouse and, most recently, the rat genome sequences now permit the comprehensive investigation of the rodent orthologs of genes associated with human disease. Here, we investigate whether human disease genes differ significantly from their rodent orthologs with respect to their overall levels of conservation and their rates of evolutionary change. |
Castresana, Jose, Guigó, Roderic, Albà, M Mar Journal of molecular evolution, 59 (1), pp. 72–9, 2004, ISSN: 0022-2844. (Abstract | Links | BibTeX | Tags: Base Composition, Base Composition: genetics, Chromatin, Chromatin: metabolism, Chromosomes, Computational Biology, Databases, DNA-Binding Proteins, DNA-Binding Proteins: genetics, DNA-Binding Proteins: metabolism, Evolution, Genetic, Genome, human, Humans, Introns, Introns: genetics, Models, Molecular, Multigene Family, Multigene Family: genetics, Pair 19, Pair 19: genetics, Phylogeny, Zinc Fingers, Zinc Fingers: genetics) @article{Castresana2004, title = {Clustering of genes coding for DNA binding proteins in a region of atypical evolution of the human genome.}, author = {Castresana, Jose and Guigó, Roderic and Albà, M Mar}, url = {http://www.ncbi.nlm.nih.gov/pubmed/15383909}, issn = {0022-2844}, year = {2004}, date = {2004-01-01}, journal = {Journal of molecular evolution}, volume = {59}, number = {1}, pages = {72--9}, abstract = {Comparison of the human and mouse genomes has revealed that significant variations in evolutionary rates exist among genomic regions and that a large part of this variation is interchromosomal. We confirm in this work, using a large collection of introns, that human chromosome 19 is the one that shows the highest divergence with respect to mouse. To search for other differences among chromosomes, we examine the distribution of gene functions in human and mouse chromosomes using the Gene Ontology definitions. We found by correspondence analysis that among the strongest clusterings of gene functions in human chromosomes is a group of genes coding for DNA binding proteins in chromosome 19. Interestingly, chromosome 19 also has a very high GC content, a feature that has been proposed to promote an opening of the chromatin, thereby facilitating binding of proteins to the DNA helix. In the mouse genome, however, a similar aggregation of genes coding for DNA binding proteins and high GC content cannot be found. This suggests that the distribution of genes coding for DNA binding proteins and the variations of the chromatin accessibility to these proteins are different in the human and mouse genomes. It is likely that the overall high synonymous and intron rates in chromosome 19 are a by-product of the high GC content of this chromosome.}, keywords = {Base Composition, Base Composition: genetics, Chromatin, Chromatin: metabolism, Chromosomes, Computational Biology, Databases, DNA-Binding Proteins, DNA-Binding Proteins: genetics, DNA-Binding Proteins: metabolism, Evolution, Genetic, Genome, human, Humans, Introns, Introns: genetics, Models, Molecular, Multigene Family, Multigene Family: genetics, Pair 19, Pair 19: genetics, Phylogeny, Zinc Fingers, Zinc Fingers: genetics} } Comparison of the human and mouse genomes has revealed that significant variations in evolutionary rates exist among genomic regions and that a large part of this variation is interchromosomal. We confirm in this work, using a large collection of introns, that human chromosome 19 is the one that shows the highest divergence with respect to mouse. To search for other differences among chromosomes, we examine the distribution of gene functions in human and mouse chromosomes using the Gene Ontology definitions. We found by correspondence analysis that among the strongest clusterings of gene functions in human chromosomes is a group of genes coding for DNA binding proteins in chromosome 19. Interestingly, chromosome 19 also has a very high GC content, a feature that has been proposed to promote an opening of the chromatin, thereby facilitating binding of proteins to the DNA helix. In the mouse genome, however, a similar aggregation of genes coding for DNA binding proteins and high GC content cannot be found. This suggests that the distribution of genes coding for DNA binding proteins and the variations of the chromatin accessibility to these proteins are different in the human and mouse genomes. It is likely that the overall high synonymous and intron rates in chromosome 19 are a by-product of the high GC content of this chromosome. |
2002 |
Albà, M Mar, Laskowski, Roman A, Hancock, John M Detecting cryptically simple protein sequences using the SIMPLE algorithm. (Article) Bioinformatics (Oxford, England), 18 (5), pp. 672–8, 2002, ISSN: 1367-4803. (Abstract | Links | BibTeX | Tags: Algorithms, Amino Acid, Amino Acid Sequence, Amino Acid: genetics, Databases, Genetic, Genetic Variation, Internet, Minisatellite Repeats, Minisatellite Repeats: genetics, Models, Molecular Sequence Data, Protein, Protein: methods, Proteins, Proteins: chemistry, Repetitive Sequences, Saccharomyces cerevisiae, Saccharomyces cerevisiae: genetics, Sensitivity and Specificity, Sequence Analysis, Sequence Homology, Software, Statistical) @article{Alba2002, title = {Detecting cryptically simple protein sequences using the SIMPLE algorithm.}, author = {Albà, M Mar and Laskowski, Roman A and Hancock, John M}, url = {http://www.ncbi.nlm.nih.gov/pubmed/12050063}, issn = {1367-4803}, year = {2002}, date = {2002-01-01}, journal = {Bioinformatics (Oxford, England)}, volume = {18}, number = {5}, pages = {672--8}, abstract = {Low-complexity or cryptically simple sequences are widespread in protein sequences but their evolution and function are poorly understood. To date methods for the detection of low complexity in proteins have been directed towards the filtering of such regions prior to sequence homology searches but not to the analysis of the regions per se. However, many of these regions are encoded by non-repetitive DNA sequences and may therefore result from selection acting on protein structure and/or function.}, keywords = {Algorithms, Amino Acid, Amino Acid Sequence, Amino Acid: genetics, Databases, Genetic, Genetic Variation, Internet, Minisatellite Repeats, Minisatellite Repeats: genetics, Models, Molecular Sequence Data, Protein, Protein: methods, Proteins, Proteins: chemistry, Repetitive Sequences, Saccharomyces cerevisiae, Saccharomyces cerevisiae: genetics, Sensitivity and Specificity, Sequence Analysis, Sequence Homology, Software, Statistical} } Low-complexity or cryptically simple sequences are widespread in protein sequences but their evolution and function are poorly understood. To date methods for the detection of low complexity in proteins have been directed towards the filtering of such regions prior to sequence homology searches but not to the analysis of the regions per se. However, many of these regions are encoded by non-repetitive DNA sequences and may therefore result from selection acting on protein structure and/or function. |
Publication List
2011 |
Lineage-specific variation in intensity of natural selection in mammals. (Article) Molecular biology and evolution, 28 (1), pp. 383–98, 2011, ISSN: 1537-1719. |
2010 |
Natural selection drives the accumulation of amino acid tandem repeats in human proteins. (Article) Genome research, 20 (6), pp. 745–54, 2010, ISSN: 1549-5469. |
Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates. (Article) Molecular biology and evolution, 27 (2), pp. 325–35, 2010, ISSN: 1537-1719. |
2009 |
Jagged1 is the pathological link between Wnt and Notch pathways in colorectal cancer. (Article) Proceedings of the National Academy of Sciences of the United States of America, 106 (15), pp. 6315–20, 2009, ISSN: 1091-6490. |
2007 |
Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats. (Article) Genomics, 89 (3), pp. 316–25, 2007, ISSN: 0888-7543. |
Housekeeping genes tend to show reduced upstream sequence conservation. (Article) Genome biology, 8 (7), pp. R140, 2007, ISSN: 1465-6914. |
Positional bias of general and tissue-specific regulatory motifs in mouse gene promoters. (Article) BMC genomics, 8 pp. 459, 2007, ISSN: 1471-2164. |
2006 |
BMC genomics, 7 pp. 165, 2006, ISSN: 1471-2164. |
Mutation patterns of amino acid tandem repeats in the human proteome. (Article) Genome biology, 7 (4), pp. R33, 2006, ISSN: 1465-6914. |
ABS: a database of Annotated regulatory Binding Sites from orthologous promoters. (Article) Nucleic acids research, 34 (Database issue), pp. D63–7, 2006, ISSN: 1362-4962. |
2004 |
Genome biology, 5 (7), pp. R47, 2004, ISSN: 1465-6914. |
Journal of molecular evolution, 59 (1), pp. 72–9, 2004, ISSN: 0022-2844. |
2002 |
Detecting cryptically simple protein sequences using the SIMPLE algorithm. (Article) Bioinformatics (Oxford, England), 18 (5), pp. 672–8, 2002, ISSN: 1367-4803. |