2017 |
M.Mar Albà Zinc-finger domains in metazoans: evolution gone wild (Article) Genome Biology, 18 pp. 168, 2017. (Abstract | Links | BibTeX | Tags: Evolution, Zinc Fingers) @article{Albà2017b, title = {Zinc-finger domains in metazoans: evolution gone wild}, author = {M.Mar Albà}, url = {http://evolutionarygenomics.imim.es/group/wp-content/uploads/2017/10/Alba20172.pdf}, year = {2017}, date = {2017-09-06}, journal = {Genome Biology}, volume = {18}, pages = {168}, abstract = {A new study uncovers a potential mechanism that may allow zinc-finger domains in metazoans to recognize and bind virtually any DNA sequence.}, keywords = {Evolution, Zinc Fingers} } A new study uncovers a potential mechanism that may allow zinc-finger domains in metazoans to recognize and bind virtually any DNA sequence. |
2016 |
José Luis Villanueva-Cañas Jorge Ruiz-Orera Isabel Agea Maria Gallo David Andreu M.Mar Albà New genes and functional innovation in mammals (Article) bioRxiv, 2016. (Abstract | Links | BibTeX | Tags: adaptation, de novo genes, Evolution, gene family, mammal) @article{Alba2016, title = {New genes and functional innovation in mammals}, author = {José Luis Villanueva-Cañas Jorge Ruiz-Orera Isabel Agea Maria Gallo David Andreu M.Mar Albà}, url = {http://dx.doi.org/10.1101/090860 }, year = {2016}, date = {2016-12-02}, journal = {bioRxiv}, abstract = {The birth of genes that encode new proteins is a major source of evolutionary innovation. However, we still understand relatively little about how these genes come into being and which functions they are selected for. Here we address this question by generating a comprehensive list of mammalian-specific gene families originated at different times during mammalian evolution. We combine gene annotations and de novo transcript assemblies from 30 mammalian species, obtaining about 6,000 families with different species composition. We show that the families which arose early in mammalian evolution (basal) are relatively well-characterized. They are enriched in secreted proteins and include milk and skin polypeptides, immune response components and, proteins involved in spermatogenesis. In contrast, there is a severe lack of knowledge about the functions of proteins which have a more recent origin in certain mammalian groups (young), despite the fact that they have extensive proteomics support. Interestingly, we find that both young and basal mammalian-specific gene families show similar gene expression biases, with a marked enrichment in testis. Proteins from both groups tend to be short and depleted in aromatic and negatively charged residues. This indicates shared mechanisms of formation and suggests that the youngest proteins may have been retained for similar kinds of functions as the oldest ones. We identify several previously described cases of genes originated de novo from non-coding genomic regions, supporting the idea that this mechanism frequently underlies the evolution of new protein-coding genes. The catalogue of gene families generated here provides a unique resource for studies on the role of new genes in mammalian-specific adaptations.}, keywords = {adaptation, de novo genes, Evolution, gene family, mammal} } The birth of genes that encode new proteins is a major source of evolutionary innovation. However, we still understand relatively little about how these genes come into being and which functions they are selected for. Here we address this question by generating a comprehensive list of mammalian-specific gene families originated at different times during mammalian evolution. We combine gene annotations and de novo transcript assemblies from 30 mammalian species, obtaining about 6,000 families with different species composition. We show that the families which arose early in mammalian evolution (basal) are relatively well-characterized. They are enriched in secreted proteins and include milk and skin polypeptides, immune response components and, proteins involved in spermatogenesis. In contrast, there is a severe lack of knowledge about the functions of proteins which have a more recent origin in certain mammalian groups (young), despite the fact that they have extensive proteomics support. Interestingly, we find that both young and basal mammalian-specific gene families show similar gene expression biases, with a marked enrichment in testis. Proteins from both groups tend to be short and depleted in aromatic and negatively charged residues. This indicates shared mechanisms of formation and suggests that the youngest proteins may have been retained for similar kinds of functions as the oldest ones. We identify several previously described cases of genes originated de novo from non-coding genomic regions, supporting the idea that this mechanism frequently underlies the evolution of new protein-coding genes. The catalogue of gene families generated here provides a unique resource for studies on the role of new genes in mammalian-specific adaptations. |
Vartia S, Villanueva-Cañas JL, Finarelli J, Farrell ED, Collins PC, Hughes GM, Carlsson JE, Gauthier DT, McGinnity P, Cross TF, FitzGerald RD, Mirimin L, Crispie F, Cotter PD, Carlsson J. A novel method of microsatellite genotyping-by-sequencing using individual combinatorial barcoding (Article) R Soc Open Sci, 3 (1), pp. 150565, 2016, ISBN: 10.1098/rsos.150565. (Abstract | Links | BibTeX | Tags: barcoding, Evolution, microsatellite, sequencing) @article{S2016, title = {A novel method of microsatellite genotyping-by-sequencing using individual combinatorial barcoding}, author = {Vartia S, Villanueva-Cañas JL, Finarelli J, Farrell ED, Collins PC, Hughes GM, Carlsson JE, Gauthier DT, McGinnity P, Cross TF, FitzGerald RD, Mirimin L, Crispie F, Cotter PD, Carlsson J.}, url = {http://www.ncbi.nlm.nih.gov/pubmed/26909185}, isbn = {10.1098/rsos.150565}, year = {2016}, date = {2016-01-20}, journal = {R Soc Open Sci}, volume = {3}, number = {1}, pages = {150565}, abstract = {This study examines the potential of next-generation sequencing based \'genotyping-by-sequencing\' (GBS) of microsatellite loci for rapid and cost-effective genotyping in large-scale population genetic studies. The recovery of individual genotypes from large sequence pools was achieved by PCR-incorporated combinatorial barcoding using universal primers. Three experimental conditions were employed to explore the possibility of using this approach with existing and novel multiplex marker panels and weighted amplicon mixture. The GBS approach was validated against microsatellite data generated by capillary electrophoresis. GBS allows access to the underlying nucleotide sequences that can reveal homoplasy, even in large datasets and facilitates cross laboratory transfer. GBS of microsatellites, using individual combinatorial barcoding, is potentially faster and cheaper than current microsatellite approaches and offers better and more data. }, keywords = {barcoding, Evolution, microsatellite, sequencing} } This study examines the potential of next-generation sequencing based 'genotyping-by-sequencing' (GBS) of microsatellite loci for rapid and cost-effective genotyping in large-scale population genetic studies. The recovery of individual genotypes from large sequence pools was achieved by PCR-incorporated combinatorial barcoding using universal primers. Three experimental conditions were employed to explore the possibility of using this approach with existing and novel multiplex marker panels and weighted amplicon mixture. The GBS approach was validated against microsatellite data generated by capillary electrophoresis. GBS allows access to the underlying nucleotide sequences that can reveal homoplasy, even in large datasets and facilitates cross laboratory transfer. GBS of microsatellites, using individual combinatorial barcoding, is potentially faster and cheaper than current microsatellite approaches and offers better and more data. |
2015 |
Ruiz-Orera, Jorge, Hernandez-Rodriguez, Jessica, Chiva, Cristina, Sabidó, Eduard, Kondova, Ivanela, Bontrop, Ronald, Marqués-Bonet, Tomàs, Albà, M.Mar Origins of de novo genes in human and chimpanzee (Article) Plos Genetics, 11 (12), pp. e1005721, 2015. (Links | BibTeX | Tags: chimpanzee, de novo gene, Evolution, Humans, lncRNA, Promoter, proteomics, ribosome profiling, RNA-Seq, transcription factor binding site, transcriptomics) @article{Ruiz-Orera2015b, title = {Origins of de novo genes in human and chimpanzee}, author = {Ruiz-Orera, Jorge, Hernandez-Rodriguez, Jessica, Chiva, Cristina, Sabidó, Eduard, Kondova, Ivanela, Bontrop, Ronald, Marqués-Bonet, Tomàs, Albà, M.Mar}, url = {http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005721}, year = {2015}, date = {2015-12-31}, journal = {Plos Genetics}, volume = {11}, number = {12}, pages = {e1005721}, keywords = {chimpanzee, de novo gene, Evolution, Humans, lncRNA, Promoter, proteomics, ribosome profiling, RNA-Seq, transcription factor binding site, transcriptomics} } |
Subirana, Juan A., Albà, M. Mar, Messeguer, Xavier High evolutionary turnover of satellite families in Caenorhabditis (Article) BMC Evolutionary Biology, 15 (1), pp. 218, 2015, ISSN: 1471-2148. (Abstract | Links | BibTeX | Tags: Evolution, Repeats, satellite) @article{Subirana2015, title = {High evolutionary turnover of satellite families in Caenorhabditis}, author = {Subirana, Juan A. and Albà, M. Mar and Messeguer, Xavier}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4595182&tool=pmcentrez&rendertype=abstract}, issn = {1471-2148}, year = {2015}, date = {2015-01-01}, journal = {BMC Evolutionary Biology}, volume = {15}, number = {1}, pages = {218}, abstract = {BACKGROUND: The high density of tandem repeat sequences (satellites) in nematode genomes and the availability of genome sequences from several species in the group offer a unique opportunity to better understand the evolutionary dynamics and the functional role of these sequences. We take advantage of the previously developed SATFIND program to study the satellites in four Caenorhabditis species and investigate these questions. METHODS: The identification and comparison of satellites is carried out in three steps. First we find all the satellites present in each species with the SATFIND program. Each satellite is defined by its length, number of repeats, and repeat sequence. Only satellites with at least ten repeats are considered. In the second step we build satellite families with a newly developed alignment program. Satellite families are defined by a consensus sequence and the number of satellites in the family. Finally we compare the consensus sequence of satellite families in different species. RESULTS: We give a catalog of individual satellites in each species. We have also identified satellite families with a related sequence and compare them in different species. We analyze the turnover of satellites: they increased in size through duplications of fragments of 100-300 bases. It appears that in many cases they have undergone an explosive expansion. In C. elegans we have identified a subset of large satellites that have strong affinity for the centromere protein CENP-A. We have also compared our results with those obtained from other species, including one nematode and three mammals. CONCLUSIONS: Most satellite families found in Caenorhabditis are species-specific; in particular those with long repeats. A subset of these satellites may facilitate the formation of kinetochores in mitosis. Other satellite families in C. elegans are either related to Helitron transposons or to meiotic pairing centers.}, keywords = {Evolution, Repeats, satellite} } BACKGROUND: The high density of tandem repeat sequences (satellites) in nematode genomes and the availability of genome sequences from several species in the group offer a unique opportunity to better understand the evolutionary dynamics and the functional role of these sequences. We take advantage of the previously developed SATFIND program to study the satellites in four Caenorhabditis species and investigate these questions. METHODS: The identification and comparison of satellites is carried out in three steps. First we find all the satellites present in each species with the SATFIND program. Each satellite is defined by its length, number of repeats, and repeat sequence. Only satellites with at least ten repeats are considered. In the second step we build satellite families with a newly developed alignment program. Satellite families are defined by a consensus sequence and the number of satellites in the family. Finally we compare the consensus sequence of satellite families in different species. RESULTS: We give a catalog of individual satellites in each species. We have also identified satellite families with a related sequence and compare them in different species. We analyze the turnover of satellites: they increased in size through duplications of fragments of 100-300 bases. It appears that in many cases they have undergone an explosive expansion. In C. elegans we have identified a subset of large satellites that have strong affinity for the centromere protein CENP-A. We have also compared our results with those obtained from other species, including one nematode and three mammals. CONCLUSIONS: Most satellite families found in Caenorhabditis are species-specific; in particular those with long repeats. A subset of these satellites may facilitate the formation of kinetochores in mitosis. Other satellite families in C. elegans are either related to Helitron transposons or to meiotic pairing centers. |
Radó-Trilla, Núria, Arató, Krisztina, Pegueroles, Cinta, Raya, Alicia, de la Luna, Susana, Albà, M Mar Molecular biology and evolution, 2015, ISSN: 1537-1719. (Abstract | Links | BibTeX | Tags: amino acid tandem repeat, Evolution, Gene Duplication, polyalanine, transcription factor, vertebrates) @article{Rado-Trilla2015, title = {Key role of amino acid repeat expansions in the functional diversification of duplicated transcription factors.}, author = {Radó-Trilla, Núria and Arató, Krisztina and Pegueroles, Cinta and Raya, Alicia and de la Luna, Susana and Albà, M Mar}, url = {http://www.ncbi.nlm.nih.gov/pubmed/25931513}, issn = {1537-1719}, year = {2015}, date = {2015-01-01}, journal = {Molecular biology and evolution}, abstract = {The high regulatory complexity of vertebrates has been related to two closely spaced whole genome duplications (2R-WGD) that occurred before the divergence of the major vertebrate groups. Following these events, many developmental transcription factors (TFs) were retained in multiple copies and subsequently specialized in diverse functions, whereas others reverted to their singleton state. TFs are known to be generally rich in amino acid repeats or low-complexity regions (LCRs), such as polyalanine or polyglutamine runs, which can evolve rapidly and potentially influence the transcriptional activity of the protein. Here we test the hypothesis that LCRs have played a major role in the diversification of TF gene duplicates. We find that nearly half of the TF gene families originated during the 2R-WGD contain LCRs. The number of gene duplicates with LCRs is 155 out of 550 analyzed (28%), about twice as many as the number of single copy genes with LCRs (15 out of 115, 13%). In addition, duplicated TFs preferentially accumulate certain LCR types, the most prominent of which are alanine repeats. We experimentally test the role of alanine-rich LCRs in two different TF gene families, PHOX2A/PHOX2B and LHX2/LHX9. In both cases, the presence of the alanine-rich LCR in one of the copies (PHOX2B and LHX2) significantly increases the capacity of the TF to activate transcription. Taken together, the results provide strong evidence that LCRs are important driving forces of evolutionary change in duplicated genes.}, keywords = {amino acid tandem repeat, Evolution, Gene Duplication, polyalanine, transcription factor, vertebrates} } The high regulatory complexity of vertebrates has been related to two closely spaced whole genome duplications (2R-WGD) that occurred before the divergence of the major vertebrate groups. Following these events, many developmental transcription factors (TFs) were retained in multiple copies and subsequently specialized in diverse functions, whereas others reverted to their singleton state. TFs are known to be generally rich in amino acid repeats or low-complexity regions (LCRs), such as polyalanine or polyglutamine runs, which can evolve rapidly and potentially influence the transcriptional activity of the protein. Here we test the hypothesis that LCRs have played a major role in the diversification of TF gene duplicates. We find that nearly half of the TF gene families originated during the 2R-WGD contain LCRs. The number of gene duplicates with LCRs is 155 out of 550 analyzed (28%), about twice as many as the number of single copy genes with LCRs (15 out of 115, 13%). In addition, duplicated TFs preferentially accumulate certain LCR types, the most prominent of which are alanine repeats. We experimentally test the role of alanine-rich LCRs in two different TF gene families, PHOX2A/PHOX2B and LHX2/LHX9. In both cases, the presence of the alanine-rich LCR in one of the copies (PHOX2B and LHX2) significantly increases the capacity of the TF to activate transcription. Taken together, the results provide strong evidence that LCRs are important driving forces of evolutionary change in duplicated genes. |
2012 |
Toll-Riera, Macarena, Bostick, David, Albà, M Mar, Plotkin, Joshua B Structure and age jointly influence rates of protein evolution. (Article) PLoS computational biology, 8 (5), pp. e1002542, 2012, ISSN: 1553-7358. (Abstract | Links | BibTeX | Tags: Animals, Binding Sites, Computational Biology, Eukaryota, Evolution, Humans, Mice, Molecular, Protein Conformation, Protein Stability, Proteins, Proteins: chemistry, Proteins: genetics, Proteins: metabolism, Solvents) @article{Toll-Riera2012a, title = {Structure and age jointly influence rates of protein evolution.}, author = {Toll-Riera, Macarena and Bostick, David and Albà, M Mar and Plotkin, Joshua B}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3364943&tool=pmcentrez&rendertype=abstract}, issn = {1553-7358}, year = {2012}, date = {2012-01-01}, journal = {PLoS computational biology}, volume = {8}, number = {5}, pages = {e1002542}, abstract = {What factors determine a protein's rate of evolution are actively debated. Especially unclear is the relative role of intrinsic factors of present-day proteins versus historical factors such as protein age. Here we study the interplay of structural properties and evolutionary age, as determinants of protein evolutionary rate. We use a large set of one-to-one orthologs between human and mouse proteins, with mapped PDB structures. We report that previously observed structural correlations also hold within each age group - including relationships between solvent accessibility, designabililty, and evolutionary rates. However, age also plays a crucial role: age modulates the relationship between solvent accessibility and rate. Additionally, younger proteins, despite being less designable, tend to evolve faster than older proteins. We show that previously reported relationships between age and rate cannot be explained by structural biases among age groups. Finally, we introduce a knowledge-based potential function to study the stability of proteins through large-scale computation. We find that older proteins are more stable for their native structure, and more robust to mutations, than younger ones. Our results underscore that several determinants, both intrinsic and historical, can interact to determine rates of protein evolution.}, keywords = {Animals, Binding Sites, Computational Biology, Eukaryota, Evolution, Humans, Mice, Molecular, Protein Conformation, Protein Stability, Proteins, Proteins: chemistry, Proteins: genetics, Proteins: metabolism, Solvents} } What factors determine a protein's rate of evolution are actively debated. Especially unclear is the relative role of intrinsic factors of present-day proteins versus historical factors such as protein age. Here we study the interplay of structural properties and evolutionary age, as determinants of protein evolutionary rate. We use a large set of one-to-one orthologs between human and mouse proteins, with mapped PDB structures. We report that previously observed structural correlations also hold within each age group - including relationships between solvent accessibility, designabililty, and evolutionary rates. However, age also plays a crucial role: age modulates the relationship between solvent accessibility and rate. Additionally, younger proteins, despite being less designable, tend to evolve faster than older proteins. We show that previously reported relationships between age and rate cannot be explained by structural biases among age groups. Finally, we introduce a knowledge-based potential function to study the stability of proteins through large-scale computation. We find that older proteins are more stable for their native structure, and more robust to mutations, than younger ones. Our results underscore that several determinants, both intrinsic and historical, can interact to determine rates of protein evolution. |
2011 |
Toll-Riera, Macarena, Laurie, Steve, Albà, M Mar Lineage-specific variation in intensity of natural selection in mammals. (Article) Molecular biology and evolution, 28 (1), pp. 383–98, 2011, ISSN: 1537-1719. (Abstract | Links | BibTeX | Tags: Amino Acid Sequence, Amino Acid Substitution, Animals, Evolution, F-Box Proteins, F-Box Proteins: genetics, G-Protein-Coupled, G-Protein-Coupled: genetics, Genetic, Genetic Variation, Humans, Mammals, Mammals: genetics, Molecular, Molecular Sequence Data, N-Methyl-D-Aspartate, N-Methyl-D-Aspartate: genetics, Odorant, Odorant: genetics, Receptors, Selection, Sequence Alignment) @article{Toll-Riera2011a, title = {Lineage-specific variation in intensity of natural selection in mammals.}, author = {Toll-Riera, Macarena and Laurie, Steve and Albà, M Mar}, url = {http://www.ncbi.nlm.nih.gov/pubmed/20688808}, issn = {1537-1719}, year = {2011}, date = {2011-01-01}, journal = {Molecular biology and evolution}, volume = {28}, number = {1}, pages = {383--98}, abstract = {The molecular clock hypothesis states that protein-coding genes evolve at an approximately constant rate. However, this is only expected to be true as long as the function and the tertiary structure of the molecule remain unaltered. An important implication of this statement is that significant deviations in the rate of evolution of a gene with respect to the species clock are likely to reflect functional and/or structural alterations. Here, we present a method to identify such deviations and apply it to a data set of 2,929 high-quality coding sequence alignments corresponding to one-to-one orthologous genes from six mammalian species--human, macaque, mouse, rat, cow, and dog. Deviated branches are defined as those that present significant alterations in both the rate of nonsynonymous substitutions (dN) and the selective pressure (dN/dS). Strikingly, we find that as many as 24.5% of the genes show branch-specific deviations in dN and dN/dS, though this is a relatively well-conserved set of genes. Around half of these genes show branch-specific acceleration of evolutionary rates. Positive selection (PS) tests based on divergence data only identify 17.7% of the accelerated branches. Failure to identify PS in accelerated branches with an excess of radical amino acid replacements suggests that these tests are conservative. Interestingly, genes with accelerated branches are significantly enriched in neural proteins, indicating that this type of protein might play a more important role than previously thought in species diversification, although they are generally not detected by PS tests. We discuss in detail several examples of genes that show lineage-specific evolutionary rate acceleration and are involved in synaptic transmission, chemosensory perception, and ubiquitination.}, keywords = {Amino Acid Sequence, Amino Acid Substitution, Animals, Evolution, F-Box Proteins, F-Box Proteins: genetics, G-Protein-Coupled, G-Protein-Coupled: genetics, Genetic, Genetic Variation, Humans, Mammals, Mammals: genetics, Molecular, Molecular Sequence Data, N-Methyl-D-Aspartate, N-Methyl-D-Aspartate: genetics, Odorant, Odorant: genetics, Receptors, Selection, Sequence Alignment} } The molecular clock hypothesis states that protein-coding genes evolve at an approximately constant rate. However, this is only expected to be true as long as the function and the tertiary structure of the molecule remain unaltered. An important implication of this statement is that significant deviations in the rate of evolution of a gene with respect to the species clock are likely to reflect functional and/or structural alterations. Here, we present a method to identify such deviations and apply it to a data set of 2,929 high-quality coding sequence alignments corresponding to one-to-one orthologous genes from six mammalian species--human, macaque, mouse, rat, cow, and dog. Deviated branches are defined as those that present significant alterations in both the rate of nonsynonymous substitutions (dN) and the selective pressure (dN/dS). Strikingly, we find that as many as 24.5% of the genes show branch-specific deviations in dN and dN/dS, though this is a relatively well-conserved set of genes. Around half of these genes show branch-specific acceleration of evolutionary rates. Positive selection (PS) tests based on divergence data only identify 17.7% of the accelerated branches. Failure to identify PS in accelerated branches with an excess of radical amino acid replacements suggests that these tests are conservative. Interestingly, genes with accelerated branches are significantly enriched in neural proteins, indicating that this type of protein might play a more important role than previously thought in species diversification, although they are generally not detected by PS tests. We discuss in detail several examples of genes that show lineage-specific evolutionary rate acceleration and are involved in synaptic transmission, chemosensory perception, and ubiquitination. |
2010 |
Farré, Domènec, Albà, M Mar Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates. (Article) Molecular biology and evolution, 27 (2), pp. 325–35, 2010, ISSN: 1537-1719. (Abstract | Links | BibTeX | Tags: Animals, Evolution, Gene Duplication, Genetic, Humans, Mammals, Mammals: genetics, Models, Molecular) @article{Farre2010, title = {Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates.}, author = {Farré, Domènec and Albà, M Mar}, url = {http://www.ncbi.nlm.nih.gov/pubmed/19822635}, issn = {1537-1719}, year = {2010}, date = {2010-01-01}, journal = {Molecular biology and evolution}, volume = {27}, number = {2}, pages = {325--35}, abstract = {Gene duplication is a major mechanism for molecular evolutionary innovation. Young gene duplicates typically exhibit elevated rates of protein evolution and, according to a number of recent studies, increased expression divergence. However, the nature of these changes is still poorly understood. To gain novel insights into the functional consequences of gene duplication, we have undertaken an in-depth analysis of a large data set of gene families containing primate- and/or rodent-specific gene duplicates. We have found a clear tendency toward an increase in protein, promoter, and expression divergence with increasing number of duplication events undergone by each gene since the human-mouse split. In addition, gene duplication is significantly associated with a reduction in expression breadth and intensity. Interestingly, it is possible to identify three main groups regarding the evolution of gene expression following gene duplication. The first group, which comprises around 25% of the families, shows patterns compatible with tissue-expression partitioning. The second and largest group, comprising 33-53% of the families, shows broad expression of one of the gene copies and reduced, overlapping, expression of the other copy or copies. This can be attributed, in most cases, to loss of expression in several tissues of one or more gene copies. Finally, a substantial number of families, 19-35%, maintain a very high level of tissue-expression overlap (>0.8) after tens of millions of years of evolution. These families may have been subject to selection for increased gene dosage.}, keywords = {Animals, Evolution, Gene Duplication, Genetic, Humans, Mammals, Mammals: genetics, Models, Molecular} } Gene duplication is a major mechanism for molecular evolutionary innovation. Young gene duplicates typically exhibit elevated rates of protein evolution and, according to a number of recent studies, increased expression divergence. However, the nature of these changes is still poorly understood. To gain novel insights into the functional consequences of gene duplication, we have undertaken an in-depth analysis of a large data set of gene families containing primate- and/or rodent-specific gene duplicates. We have found a clear tendency toward an increase in protein, promoter, and expression divergence with increasing number of duplication events undergone by each gene since the human-mouse split. In addition, gene duplication is significantly associated with a reduction in expression breadth and intensity. Interestingly, it is possible to identify three main groups regarding the evolution of gene expression following gene duplication. The first group, which comprises around 25% of the families, shows patterns compatible with tissue-expression partitioning. The second and largest group, comprising 33-53% of the families, shows broad expression of one of the gene copies and reduced, overlapping, expression of the other copy or copies. This can be attributed, in most cases, to loss of expression in several tissues of one or more gene copies. Finally, a substantial number of families, 19-35%, maintain a very high level of tissue-expression overlap (>0.8) after tens of millions of years of evolution. These families may have been subject to selection for increased gene dosage. |
2009 |
Toll-Riera, Macarena, Castelo, Robert, Bellora, Nicolás, Albà, M Mar Evolution of primate orphan proteins. (Article) Biochemical Society transactions, 37 (Pt 4), pp. 778–82, 2009, ISSN: 1470-8752. (Abstract | Links | BibTeX | Tags: Animals, Evolution, Gene Duplication, Genome, Genome: genetics, Molecular, Primates, Primates: genetics, Proteins, Proteins: genetics) @article{Toll-Riera2009, title = {Evolution of primate orphan proteins.}, author = {Toll-Riera, Macarena and Castelo, Robert and Bellora, Nicolás and Albà, M Mar}, url = {http://www.ncbi.nlm.nih.gov/pubmed/19614593}, issn = {1470-8752}, year = {2009}, date = {2009-01-01}, journal = {Biochemical Society transactions}, volume = {37}, number = {Pt 4}, pages = {778--82}, abstract = {Genomes contain a large number of genes that do not have recognizable homologues in other species. These genes, found in only one or a few closely related species, are known as orphan genes. Their limited distribution implies that many of them are probably involved in lineage-specific adaptive processes. One important question that has remained elusive to date is how orphan genes originate. It has been proposed that they might have arisen by gene duplication followed by a period of very rapid sequence divergence, which would have erased any traces of similarity to other evolutionarily related genes. However, this explanation does not seem plausible for genes lacking homologues in very closely related species. In the present article, we review recent efforts to identify the mechanisms of formation of primate orphan genes. These studies reveal an unexpected important role of transposable elements in the formation of novel protein-coding genes in the genomes of primates.}, keywords = {Animals, Evolution, Gene Duplication, Genome, Genome: genetics, Molecular, Primates, Primates: genetics, Proteins, Proteins: genetics} } Genomes contain a large number of genes that do not have recognizable homologues in other species. These genes, found in only one or a few closely related species, are known as orphan genes. Their limited distribution implies that many of them are probably involved in lineage-specific adaptive processes. One important question that has remained elusive to date is how orphan genes originate. It has been proposed that they might have arisen by gene duplication followed by a period of very rapid sequence divergence, which would have erased any traces of similarity to other evolutionarily related genes. However, this explanation does not seem plausible for genes lacking homologues in very closely related species. In the present article, we review recent efforts to identify the mechanisms of formation of primate orphan genes. These studies reveal an unexpected important role of transposable elements in the formation of novel protein-coding genes in the genomes of primates. |
2007 |
Albà, M Mar, Castresana, Jose On homology searches by protein Blast and the characterization of the age of genes. (Article) BMC evolutionary biology, 7 pp. 53, 2007, ISSN: 1471-2148. (Abstract | Links | BibTeX | Tags: Amino Acid, Animals, Computational Biology, Databases, Evolution, Genes, Humans, Molecular, Phylogeny, Protein, Sequence Analysis, Sequence Homology) @article{Alba2007, title = {On homology searches by protein Blast and the characterization of the age of genes.}, author = {Albà, M Mar and Castresana, Jose}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1855329&tool=pmcentrez&rendertype=abstract}, issn = {1471-2148}, year = {2007}, date = {2007-01-01}, journal = {BMC evolutionary biology}, volume = {7}, pages = {53}, abstract = {It has been shown in a variety of organisms, including mammals, that genes that appeared recently in evolution, for example orphan genes, evolve faster than older genes. Low functional constraints at the time of origin of novel genes may explain these results. However, this observation has been recently attributed to an artifact caused by the inability of Blast to detect the fastest genes in different eukaryotic genomes. Distinguishing between these two possible explanations would be of great importance for any studies dealing with the taxon distribution of proteins and the origin of novel genes.}, keywords = {Amino Acid, Animals, Computational Biology, Databases, Evolution, Genes, Humans, Molecular, Phylogeny, Protein, Sequence Analysis, Sequence Homology} } It has been shown in a variety of organisms, including mammals, that genes that appeared recently in evolution, for example orphan genes, evolve faster than older genes. Low functional constraints at the time of origin of novel genes may explain these results. However, this observation has been recently attributed to an artifact caused by the inability of Blast to detect the fastest genes in different eukaryotic genomes. Distinguishing between these two possible explanations would be of great importance for any studies dealing with the taxon distribution of proteins and the origin of novel genes. |
Mularoni, Loris, Veitia, Reiner A, Albà, M Mar Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats. (Article) Genomics, 89 (3), pp. 316–25, 2007, ISSN: 0888-7543. (Abstract | Links | BibTeX | Tags: Amino Acid, Amino Acid Sequence, Animals, Complementary, Conserved Sequence, DNA, Evolution, Genetic, Humans, Mice, Molecular, Point Mutation, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences, Selection, Trinucleotide Repeats) @article{Mularoni2007, title = {Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats.}, author = {Mularoni, Loris and Veitia, Reiner A and Albà, M Mar}, url = {http://www.ncbi.nlm.nih.gov/pubmed/17196365}, issn = {0888-7543}, year = {2007}, date = {2007-01-01}, journal = {Genomics}, volume = {89}, number = {3}, pages = {316--25}, abstract = {Single-amino-acid tandem repeats are very common in mammalian proteins but their function and evolution are still poorly understood. Here we investigate how the variability and prevalence of amino acid repeats are related to the evolutionary constraints operating on the proteins. We find a significant positive correlation between repeat size difference and protein nonsynonymous substitution rate in human and mouse orthologous genes. This association is observed for all the common amino acid repeat types and indicates that rapid diversification of repeat structures, involving both trinucleotide slippage and nucleotide substitutions, preferentially occurs in proteins subject to low selective constraints. However, strikingly, we also observe a significant negative correlation between the number of repeats in a protein and the gene nonsynonymous substitution rate, particularly for glutamine, glycine, and alanine repeats. This implies that proteins subject to strong selective constraints tend to contain an unexpectedly high number of repeats, which tend to be well conserved between the two species. This is consistent with a role for selection in the maintenance of a significant number of repeats. Analysis of the codon structure of the sequences encoding the repeats shows that codon purity is associated with high repeat size interspecific variability. Interestingly, polyalanine and polyglutamine repeats associated with disease show very distinctive features regarding the degree of repeat conservation and the protein sequence selective constraints.}, keywords = {Amino Acid, Amino Acid Sequence, Animals, Complementary, Conserved Sequence, DNA, Evolution, Genetic, Humans, Mice, Molecular, Point Mutation, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences, Selection, Trinucleotide Repeats} } Single-amino-acid tandem repeats are very common in mammalian proteins but their function and evolution are still poorly understood. Here we investigate how the variability and prevalence of amino acid repeats are related to the evolutionary constraints operating on the proteins. We find a significant positive correlation between repeat size difference and protein nonsynonymous substitution rate in human and mouse orthologous genes. This association is observed for all the common amino acid repeat types and indicates that rapid diversification of repeat structures, involving both trinucleotide slippage and nucleotide substitutions, preferentially occurs in proteins subject to low selective constraints. However, strikingly, we also observe a significant negative correlation between the number of repeats in a protein and the gene nonsynonymous substitution rate, particularly for glutamine, glycine, and alanine repeats. This implies that proteins subject to strong selective constraints tend to contain an unexpectedly high number of repeats, which tend to be well conserved between the two species. This is consistent with a role for selection in the maintenance of a significant number of repeats. Analysis of the codon structure of the sequences encoding the repeats shows that codon purity is associated with high repeat size interspecific variability. Interestingly, polyalanine and polyglutamine repeats associated with disease show very distinctive features regarding the degree of repeat conservation and the protein sequence selective constraints. |
Albà, M M, Tompa, P, Veitia, R A Amino acid repeats and the structure and evolution of proteins. (Article) Genome dynamics, 3 pp. 119–30, 2007, ISSN: 1660-9263. (Abstract | Links | BibTeX | Tags: Amino Acid, Animals, Base Composition, Evolution, Humans, Molecular, Open Reading Frames, Open Reading Frames: genetics, Peptides, Peptides: chemistry, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences) @article{Alba2007a, title = {Amino acid repeats and the structure and evolution of proteins.}, author = {Albà, M M and Tompa, P and Veitia, R A}, url = {http://www.ncbi.nlm.nih.gov/pubmed/18753788}, issn = {1660-9263}, year = {2007}, date = {2007-01-01}, journal = {Genome dynamics}, volume = {3}, pages = {119--30}, abstract = {Many proteins have repeats or runs of single amino acids. The pathogenicity of some repeat expansions has fueled proteomic, genomic and structural explorations of homopolymeric runs not only in human but in a wide variety of other organisms. Other types of amino acid repetitive structures exhibit more complex patterns than homopeptides. Irrespective of their precise organization, repetitive sequences are defined as low complexity or simple sequences, as one or a few residues are particularly abundant. Prokaryotes show a relatively low frequency of simple sequences compared to eukaryotes. In the latter the percentage of proteins containing homopolymeric runs varies greatly from one group to another. For instance, within vertebrates, amino acid repeat frequency is much higher in mammals than in amphibians, birds or fishes. For some repeats, this is correlated with the GC-richness of the regions containing the corresponding genes. Homopeptides tend to occur in disordered regions of transcription factors or developmental proteins. They can trigger the formation of protein aggregates, particularly in 'disease' proteins. Simple sequences seem to evolve more rapidly than the rest of the protein/gene and may have a functional impact. Therefore, they are good candidates to promote rapid evolutionary changes. All these diverse facets of homopolymeric runs are explored in this review.}, keywords = {Amino Acid, Animals, Base Composition, Evolution, Humans, Molecular, Open Reading Frames, Open Reading Frames: genetics, Peptides, Peptides: chemistry, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences} } Many proteins have repeats or runs of single amino acids. The pathogenicity of some repeat expansions has fueled proteomic, genomic and structural explorations of homopolymeric runs not only in human but in a wide variety of other organisms. Other types of amino acid repetitive structures exhibit more complex patterns than homopeptides. Irrespective of their precise organization, repetitive sequences are defined as low complexity or simple sequences, as one or a few residues are particularly abundant. Prokaryotes show a relatively low frequency of simple sequences compared to eukaryotes. In the latter the percentage of proteins containing homopolymeric runs varies greatly from one group to another. For instance, within vertebrates, amino acid repeat frequency is much higher in mammals than in amphibians, birds or fishes. For some repeats, this is correlated with the GC-richness of the regions containing the corresponding genes. Homopeptides tend to occur in disordered regions of transcription factors or developmental proteins. They can trigger the formation of protein aggregates, particularly in 'disease' proteins. Simple sequences seem to evolve more rapidly than the rest of the protein/gene and may have a functional impact. Therefore, they are good candidates to promote rapid evolutionary changes. All these diverse facets of homopolymeric runs are explored in this review. |
Farré, Domènec, Bellora, Nicolás, Mularoni, Loris, Messeguer, Xavier, Albà, M Mar Housekeeping genes tend to show reduced upstream sequence conservation. (Article) Genome biology, 8 (7), pp. R140, 2007, ISSN: 1465-6914. (Abstract | Links | BibTeX | Tags: Animals, Base Sequence, Conserved Sequence, CpG Islands, Evolution, Gene Expression, Genetic, Genetic Variation, Humans, Mice, Molecular, Molecular Sequence Data, Promoter Regions) @article{Farre2007, title = {Housekeeping genes tend to show reduced upstream sequence conservation.}, author = {Farré, Domènec and Bellora, Nicolás and Mularoni, Loris and Messeguer, Xavier and Albà, M Mar}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2323216&tool=pmcentrez&rendertype=abstract}, issn = {1465-6914}, year = {2007}, date = {2007-01-01}, journal = {Genome biology}, volume = {8}, number = {7}, pages = {R140}, abstract = {Understanding the constraints that operate in mammalian gene promoter sequences is of key importance to understand the evolution of gene regulatory networks. The level of promoter conservation varies greatly across orthologous genes, denoting differences in the strength of the evolutionary constraints. Here we test the hypothesis that the number of tissues in which a gene is expressed is related in a significant manner to the extent of promoter sequence conservation.}, keywords = {Animals, Base Sequence, Conserved Sequence, CpG Islands, Evolution, Gene Expression, Genetic, Genetic Variation, Humans, Mice, Molecular, Molecular Sequence Data, Promoter Regions} } Understanding the constraints that operate in mammalian gene promoter sequences is of key importance to understand the evolution of gene regulatory networks. The level of promoter conservation varies greatly across orthologous genes, denoting differences in the strength of the evolutionary constraints. Here we test the hypothesis that the number of tissues in which a gene is expressed is related in a significant manner to the extent of promoter sequence conservation. |
2006 |
Furney, Simon J, Albà, M Mar, López-Bigas, Núria BMC genomics, 7 pp. 165, 2006, ISSN: 1471-2164. (Abstract | Links | BibTeX | Tags: Amino Acid, Animals, Caenorhabditis elegans, Caenorhabditis elegans: genetics, Computational Biology, Conserved Sequence, Dominant, Essential, Evolution, Genes, Genetic, Genetic Diseases, Genetic Structures, Humans, Inborn, Inborn: classification, Inborn: genetics, Mice, Molecular, Mutation, Pan troglodytes, Pan troglodytes: genetics, Recessive, Selection, Sequence Homology) @article{Furney2006, title = {Differences in the evolutionary history of disease genes affected by dominant or recessive mutations.}, author = {Furney, Simon J and Albà, M Mar and López-Bigas, Núria}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1534034&tool=pmcentrez&rendertype=abstract}, issn = {1471-2164}, year = {2006}, date = {2006-01-01}, journal = {BMC genomics}, volume = {7}, pages = {165}, abstract = {Global analyses of human disease genes by computational methods have yielded important advances in the understanding of human diseases. Generally these studies have treated the group of disease genes uniformly, thus ignoring the type of disease-causing mutations (dominant or recessive). In this report we present a comprehensive study of the evolutionary history of autosomal disease genes separated by mode of inheritance.}, keywords = {Amino Acid, Animals, Caenorhabditis elegans, Caenorhabditis elegans: genetics, Computational Biology, Conserved Sequence, Dominant, Essential, Evolution, Genes, Genetic, Genetic Diseases, Genetic Structures, Humans, Inborn, Inborn: classification, Inborn: genetics, Mice, Molecular, Mutation, Pan troglodytes, Pan troglodytes: genetics, Recessive, Selection, Sequence Homology} } Global analyses of human disease genes by computational methods have yielded important advances in the understanding of human diseases. Generally these studies have treated the group of disease genes uniformly, thus ignoring the type of disease-causing mutations (dominant or recessive). In this report we present a comprehensive study of the evolutionary history of autosomal disease genes separated by mode of inheritance. |
2005 |
Albà, M Mar, Castresana, Jose Inverse relationship between evolutionary rate and age of mammalian genes. (Article) Molecular biology and evolution, 22 (3), pp. 598–606, 2005, ISSN: 0737-4038. (Abstract | Links | BibTeX | Tags: Animals, DNA, Evolution, Genome, human, Humans, Mice, Molecular, Sequence Analysis) @article{Alba2005, title = {Inverse relationship between evolutionary rate and age of mammalian genes.}, author = {Albà, M Mar and Castresana, Jose}, url = {http://www.ncbi.nlm.nih.gov/pubmed/15537804}, issn = {0737-4038}, year = {2005}, date = {2005-01-01}, journal = {Molecular biology and evolution}, volume = {22}, number = {3}, pages = {598--606}, abstract = {A large number of genes is shared by all living organisms, whereas many others are unique to some specific lineages, indicating their different times of origin. The availability of a growing number of eukaryotic genomes allows us to estimate which mammalian genes are novel genes and, approximately, when they arose. In this article, we classify human genes into four different age groups and estimate evolutionary rates in human and mouse orthologs. We show that older genes tend to evolve more slowly than newer ones; that is, proteins that arose earlier in evolution currently have a larger proportion of sites subjected to negative selection. Interestingly, this property is maintained when a fraction of the fastest-evolving genes is excluded or when only genes belonging to a given functional class are considered. One way to explain this relationship is by assuming that genes maintain their functional constraints along all their evolutionary history, but the nature of more recent evolutionary innovations is such that the functional constraints operating on them are increasingly weaker. Alternatively, our results would also be consistent with a scenario in which the functional constraints acting on a gene would not need to be constant through evolution. Instead, starting from weak functional constraints near the time of origin of a gene-as supported by mechanisms proposed for the origin of orphan genes-there would be a gradual increase in selective pressures with time, resulting in fewer accepted mutations in older versus more novel genes.}, keywords = {Animals, DNA, Evolution, Genome, human, Humans, Mice, Molecular, Sequence Analysis} } A large number of genes is shared by all living organisms, whereas many others are unique to some specific lineages, indicating their different times of origin. The availability of a growing number of eukaryotic genomes allows us to estimate which mammalian genes are novel genes and, approximately, when they arose. In this article, we classify human genes into four different age groups and estimate evolutionary rates in human and mouse orthologs. We show that older genes tend to evolve more slowly than newer ones; that is, proteins that arose earlier in evolution currently have a larger proportion of sites subjected to negative selection. Interestingly, this property is maintained when a fraction of the fastest-evolving genes is excluded or when only genes belonging to a given functional class are considered. One way to explain this relationship is by assuming that genes maintain their functional constraints along all their evolutionary history, but the nature of more recent evolutionary innovations is such that the functional constraints operating on them are increasingly weaker. Alternatively, our results would also be consistent with a scenario in which the functional constraints acting on a gene would not need to be constant through evolution. Instead, starting from weak functional constraints near the time of origin of a gene-as supported by mechanisms proposed for the origin of orphan genes-there would be a gradual increase in selective pressures with time, resulting in fewer accepted mutations in older versus more novel genes. |
2004 |
Castresana, Jose, Guigó, Roderic, Albà, M Mar Journal of molecular evolution, 59 (1), pp. 72–9, 2004, ISSN: 0022-2844. (Abstract | Links | BibTeX | Tags: Base Composition, Base Composition: genetics, Chromatin, Chromatin: metabolism, Chromosomes, Computational Biology, Databases, DNA-Binding Proteins, DNA-Binding Proteins: genetics, DNA-Binding Proteins: metabolism, Evolution, Genetic, Genome, human, Humans, Introns, Introns: genetics, Models, Molecular, Multigene Family, Multigene Family: genetics, Pair 19, Pair 19: genetics, Phylogeny, Zinc Fingers, Zinc Fingers: genetics) @article{Castresana2004, title = {Clustering of genes coding for DNA binding proteins in a region of atypical evolution of the human genome.}, author = {Castresana, Jose and Guigó, Roderic and Albà, M Mar}, url = {http://www.ncbi.nlm.nih.gov/pubmed/15383909}, issn = {0022-2844}, year = {2004}, date = {2004-01-01}, journal = {Journal of molecular evolution}, volume = {59}, number = {1}, pages = {72--9}, abstract = {Comparison of the human and mouse genomes has revealed that significant variations in evolutionary rates exist among genomic regions and that a large part of this variation is interchromosomal. We confirm in this work, using a large collection of introns, that human chromosome 19 is the one that shows the highest divergence with respect to mouse. To search for other differences among chromosomes, we examine the distribution of gene functions in human and mouse chromosomes using the Gene Ontology definitions. We found by correspondence analysis that among the strongest clusterings of gene functions in human chromosomes is a group of genes coding for DNA binding proteins in chromosome 19. Interestingly, chromosome 19 also has a very high GC content, a feature that has been proposed to promote an opening of the chromatin, thereby facilitating binding of proteins to the DNA helix. In the mouse genome, however, a similar aggregation of genes coding for DNA binding proteins and high GC content cannot be found. This suggests that the distribution of genes coding for DNA binding proteins and the variations of the chromatin accessibility to these proteins are different in the human and mouse genomes. It is likely that the overall high synonymous and intron rates in chromosome 19 are a by-product of the high GC content of this chromosome.}, keywords = {Base Composition, Base Composition: genetics, Chromatin, Chromatin: metabolism, Chromosomes, Computational Biology, Databases, DNA-Binding Proteins, DNA-Binding Proteins: genetics, DNA-Binding Proteins: metabolism, Evolution, Genetic, Genome, human, Humans, Introns, Introns: genetics, Models, Molecular, Multigene Family, Multigene Family: genetics, Pair 19, Pair 19: genetics, Phylogeny, Zinc Fingers, Zinc Fingers: genetics} } Comparison of the human and mouse genomes has revealed that significant variations in evolutionary rates exist among genomic regions and that a large part of this variation is interchromosomal. We confirm in this work, using a large collection of introns, that human chromosome 19 is the one that shows the highest divergence with respect to mouse. To search for other differences among chromosomes, we examine the distribution of gene functions in human and mouse chromosomes using the Gene Ontology definitions. We found by correspondence analysis that among the strongest clusterings of gene functions in human chromosomes is a group of genes coding for DNA binding proteins in chromosome 19. Interestingly, chromosome 19 also has a very high GC content, a feature that has been proposed to promote an opening of the chromatin, thereby facilitating binding of proteins to the DNA helix. In the mouse genome, however, a similar aggregation of genes coding for DNA binding proteins and high GC content cannot be found. This suggests that the distribution of genes coding for DNA binding proteins and the variations of the chromatin accessibility to these proteins are different in the human and mouse genomes. It is likely that the overall high synonymous and intron rates in chromosome 19 are a by-product of the high GC content of this chromosome. |
Huang, Hui, Winter, Eitan E, Wang, Huajun, Weinstock, Keith G, Xing, Heming, Goodstadt, Leo, Stenson, Peter D, Cooper, David N, Smith, Douglas, Albà, M Mar, Ponting, Chris P, Fechtel, Kim Genome biology, 5 (7), pp. R47, 2004, ISSN: 1465-6914. (Abstract | Links | BibTeX | Tags: Amino Acid, Amino Acid: genetics, Animal, Animals, Chromosome Mapping, Chromosome Mapping: methods, Conserved Sequence, Conserved Sequence: genetics, Disease Models, Evolution, Fishes, Fishes: genetics, Fungal, Fungal: genetics, Genes, Genes: genetics, Genes: physiology, Genetic, Genetic Diseases, Genome, Helminth, Helminth: genetics, human, Humans, Inborn, Inborn: genetics, Inborn: physiopathology, Insect, Insect: genetics, Mice, Molecular, Mutagenesis, Mutagenesis: genetics, Nucleic Acid, Nucleotides, Nucleotides: genetics, Point Mutation, Point Mutation: genetics, Rats, Repetitive Sequences, Selection, Sequence Homology, Trinucleotide Repeat Expansion, Trinucleotide Repeat Expansion: genetics) @article{Huang2004, title = {Evolutionary conservation and selection of human disease gene orthologs in the rat and mouse genomes.}, author = {Huang, Hui and Winter, Eitan E and Wang, Huajun and Weinstock, Keith G and Xing, Heming and Goodstadt, Leo and Stenson, Peter D and Cooper, David N and Smith, Douglas and Albà, M Mar and Ponting, Chris P and Fechtel, Kim}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=463309&tool=pmcentrez&rendertype=abstract}, issn = {1465-6914}, year = {2004}, date = {2004-01-01}, journal = {Genome biology}, volume = {5}, number = {7}, pages = {R47}, abstract = {Model organisms have contributed substantially to our understanding of the etiology of human disease as well as having assisted with the development of new treatment modalities. The availability of the human, mouse and, most recently, the rat genome sequences now permit the comprehensive investigation of the rodent orthologs of genes associated with human disease. Here, we investigate whether human disease genes differ significantly from their rodent orthologs with respect to their overall levels of conservation and their rates of evolutionary change.}, keywords = {Amino Acid, Amino Acid: genetics, Animal, Animals, Chromosome Mapping, Chromosome Mapping: methods, Conserved Sequence, Conserved Sequence: genetics, Disease Models, Evolution, Fishes, Fishes: genetics, Fungal, Fungal: genetics, Genes, Genes: genetics, Genes: physiology, Genetic, Genetic Diseases, Genome, Helminth, Helminth: genetics, human, Humans, Inborn, Inborn: genetics, Inborn: physiopathology, Insect, Insect: genetics, Mice, Molecular, Mutagenesis, Mutagenesis: genetics, Nucleic Acid, Nucleotides, Nucleotides: genetics, Point Mutation, Point Mutation: genetics, Rats, Repetitive Sequences, Selection, Sequence Homology, Trinucleotide Repeat Expansion, Trinucleotide Repeat Expansion: genetics} } Model organisms have contributed substantially to our understanding of the etiology of human disease as well as having assisted with the development of new treatment modalities. The availability of the human, mouse and, most recently, the rat genome sequences now permit the comprehensive investigation of the rodent orthologs of genes associated with human disease. Here, we investigate whether human disease genes differ significantly from their rodent orthologs with respect to their overall levels of conservation and their rates of evolutionary change. |
Gibbs, Richard A, Et al. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. (Article) Nature, 428 (6982), pp. 493–521, 2004, ISSN: 1476-4687. (Abstract | Links | BibTeX | Tags: Animals, Base Composition, Centromere, Centromere: genetics, Chromosomes, CpG Islands, CpG Islands: genetics, DNA, DNA Transposable Elements, DNA Transposable Elements: genetics, Evolution, Gene Duplication, Genome, Genomics, Humans, Inbred BN, Inbred BN: genetics, Introns, Introns: genetics, Male, Mammalian, Mammalian: genetics, Mice, Mitochondrial, Mitochondrial: genetics, Models, Molecular, Mutagenesis, Nucleic Acid, Nucleic Acid: genetics, Polymorphism, Rats, Regulatory Sequences, Retroelements, Retroelements: genetics, RNA, RNA Splice Sites, RNA Splice Sites: genetics, Sequence Analysis, Single Nucleotide, Single Nucleotide: genetics, Telomere, Telomere: genetics, Untranslated, Untranslated: genetics) @article{Gibbs2004, title = {Genome sequence of the Brown Norway rat yields insights into mammalian evolution.}, author = {Gibbs, Richard A and Et al.}, url = {http://www.ncbi.nlm.nih.gov/pubmed/15057822}, issn = {1476-4687}, year = {2004}, date = {2004-01-01}, journal = {Nature}, volume = {428}, number = {6982}, pages = {493--521}, abstract = {The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.}, keywords = {Animals, Base Composition, Centromere, Centromere: genetics, Chromosomes, CpG Islands, CpG Islands: genetics, DNA, DNA Transposable Elements, DNA Transposable Elements: genetics, Evolution, Gene Duplication, Genome, Genomics, Humans, Inbred BN, Inbred BN: genetics, Introns, Introns: genetics, Male, Mammalian, Mammalian: genetics, Mice, Mitochondrial, Mitochondrial: genetics, Models, Molecular, Mutagenesis, Nucleic Acid, Nucleic Acid: genetics, Polymorphism, Rats, Regulatory Sequences, Retroelements, Retroelements: genetics, RNA, RNA Splice Sites, RNA Splice Sites: genetics, Sequence Analysis, Single Nucleotide, Single Nucleotide: genetics, Telomere, Telomere: genetics, Untranslated, Untranslated: genetics} } The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution. |
Publication List
2017 |
Zinc-finger domains in metazoans: evolution gone wild (Article) Genome Biology, 18 pp. 168, 2017. |
2016 |
New genes and functional innovation in mammals (Article) bioRxiv, 2016. |
A novel method of microsatellite genotyping-by-sequencing using individual combinatorial barcoding (Article) R Soc Open Sci, 3 (1), pp. 150565, 2016, ISBN: 10.1098/rsos.150565. |
2015 |
Origins of de novo genes in human and chimpanzee (Article) Plos Genetics, 11 (12), pp. e1005721, 2015. |
High evolutionary turnover of satellite families in Caenorhabditis (Article) BMC Evolutionary Biology, 15 (1), pp. 218, 2015, ISSN: 1471-2148. |
Molecular biology and evolution, 2015, ISSN: 1537-1719. |
2012 |
Structure and age jointly influence rates of protein evolution. (Article) PLoS computational biology, 8 (5), pp. e1002542, 2012, ISSN: 1553-7358. |
2011 |
Lineage-specific variation in intensity of natural selection in mammals. (Article) Molecular biology and evolution, 28 (1), pp. 383–98, 2011, ISSN: 1537-1719. |
2010 |
Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates. (Article) Molecular biology and evolution, 27 (2), pp. 325–35, 2010, ISSN: 1537-1719. |
2009 |
Evolution of primate orphan proteins. (Article) Biochemical Society transactions, 37 (Pt 4), pp. 778–82, 2009, ISSN: 1470-8752. |
2007 |
On homology searches by protein Blast and the characterization of the age of genes. (Article) BMC evolutionary biology, 7 pp. 53, 2007, ISSN: 1471-2148. |
Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats. (Article) Genomics, 89 (3), pp. 316–25, 2007, ISSN: 0888-7543. |
Amino acid repeats and the structure and evolution of proteins. (Article) Genome dynamics, 3 pp. 119–30, 2007, ISSN: 1660-9263. |
Housekeeping genes tend to show reduced upstream sequence conservation. (Article) Genome biology, 8 (7), pp. R140, 2007, ISSN: 1465-6914. |
2006 |
BMC genomics, 7 pp. 165, 2006, ISSN: 1471-2164. |
2005 |
Inverse relationship between evolutionary rate and age of mammalian genes. (Article) Molecular biology and evolution, 22 (3), pp. 598–606, 2005, ISSN: 0737-4038. |
2004 |
Journal of molecular evolution, 59 (1), pp. 72–9, 2004, ISSN: 0022-2844. |
Genome biology, 5 (7), pp. R47, 2004, ISSN: 1465-6914. |
Genome sequence of the Brown Norway rat yields insights into mammalian evolution. (Article) Nature, 428 (6982), pp. 493–521, 2004, ISSN: 1476-4687. |