2012 |
Toll-Riera, Macarena, Bostick, David, Albà, M Mar, Plotkin, Joshua B Structure and age jointly influence rates of protein evolution. (Article) PLoS computational biology, 8 (5), pp. e1002542, 2012, ISSN: 1553-7358. (Abstract | Links | BibTeX | Tags: Animals, Binding Sites, Computational Biology, Eukaryota, Evolution, Humans, Mice, Molecular, Protein Conformation, Protein Stability, Proteins, Proteins: chemistry, Proteins: genetics, Proteins: metabolism, Solvents) @article{Toll-Riera2012a, title = {Structure and age jointly influence rates of protein evolution.}, author = {Toll-Riera, Macarena and Bostick, David and Albà, M Mar and Plotkin, Joshua B}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3364943&tool=pmcentrez&rendertype=abstract}, issn = {1553-7358}, year = {2012}, date = {2012-01-01}, journal = {PLoS computational biology}, volume = {8}, number = {5}, pages = {e1002542}, abstract = {What factors determine a protein's rate of evolution are actively debated. Especially unclear is the relative role of intrinsic factors of present-day proteins versus historical factors such as protein age. Here we study the interplay of structural properties and evolutionary age, as determinants of protein evolutionary rate. We use a large set of one-to-one orthologs between human and mouse proteins, with mapped PDB structures. We report that previously observed structural correlations also hold within each age group - including relationships between solvent accessibility, designabililty, and evolutionary rates. However, age also plays a crucial role: age modulates the relationship between solvent accessibility and rate. Additionally, younger proteins, despite being less designable, tend to evolve faster than older proteins. We show that previously reported relationships between age and rate cannot be explained by structural biases among age groups. Finally, we introduce a knowledge-based potential function to study the stability of proteins through large-scale computation. We find that older proteins are more stable for their native structure, and more robust to mutations, than younger ones. Our results underscore that several determinants, both intrinsic and historical, can interact to determine rates of protein evolution.}, keywords = {Animals, Binding Sites, Computational Biology, Eukaryota, Evolution, Humans, Mice, Molecular, Protein Conformation, Protein Stability, Proteins, Proteins: chemistry, Proteins: genetics, Proteins: metabolism, Solvents} } What factors determine a protein's rate of evolution are actively debated. Especially unclear is the relative role of intrinsic factors of present-day proteins versus historical factors such as protein age. Here we study the interplay of structural properties and evolutionary age, as determinants of protein evolutionary rate. We use a large set of one-to-one orthologs between human and mouse proteins, with mapped PDB structures. We report that previously observed structural correlations also hold within each age group - including relationships between solvent accessibility, designabililty, and evolutionary rates. However, age also plays a crucial role: age modulates the relationship between solvent accessibility and rate. Additionally, younger proteins, despite being less designable, tend to evolve faster than older proteins. We show that previously reported relationships between age and rate cannot be explained by structural biases among age groups. Finally, we introduce a knowledge-based potential function to study the stability of proteins through large-scale computation. We find that older proteins are more stable for their native structure, and more robust to mutations, than younger ones. Our results underscore that several determinants, both intrinsic and historical, can interact to determine rates of protein evolution. |
2010 |
Mularoni, Loris, Ledda, Alice, Toll-Riera, Macarena, Albà, M Mar Natural selection drives the accumulation of amino acid tandem repeats in human proteins. (Article) Genome research, 20 (6), pp. 745–54, 2010, ISSN: 1549-5469. (Abstract | Links | BibTeX | Tags: Amino Acid, Amino Acid Sequence, Amino Acids, Amino Acids: chemistry, Amino Acids: genetics, Animals, Genetic, Humans, Molecular Sequence Data, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences, Selection, Sequence Homology) @article{Mularoni2010, title = {Natural selection drives the accumulation of amino acid tandem repeats in human proteins.}, author = {Mularoni, Loris and Ledda, Alice and Toll-Riera, Macarena and Albà, M Mar}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2877571&tool=pmcentrez&rendertype=abstract}, issn = {1549-5469}, year = {2010}, date = {2010-01-01}, journal = {Genome research}, volume = {20}, number = {6}, pages = {745--54}, abstract = {Amino acid tandem repeats are found in a large number of eukaryotic proteins. They are often encoded by trinucleotide repeats and exhibit high intra- and interspecies size variability due to the high mutation rate associated with replication slippage. The extent to which natural selection is important in shaping amino acid repeat evolution is a matter of debate. On one hand, their high frequency may simply reflect their high probability of expansion by slippage, and they could essentially evolve in a neutral manner. On the other hand, there is experimental evidence that changes in repeat size can influence protein-protein interactions, transcriptional activity, or protein subcellular localization, indicating that repeats could be functionally relevant and thus shaped by selection. To gauge the relative contribution of neutral and selective forces in amino acid repeat evolution, we have performed a comparative analysis of amino acid repeat conservation in a large set of orthologous proteins from 12 vertebrate species. As a neutral model of repeat evolution we have used sequences with the same DNA triplet composition as the coding sequences--and thus expected to be subject to the same mutational forces--but located in syntenic noncoding genomic regions. The results strongly indicate that selection has played a more important role than previously suspected in amino acid tandem repeat evolution, by increasing the repeat retention rate and by modulating repeat size. The data obtained in this study have allowed us to identify a set of 92 repeats that are postulated to play important functional roles due to their strong selective signature, including five cases with direct experimental evidence.}, keywords = {Amino Acid, Amino Acid Sequence, Amino Acids, Amino Acids: chemistry, Amino Acids: genetics, Animals, Genetic, Humans, Molecular Sequence Data, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences, Selection, Sequence Homology} } Amino acid tandem repeats are found in a large number of eukaryotic proteins. They are often encoded by trinucleotide repeats and exhibit high intra- and interspecies size variability due to the high mutation rate associated with replication slippage. The extent to which natural selection is important in shaping amino acid repeat evolution is a matter of debate. On one hand, their high frequency may simply reflect their high probability of expansion by slippage, and they could essentially evolve in a neutral manner. On the other hand, there is experimental evidence that changes in repeat size can influence protein-protein interactions, transcriptional activity, or protein subcellular localization, indicating that repeats could be functionally relevant and thus shaped by selection. To gauge the relative contribution of neutral and selective forces in amino acid repeat evolution, we have performed a comparative analysis of amino acid repeat conservation in a large set of orthologous proteins from 12 vertebrate species. As a neutral model of repeat evolution we have used sequences with the same DNA triplet composition as the coding sequences--and thus expected to be subject to the same mutational forces--but located in syntenic noncoding genomic regions. The results strongly indicate that selection has played a more important role than previously suspected in amino acid tandem repeat evolution, by increasing the repeat retention rate and by modulating repeat size. The data obtained in this study have allowed us to identify a set of 92 repeats that are postulated to play important functional roles due to their strong selective signature, including five cases with direct experimental evidence. |
2009 |
Salichs, Eulàlia, Ledda, Alice, Mularoni, Loris, Albà, M Mar, de la Luna, Susana PLoS genetics, 5 (3), pp. e1000397, 2009, ISSN: 1553-7404. (Abstract | Links | BibTeX | Tags: Amino Acids, Cell Line, Cell Nucleus, Cell Nucleus: chemistry, Cell Nucleus: genetics, Cell Nucleus: metabolism, Genome, Histidine, Histidine: chemistry, Histidine: genetics, Histidine: metabolism, human, Humans, Molecular Sequence Data, Nuclear Localization Signals, Nuclear Proteins, Nuclear Proteins: chemistry, Nuclear Proteins: genetics, Nuclear Proteins: metabolism, Protein Transport, Proteins, Proteins: chemistry, Proteins: genetics, Proteins: metabolism, Sequence Alignment, Tandem Repeat Sequences) @article{Salichs2009, title = {Genome-wide analysis of histidine repeats reveals their role in the localization of human proteins to the nuclear speckles compartment.}, author = {Salichs, Eulàlia and Ledda, Alice and Mularoni, Loris and Albà, M Mar and de la Luna, Susana}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2644819&tool=pmcentrez&rendertype=abstract}, issn = {1553-7404}, year = {2009}, date = {2009-01-01}, journal = {PLoS genetics}, volume = {5}, number = {3}, pages = {e1000397}, abstract = {Single amino acid repeats are prevalent in eukaryote organisms, although the role of many such sequences is still poorly understood. We have performed a comprehensive analysis of the proteins containing homopolymeric histidine tracts in the human genome and identified 86 human proteins that contain stretches of five or more histidines. Most of them are endowed with DNA- and RNA-related functions, and, in addition, there is an overrepresentation of proteins expressed in the brain and/or nervous system development. An analysis of their subcellular localization shows that 15 of the 22 nuclear proteins identified accumulate in the nuclear subcompartment known as nuclear speckles. This localization is lost when the histidine repeat is deleted, and significantly, closely related paralogous proteins without histidine repeats also fail to localize to nuclear speckles. Hence, the histidine tract appears to be directly involved in targeting proteins to this compartment. The removal of DNA-binding domains or treatment with RNA polymerase II inhibitors induces the re-localization of several polyhistidine-containing proteins from the nucleoplasm to nuclear speckles. These findings highlight the dynamic relationship between sites of transcription and nuclear speckles. Therefore, we define the histidine repeats as a novel targeting signal for nuclear speckles, and we suggest that these repeats are a way of generating evolutionary diversification in gene duplicates. These data contribute to our better understanding of the physiological role of single amino acid repeats in proteins.}, keywords = {Amino Acids, Cell Line, Cell Nucleus, Cell Nucleus: chemistry, Cell Nucleus: genetics, Cell Nucleus: metabolism, Genome, Histidine, Histidine: chemistry, Histidine: genetics, Histidine: metabolism, human, Humans, Molecular Sequence Data, Nuclear Localization Signals, Nuclear Proteins, Nuclear Proteins: chemistry, Nuclear Proteins: genetics, Nuclear Proteins: metabolism, Protein Transport, Proteins, Proteins: chemistry, Proteins: genetics, Proteins: metabolism, Sequence Alignment, Tandem Repeat Sequences} } Single amino acid repeats are prevalent in eukaryote organisms, although the role of many such sequences is still poorly understood. We have performed a comprehensive analysis of the proteins containing homopolymeric histidine tracts in the human genome and identified 86 human proteins that contain stretches of five or more histidines. Most of them are endowed with DNA- and RNA-related functions, and, in addition, there is an overrepresentation of proteins expressed in the brain and/or nervous system development. An analysis of their subcellular localization shows that 15 of the 22 nuclear proteins identified accumulate in the nuclear subcompartment known as nuclear speckles. This localization is lost when the histidine repeat is deleted, and significantly, closely related paralogous proteins without histidine repeats also fail to localize to nuclear speckles. Hence, the histidine tract appears to be directly involved in targeting proteins to this compartment. The removal of DNA-binding domains or treatment with RNA polymerase II inhibitors induces the re-localization of several polyhistidine-containing proteins from the nucleoplasm to nuclear speckles. These findings highlight the dynamic relationship between sites of transcription and nuclear speckles. Therefore, we define the histidine repeats as a novel targeting signal for nuclear speckles, and we suggest that these repeats are a way of generating evolutionary diversification in gene duplicates. These data contribute to our better understanding of the physiological role of single amino acid repeats in proteins. |
Toll-Riera, Macarena, Castelo, Robert, Bellora, Nicolás, Albà, M Mar Evolution of primate orphan proteins. (Article) Biochemical Society transactions, 37 (Pt 4), pp. 778–82, 2009, ISSN: 1470-8752. (Abstract | Links | BibTeX | Tags: Animals, Evolution, Gene Duplication, Genome, Genome: genetics, Molecular, Primates, Primates: genetics, Proteins, Proteins: genetics) @article{Toll-Riera2009, title = {Evolution of primate orphan proteins.}, author = {Toll-Riera, Macarena and Castelo, Robert and Bellora, Nicolás and Albà, M Mar}, url = {http://www.ncbi.nlm.nih.gov/pubmed/19614593}, issn = {1470-8752}, year = {2009}, date = {2009-01-01}, journal = {Biochemical Society transactions}, volume = {37}, number = {Pt 4}, pages = {778--82}, abstract = {Genomes contain a large number of genes that do not have recognizable homologues in other species. These genes, found in only one or a few closely related species, are known as orphan genes. Their limited distribution implies that many of them are probably involved in lineage-specific adaptive processes. One important question that has remained elusive to date is how orphan genes originate. It has been proposed that they might have arisen by gene duplication followed by a period of very rapid sequence divergence, which would have erased any traces of similarity to other evolutionarily related genes. However, this explanation does not seem plausible for genes lacking homologues in very closely related species. In the present article, we review recent efforts to identify the mechanisms of formation of primate orphan genes. These studies reveal an unexpected important role of transposable elements in the formation of novel protein-coding genes in the genomes of primates.}, keywords = {Animals, Evolution, Gene Duplication, Genome, Genome: genetics, Molecular, Primates, Primates: genetics, Proteins, Proteins: genetics} } Genomes contain a large number of genes that do not have recognizable homologues in other species. These genes, found in only one or a few closely related species, are known as orphan genes. Their limited distribution implies that many of them are probably involved in lineage-specific adaptive processes. One important question that has remained elusive to date is how orphan genes originate. It has been proposed that they might have arisen by gene duplication followed by a period of very rapid sequence divergence, which would have erased any traces of similarity to other evolutionarily related genes. However, this explanation does not seem plausible for genes lacking homologues in very closely related species. In the present article, we review recent efforts to identify the mechanisms of formation of primate orphan genes. These studies reveal an unexpected important role of transposable elements in the formation of novel protein-coding genes in the genomes of primates. |
2007 |
Mularoni, Loris, Veitia, Reiner A, Albà, M Mar Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats. (Article) Genomics, 89 (3), pp. 316–25, 2007, ISSN: 0888-7543. (Abstract | Links | BibTeX | Tags: Amino Acid, Amino Acid Sequence, Animals, Complementary, Conserved Sequence, DNA, Evolution, Genetic, Humans, Mice, Molecular, Point Mutation, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences, Selection, Trinucleotide Repeats) @article{Mularoni2007, title = {Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats.}, author = {Mularoni, Loris and Veitia, Reiner A and Albà, M Mar}, url = {http://www.ncbi.nlm.nih.gov/pubmed/17196365}, issn = {0888-7543}, year = {2007}, date = {2007-01-01}, journal = {Genomics}, volume = {89}, number = {3}, pages = {316--25}, abstract = {Single-amino-acid tandem repeats are very common in mammalian proteins but their function and evolution are still poorly understood. Here we investigate how the variability and prevalence of amino acid repeats are related to the evolutionary constraints operating on the proteins. We find a significant positive correlation between repeat size difference and protein nonsynonymous substitution rate in human and mouse orthologous genes. This association is observed for all the common amino acid repeat types and indicates that rapid diversification of repeat structures, involving both trinucleotide slippage and nucleotide substitutions, preferentially occurs in proteins subject to low selective constraints. However, strikingly, we also observe a significant negative correlation between the number of repeats in a protein and the gene nonsynonymous substitution rate, particularly for glutamine, glycine, and alanine repeats. This implies that proteins subject to strong selective constraints tend to contain an unexpectedly high number of repeats, which tend to be well conserved between the two species. This is consistent with a role for selection in the maintenance of a significant number of repeats. Analysis of the codon structure of the sequences encoding the repeats shows that codon purity is associated with high repeat size interspecific variability. Interestingly, polyalanine and polyglutamine repeats associated with disease show very distinctive features regarding the degree of repeat conservation and the protein sequence selective constraints.}, keywords = {Amino Acid, Amino Acid Sequence, Animals, Complementary, Conserved Sequence, DNA, Evolution, Genetic, Humans, Mice, Molecular, Point Mutation, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences, Selection, Trinucleotide Repeats} } Single-amino-acid tandem repeats are very common in mammalian proteins but their function and evolution are still poorly understood. Here we investigate how the variability and prevalence of amino acid repeats are related to the evolutionary constraints operating on the proteins. We find a significant positive correlation between repeat size difference and protein nonsynonymous substitution rate in human and mouse orthologous genes. This association is observed for all the common amino acid repeat types and indicates that rapid diversification of repeat structures, involving both trinucleotide slippage and nucleotide substitutions, preferentially occurs in proteins subject to low selective constraints. However, strikingly, we also observe a significant negative correlation between the number of repeats in a protein and the gene nonsynonymous substitution rate, particularly for glutamine, glycine, and alanine repeats. This implies that proteins subject to strong selective constraints tend to contain an unexpectedly high number of repeats, which tend to be well conserved between the two species. This is consistent with a role for selection in the maintenance of a significant number of repeats. Analysis of the codon structure of the sequences encoding the repeats shows that codon purity is associated with high repeat size interspecific variability. Interestingly, polyalanine and polyglutamine repeats associated with disease show very distinctive features regarding the degree of repeat conservation and the protein sequence selective constraints. |
Albà, M M, Tompa, P, Veitia, R A Amino acid repeats and the structure and evolution of proteins. (Article) Genome dynamics, 3 pp. 119–30, 2007, ISSN: 1660-9263. (Abstract | Links | BibTeX | Tags: Amino Acid, Animals, Base Composition, Evolution, Humans, Molecular, Open Reading Frames, Open Reading Frames: genetics, Peptides, Peptides: chemistry, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences) @article{Alba2007a, title = {Amino acid repeats and the structure and evolution of proteins.}, author = {Albà, M M and Tompa, P and Veitia, R A}, url = {http://www.ncbi.nlm.nih.gov/pubmed/18753788}, issn = {1660-9263}, year = {2007}, date = {2007-01-01}, journal = {Genome dynamics}, volume = {3}, pages = {119--30}, abstract = {Many proteins have repeats or runs of single amino acids. The pathogenicity of some repeat expansions has fueled proteomic, genomic and structural explorations of homopolymeric runs not only in human but in a wide variety of other organisms. Other types of amino acid repetitive structures exhibit more complex patterns than homopeptides. Irrespective of their precise organization, repetitive sequences are defined as low complexity or simple sequences, as one or a few residues are particularly abundant. Prokaryotes show a relatively low frequency of simple sequences compared to eukaryotes. In the latter the percentage of proteins containing homopolymeric runs varies greatly from one group to another. For instance, within vertebrates, amino acid repeat frequency is much higher in mammals than in amphibians, birds or fishes. For some repeats, this is correlated with the GC-richness of the regions containing the corresponding genes. Homopeptides tend to occur in disordered regions of transcription factors or developmental proteins. They can trigger the formation of protein aggregates, particularly in 'disease' proteins. Simple sequences seem to evolve more rapidly than the rest of the protein/gene and may have a functional impact. Therefore, they are good candidates to promote rapid evolutionary changes. All these diverse facets of homopolymeric runs are explored in this review.}, keywords = {Amino Acid, Animals, Base Composition, Evolution, Humans, Molecular, Open Reading Frames, Open Reading Frames: genetics, Peptides, Peptides: chemistry, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences} } Many proteins have repeats or runs of single amino acids. The pathogenicity of some repeat expansions has fueled proteomic, genomic and structural explorations of homopolymeric runs not only in human but in a wide variety of other organisms. Other types of amino acid repetitive structures exhibit more complex patterns than homopeptides. Irrespective of their precise organization, repetitive sequences are defined as low complexity or simple sequences, as one or a few residues are particularly abundant. Prokaryotes show a relatively low frequency of simple sequences compared to eukaryotes. In the latter the percentage of proteins containing homopolymeric runs varies greatly from one group to another. For instance, within vertebrates, amino acid repeat frequency is much higher in mammals than in amphibians, birds or fishes. For some repeats, this is correlated with the GC-richness of the regions containing the corresponding genes. Homopeptides tend to occur in disordered regions of transcription factors or developmental proteins. They can trigger the formation of protein aggregates, particularly in 'disease' proteins. Simple sequences seem to evolve more rapidly than the rest of the protein/gene and may have a functional impact. Therefore, they are good candidates to promote rapid evolutionary changes. All these diverse facets of homopolymeric runs are explored in this review. |
2004 |
Albà, M Mar, Guigó, Roderic Comparative analysis of amino acid repeats in rodents and humans. (Article) Genome research, 14 (4), pp. 549–54, 2004, ISSN: 1088-9051. (Abstract | Links | BibTeX | Tags: Amino Acid, Amino Acid: genetics, Amino Acid: physiology, Animals, Chromosome Mapping, Chromosome Mapping: methods, Chromosome Mapping: statistics & numerical data, Computational Biology, Computational Biology: methods, Computational Biology: statistics & numerical data, GC Rich Sequence, GC Rich Sequence: genetics, Humans, Mice, Proteins, Proteins: chemistry, Proteins: genetics, Proteins: physiology, Rats, Repetitive Sequences, Trinucleotide Repeats, Trinucleotide Repeats: genetics) @article{Alba2004, title = {Comparative analysis of amino acid repeats in rodents and humans.}, author = {Albà, M Mar and Guigó, Roderic}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=383298&tool=pmcentrez&rendertype=abstract}, issn = {1088-9051}, year = {2004}, date = {2004-01-01}, journal = {Genome research}, volume = {14}, number = {4}, pages = {549--54}, abstract = {Amino acid tandem repeats, also called homopolymeric tracts, are extremely abundant in eukaryotic proteins. To gain insight into the genome-wide evolution of these regions in mammals, we analyzed the repeat content in a large data set of rat-mouse-human orthologs. Our results show that human proteins contain more amino acid repeats than rodent proteins and that trinucleotide repeats are also more abundant in human coding sequences. Using the human species as an outgroup, we were able to address differences in repeat loss and repeat gain in the rat and mouse lineages. In this data set, mouse proteins contain substantially more repeats than rat proteins, which can be at least partly attributed to a higher repeat loss in the rat lineage. The data are consistent with a role for trinucleotide slippage in the generation of novel amino acid repeats. We confirm the previously observed functional bias of proteins with repeats, with overrepresentation of transcription factors and DNA-binding proteins. We show that genes encoding amino acid repeats tend to have an unusually high GC content, and that differences in coding GC content among orthologs are directly related to the presence/absence of repeats. We propose that the different GC content isochore structure in rodents and humans may result in an increased amino acid repeat prevalence in the human lineage.}, keywords = {Amino Acid, Amino Acid: genetics, Amino Acid: physiology, Animals, Chromosome Mapping, Chromosome Mapping: methods, Chromosome Mapping: statistics & numerical data, Computational Biology, Computational Biology: methods, Computational Biology: statistics & numerical data, GC Rich Sequence, GC Rich Sequence: genetics, Humans, Mice, Proteins, Proteins: chemistry, Proteins: genetics, Proteins: physiology, Rats, Repetitive Sequences, Trinucleotide Repeats, Trinucleotide Repeats: genetics} } Amino acid tandem repeats, also called homopolymeric tracts, are extremely abundant in eukaryotic proteins. To gain insight into the genome-wide evolution of these regions in mammals, we analyzed the repeat content in a large data set of rat-mouse-human orthologs. Our results show that human proteins contain more amino acid repeats than rodent proteins and that trinucleotide repeats are also more abundant in human coding sequences. Using the human species as an outgroup, we were able to address differences in repeat loss and repeat gain in the rat and mouse lineages. In this data set, mouse proteins contain substantially more repeats than rat proteins, which can be at least partly attributed to a higher repeat loss in the rat lineage. The data are consistent with a role for trinucleotide slippage in the generation of novel amino acid repeats. We confirm the previously observed functional bias of proteins with repeats, with overrepresentation of transcription factors and DNA-binding proteins. We show that genes encoding amino acid repeats tend to have an unusually high GC content, and that differences in coding GC content among orthologs are directly related to the presence/absence of repeats. We propose that the different GC content isochore structure in rodents and humans may result in an increased amino acid repeat prevalence in the human lineage. |
2002 |
Albà, M Mar, Laskowski, Roman A, Hancock, John M Detecting cryptically simple protein sequences using the SIMPLE algorithm. (Article) Bioinformatics (Oxford, England), 18 (5), pp. 672–8, 2002, ISSN: 1367-4803. (Abstract | Links | BibTeX | Tags: Algorithms, Amino Acid, Amino Acid Sequence, Amino Acid: genetics, Databases, Genetic, Genetic Variation, Internet, Minisatellite Repeats, Minisatellite Repeats: genetics, Models, Molecular Sequence Data, Protein, Protein: methods, Proteins, Proteins: chemistry, Repetitive Sequences, Saccharomyces cerevisiae, Saccharomyces cerevisiae: genetics, Sensitivity and Specificity, Sequence Analysis, Sequence Homology, Software, Statistical) @article{Alba2002, title = {Detecting cryptically simple protein sequences using the SIMPLE algorithm.}, author = {Albà, M Mar and Laskowski, Roman A and Hancock, John M}, url = {http://www.ncbi.nlm.nih.gov/pubmed/12050063}, issn = {1367-4803}, year = {2002}, date = {2002-01-01}, journal = {Bioinformatics (Oxford, England)}, volume = {18}, number = {5}, pages = {672--8}, abstract = {Low-complexity or cryptically simple sequences are widespread in protein sequences but their evolution and function are poorly understood. To date methods for the detection of low complexity in proteins have been directed towards the filtering of such regions prior to sequence homology searches but not to the analysis of the regions per se. However, many of these regions are encoded by non-repetitive DNA sequences and may therefore result from selection acting on protein structure and/or function.}, keywords = {Algorithms, Amino Acid, Amino Acid Sequence, Amino Acid: genetics, Databases, Genetic, Genetic Variation, Internet, Minisatellite Repeats, Minisatellite Repeats: genetics, Models, Molecular Sequence Data, Protein, Protein: methods, Proteins, Proteins: chemistry, Repetitive Sequences, Saccharomyces cerevisiae, Saccharomyces cerevisiae: genetics, Sensitivity and Specificity, Sequence Analysis, Sequence Homology, Software, Statistical} } Low-complexity or cryptically simple sequences are widespread in protein sequences but their evolution and function are poorly understood. To date methods for the detection of low complexity in proteins have been directed towards the filtering of such regions prior to sequence homology searches but not to the analysis of the regions per se. However, many of these regions are encoded by non-repetitive DNA sequences and may therefore result from selection acting on protein structure and/or function. |
Publication List
2012 |
Structure and age jointly influence rates of protein evolution. (Article) PLoS computational biology, 8 (5), pp. e1002542, 2012, ISSN: 1553-7358. |
2010 |
Natural selection drives the accumulation of amino acid tandem repeats in human proteins. (Article) Genome research, 20 (6), pp. 745–54, 2010, ISSN: 1549-5469. |
2009 |
PLoS genetics, 5 (3), pp. e1000397, 2009, ISSN: 1553-7404. |
Evolution of primate orphan proteins. (Article) Biochemical Society transactions, 37 (Pt 4), pp. 778–82, 2009, ISSN: 1470-8752. |
2007 |
Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats. (Article) Genomics, 89 (3), pp. 316–25, 2007, ISSN: 0888-7543. |
Amino acid repeats and the structure and evolution of proteins. (Article) Genome dynamics, 3 pp. 119–30, 2007, ISSN: 1660-9263. |
2004 |
Comparative analysis of amino acid repeats in rodents and humans. (Article) Genome research, 14 (4), pp. 549–54, 2004, ISSN: 1088-9051. |
2002 |
Detecting cryptically simple protein sequences using the SIMPLE algorithm. (Article) Bioinformatics (Oxford, England), 18 (5), pp. 672–8, 2002, ISSN: 1367-4803. |