Publication List

2012

Toll-Riera, Macarena, Bostick, David, Albà, M Mar, Plotkin, Joshua B

Structure and age jointly influence rates of protein evolution. (Article)

PLoS computational biology, 8 (5), pp. e1002542, 2012, ISSN: 1553-7358.

(Abstract | Links | BibTeX | Tags: Animals, Binding Sites, Computational Biology, Eukaryota, Evolution, Humans, Mice, Molecular, Protein Conformation, Protein Stability, Proteins, Proteins: chemistry, Proteins: genetics, Proteins: metabolism, Solvents)

@article{Toll-Riera2012a,
title = {Structure and age jointly influence rates of protein evolution.},
author = {Toll-Riera, Macarena and Bostick, David and Albà, M Mar and Plotkin, Joshua B},
url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3364943&tool=pmcentrez&rendertype=abstract},
issn = {1553-7358},
year = {2012},
date = {2012-01-01},
journal = {PLoS computational biology},
volume = {8},
number = {5},
pages = {e1002542},
abstract = {What factors determine a protein's rate of evolution are actively debated. Especially unclear is the relative role of intrinsic factors of present-day proteins versus historical factors such as protein age. Here we study the interplay of structural properties and evolutionary age, as determinants of protein evolutionary rate. We use a large set of one-to-one orthologs between human and mouse proteins, with mapped PDB structures. We report that previously observed structural correlations also hold within each age group - including relationships between solvent accessibility, designabililty, and evolutionary rates. However, age also plays a crucial role: age modulates the relationship between solvent accessibility and rate. Additionally, younger proteins, despite being less designable, tend to evolve faster than older proteins. We show that previously reported relationships between age and rate cannot be explained by structural biases among age groups. Finally, we introduce a knowledge-based potential function to study the stability of proteins through large-scale computation. We find that older proteins are more stable for their native structure, and more robust to mutations, than younger ones. Our results underscore that several determinants, both intrinsic and historical, can interact to determine rates of protein evolution.},
keywords = {Animals, Binding Sites, Computational Biology, Eukaryota, Evolution, Humans, Mice, Molecular, Protein Conformation, Protein Stability, Proteins, Proteins: chemistry, Proteins: genetics, Proteins: metabolism, Solvents}
}

2010

Mularoni, Loris, Ledda, Alice, Toll-Riera, Macarena, Albà, M Mar

Natural selection drives the accumulation of amino acid tandem repeats in human proteins. (Article)

Genome research, 20 (6), pp. 745–54, 2010, ISSN: 1549-5469.

(Abstract | Links | BibTeX | Tags: Amino Acid, Amino Acid Sequence, Amino Acids, Amino Acids: chemistry, Amino Acids: genetics, Animals, Genetic, Humans, Molecular Sequence Data, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences, Selection, Sequence Homology)

@article{Mularoni2010,
title = {Natural selection drives the accumulation of amino acid tandem repeats in human proteins.},
author = {Mularoni, Loris and Ledda, Alice and Toll-Riera, Macarena and Albà, M Mar},
url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2877571&tool=pmcentrez&rendertype=abstract},
issn = {1549-5469},
year = {2010},
date = {2010-01-01},
journal = {Genome research},
volume = {20},
number = {6},
pages = {745--54},
abstract = {Amino acid tandem repeats are found in a large number of eukaryotic proteins. They are often encoded by trinucleotide repeats and exhibit high intra- and interspecies size variability due to the high mutation rate associated with replication slippage. The extent to which natural selection is important in shaping amino acid repeat evolution is a matter of debate. On one hand, their high frequency may simply reflect their high probability of expansion by slippage, and they could essentially evolve in a neutral manner. On the other hand, there is experimental evidence that changes in repeat size can influence protein-protein interactions, transcriptional activity, or protein subcellular localization, indicating that repeats could be functionally relevant and thus shaped by selection. To gauge the relative contribution of neutral and selective forces in amino acid repeat evolution, we have performed a comparative analysis of amino acid repeat conservation in a large set of orthologous proteins from 12 vertebrate species. As a neutral model of repeat evolution we have used sequences with the same DNA triplet composition as the coding sequences--and thus expected to be subject to the same mutational forces--but located in syntenic noncoding genomic regions. The results strongly indicate that selection has played a more important role than previously suspected in amino acid tandem repeat evolution, by increasing the repeat retention rate and by modulating repeat size. The data obtained in this study have allowed us to identify a set of 92 repeats that are postulated to play important functional roles due to their strong selective signature, including five cases with direct experimental evidence.},
keywords = {Amino Acid, Amino Acid Sequence, Amino Acids, Amino Acids: chemistry, Amino Acids: genetics, Animals, Genetic, Humans, Molecular Sequence Data, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences, Selection, Sequence Homology}
}

2009

Salichs, Eulàlia, Ledda, Alice, Mularoni, Loris, Albà, M Mar, de la Luna, Susana

Genome-wide analysis of histidine repeats reveals their role in the localization of human proteins to the nuclear speckles compartment. (Article)

PLoS genetics, 5 (3), pp. e1000397, 2009, ISSN: 1553-7404.

(Abstract | Links | BibTeX | Tags: Amino Acids, Cell Line, Cell Nucleus, Cell Nucleus: chemistry, Cell Nucleus: genetics, Cell Nucleus: metabolism, Genome, Histidine, Histidine: chemistry, Histidine: genetics, Histidine: metabolism, human, Humans, Molecular Sequence Data, Nuclear Localization Signals, Nuclear Proteins, Nuclear Proteins: chemistry, Nuclear Proteins: genetics, Nuclear Proteins: metabolism, Protein Transport, Proteins, Proteins: chemistry, Proteins: genetics, Proteins: metabolism, Sequence Alignment, Tandem Repeat Sequences)

@article{Salichs2009,
title = {Genome-wide analysis of histidine repeats reveals their role in the localization of human proteins to the nuclear speckles compartment.},
author = {Salichs, Eulàlia and Ledda, Alice and Mularoni, Loris and Albà, M Mar and de la Luna, Susana},
url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2644819&tool=pmcentrez&rendertype=abstract},
issn = {1553-7404},
year = {2009},
date = {2009-01-01},
journal = {PLoS genetics},
volume = {5},
number = {3},
pages = {e1000397},
abstract = {Single amino acid repeats are prevalent in eukaryote organisms, although the role of many such sequences is still poorly understood. We have performed a comprehensive analysis of the proteins containing homopolymeric histidine tracts in the human genome and identified 86 human proteins that contain stretches of five or more histidines. Most of them are endowed with DNA- and RNA-related functions, and, in addition, there is an overrepresentation of proteins expressed in the brain and/or nervous system development. An analysis of their subcellular localization shows that 15 of the 22 nuclear proteins identified accumulate in the nuclear subcompartment known as nuclear speckles. This localization is lost when the histidine repeat is deleted, and significantly, closely related paralogous proteins without histidine repeats also fail to localize to nuclear speckles. Hence, the histidine tract appears to be directly involved in targeting proteins to this compartment. The removal of DNA-binding domains or treatment with RNA polymerase II inhibitors induces the re-localization of several polyhistidine-containing proteins from the nucleoplasm to nuclear speckles. These findings highlight the dynamic relationship between sites of transcription and nuclear speckles. Therefore, we define the histidine repeats as a novel targeting signal for nuclear speckles, and we suggest that these repeats are a way of generating evolutionary diversification in gene duplicates. These data contribute to our better understanding of the physiological role of single amino acid repeats in proteins.},
keywords = {Amino Acids, Cell Line, Cell Nucleus, Cell Nucleus: chemistry, Cell Nucleus: genetics, Cell Nucleus: metabolism, Genome, Histidine, Histidine: chemistry, Histidine: genetics, Histidine: metabolism, human, Humans, Molecular Sequence Data, Nuclear Localization Signals, Nuclear Proteins, Nuclear Proteins: chemistry, Nuclear Proteins: genetics, Nuclear Proteins: metabolism, Protein Transport, Proteins, Proteins: chemistry, Proteins: genetics, Proteins: metabolism, Sequence Alignment, Tandem Repeat Sequences}
}

2007

Mularoni, Loris, Veitia, Reiner A, Albà, M Mar

Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats. (Article)

Genomics, 89 (3), pp. 316–25, 2007, ISSN: 0888-7543.

(Abstract | Links | BibTeX | Tags: Amino Acid, Amino Acid Sequence, Animals, Complementary, Conserved Sequence, DNA, Evolution, Genetic, Humans, Mice, Molecular, Point Mutation, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences, Selection, Trinucleotide Repeats)

@article{Mularoni2007,
title = {Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats.},
author = {Mularoni, Loris and Veitia, Reiner A and Albà, M Mar},
url = {http://www.ncbi.nlm.nih.gov/pubmed/17196365},
issn = {0888-7543},
year = {2007},
date = {2007-01-01},
journal = {Genomics},
volume = {89},
number = {3},
pages = {316--25},
abstract = {Single-amino-acid tandem repeats are very common in mammalian proteins but their function and evolution are still poorly understood. Here we investigate how the variability and prevalence of amino acid repeats are related to the evolutionary constraints operating on the proteins. We find a significant positive correlation between repeat size difference and protein nonsynonymous substitution rate in human and mouse orthologous genes. This association is observed for all the common amino acid repeat types and indicates that rapid diversification of repeat structures, involving both trinucleotide slippage and nucleotide substitutions, preferentially occurs in proteins subject to low selective constraints. However, strikingly, we also observe a significant negative correlation between the number of repeats in a protein and the gene nonsynonymous substitution rate, particularly for glutamine, glycine, and alanine repeats. This implies that proteins subject to strong selective constraints tend to contain an unexpectedly high number of repeats, which tend to be well conserved between the two species. This is consistent with a role for selection in the maintenance of a significant number of repeats. Analysis of the codon structure of the sequences encoding the repeats shows that codon purity is associated with high repeat size interspecific variability. Interestingly, polyalanine and polyglutamine repeats associated with disease show very distinctive features regarding the degree of repeat conservation and the protein sequence selective constraints.},
keywords = {Amino Acid, Amino Acid Sequence, Animals, Complementary, Conserved Sequence, DNA, Evolution, Genetic, Humans, Mice, Molecular, Point Mutation, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences, Selection, Trinucleotide Repeats}
}

Albà, M M, Tompa, P, Veitia, R A

Amino acid repeats and the structure and evolution of proteins. (Article)

Genome dynamics, 3 pp. 119–30, 2007, ISSN: 1660-9263.

(Abstract | Links | BibTeX | Tags: Amino Acid, Animals, Base Composition, Evolution, Humans, Molecular, Open Reading Frames, Open Reading Frames: genetics, Peptides, Peptides: chemistry, Proteins, Proteins: chemistry, Proteins: genetics, Repetitive Sequences)

2004

Albà, M Mar, Guigó, Roderic

Comparative analysis of amino acid repeats in rodents and humans. (Article)

Genome research, 14 (4), pp. 549–54, 2004, ISSN: 1088-9051.

(Abstract | Links | BibTeX | Tags: Amino Acid, Amino Acid: genetics, Amino Acid: physiology, Animals, Chromosome Mapping, Chromosome Mapping: methods, Chromosome Mapping: statistics & numerical data, Computational Biology, Computational Biology: methods, Computational Biology: statistics & numerical data, GC Rich Sequence, GC Rich Sequence: genetics, Humans, Mice, Proteins, Proteins: chemistry, Proteins: genetics, Proteins: physiology, Rats, Repetitive Sequences, Trinucleotide Repeats, Trinucleotide Repeats: genetics)

@article{Alba2004,
title = {Comparative analysis of amino acid repeats in rodents and humans.},
author = {Albà, M Mar and Guigó, Roderic},
url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=383298&tool=pmcentrez&rendertype=abstract},
issn = {1088-9051},
year = {2004},
date = {2004-01-01},
journal = {Genome research},
volume = {14},
number = {4},
pages = {549--54},
abstract = {Amino acid tandem repeats, also called homopolymeric tracts, are extremely abundant in eukaryotic proteins. To gain insight into the genome-wide evolution of these regions in mammals, we analyzed the repeat content in a large data set of rat-mouse-human orthologs. Our results show that human proteins contain more amino acid repeats than rodent proteins and that trinucleotide repeats are also more abundant in human coding sequences. Using the human species as an outgroup, we were able to address differences in repeat loss and repeat gain in the rat and mouse lineages. In this data set, mouse proteins contain substantially more repeats than rat proteins, which can be at least partly attributed to a higher repeat loss in the rat lineage. The data are consistent with a role for trinucleotide slippage in the generation of novel amino acid repeats. We confirm the previously observed functional bias of proteins with repeats, with overrepresentation of transcription factors and DNA-binding proteins. We show that genes encoding amino acid repeats tend to have an unusually high GC content, and that differences in coding GC content among orthologs are directly related to the presence/absence of repeats. We propose that the different GC content isochore structure in rodents and humans may result in an increased amino acid repeat prevalence in the human lineage.},
keywords = {Amino Acid, Amino Acid: genetics, Amino Acid: physiology, Animals, Chromosome Mapping, Chromosome Mapping: methods, Chromosome Mapping: statistics & numerical data, Computational Biology, Computational Biology: methods, Computational Biology: statistics & numerical data, GC Rich Sequence, GC Rich Sequence: genetics, Humans, Mice, Proteins, Proteins: chemistry, Proteins: genetics, Proteins: physiology, Rats, Repetitive Sequences, Trinucleotide Repeats, Trinucleotide Repeats: genetics}
}

2002

Albà, M Mar, Laskowski, Roman A, Hancock, John M

Detecting cryptically simple protein sequences using the SIMPLE algorithm. (Article)

Bioinformatics (Oxford, England), 18 (5), pp. 672–8, 2002, ISSN: 1367-4803.

(Abstract | Links | BibTeX | Tags: Algorithms, Amino Acid, Amino Acid Sequence, Amino Acid: genetics, Databases, Genetic, Genetic Variation, Internet, Minisatellite Repeats, Minisatellite Repeats: genetics, Models, Molecular Sequence Data, Protein, Protein: methods, Proteins, Proteins: chemistry, Repetitive Sequences, Saccharomyces cerevisiae, Saccharomyces cerevisiae: genetics, Sensitivity and Specificity, Sequence Analysis, Sequence Homology, Software, Statistical)