2007 |
Bellora, Nicolás, Farré, Domènec, Albà, M Mar Positional bias of general and tissue-specific regulatory motifs in mouse gene promoters. (Article) BMC genomics, 8 pp. 459, 2007, ISSN: 1471-2164. (Abstract | Links | BibTeX | Tags: Animals, Databases, Gene Expression Regulation, Gene Expression Regulation: genetics, Genetic, Genetic: genetics, Mice, Nucleic Acid, Organ Specificity, Organ Specificity: genetics, Promoter Regions, Software, Transcription Factors, Transcription Factors: metabolism) @article{Bellora2007, title = {Positional bias of general and tissue-specific regulatory motifs in mouse gene promoters.}, author = {Bellora, Nicolás and Farré, Domènec and Albà, M Mar}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2249607&tool=pmcentrez&rendertype=abstract}, issn = {1471-2164}, year = {2007}, date = {2007-01-01}, journal = {BMC genomics}, volume = {8}, pages = {459}, abstract = {The arrangement of regulatory motifs in gene promoters, or promoter architecture, is the result of mutation and selection processes that have operated over many millions of years. In mammals, tissue-specific transcriptional regulation is related to the presence of specific protein-interacting DNA motifs in gene promoters. However, little is known about the relative location and spacing of these motifs. To fill this gap, we have performed a systematic search for motifs that show significant bias at specific promoter locations in a large collection of housekeeping and tissue-specific genes.}, keywords = {Animals, Databases, Gene Expression Regulation, Gene Expression Regulation: genetics, Genetic, Genetic: genetics, Mice, Nucleic Acid, Organ Specificity, Organ Specificity: genetics, Promoter Regions, Software, Transcription Factors, Transcription Factors: metabolism} } The arrangement of regulatory motifs in gene promoters, or promoter architecture, is the result of mutation and selection processes that have operated over many millions of years. In mammals, tissue-specific transcriptional regulation is related to the presence of specific protein-interacting DNA motifs in gene promoters. However, little is known about the relative location and spacing of these motifs. To fill this gap, we have performed a systematic search for motifs that show significant bias at specific promoter locations in a large collection of housekeeping and tissue-specific genes. |
Albà, M Mar, Castresana, Jose On homology searches by protein Blast and the characterization of the age of genes. (Article) BMC evolutionary biology, 7 pp. 53, 2007, ISSN: 1471-2148. (Abstract | Links | BibTeX | Tags: Amino Acid, Animals, Computational Biology, Databases, Evolution, Genes, Humans, Molecular, Phylogeny, Protein, Sequence Analysis, Sequence Homology) @article{Alba2007, title = {On homology searches by protein Blast and the characterization of the age of genes.}, author = {Albà, M Mar and Castresana, Jose}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1855329&tool=pmcentrez&rendertype=abstract}, issn = {1471-2148}, year = {2007}, date = {2007-01-01}, journal = {BMC evolutionary biology}, volume = {7}, pages = {53}, abstract = {It has been shown in a variety of organisms, including mammals, that genes that appeared recently in evolution, for example orphan genes, evolve faster than older genes. Low functional constraints at the time of origin of novel genes may explain these results. However, this observation has been recently attributed to an artifact caused by the inability of Blast to detect the fastest genes in different eukaryotic genomes. Distinguishing between these two possible explanations would be of great importance for any studies dealing with the taxon distribution of proteins and the origin of novel genes.}, keywords = {Amino Acid, Animals, Computational Biology, Databases, Evolution, Genes, Humans, Molecular, Phylogeny, Protein, Sequence Analysis, Sequence Homology} } It has been shown in a variety of organisms, including mammals, that genes that appeared recently in evolution, for example orphan genes, evolve faster than older genes. Low functional constraints at the time of origin of novel genes may explain these results. However, this observation has been recently attributed to an artifact caused by the inability of Blast to detect the fastest genes in different eukaryotic genomes. Distinguishing between these two possible explanations would be of great importance for any studies dealing with the taxon distribution of proteins and the origin of novel genes. |
2006 |
Blanco, Enrique, Farré, Domènec, Albà, M Mar, Messeguer, Xavier, Guigó, Roderic ABS: a database of Annotated regulatory Binding Sites from orthologous promoters. (Article) Nucleic acids research, 34 (Database issue), pp. D63–7, 2006, ISSN: 1362-4962. (Abstract | Links | BibTeX | Tags: Animals, Binding Sites, Chickens, Chickens: genetics, Databases, Genetic, Genomics, Humans, Internet, Mice, Nucleic Acid, Promoter Regions, Rats, Transcription Factors, Transcription Factors: metabolism, User-Computer Interface) @article{Blanco2006, title = {ABS: a database of Annotated regulatory Binding Sites from orthologous promoters.}, author = {Blanco, Enrique and Farré, Domènec and Albà, M Mar and Messeguer, Xavier and Guigó, Roderic}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1347478&tool=pmcentrez&rendertype=abstract}, issn = {1362-4962}, year = {2006}, date = {2006-01-01}, journal = {Nucleic acids research}, volume = {34}, number = {Database issue}, pages = {D63--7}, abstract = {Information about the genomic coordinates and the sequence of experimentally identified transcription factor binding sites is found scattered under a variety of diverse formats. The availability of standard collections of such high-quality data is important to design, evaluate and improve novel computational approaches to identify binding motifs on promoter sequences from related genes. ABS (http://genome.imim.es/datasets/abs2005/index.html) is a public database of known binding sites identified in promoters of orthologous vertebrate genes that have been manually curated from bibliography. We have annotated 650 experimental binding sites from 68 transcription factors and 100 orthologous target genes in human, mouse, rat or chicken genome sequences. Computational predictions and promoter alignment information are also provided for each entry. A simple and easy-to-use web interface facilitates data retrieval allowing different views of the information. In addition, the release 1.0 of ABS includes a customizable generator of artificial datasets based on the known sites contained in the collection and an evaluation tool to aid during the training and the assessment of motif-finding programs.}, keywords = {Animals, Binding Sites, Chickens, Chickens: genetics, Databases, Genetic, Genomics, Humans, Internet, Mice, Nucleic Acid, Promoter Regions, Rats, Transcription Factors, Transcription Factors: metabolism, User-Computer Interface} } Information about the genomic coordinates and the sequence of experimentally identified transcription factor binding sites is found scattered under a variety of diverse formats. The availability of standard collections of such high-quality data is important to design, evaluate and improve novel computational approaches to identify binding motifs on promoter sequences from related genes. ABS (http://genome.imim.es/datasets/abs2005/index.html) is a public database of known binding sites identified in promoters of orthologous vertebrate genes that have been manually curated from bibliography. We have annotated 650 experimental binding sites from 68 transcription factors and 100 orthologous target genes in human, mouse, rat or chicken genome sequences. Computational predictions and promoter alignment information are also provided for each entry. A simple and easy-to-use web interface facilitates data retrieval allowing different views of the information. In addition, the release 1.0 of ABS includes a customizable generator of artificial datasets based on the known sites contained in the collection and an evaluation tool to aid during the training and the assessment of motif-finding programs. |
2004 |
Castresana, Jose, Guigó, Roderic, Albà, M Mar Journal of molecular evolution, 59 (1), pp. 72–9, 2004, ISSN: 0022-2844. (Abstract | Links | BibTeX | Tags: Base Composition, Base Composition: genetics, Chromatin, Chromatin: metabolism, Chromosomes, Computational Biology, Databases, DNA-Binding Proteins, DNA-Binding Proteins: genetics, DNA-Binding Proteins: metabolism, Evolution, Genetic, Genome, human, Humans, Introns, Introns: genetics, Models, Molecular, Multigene Family, Multigene Family: genetics, Pair 19, Pair 19: genetics, Phylogeny, Zinc Fingers, Zinc Fingers: genetics) @article{Castresana2004, title = {Clustering of genes coding for DNA binding proteins in a region of atypical evolution of the human genome.}, author = {Castresana, Jose and Guigó, Roderic and Albà, M Mar}, url = {http://www.ncbi.nlm.nih.gov/pubmed/15383909}, issn = {0022-2844}, year = {2004}, date = {2004-01-01}, journal = {Journal of molecular evolution}, volume = {59}, number = {1}, pages = {72--9}, abstract = {Comparison of the human and mouse genomes has revealed that significant variations in evolutionary rates exist among genomic regions and that a large part of this variation is interchromosomal. We confirm in this work, using a large collection of introns, that human chromosome 19 is the one that shows the highest divergence with respect to mouse. To search for other differences among chromosomes, we examine the distribution of gene functions in human and mouse chromosomes using the Gene Ontology definitions. We found by correspondence analysis that among the strongest clusterings of gene functions in human chromosomes is a group of genes coding for DNA binding proteins in chromosome 19. Interestingly, chromosome 19 also has a very high GC content, a feature that has been proposed to promote an opening of the chromatin, thereby facilitating binding of proteins to the DNA helix. In the mouse genome, however, a similar aggregation of genes coding for DNA binding proteins and high GC content cannot be found. This suggests that the distribution of genes coding for DNA binding proteins and the variations of the chromatin accessibility to these proteins are different in the human and mouse genomes. It is likely that the overall high synonymous and intron rates in chromosome 19 are a by-product of the high GC content of this chromosome.}, keywords = {Base Composition, Base Composition: genetics, Chromatin, Chromatin: metabolism, Chromosomes, Computational Biology, Databases, DNA-Binding Proteins, DNA-Binding Proteins: genetics, DNA-Binding Proteins: metabolism, Evolution, Genetic, Genome, human, Humans, Introns, Introns: genetics, Models, Molecular, Multigene Family, Multigene Family: genetics, Pair 19, Pair 19: genetics, Phylogeny, Zinc Fingers, Zinc Fingers: genetics} } Comparison of the human and mouse genomes has revealed that significant variations in evolutionary rates exist among genomic regions and that a large part of this variation is interchromosomal. We confirm in this work, using a large collection of introns, that human chromosome 19 is the one that shows the highest divergence with respect to mouse. To search for other differences among chromosomes, we examine the distribution of gene functions in human and mouse chromosomes using the Gene Ontology definitions. We found by correspondence analysis that among the strongest clusterings of gene functions in human chromosomes is a group of genes coding for DNA binding proteins in chromosome 19. Interestingly, chromosome 19 also has a very high GC content, a feature that has been proposed to promote an opening of the chromatin, thereby facilitating binding of proteins to the DNA helix. In the mouse genome, however, a similar aggregation of genes coding for DNA binding proteins and high GC content cannot be found. This suggests that the distribution of genes coding for DNA binding proteins and the variations of the chromatin accessibility to these proteins are different in the human and mouse genomes. It is likely that the overall high synonymous and intron rates in chromosome 19 are a by-product of the high GC content of this chromosome. |
2002 |
Albà, M Mar, Laskowski, Roman A, Hancock, John M Detecting cryptically simple protein sequences using the SIMPLE algorithm. (Article) Bioinformatics (Oxford, England), 18 (5), pp. 672–8, 2002, ISSN: 1367-4803. (Abstract | Links | BibTeX | Tags: Algorithms, Amino Acid, Amino Acid Sequence, Amino Acid: genetics, Databases, Genetic, Genetic Variation, Internet, Minisatellite Repeats, Minisatellite Repeats: genetics, Models, Molecular Sequence Data, Protein, Protein: methods, Proteins, Proteins: chemistry, Repetitive Sequences, Saccharomyces cerevisiae, Saccharomyces cerevisiae: genetics, Sensitivity and Specificity, Sequence Analysis, Sequence Homology, Software, Statistical) @article{Alba2002, title = {Detecting cryptically simple protein sequences using the SIMPLE algorithm.}, author = {Albà, M Mar and Laskowski, Roman A and Hancock, John M}, url = {http://www.ncbi.nlm.nih.gov/pubmed/12050063}, issn = {1367-4803}, year = {2002}, date = {2002-01-01}, journal = {Bioinformatics (Oxford, England)}, volume = {18}, number = {5}, pages = {672--8}, abstract = {Low-complexity or cryptically simple sequences are widespread in protein sequences but their evolution and function are poorly understood. To date methods for the detection of low complexity in proteins have been directed towards the filtering of such regions prior to sequence homology searches but not to the analysis of the regions per se. However, many of these regions are encoded by non-repetitive DNA sequences and may therefore result from selection acting on protein structure and/or function.}, keywords = {Algorithms, Amino Acid, Amino Acid Sequence, Amino Acid: genetics, Databases, Genetic, Genetic Variation, Internet, Minisatellite Repeats, Minisatellite Repeats: genetics, Models, Molecular Sequence Data, Protein, Protein: methods, Proteins, Proteins: chemistry, Repetitive Sequences, Saccharomyces cerevisiae, Saccharomyces cerevisiae: genetics, Sensitivity and Specificity, Sequence Analysis, Sequence Homology, Software, Statistical} } Low-complexity or cryptically simple sequences are widespread in protein sequences but their evolution and function are poorly understood. To date methods for the detection of low complexity in proteins have been directed towards the filtering of such regions prior to sequence homology searches but not to the analysis of the regions per se. However, many of these regions are encoded by non-repetitive DNA sequences and may therefore result from selection acting on protein structure and/or function. |
Publication List
Amino Acid Animals Computational Biology Databases de novo gene DNA Evolution Genetic Genome human Humans Mice Molecular Molecular Sequence Data Proteins Proteins: chemistry Proteins: genetics Repetitive Sequences ribosome profiling RNA-Seq Selection Sequence Analysis Sequence Homology transcriptomics yeast
2007 |
Positional bias of general and tissue-specific regulatory motifs in mouse gene promoters. (Article) BMC genomics, 8 pp. 459, 2007, ISSN: 1471-2164. |
On homology searches by protein Blast and the characterization of the age of genes. (Article) BMC evolutionary biology, 7 pp. 53, 2007, ISSN: 1471-2148. |
2006 |
ABS: a database of Annotated regulatory Binding Sites from orthologous promoters. (Article) Nucleic acids research, 34 (Database issue), pp. D63–7, 2006, ISSN: 1362-4962. |
2004 |
Journal of molecular evolution, 59 (1), pp. 72–9, 2004, ISSN: 0022-2844. |
2002 |
Detecting cryptically simple protein sequences using the SIMPLE algorithm. (Article) Bioinformatics (Oxford, England), 18 (5), pp. 672–8, 2002, ISSN: 1367-4803. |