DNA oligomers: All posible DNA words between 3 to 6 nucleotides can be used to discover putative new functional motifs.
Depending on the size and strand selected there are diferent number (N) of posible DNA words.
SIZE | N STRAND FORWARD | N PALINDROMES | N NON PALINDROMES | N STRAND BOTH |
3 | 64 | 0 | 32 | 32 |
4 | 256 | 16 | 120 | 136 |
5 | 1024 | 0 | 512 | 512 |
6 | 4096 | 64 | 2016 | 2080 |
For example, selecting size=4 and strand=both in a dataset of plant promoters extracted from EPD, 3 out of 136 posibles oligomers were significant with p-values < 1e-20.
TATA, ATAG & ATAA (or ATAT, CTAT & TTAT) show positional overrepresentation between -20 to -40 relative to the transcription start site.
These 3 tetramers correspond to the core binding motif for the TBP.