Preference of Simple Sequence Repeats in Coding and Non-Coding Regions of Arabidopsis Thaliana

LD Zhang,DJ Yuan,SW Yu,ZG Li,YF Cao,ZQ Miao,HM Qian,KX Tang
DOI: https://doi.org/10.1093/bioinformatics/bth043
IF: 5.8
2004-01-01
Bioinformatics
Abstract:Motivation: Simple sequence repeats or microsatellites have been found abundantly in many genomes. However, the significance of distribution preference has not been completely understood. Completion of the Arabidopsis genome sequencing allows us to better understand and characterize microsatellites.Results: Microsatellite distribution was more abundant in 5'-flanking regions of genes compared with that expected in the whole genome, with an over-representation of AG and AAG repeats; there were clear differences from distributions in 3'-flanks and coding fractions, where triplet frequencies evidently corresponded to codon usage. We identified 1140 full-length genes that contained at least one locus of AG or AAG repeats in their upstream sequences, and whose functional characteristics were significantly associated with the repeats. This observation indicates that selective pressure markedly differed in the three transcribed regions, with positive selection of AG and AAG repeats in 5'-flanks close to those genes whose products are preferentially involved in transcription.
What problem does this paper attempt to address?