[Analysis of Factors Shaping S. Pneumoniae Codon Usage].

Zhuo-Cheng Hou,Ning Yang
2002-01-01
Abstract:Streptococcus pneumoniae is a Gram-positive bacteria causing community acquired pneumonia, bacteremia, meningitis and otitis media. As a human pathogen, S. pneumoniae is the most common bacterial cause of acute respiratory infection and otitis media and is estimated to result in over 3 million deaths in children every year worldwide. S. pneumoniae has played a pivotal role in the fields of genetics and microbiology. The complete genome of S. pneumoniae was sequenced and published recently. In order to have a further insight into the synonymous codon usage evolution and to study S. pneumoniae gene codon usage pattern in highly and lowly expressed genes, factors shaping synonymous codon usage pattern of S. pneumoniae were analyzed in this paper. Genes larger than of equal to 300bp of the complete genome of S. pneumoniae (1709 genes in total) were analyzed. The gene expression level (CAI, codon adaption index), RSCU (relative synonymous codon usage), Nc (effective codon numbers), A3s, T3s, G3s, C3s (the frequencies of the adenine, thymine, guanine and cytosine at the synonymous third position of codons, respectively), GC (frequency of guanine + cytosine in gene sequence), GC3s (frequency of guanine + cytosine at the synonymous third position of codons) values and multivariate statistics were calculated. The results show that there is a significant increment of cytosine (C) usage at the synonymous positions in highly expressed genes than lowly expressed genes, while lowly expressed genes tend to use guanine (G) at synonymous sites. Gene expression has a significant correlation with the first axis of correspondence analysis (COA; R = 0.86) and significant effects on codon usage by comparing the codon usage patterns of highly expressed genes and lowly expressed genes. The G + C content of genes has a moderately correlation with gene expression (R = 0.44) and the first axis of the COA (R = 0.51), and therefore shapes gene expression and codon usage in S. pneumoniae. The dataset is divided into 6 groups by gene length. Then, gene expression level, GC3s and Nc values are compared among 6 different gene length groups (> = 300 bp, 2000-2999 bp, 1500-1999 bp, 1000-1499 bp, 500-999 bp, < 500 bp). CAI, GC3s and Nc values show some differences among different gene length groups. Protein hydrophobicities do not show significant influence on codon usage pattern. In summary, the natural selection on gene expression level and the base composition of genes are the major factors affecting codon usage of S. pneumoniae. Gene length shapes codon usage of S. pneumoniae in a minor way.
What problem does this paper attempt to address?