Abstract:A novel human gene encoding a protein of 208 amino acids is identified and characterized, which has been offered by HGNC with symbol of C17orf32 and name of chromosome 17 open reading frame 32. The full-length cDNA of 1679 bp for C17orf32 was cloned through a blast search of public databases following the identification of 1 119 bp cDNA obtained by EST assembly with full robotization of SiClone software (created by Chen RS and Ling LJ, and will be released on their website) in ShenWei IV-type supercomputer. Structurally, C17orf32 has one calcitonin / CGRP / IAPP family signature from amino acid 16 to 169, one dihydroorotase signature from amino acid 43 to 117, one tyrosine kinase phosphorylation site from amino acid 68 to 75, and one bipartite nuclear localization signal from amino acid 28 to 45. These motifs. imply the potential biological importance of this gene. Genomic organization analyses show that C17orf32 gene is comprised of six exons, in the size ranging from 43 to 1 101 bp, and five introns, in the size ranging from 163 to 1 124 bp, and spanning 4.61 kb. All of the exon/intron boundaries are consistent with the GT/AG rule, and consensuses surrounding the splice boundaries are found as well. The C17orf32 gene is located on accession NT - 010808.7 in the human chromosome 17, and is only linked with LOC124919, a hypothetical human gene of 889 bp mRNA encoding hypothetical protein XP - 058865 of 260 amino acids supported by XM - 058865. The sequence of LOC124919 has not been verified experimentally. Furthermore, the full-length ORF of 627 bp cDNA from 31 to 654 bp by RT-PCR from the single-stranded human gastric adenocarcinoma MGC803 cell line are cloned and sequenced, which is fully identical with that of the in silico cloning determined by the nucleotide sequencing. Thus,, in silico cloning of C17orf31 gene with GenBank accession number of AY074907 and TPA: BKO00260 is identified solely by bioinformatics analyses. The full-length cDNA sequence of 1 679 bp exhibits very good overall homology to that of LOC123722 of 899 bp mRNA, with matching percentage of 99 % in 78 % of total window and 57 % in 57 % of total window over the full-length nucleotide and protein, respectively. However, the base G in the No. 401 position of LOC123722 cDNA is a redundant insert, which causes a reading frame shift in the translation of an alternative protein. The insert G of LOC123722 is not supported by the experimental clone, and is fully rejected by human EST alignment, and is shown as a redundance by genomic GT/AG organization analysis. C17orf32 gene has 9 putative promoters with possibility of 58 % similar to 97 %, two TATAs, a stop codon in the upstream of ORF, two PolyA signals and a PolyA tail in the downstream of OFF, and accords with Kozak rule around the translation start of the ORF. Based on the above results, it can be concluded that a complete novel human gene is obtained. The full-length gene sequence exhibits little overall homology to any known protein at either the nucleotide or the amino acid level. The two related proteins, with 31 % (in 29 % of total window) and 18 % ( in 18 % of total window) identity over the full-length protein, respectively, are hypothetical caenorhabditis elegans protein F09E5. 11. p of 221 amino acids and polyphosphate kinase [the filamentous nitrogen-fixing cyanobacterium Anabaena sp. strain PCC 71201 of 736 amino acids. Taken together, by combining bioinformatics analyses with experimental verification, a novel human gene C17orf32 is successfully cloned, verified by a series of theoretical and experimental evidence.The strategy will be helpful in discovering more novel human genes, even in correcting errors appeared in NCBI GENOME ANNOTATION PROJECT REFSEQs, such as LOC124919, a model reference sequence predicted from NCBI contig NT - 010808 by automated computational analysis using gene prediction method. Therefore, human genome coding region annotated by computer should be used with caution.

In Silico Cloning of C17orf32, a Novel Human Gene and Verification of Its Coding Region by RT-PCR

Molecular Cloning and Characterization of a Novel Human C4orf13 Gene, Tentatively a Member of the Sodium Bile Acid Cotransporter Family.

Cloning and Tissue Expressional Characterization of a Full-Length Cdna Encoding Human Neuronal Protein P17.3.

Cloning, expression and genomic structure of a novel human GNB2L1 gene, which encodes a receptor of activated protein kinase C (RACK) *

Cloning, Expression and Mapping of the Full-Length Cdna of Human CCTβ Subunit

[Cloning and Identification of a New Telomeric-Associated Zinc Finger Protein Cdna].

Molecular cloning and characterization of a novel human gene containing 4 ankyrin repeat domains.

Cloning and Identification of a Novel cDNA 1 Which May Be Associated with FKBP25

A Novel Human Gene ( WDR 25) Encoding a 7-Wd40-containing Protein Maps on 14Q32

Cloning and identification of a novel cDNA which may be associated with FKBP25.

[Correction of Five Different Types of Errors of Model REFSEQs Appeared in NCBI Human Gene Database Only by Using Two Novel Human Genes C17orf32 and ZNF362].

C2H2-171: a novel human cDNA representing a developmentally regulated POZ domain/zinc finger protein preferentially expressed in brain

Molecular Cloning and Characterization of A Novel Human Gene (anp32e Alias Lanpl) from Human Fetal Brain

Cloning and Identification of a Cdna That Encodes a Novel Human Protein with Thrombospondin Type I Repeat Domain, Hpwtsr

Cloning of a new human cytochrome P450 2A6 cDNA

Cloning and Expression of a Novel Human C5orf12 Gene*, a Member of the TMS_TDE Family.

Cloning and Identification of a Novel Human RNPC3 Gene That Encodes a Protein with Two RRM Domains and is Expressed in the Cell Nucleus

Cloning of a Human Arsenic Resistance Related Gene (harrg) Cdna and Its Expression in E.coli

Cloning, Tissue Expression Pattern, and Chromosome Location of a Novel Human Gene BRI3BP

Cloning of the Complete Cdna of Znf191, A Novel Human Zinc Finger Protein Gene

Molecular Cloning and Expression Analysis of a Novel Human Cdna Fragment Encoding a Putative Ser/Thr Protein Kinase