GeMCRIS and Gene Transfer Product Characterization Vocabulary Development

Kathy A. Lesh,R. Jambou,M. O'Reilly,A. Patterson,E. Rosenthal,T. Shih,J. Foss
Abstract:The National Institutes ofHealth Office ofBiotechnology Activities (OBA) is developing a database (GeMCRIS Genetic Modification Clinical Research Information System) for capturing information on clinical trials involving the transfer ofrecombinant DNA to humans. This database will provide information for a diverse audience. Some data will be for internal use only, while other information will be shared with the public via the Internet. Certain fields within the database will use controlled vocabularies. Where possible, the controlled vocabularies will coincide with vocabularies recommended by the International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH). This may include the use ofMedDRA (Medical Dictionary for Regulatory Activities). Controlled vocabularies for the characterization of gene transfer products do not currently exist. OBA staff, in coordination with the FDA, are developing these vocabularies. The first step in developing the gene transfer product vocabularies was to identify several subject matter experts within FDA and OBA. Each group was asked to put together and cluster terms they use when characterizing gene transfer products. The FDA point ofreference is the regulatory process. OBA's focus is providing information to researchers and to the public regarding human recombinant DNA research in order to optimize the design and conduct of this research. Ideally, the vocabulary should encompass the needs ofboth groups. Four basic components were identified: genetic elements, ex vivo cultured cells, vector producer systems and gene vector systems. The characterization of the last component remains to be qualified because not all transgenes are transferred via a true vector (i.e. a vehicle capable of transferring genetic information such as a plasmid or a modified virus), but rather are delivered into the human body as a "naked" nucleic acid fragment. Ex vivo cultured cells have been well described and classified, but the other components have not. Critical to the classification, validation, consistency, and maintenance of these vocabularies is the creation of concept definitions. The subject matter experts have been asked to identify the attributes and behaviors ofeach concept. This aspect of the vocabulary development process has been particularly challenging to the group. An example of this challenge is the definition of genetic elements. While lists of genetic elements are available, identifying what defines a nucleic acid sequence as a genetic element is elusive. Ultimately, a thesaurus will be incorporated into GeMCRIS. The thesaurus will contain synonyms and definitions for all concepts in the gene transfer product vocabularies and will be essential for search and retrieval of information. Another important aspect is how the vocabularies are linked in GeMCRIS. It is important for the genetic elements to be linked with the diseases being targeted. It must be possible to search and retrieve which genetic elements expressing proteins are being used to target specific diseases. There are at least two ways to ensure this capability. One is to classify the genetic elements according to disease target. An alternative method is to create relationships between the disease targets and the genetic elements. The latter method has been selected and the genetic elements that express proteins will be classified according to the type ofprotein. The exact protein classification system has not yet been identified, but it has been decided that a structural classification system will not be used. The subject matter experts have been meeting with a vocabulary developer individually and in small groups. The goal is to get the high level terms identified and defined so the developers can prepare a beta version of GeMCRIS in 2001. Also, focus groups have met to identify their specific needs when using GeMCRIS. The focus group comments are still being analyzed. Ideally, the needs identified by the focus group will coincide with the beta version ofGeMCRIS. However, the database is being structured to handle changes and additions. This capability is critical because the field is growing and changing frequently. GeMCRIS and the gene transfer product vocabularies are evolving. Just as the field is in the early developmental stage, so are the vocabularies to describe the field. However, maintaining the database structure and its vocabularies is an incremental, iterative process. The usefulness of the database and its vocabulary is dependent upon input from all potential users. 1067-5027/01/$5.00 D 2001 AMIA, Inc. 953
What problem does this paper attempt to address?