Abstract 5352: Optimization of Library and Enrichment Procedures for RNASeq Using RNA from Formalin Fixed Paraffin Embedded Tissue

Ji Wen,Ying Shao,Ruth Tatevossian,Yongjin Li,David W. Ellison,Gang Wu,Jinghui Zhang,John B. Easton
DOI: https://doi.org/10.1158/1538-7445.am2017-5352
IF: 11.2
2017-01-01
Cancer Research
Abstract:Abstract Large repositories of diagnostic formalin-fixed, paraffin-embedded (FFPE) tissue remain underutilized for transcriptome sequence analysis due to degradation of the RNA from the fixation process, which makes the samples unsuitable for traditional mRNA sequencing (mRNASeq). The use of random primers to generate cDNA, and biotinylated exon baits to enrich for coding regions allows for the generation of RNASeq data, but still yields data of variable quality, in part due to differences in FFPE sample processing. We performed a systematic analysis to determine which aspects of library construction and the enrichment process can be optimized to provide the best RNASeq data from FFPE samples. The TruSeq RNA Access Library Prep Kit (Illumina) protocol was selected to evaluate optimization procedures for RNASeq from FFPE material. RNA was isolated from FFPE tissue using the Maxwell automated system (Promega). RNA libraries were also prepared with RNA extracted from frozen tissue of the matching samples using the TruSeq V2 mRNASeq Library Prep Kit (Illumina) for a comparison of exon coverage metrics. Samples derived from different types of pediatric cancer were selected for evaluation, including 2 low-grade gliomas, 2 ependymomas, 9 melanomas and 1 Ewing sarcoma. All of the FF tumor samples selected contained oncogenic gene fusions, previously identified by analysis of mRNASeq data. The parameters investigated included the initial amount of RNA input (10ng-400ng), library amplification PCR cycle number (9 cycles-15 cycles), and the duration of time for exon probe hybridization (1.5 hours vs 16 hours). The quality of the data was based on the overall library complexity and exon coverage. The complexity of the sequencing was measured using duplication rate (DupRate) defined as a library inserts from different sequence reads mapping to the same location in the reference genome; while the coverage was measured using the percentage of coding bases with coverage >=20x (PCB20x). Using the RNA Access protocol, we found that the PCB20x metrics were comparable between the FFPE samples and matched FF mRNASeq controls under the non-optimized conditions of 20ng of RNA input, 1.6 hour hybridization, and 15 cycles of PCR for library amplification. However, optimizing the protocol by increasing the input of RNA to 400ng, reducing the PCR to between 9-11 cycles, and extending the hybridization time to 16 hours increased library diversity, reduced the DupRate from greater than 80% to less than 44%, and increased coding region coverage PCB20x by at least 10%. All expected gene fusions identified in the RNASEQ data from the FF samples were also detected in the FFPE samples. We have shown that increasing RNA input, reducing PCR cycles and extending the length of hybridization can significantly improve data quality and maximize library diversity, allowing for greater utilization of archival FFPE material for RNASeq. Citation Format: Ji Wen, Ying Shao, Ruth Tatevossian, Yongjin Li, David W. Ellison, Gang Wu, Jinghui Zhang, John B. Easton. Optimization of library and enrichment procedures for RNASeq using RNA from formalin fixed paraffin embedded tissue [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 5352. doi:10.1158/1538-7445.AM2017-5352
What problem does this paper attempt to address?