Ecdna Machine Learning Modeling

Shixiang Wang,Qi Zhao
DOI: https://doi.org/10.5281/zenodo.10212116
2023-01-01
Abstract:1. ecDNA_cargo_gene_modeling_data.csv.gz The dataset contains features from 386 TCGA tumors for modeling ecDNA cargo gene prediction. It was converted from R data format with the following code. NOTE: columns 'sample' and 'gene_id' are not used for actual modeling but for identifying, and sampling purposes. library(data.table) data = readRDS("~/../Downloads/ecDNA_cargo_gene_modeling_data.rds") colnames(data)[3] = "total_cn" data.table::fwrite(data, file = "~/../Downloads/ecDNA_cargo_gene_modeling_data.csv.gz", sep = ",") 2. gcap_pcawg_WGS_result.tar.gz GCAP analysis results for PCAWG allele-specific copy number profiles derived from WGS. 3. gcap_tcga_snp6_result.tar.gz GCAP analysis results for TCGA allele-specific copy number profiles derived from SNP6 array. 4. gcap_Changkang_WES_result.tar.gz GCAP analysis results for SYSUCC Changkang allele-specific copy number profiles derived from tumor-normal paired WES. 5. tcga_overlap_gene_wgs.rds, tcga_overlap_gene_snp.rds and tcga_overlap_gene_wes.rds These datasets contain TCGA gene-level copy number results in R data format from overlapping samples (dataset above). WGS from PCAWG, SNP array, and WES from GDC portal. 6. cellline-batch1.zip & cellline-batch1.zip GCAP results of cell line batch 1 and batch 2. 7. AA_cellline_wgs.zip AA software results for cell line batch 1. 8. Batch2_AA_summary.xlsx AA software results for cell line batch 2. 9. FISH-for-supp-file.zip Extended raw FISH images from 12 CRC samples. 10. SNU216.zip Extended AA and GCAP analysis on SNU216. 11. aa_ffpe.zip and AA_summary_table_of_6_erbb2_ffpe_samples.xlsx Extended AA running files (all results) and result summary data for 6 GCAP predicted ERBB2 amp clinical samples.
What problem does this paper attempt to address?