Abstract:Purpose: Breast cancer is the leading cause of cancer death worldwide in women. The molecular mechanism for human breast cancer is unknown. Gene microarray has been widely used in breast cancer research to identify clinically relevant molecular subtypes as well as to predict prognosis survival. So far, the valuable multigene signatures in clinical practice are unclear, and the biological importance of individual genes is difficult to detect, as the described signatures virtually do not overlap. Early prognosis of this disease, breast invasive ductal carcinoma (IDC) and breast ductal carcinoma in situ (DCIS), is vital in breast surgery.Methods: Thus, this study reports gene expression profiling in large breast cancer cohorts from Gene Expression Omnibus, including GSE29044 (N=138) and GSE10780 (N=185) test series and four independent validation series GSE21653 (N=266), GSE20685 (N=327), GSE26971 (N=276), and GSE12776 (N=204). Significantly differentially expressed genes in human breast IDC and breast DCIS were detected by transcriptome microarray analysis.Results: We created a set of three genes (MAMDC2, TSHZ2, and CLDN11) that were significantly correlated with disease-free survival of breast cancer patients using a univariate Cox regression model (significance level P<0.01) in a meta-analysis. Based on the risk score of the three genes, the test series patients could be separated into low-risk and high-risk groups with significantly different survival times. This signature was validated in the other three cohorts. The prognostic value of this three-gene signature was confirmed in the internal validation series and another four independent breast cancer data sets. The prognostic impact of one of the three genes, CLDN11, was confirmed by immunohistochemistry. CLDN11 was significantly overexpressed in human breast IDC as compared with normal breast tissues and breast DCIS.Conclusion: Using novel gene expression profiling together with a meta-analysis validation approach, we have identified a three-gene signature with independent prognostic impact. Furthermore, CLDN11 may offer a biomarker to predict prognosis as well as a new target for prognostic and therapeutic intervention for human breast IDC.

Cluster validation by prediction strength

Clustering and Prediction with Variable Dimension Covariates

Biomarker Discovery To Improve Prediction Of Breast Cancer Survival: Using Gene Expression Profiling, Meta-Analysis, And Tissue Validation

An Efficient Digital Twin Assisted Clustered Federated Learning Algorithm for Disease Prediction.

Powerful Significance Testing for Unbalanced Clusters

Clustering gene expression data based on predicted differential effects of GV interaction.

Clustering validation by distribution hypothesis learning

On the Index of Cluster Validity

Class-Conditional Conformal Prediction with Many Classes

Clustering with Confidence: Finding Clusters with Statistical Guarantees

Knowing what you know: valid and validated confidence sets in multiclass and multilabel prediction

Robust model-based clustering with gene ranking

Sparse clusterability: testing for cluster structure in high dimensions

Normalised clustering accuracy: An asymmetric external cluster validity measure

Hierarchical clustered multiclass discriminant analysis via cross-validation

Statistical power for cluster analysis

Extension of the Dip-test Repertoire -- Efficient and Differentiable p-value Calculation for Clustering

Development and validation of a reliable DNA copy-number-based machine learning algorithm (CopyClust) for breast cancer integrative cluster classification

On the informativeness of dominant and co-dominant genetic markers for Bayesian supervised clustering

A Self-Training Subspace Clustering Algorithm under Low-Rank Representation for Cancer Classification on Gene Expression Data

Cross-Validation Approach to Evaluate Clustering Algorithms: An Experimental Study Using Multi-Label Datasets