Molecular subtyping for clinically defined breast cancer subgroups

IntroductionBreast cancer is commonly classified into intrinsic molecular subtypes. Standard gene centering is routinely done prior to molecular subtyping, but it can produce inaccurate classifications when the distribution of clinicopathological characteristics in the study cohort differs from that of the training cohort used to derive the classifier.MethodsWe propose a subgroup-specific gene-centering method to perform molecular subtyping on a study cohort that has a skewed distribution of clinicopathological characteristics relative to the training cohort. On such a study cohort, we center each gene on a specified percentile, where the percentile is determined from a subgroup of the training cohort with clinicopathological characteristics similar to the study cohort. We demonstrate our method using the PAM50 classifier and its associated University of North Carolina (UNC) training cohort. We considered study cohorts with skewed clinicopathological characteristics, including subgroups composed of a single prototypic subtype of the UNC-PAM50 training cohort (n = 139), an external estrogen receptor (ER)-positive cohort (n = 48) and an external triple-negative cohort (n = 77).ResultsSubgroup-specific gene centering improved prediction performance with the accuracies between 77% and 100%, compared to accuracies between 17% and 33% from standard gene centering, when applied to the prototypic tumor subsets of the PAM50 training cohort. It reduced classification error rates on the ER-positive (11% versus 28%; P = 0.0389), the ER-negative (5% versus 41%; P < 0.0001) and the triple-negative (11% versus 56%; P = 0.1336) subgroups of the PAM50 training cohort. In addition, it produced higher accuracy for subtyping study cohorts composed of varying proportions of ER-positive versus ER-negative cases. Finally, it increased the percentage of assigned luminal subtypes on the external ER-positive cohort and basal-like subtype on the external triple-negative cohort.ConclusionsGene centering is often necessary to accurately apply a molecular subtype classifier. Compared with standard gene centering, our proposed subgroup-specific gene centering produced more accurate molecular subtype assignments in a study cohort with skewed clinicopathological characteristics relative to the training cohort.

[1]  R. Greil,et al.  A New Molecular Predictor of Distant Recurrence in ER-Positive, HER2-Negative Breast Cancer Adds Independent Information to Conventional Clinical Risk Factors , 2011, Clinical Cancer Research.

[2]  Aleix Prat Aparicio Comprehensive molecular portraits of human breast tumours , 2012 .

[3]  C. Perou,et al.  Mammary development meets cancer genomics , 2009, Nature Medicine.

[4]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[5]  A. Nobel,et al.  The molecular portraits of breast tumors are conserved across microarray platforms , 2006, BMC Genomics.

[6]  Charles M. Perou,et al.  Deconstructing the molecular portraits of breast cancer , 2010, Molecular oncology.

[7]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  M. Cronin,et al.  A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. , 2004, The New England journal of medicine.

[9]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[10]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[11]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[12]  Simen Myhre,et al.  The importance of gene-centring microarray data. , 2010, The Lancet. Oncology.

[13]  Charles M Perou,et al.  Systems biology and genomics of breast cancer. , 2011, Cold Spring Harbor perspectives in biology.

[14]  Zhiyuan Hu,et al.  Classification and risk stratification of invasive breast carcinomas using a real-time quantitative RT-PCR assay , 2006, Breast Cancer Research.

[15]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[16]  J. Hicks,et al.  Insight into the heterogeneity of breast cancer through next-generation sequencing. , 2011, The Journal of clinical investigation.

[17]  Joel S. Parker,et al.  Adjustment of systematic microarray data biases , 2004, Bioinform..

[18]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[20]  T. Sørlie,et al.  Merging transcriptomics and metabolomics - advances in breast cancer profiling , 2010, BMC Cancer.