Penalized regression for left‐truncated and right‐censored survival data

High‐dimensional data are becoming increasingly common in the medical field as large volumes of patient information are collected and processed by high‐throughput screening, electronic health records, and comprehensive genomic testing. Statistical models that attempt to study the effects of many predictors on survival typically implement feature selection or penalized methods to mitigate the undesirable consequences of overfitting. In some cases survival data are also left‐truncated which can give rise to an immortal time bias, but penalized survival methods that adjust for left truncation are not commonly implemented. To address these challenges, we apply a penalized Cox proportional hazards model for left‐truncated and right‐censored survival data and assess implications of left truncation adjustment on bias and interpretation. We use simulation studies and a high‐dimensional, real‐world clinico‐genomic database to highlight the pitfalls of failing to account for left truncation in survival modeling.

[1]  Adler J. Perotte,et al.  X-CAL: Explicit Calibration for Survival Analysis , 2020, NeurIPS.

[2]  R. Tibshirani,et al.  Lasso and Elastic-Net Regularized Generalized Linear Models [R package glmnet version 4.0-2] , 2020 .

[3]  H. Uno,et al.  Assessment of Temporal Selection Bias in Genomic Testing in a Cohort of Patients With Cancer , 2020, JAMA network open.

[4]  S. Baxi,et al.  Comparison of Population Characteristics in Real-World Clinical Oncology Databases in the US: Flatiron Health, SEER, and NPCR , 2020 .

[5]  Semon Wu,et al.  Overall survival prediction of non-small cell lung cancer by integrating microarray and clinical data with deep learning , 2020, Scientific Reports.

[6]  Joshua Haimson,et al.  Model-assisted cohort selection with bias analysis for generating large-scale cohorts from the EHR for oncology research , 2020, ArXiv.

[7]  V. Seshan,et al.  Harnessing Clinical Sequencing Data for Survival Stratification of Patients with Metastatic Lung Adenocarcinomas. , 2019, JCO precision oncology.

[8]  L. Kachnic,et al.  Immortal Time Bias in National Cancer Data Base Studies. , 2019, International journal of radiation oncology, biology, physics.

[9]  Matthew D. Austin,et al.  Transformation model estimation of survival under dependent truncation and independent censoring , 2018, Statistical methods in medical research.

[10]  A. Flecker,et al.  Riparian plant litter quality increases with latitude , 2017, Scientific Reports.

[11]  A. Abernethy,et al.  Development and validation of a real-world clinicogenomic database. , 2017 .

[12]  Joshua E. Lewis,et al.  Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models , 2017, Scientific Reports.

[13]  Vladimir A. Kuznetsov,et al.  Big genomics and clinical data analytics strategies for precision cancer prognosis , 2016, Scientific Reports.

[14]  James E. Helmreich Regression Modeling Strategies with Applications to Linear Models, Logistic and Ordinal Regression and Survival Analysis (2nd Edition) , 2016 .

[15]  David Madigan,et al.  Large‐scale parametric survival analysis , 2013, Statistics in medicine.

[16]  A. Giobbie-Hurder,et al.  Challenges of guarantee-time bias. , 2013, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[17]  Haijun Gong,et al.  A transcriptome analysis by lasso penalized Cox regression for pancreatic cancer survival. , 2011, Journal of bioinformatics and computational biology.

[18]  E. Steyerberg,et al.  [Regression modeling strategies]. , 2011, Revista espanola de cardiologia.

[19]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[20]  Bin Wang,et al.  Deconvolution Estimation in Measurement Error Models: The R Package decon. , 2011, Journal of statistical software.

[21]  Samy Suissa,et al.  Problem of immortal time bias in cohort studies: example using statins for preventing progression of diabetes , 2010, BMJ : British Medical Journal.

[22]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[23]  Carlos Caldas,et al.  PREDICT: a new UK prognostic model that predicts survival following surgery for invasive breast cancer , 2010, Breast Cancer Research.

[24]  Jian Huang,et al.  SCAD-penalized regression in high-dimensional partially linear models , 2009, 0903.5474.

[25]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[26]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[27]  Jiang Gui,et al.  Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data , 2005, Bioinform..

[28]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[29]  J. Benichou,et al.  Choice of time‐scale in Cox's model analysis of epidemiologic cohort data: a simulation study , 2004, Statistics in medicine.

[30]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[31]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[32]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[33]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[34]  M. Kenward,et al.  Contribution to the discussion of the paper by Diggle, Tawn and Moyeed , 1998 .

[35]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[36]  F. Harrell,et al.  Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors , 2005 .

[37]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[38]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[39]  Nicholas P. Jewell,et al.  A note on the product-limit estimator under right censoring and left truncation , 1987 .

[40]  J. Herson The statistical analysis of failure time data , 1981 .

[41]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .

[42]  D. Cox Regression Models and Life-Tables , 1972 .

[43]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[44]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[45]  D.,et al.  Regression Models and Life-Tables , 2022 .