Wrangling Real-World Data: Optimizing Clinical Research Through Factor Selection with LASSO Regression.
Howard KA., Anderson W., Podichetty JT., Gould R., Boyce D., Dasher P., Evans L., Kao C., Kumar VK., Hamilton C., Mathé E., Guerin PJ., Dodd K., Mehta AK., Ortman C., Patil N., Rhodes J., Robinson M., Stone H., Heavner SF.
Data-driven approaches to clinical research are necessary for understanding and effectively treating infectious diseases. However, challenges such as issues with data validity, lack of collaboration, and difficult-to-treat infectious diseases (e.g., those that are rare or newly emerging) hinder research. Prioritizing innovative methods to facilitate the continued use of data generated during routine clinical care for research, but in an organized, accelerated, and shared manner, is crucial. This study investigates the potential of CURE ID, an open-source platform to accelerate drug-repurposing research for difficult-to-treat diseases, with COVID-19 as a use case. Data from eight US health systems were analyzed using least absolute shrinkage and selection operator (LASSO) regression to identify key predictors of 28-day all-cause mortality in COVID-19 patients, including demographics, comorbidities, treatments, and laboratory measurements captured during the first two days of hospitalization. Key findings indicate that age, laboratory measures, severity of illness indicators, oxygen support administration, and comorbidities significantly influenced all-cause 28-day mortality, aligning with previous studies. This work underscores the value of collaborative repositories like CURE ID in providing robust datasets for prognostic research and the importance of factor selection in identifying key variables, helping to streamline future research and drug-repurposing efforts.