Publication:
Extracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques

dc.contributor.authorHayet Otero, Miren
dc.contributor.authorGarcía García, Fernando
dc.contributor.authorEspaña Yandiola, Pedro Pablo
dc.contributor.authorUrrutia Landa, Isabel
dc.contributor.authorErmecheo, Mónica Nieves
dc.contributor.authorQuintana, José María
dc.contributor.authorMenéndez, Rosario
dc.contributor.authorTorres, Antoni
dc.contributor.authorZalacain Jorge, Rafael
dc.contributor.authorArostegui, Inmaculada
dc.contributor.authorMartínez Minaya, Joaquín
dc.contributor.authorLee, Dae Jin
dc.contributor.rorhttps://ror.org/02jjdwm75
dc.date.accessioned2024-07-08T13:15:05Z
dc.date.available2024-07-08T13:15:05Z
dc.date.issued2023
dc.description.abstractWith the COVID-19 pandemic having caused unprecedented numbers of infections and deaths,large research efforts have been undertaken to increase our understanding of the disease and the factors which determine diverse clinical evolutions. Here we focused on a fully data-driven exploration regarding which factors (clinical or otherwise) were most informative for SARS-CoV-2 pneumonia severity prediction via machine learning (ML). In particular,feature selection techniques (FS),designed to reduce the dimensionality of data,allowed us to characterize which of our variables were the most useful for ML prognosis. We conducted a multi-centre clinical study,enrolling n = 1548 patients hospitalized due to SARS-CoV-2 pneumonia: where 792,238,and 598 patients experienced low,medium and high-severity evolutions,respectively. Up to 106 patient-specific clinical variables were collected at admission,although 14 of them had to be discarded for containing 60% missing values. Alongside 7 socioeconomic attributes and 32 exposures to air pollution (chronic and acute),these became d = 148 features after variable encoding. We addressed this ordinal classification problem both as a ML classification and regression task. Two imputation techniques for missing data were explored,along with a total of 166 unique FS algorithm configurations: 46 filters,100 wrappers and 20 embeddeds. Of these,21 setups achieved satisfactory bootstrap stability (?0.70) with reasonable computation times: 16 filters,2 wrappers,and 3 embeddeds. The subsets of features selected by each technique showed modest Jaccard similarities across them. However,they consistently pointed out the importance of certain explanatory variables. Namely: patient's C-reactive protein (CRP),pneumonia severity index (PSI),respiratory rate (RR) and oxygen levels -saturation Sp O2,quotients Sp O2/RR and arterial Sat O2/Fi O2-,the neutrophil-to-lymphocyte ratio (NLR) - to certain extent,also neutrophil and lymphocyte counts separately-,lactate dehydrogenase (LDH),and procalcitonin (PCT) levels in blood. A remarkable agreement has been found a posteriori between our strategy and independent clinical research works investigating risk factors for COVID-19 severity. Hence,these findings stress the suitability of this type of fully data-driven approaches for knowledge extraction,as a complementary to clinical perspectives. © 2023 Hayet-Otero et al.
dc.formatapplication/pdf
dc.identifier.citationHayet-Otero, M., García-García, F., Lee, D. J., Martínez-Minaya, J., España Yandiola, P. P., Urrutia Landa, I., ... & with the COVID-19 & Air Pollution Working Group. (2023). Extracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques. Plos one, 18(4), e0284150. https://doi.org/10.1371/journal.pone.0284150
dc.identifier.doihttps://doi.org/10.1371/journal.pone.0284150
dc.identifier.issn19326203
dc.identifier.officialurlhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85152593485&doi=10.1371%2fjournal.pone.0284150&partnerID=40&md5=4f5dc7158f63b586bcb5bb9a88a21e17
dc.identifier.urihttps://hdl.handle.net/20.500.14417/3143
dc.issue.number4 April
dc.journal.titlePLoS ONE
dc.language.isoeng
dc.publisherPublic Library of Science
dc.relation.departmentSci Tech (Data Science)
dc.relation.entityIE University
dc.relation.schoolIE School of Science & Technology
dc.rightsAttribution 4.0 International
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subject.otherAlanine aminotransferase
dc.subject.otherAspartate aminotransferase
dc.subject.otherBilirubin
dc.subject.otherBrain natriuretic peptide
dc.subject.otherC reactive protein
dc.subject.otherCreatine kinase
dc.subject.otherFerritin
dc.subject.otherInterleukin 6
dc.subject.otherLactate dehydrogenase
dc.subject.otherNitrogen dioxide
dc.subject.otherOxygen
dc.subject.otherOzone
dc.subject.otherProcalcitonin
dc.subject.otherTroponin
dc.subject.otherAdults
dc.subject.otherAir pollution
dc.subject.otherBreathing rate
dc.subject.otherCohort analysis
dc.subject.otherCOVID-19
dc.subject.otherDisease severity
dc.subject.otherHorowitz index
dc.subject.otherHospitalization
dc.subject.otherHuman
dc.subject.otherMachine learning
dc.subject.otherNeutrophil lymphocyte ratio
dc.subject.otherOxygen saturation
dc.subject.otherParticulate matter 10
dc.subject.otherParticulate matter 2.5
dc.subject.otherPneumonia Severity Index
dc.subject.otherQuality control
dc.subject.otherSocioeconomics
dc.subject.otherTraining
dc.subject.otherPandemic
dc.subject.otherPrognosis
dc.subject.otherSARS-CoV-2
dc.titleExtracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques
dc.typeinfo:eu-repo/semantics/article
dc.version.typeinfo:eu-repo/semantics/publishedVersion
dc.volume.number18
dspace.entity.typePublication
person.identifier.scopus-author-id58184856600
person.identifier.scopus-author-id50861077300
person.identifier.scopus-author-id58065186800
person.identifier.scopus-author-id56692895500
person.identifier.scopus-author-id59158490800
person.identifier.scopus-author-id58189246400
person.identifier.scopus-author-id58189757400
person.identifier.scopus-author-id55547259500
person.identifier.scopus-author-id7102205716
person.identifier.scopus-author-id57205521091
person.identifier.scopus-author-id35221658600
person.identifier.scopus-author-id56059009000
relation.isAuthorOfPublicationc8601ce9-af35-48fa-bdb6-9875f25e6c1f
relation.isAuthorOfPublication.latestForDiscoveryc8601ce9-af35-48fa-bdb6-9875f25e6c1f
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Extracting relevant predictive variables for COVID-19 severity prognosis An exhaustive comparison of feature selection techniques.pdf
Size:
4.11 MB
Format:
Adobe Portable Document Format