Professor & Chair Rensselaer Polytechnic Institute, United States
Introduction: Alzheimer’s disease is a neurodegenerative disease which is characterized by cognitive decline and difficulty with learning and memory. Previous research has indicated that changes in the gut microbiome can lead to physiological changes usually associated with Alzheimer’s disease, including amyloid-beta deposition. Furthermore, bile acids (BAs) in the gut are involved in cholesterol metabolism, which has been hypothesized to play a role in the development of AD. Recent research has shown that changes in BA levels can be associated with the onset of mild cognitive impairment (MCI), and furthermore that BA measurements can be used in conjunction with other variables to improve machine learning model prediction of AD progression. However, a classifier using only BA measurements to determine if a patient is mildly cognitively impaired has not yet been developed. Furthermore, no robustness analysis has been done on current models to determine if they can still perform well in the presence of noise. The goal of this study is to develop a classifier using BA measurements that can accurately and robustly discriminate between cognitively normal (CN) and early mild cognitively impaired (EMCI) patients. The development of such a test could aid in the timely and reliable diagnosis of EMCI.
Materials and
Methods: Data from the Alzheimer’s Disease Neuroimaging Initiative measured the concentrations of 104 bile acid metabolites in 4,228 patients, 767 of which had prior diagnoses of cognitively normal (CN), early MCI (EMCI), late MCI, and Alzheimer’s disease (AD). Of these, 193 were CN and 108 were EMCI. Feature selection was performed on the dataset to find the optimal subset of BA features to correctly classify CN and EMCI samples. To obtain this BA panel, Support Vector Machine-based Recursive Feature Elimination (SVM-RFE) was implemented and chose a number of features that achieved the highest accuracy without overfitting. Once a set of bile acid features were obtained, 5 classifier models using different machine learning algorithms were evaluated for their Leave-One-Out cross-validation performance to develop a diagnostic classifier for EMCI using BA data. The algorithms used to train these classifiers were Logistic Regression (LR), Naïve Bayes (NB), K-Nearest Neighbors (KNN), Random Forest (RF), and Linear Discriminant Analysis (LDA). For each of the classifiers, robustness evaluation was performed by perturbing progressively larger portions of the data with randomly sampled noise.
Results, Conclusions, and Discussions: SVM-RFE identified 16 bile acids that maximized classifier performance without overfitting: L-Serine and L-Aspartic, Glyceric, Glycolic, Acetic, Succinic, Heptanoic, Undecanoic, Tridecanoic, Pentadecanoic, Myristic, 3-Oxycholic, Palmitoleic, Hyocholic, Glycocholic, and Murocholic acids. As can be seen in Figure 1, The LR classifier yielded the best leave-one-out cross-validation accuracy (0.927) and recall (0.889), and second highest precision (0.906) among the classifiers sampled, indicating that this classifier can accurately discriminate EMCI from CN patient data without overfitting to one class. Furthermore, the LR classifier was shown to be the stable when the data was noisy as it maintained high performance and demonstrated the lowest variation in performance when data was perturbed by small amounts of random noise, as is shown by Figure 2 and 3. This indicates that in addition to being accurate, the LR diagnostic classifier is robust. Further research should determine if these bile acids play a mechanistic role in the development of EMCI or if they are simply correlated with EMCI. Overall, this work developed a robust and accurate classifier for the diagnosis of EMCI vs CN patients using only bile acid measurements.
Acknowledgements (Optional): Data collection and sharing for the Alzheimer's Disease Neuroimaging Initiative (ADNI) is funded by the National Institute on Aging (National Institutes of Health Grant U19 AG024904). The grantee organization is the Northern California Institute for Research and Education.
The authors would also like to acknowledge the National Institutes of Aging (T32AG078123) for funding this research.