Brain HealthResearch PaperPaywall

AI Detects Cognitive Impairment by Listening to Doctor Visits

Machine learning models trained on acoustic speech features from routine primary care conversations identified cognitive impairment with meaningful accuracy.

Tuesday, June 16, 2026 6 views
Published in JAMA Neurol
An elderly patient speaking with a doctor in a clinic exam room, a small recording device visible on the desk between them, warm clinical lighting

Summary

Researchers at Mount Sinai recorded routine primary care visits for nearly 1,000 older adults and used machine learning to analyze acoustic features of patient speech — things like pitch, timing, and vocal variability. Without any dedicated cognitive test, the AI correctly identified cognitive impairment about 68% of the time. Models using Whisper, a speech-processing tool, performed best and held up in an independent validation group in Chicago. This passive, low-burden approach could one day flag patients who need further evaluation without adding time to already-packed clinic visits. About one in five participants had undetected cognitive impairment, highlighting how common underdiagnosis is in primary care today.

Deep Dive Audio
0:00--:--

Detailed Summary

Cognitive impairment affects millions of older adults, yet it frequently goes undetected in primary care settings where time is short and standardized cognitive tests are rarely administered. A new diagnostic study published in JAMA Neurology suggests that the conversations already happening in exam rooms may contain the signal needed to catch early impairment — if an AI is listening.

Researchers at the Icahn School of Medicine at Mount Sinai recorded routine primary care visits from 787 English-speaking patients aged 55 and older in New York. A separate validation cohort of 179 patients was enrolled in Chicago. None had a prior diagnosis of mild cognitive impairment or dementia. Acoustic features were extracted from 30-second speech segments using both foundation AI models — Whisper, HuBERT, and wav2vec 2.0 — and traditional expert-defined measures like prosody and eGeMAPS. Cognitive impairment was defined using the Montreal Cognitive Assessment, adjusted for age and education.

The Whisper-based model delivered the strongest performance, achieving an AUROC of 0.733 in the primary cohort and 0.727 in external validation — consistent results suggesting the approach is reproducible across sites. Sensitivity was 68.2% and specificity 63.6%, with a positive predictive value of 30.4%. Key acoustic predictors included pitch, timing, and vocal variability features. Approximately 21% of participants had cognitive impairment at enrollment, underscoring the scale of underdetection.

The clinical implication is significant: this technology could operate passively during existing appointments, requiring no additional clinician time or patient burden, and generate a flag prompting further evaluation for those at risk.

Caveats exist. The positive predictive value remains modest at 30%, meaning many flagged patients would not have true impairment. The study was conducted in English-speaking patients at urban academic medical centers, limiting generalizability. Performance as a standalone screening tool requires further refinement before clinical deployment.

Key Findings

  • AI analyzing speech acoustics from routine clinic visits detected cognitive impairment with 68.2% sensitivity and 63.6% specificity.
  • Whisper-based models achieved AUROC of 0.733, validated in an independent Chicago cohort at 0.727.
  • 21% of enrolled older adults without a prior diagnosis had undetected cognitive impairment.
  • Pitch, timing, and vocal variability were the strongest acoustic predictors of impairment.
  • Screening required no dedicated test — only passive recording of existing patient-clinician dialogue.

Methodology

This diagnostic study enrolled 966 older adults (≥55 years) without prior cognitive diagnoses across primary care practices in New York and Chicago between 2020–2021. Audio recordings were analyzed using multiple AI speech models; cognitive impairment was defined as Montreal Cognitive Assessment scores ≥1 SD below age- and education-adjusted norms. ML classifiers were evaluated by AUROC and F1 score in both holdout and external validation cohorts.

Study Limitations

Summary is based on the abstract only, as the full text was not available. Positive predictive value is modest (30.4%), meaning a high false-positive rate remains a barrier to standalone clinical use. The study was limited to English-speaking patients at urban academic medical centers, which may limit generalizability to diverse or rural populations.

Enjoyed this summary?

Get the latest longevity research delivered to your inbox every week.