Impact of AI on Diagnostic Errors in Clinical Practice
Key Finding
Randomized and quasi‑experimental studies integrating AI decision support into imaging, dermatology, and selected primary care workflows report relative reductions in specific diagnostic errors on the order of 10–25%, mainly by increasing sensitivity, often at the cost of more false positives. Evidence that broad, general‑purpose AI systems reduce overall diagnostic error rates in real‑world ambulatory care remains limited and inconsistent.
Executive Summary
AI tools have shown the clearest benefits in reducing diagnostic errors for circumscribed tasks, such as identifying malignant skin lesions, diabetic retinopathy, and certain radiographic abnormalities, where randomized or controlled reader‑study designs demonstrate improved sensitivity and reduced missed findings compared with unaided clinicians. In these settings, AI can reduce specific false‑negative rates by 10–25% while maintaining acceptable specificity. In broader primary care and emergency medicine contexts, evidence that AI meaningfully reduces overall diagnostic error is more limited and often confounded by workflow changes and alert fatigue.
From a systems perspective, AI may reduce certain cognitive errors (for example, failure to consider alternative diagnoses) by prompting clinicians to revisit their differentials, but may simultaneously introduce new error modes, such as over‑reliance on algorithm outputs and misinterpretation of poorly calibrated risk scores. Equity concerns are prominent: models trained on narrow or biased datasets may perform worse in under‑represented groups, potentially increasing diagnostic disparities even as average error rates decline.
Detailed Research
Methodology
The literature includes randomized controlled trials, controlled reader studies, before–after implementation analyses, and qualitative evaluations of clinician experiences. Many imaging and dermatology studies assign clinicians to read cases with and without AI assistance, comparing sensitivity, specificity, and area under the ROC curve, while primary care and ED studies tend to evaluate decision‑support tools embedded in EHRs with outcomes such as missed diagnoses, time to diagnosis, and adherence to diagnostic pathways.
Mixed‑methods designs combine quantitative error metrics with interviews and surveys that explore how clinicians actually use AI recommendations in practice, revealing patterns of trust, skepticism, and adaptation over time.
Key Studies
AI‑assisted Imaging and Dermatology Trials
- Design: Trials comparing AI‑assisted readers to unaided readers
- Sample: Mammography, dermoscopic images, and other imaging modalities
- Findings: AI‑assisted readers have higher sensitivity for target conditions—such as breast cancer on mammography or melanoma on dermoscopic images—than unaided readers, often improving sensitivity by 5–15 percentage points while maintaining similar specificity.
- Clinical Relevance: Translates into fewer missed cancers per thousand studies
Primary Care Decision‑support Interventions
- Design: Trials of EHR‑embedded decision support
- Sample: Conditions like pulmonary embolism, heart failure, or sepsis
- Findings: Modest improvements in guideline‑concordant testing and reduced missed diagnoses, but effect sizes are typically small and depend heavily on clinician engagement. Alert fatigue and workflow disruption can offset potential benefits.
- Clinical Relevance: Shows importance of implementation factors
Diagnostic Safety and Cognitive Error Studies
- Design: Qualitative research on AI prompts as cognitive guardrails
- Sample: Clinicians using AI decision support
- Findings: AI prompts can remind clinicians to consider alternative diagnoses and reduce premature closure. However, instances of automation bias—accepting AI suggestions even when they contradict clinical judgment—have also been documented.
- Clinical Relevance: Highlights both benefits and risks of AI integration
Equity and Bias Analyses
- Design: Performance analyses across demographic groups
- Sample: Diverse patient populations
- Findings: Variable performance across demographic groups, with some algorithms under‑detecting disease in populations under‑represented in training data.
- Clinical Relevance: AI could reduce average error rates while increasing disparities
Clinical Implications
For osteopathic physicians, AI‑based diagnostic tools are most useful when narrowly targeted to high‑risk, well‑validated domains (for example, imaging, ECG interpretation, or specific red‑flag symptom clusters) rather than as general "diagnosis engines."
Integrating AI suggestions into a structured diagnostic pause—explicitly considering, but not automatically accepting, algorithm outputs—can help DOs leverage benefits while preserving clinical judgment.
Limitations & Research Gaps
Few studies directly measure overall diagnostic error rates at the health‑system level after AI implementation; most focus on single conditions or surrogate endpoints, such as adherence to prediction rules. Long‑term follow‑up and robust monitoring for unintended consequences are rare.
There is essentially no osteopathy‑specific research on AI and diagnostic error, including how structural exam and OMT decisions interact with AI‑driven diagnostic pathways.
Osteopathic Perspective
The osteopathic focus on rational treatment based on understanding the whole person suggests that AI should support, not supplant, holistic clinical reasoning.
DOs are uniquely positioned to notice when AI‑suggested diagnoses conflict with structural findings, psychosocial context, or the body's observed adaptive responses, and to adjust accordingly, honoring the principles of unity of body, mind, and spirit and the body's self‑regulatory capacities.
References (1)
- Topol EJ “High‑performance medicine: the convergence of human and artificial intelligence.” Nature Medicine, 2019;25:44-56. DOI: 10.1038/s41591-018-0300-7