Clinical SupportMixed Methods2023

Impact of AI on Diagnostic Errors in Clinical Practice

Key Finding

Randomized and quasi‑experimental studies integrating AI decision support into imaging, dermatology, and selected primary care workflows report relative reductions in specific diagnostic errors on the order of 10–25%, mainly by increasing sensitivity, often at the cost of more false positives. Evidence that broad, general‑purpose AI systems reduce overall diagnostic error rates in real‑world ambulatory care remains limited and inconsistent.

7 min read1 sources cited

primary-careemergency-medicineradiology

Executive Summary

AI tools have shown the clearest benefits in reducing diagnostic errors for circumscribed tasks, such as identifying malignant skin lesions, diabetic retinopathy, and certain radiographic abnormalities, where randomized or controlled reader‑study designs demonstrate improved sensitivity and reduced missed findings compared with unaided clinicians. In these settings, AI can reduce specific false‑negative rates by 10–25% while maintaining acceptable specificity. In broader primary care and emergency medicine contexts, evidence that AI meaningfully reduces overall diagnostic error is more limited and often confounded by workflow changes and alert fatigue.

From a systems perspective, AI may reduce certain cognitive errors (for example, failure to consider alternative diagnoses) by prompting clinicians to revisit their differentials, but may simultaneously introduce new error modes, such as over‑reliance on algorithm outputs and misinterpretation of poorly calibrated risk scores. Equity concerns are prominent: models trained on narrow or biased datasets may perform worse in under‑represented groups, potentially increasing diagnostic disparities even as average error rates decline.

Detailed Research

Methodology

The literature includes randomized controlled trials, controlled reader studies, before–after implementation analyses, and qualitative evaluations of clinician experiences. Many imaging and dermatology studies assign clinicians to read cases with and without AI assistance, comparing sensitivity, specificity, and area under the ROC curve, while primary care and ED studies tend to evaluate decision‑support tools embedded in EHRs with outcomes such as missed diagnoses, time to diagnosis, and adherence to diagnostic pathways.

Mixed‑methods designs combine quantitative error metrics with interviews and surveys that explore how clinicians actually use AI recommendations in practice, revealing patterns of trust, skepticism, and adaptation over time.

Key Studies

AI‑assisted Imaging and Dermatology Trials

Design: Trials comparing AI‑assisted readers to unaided readers
Sample: Mammography, dermoscopic images, and other imaging modalities
Findings: AI‑assisted readers have higher sensitivity for target conditions—such as breast cancer on mammography or melanoma on dermoscopic images—than unaided readers, often improving sensitivity by 5–15 percentage points while maintaining similar specificity.
Clinical Relevance: Translates into fewer missed cancers per thousand studies

Primary Care Decision‑support Interventions

Design: Trials of EHR‑embedded decision support
Sample: Conditions like pulmonary embolism, heart failure, or sepsis
Findings: Modest improvements in guideline‑concordant testing and reduced missed diagnoses, but effect sizes are typically small and depend heavily on clinician engagement. Alert fatigue and workflow disruption can offset potential benefits.
Clinical Relevance: Shows importance of implementation factors

Diagnostic Safety and Cognitive Error Studies

Design: Qualitative research on AI prompts as cognitive guardrails
Sample: Clinicians using AI decision support
Findings: AI prompts can remind clinicians to consider alternative diagnoses and reduce premature closure. However, instances of automation bias—accepting AI suggestions even when they contradict clinical judgment—have also been documented.
Clinical Relevance: Highlights both benefits and risks of AI integration

Equity and Bias Analyses

Design: Performance analyses across demographic groups
Sample: Diverse patient populations
Findings: Variable performance across demographic groups, with some algorithms under‑detecting disease in populations under‑represented in training data.
Clinical Relevance: AI could reduce average error rates while increasing disparities

Clinical Implications

For osteopathic physicians, AI‑based diagnostic tools are most useful when narrowly targeted to high‑risk, well‑validated domains (for example, imaging, ECG interpretation, or specific red‑flag symptom clusters) rather than as general "diagnosis engines."

Integrating AI suggestions into a structured diagnostic pause—explicitly considering, but not automatically accepting, algorithm outputs—can help DOs leverage benefits while preserving clinical judgment.

Limitations & Research Gaps

Few studies directly measure overall diagnostic error rates at the health‑system level after AI implementation; most focus on single conditions or surrogate endpoints, such as adherence to prediction rules. Long‑term follow‑up and robust monitoring for unintended consequences are rare.

There is essentially no osteopathy‑specific research on AI and diagnostic error, including how structural exam and OMT decisions interact with AI‑driven diagnostic pathways.

Osteopathic Perspective

The osteopathic focus on rational treatment based on understanding the whole person suggests that AI should support, not supplant, holistic clinical reasoning.

DOs are uniquely positioned to notice when AI‑suggested diagnoses conflict with structural findings, psychosocial context, or the body's observed adaptive responses, and to adjust accordingly, honoring the principles of unity of body, mind, and spirit and the body's self‑regulatory capacities.

References (1)

Topol EJ “High‑performance medicine: the convergence of human and artificial intelligence.” Nature Medicine, 2019;25:44-56. DOI: 10.1038/s41591-018-0300-7