Skip to main content
Clinical SupportMixed Methods2023

Impact of AI on Diagnostic Errors in Clinical Practice

Key Finding

Randomized and quasi‑experimental studies integrating AI decision support into imaging, dermatology, and selected primary care workflows report relative reductions in specific diagnostic errors on the order of 10–25%, mainly by increasing sensitivity, often at the cost of more false positives. Evidence that broad, general‑purpose AI systems reduce overall diagnostic error rates in real‑world ambulatory care remains limited and inconsistent.

7 min read1 sources cited
primary-careemergency-medicineradiology

Executive Summary

AI tools have shown the clearest benefits in reducing diagnostic errors for circumscribed tasks, such as identifying malignant skin lesions, diabetic retinopathy, and certain radiographic abnormalities, where randomized or controlled reader‑study designs demonstrate improved sensitivity and reduced missed findings compared with unaided clinicians. In these settings, AI can reduce specific false‑negative rates by 10–25% while maintaining acceptable specificity. In broader primary care and emergency medicine contexts, evidence that AI meaningfully reduces overall diagnostic error is more limited and often confounded by workflow changes and alert fatigue.

From a systems perspective, AI may reduce certain cognitive errors (for example, failure to consider alternative diagnoses) by prompting clinicians to revisit their differentials, but may simultaneously introduce new error modes, such as over‑reliance on algorithm outputs and misinterpretation of poorly calibrated risk scores. Equity concerns are prominent: models trained on narrow or biased datasets may perform worse in under‑represented groups, potentially increasing diagnostic disparities even as average error rates decline.

Detailed Research

Methodology

The literature includes randomized controlled trials, controlled reader studies, before–after implementation analyses, and qualitative evaluations of clinician experiences. Many imaging and dermatology studies assign clinicians to read cases with and without AI assistance, comparing sensitivity, specificity, and area under the ROC curve, while primary care and ED studies tend to evaluate decision‑support tools embedded in EHRs with outcomes such as missed diagnoses, time to diagnosis, and adherence to diagnostic pathways.

Mixed‑methods designs combine quantitative error metrics with interviews and surveys that explore how clinicians actually use AI recommendations in practice, revealing patterns of trust, skepticism, and adaptation over time.

Key Studies

AI‑assisted Imaging and Dermatology Trials

  • Design: Trials comparing AI‑assisted readers to unaided readers
  • Sample: Mammography, dermoscopic images, and other imaging modalities
  • Findings: AI‑assisted readers have higher sensitivity for target conditions—such as breast cancer on mammography or melanoma on dermoscopic images—than unaided readers, often improving sensitivity by 5–15 percentage points while maintaining similar specificity.
  • Clinical Relevance: Translates into fewer missed cancers per thousand studies

Primary Care Decision‑support Interventions

  • Design: Trials of EHR‑embedded decision support
  • Sample: Conditions like pulmonary embolism, heart failure, or sepsis
  • Findings: Modest improvements in guideline‑concordant testing and reduced missed diagnoses, but effect sizes are typically small and depend heavily on clinician engagement. Alert fatigue and workflow disruption can offset potential benefits.
  • Clinical Relevance: Shows importance of implementation factors

Diagnostic Safety and Cognitive Error Studies

  • Design: Qualitative research on AI prompts as cognitive guardrails
  • Sample: Clinicians using AI decision support
  • Findings: AI prompts can remind clinicians to consider alternative diagnoses and reduce premature closure. However, instances of automation bias—accepting AI suggestions even when they contradict clinical judgment—have also been documented.
  • Clinical Relevance: Highlights both benefits and risks of AI integration

Equity and Bias Analyses

  • Design: Performance analyses across demographic groups
  • Sample: Diverse patient populations
  • Findings: Variable performance across demographic groups, with some algorithms under‑detecting disease in populations under‑represented in training data.
  • Clinical Relevance: AI could reduce average error rates while increasing disparities

Clinical Implications

For osteopathic physicians, AI‑based diagnostic tools are most useful when narrowly targeted to high‑risk, well‑validated domains (for example, imaging, ECG interpretation, or specific red‑flag symptom clusters) rather than as general "diagnosis engines."

Integrating AI suggestions into a structured diagnostic pause—explicitly considering, but not automatically accepting, algorithm outputs—can help DOs leverage benefits while preserving clinical judgment.

Limitations & Research Gaps

Few studies directly measure overall diagnostic error rates at the health‑system level after AI implementation; most focus on single conditions or surrogate endpoints, such as adherence to prediction rules. Long‑term follow‑up and robust monitoring for unintended consequences are rare.

There is essentially no osteopathy‑specific research on AI and diagnostic error, including how structural exam and OMT decisions interact with AI‑driven diagnostic pathways.

Osteopathic Perspective

The osteopathic focus on rational treatment based on understanding the whole person suggests that AI should support, not supplant, holistic clinical reasoning.

DOs are uniquely positioned to notice when AI‑suggested diagnoses conflict with structural findings, psychosocial context, or the body's observed adaptive responses, and to adjust accordingly, honoring the principles of unity of body, mind, and spirit and the body's self‑regulatory capacities.

References (1)

  1. Topol EJ High‑performance medicine: the convergence of human and artificial intelligence.” Nature Medicine, 2019;25:44-56. DOI: 10.1038/s41591-018-0300-7

Related Research

Accuracy of AI Systems in Generating Differential Diagnoses

Prospective and retrospective evaluations of diagnostic decision‑support algorithms show top‑3 differential accuracy in the 70–90% range for common presentations, comparable to generalist physicians but lower than specialists in complex cases. Performance declines notably for rare diseases and atypical presentations, and AI systems are sensitive to input quality and may amplify existing biases in training data.

AI‑Enhanced Drug Interaction Checking and Medication Safety

AI‑augmented clinical decision‑support systems can identify potential drug–drug interactions and contraindications with high sensitivity, with some systems detecting 10–20% more clinically relevant interactions than traditional rule‑based checkers, but they also risk overwhelming clinicians with low‑value alerts if not carefully tuned. Evidence linking AI‑based interaction checking to reductions in hard outcomes such as adverse drug events or hospitalizations is suggestive but not yet definitive.

AI Detection of Rare Diseases from Symptom and Multimodal Patterns

Scoping and narrative reviews report that AI methods—particularly few-shot learning, multimodal models, and AI-augmented symptom checkers—can shorten the diagnostic odyssey for rare diseases, with potential reductions in time to diagnosis from the current 4–5 year average, though quantitative effect sizes are not yet well established. Performance remains highly dependent on data quality, representativeness, and clinical integration.