Skip to main content
Admin AutomationSystematic Review2025

AI in Medical Coding Accuracy and Efficiency

Key Finding

A 2025 systematic review of AI-driven automated ICD coding found strong performance, with one GPT-2–based system achieving F1 ≈0.67 on test data and Cohen's κ ≈0.71 agreement with human coders, outperforming traditional models. For selected musculoskeletal CPT codes, simpler NLP approaches reached accuracy rates around 97%, highlighting the importance of model-task fit. Hybrid workflows combining AI-assisted coding with expert review yield the highest accuracy and safety.

8 min read2 sources cited
allpractice-managementomm-nmm

Executive Summary

Medical coding is labor-intensive and error-prone, with implications for reimbursement, quality metrics, and compliance. A 2025 systematic review of AI-based automated ICD coding identified 11 peer-reviewed studies and concluded that AI models, including deep-learning and transformer-based approaches, can achieve high sensitivity, specificity, and AUROC, reducing manual workload and speeding coding. One GPT-2–based system achieved an F1-score of 0.667 on test data and 0.621 on real-hospital data, with Cohen's κ = 0.714 versus human coders, outperforming traditional models.

For common musculoskeletal CPT codes, rule-based and simpler NLP approaches sometimes outperformed more complex models like BERT, achieving accuracy rates near 97% and offering better interpretability. A 2025 review on the impact of accurate coding emphasizes that hybrid models—pairing AI suggestions with human coder oversight and clinician involvement—achieve the best balance of efficiency and accuracy and mitigate AI-related errors.

Detailed Research

Methodology

Evidence includes a 2025 systematic review of AI-driven ICD coding and additional studies on CPT coding and revenue-cycle impacts. Models are evaluated on accuracy, precision, recall, F1-score, AUROC, and agreement statistics such as Cohen's κ versus human coders.

Studies typically use large labeled datasets of clinical notes and claims, with train–test splits or cross-validation; some include real-world deployment data.

Key Studies

Systematic Review of AI-Driven Automated ICD Coding (2025)

  • Design: Systematic review
  • Sample: 11 peer-reviewed studies
  • Findings: This review synthesized 11 studies focused on AI for automated ICD coding, concluding that modern models can substantially improve efficiency and accuracy but require careful integration and validation.
  • Clinical Relevance: Establishes evidence base for AI coding

GPT-2–Based ICD Coding System (2025)

  • Design: Model development study
  • Sample: Hospital coding data
  • Findings: A GPT-2–based model demonstrated strong performance with F1 = 0.667 on test data, 0.621 on real-hospital data, and κ = 0.714 with coding specialists, outperforming traditional models and underscoring the potential of transformer-based coding assistants.
  • Clinical Relevance: Transformer models show promise

AI for Musculoskeletal CPT Coding (2025)

  • Design: Comparative analysis
  • Sample: MSK procedure notes
  • Findings: Analyses of MSK CPT codes showed that traditional NLP models could achieve 97% accuracy, sometimes exceeding more complex neural models, and providing greater interpretability for clinical use.
  • Clinical Relevance: Simpler models may suffice for specific tasks

Impact of Accurate Coding on Quality and Revenue (2023)

  • Design: Narrative review
  • Sample: Coding accuracy studies
  • Findings: A review of coding accuracy studies found that hybrid AI–human workflows produce the highest accuracy while minimizing new error types introduced by automation.
  • Clinical Relevance: Hybrid workflows recommended

Clinical Implications

For osteopathic practices, AI-assisted coding can reduce administrative workload and improve capture of OMT and MSK services when models are trained on appropriate data and tailored to osteopathic procedure codes.

Hybrid workflows, in which AI generates candidate codes reviewed by coders or clinicians, are likely safest and most effective, supporting both revenue integrity and compliance.

Limitations & Research Gaps

Most coding models are trained in large hospital systems and may not generalize to smaller practices or osteopathic-specific documentation styles.

There is limited research on AI performance for OMT-specific codes and on how AI coding affects audit risk and payer relationships in osteopathic practices.

Osteopathic Perspective

Accurate coding ensures that the structural and manual aspects of osteopathic care are recognized in reimbursement and quality metrics.

DOs should participate in designing and validating AI coding tools to ensure that OMT and osteopathic diagnostic nuances are appropriately represented, preserving both financial and professional recognition of osteopathic services.

References (2)

  1. Sharma A, et al. Revamping Medical Coding with AI: A Systematic Review of Automated ICD Coding Systems.” Advances in Intelligent Systems, 2025;5:e78. DOI: 10.20517/ais.2024.78
  2. Kim S, et al. Artificial Intelligence-Based Automated International Classification of Diseases Coding in Real Hospital Data.” BMC Medical Informatics and Decision Making, 2025;25:210. DOI: 10.1186/s12911-025-02010-9