AI in Medical Coding Accuracy and Efficiency
Key Finding
A 2025 systematic review of AI-driven automated ICD coding found strong performance, with one GPT-2–based system achieving F1 ≈0.67 on test data and Cohen's κ ≈0.71 agreement with human coders, outperforming traditional models. For selected musculoskeletal CPT codes, simpler NLP approaches reached accuracy rates around 97%, highlighting the importance of model-task fit. Hybrid workflows combining AI-assisted coding with expert review yield the highest accuracy and safety.
Executive Summary
Medical coding is labor-intensive and error-prone, with implications for reimbursement, quality metrics, and compliance. A 2025 systematic review of AI-based automated ICD coding identified 11 peer-reviewed studies and concluded that AI models, including deep-learning and transformer-based approaches, can achieve high sensitivity, specificity, and AUROC, reducing manual workload and speeding coding. One GPT-2–based system achieved an F1-score of 0.667 on test data and 0.621 on real-hospital data, with Cohen's κ = 0.714 versus human coders, outperforming traditional models.
For common musculoskeletal CPT codes, rule-based and simpler NLP approaches sometimes outperformed more complex models like BERT, achieving accuracy rates near 97% and offering better interpretability. A 2025 review on the impact of accurate coding emphasizes that hybrid models—pairing AI suggestions with human coder oversight and clinician involvement—achieve the best balance of efficiency and accuracy and mitigate AI-related errors.
Detailed Research
Methodology
Evidence includes a 2025 systematic review of AI-driven ICD coding and additional studies on CPT coding and revenue-cycle impacts. Models are evaluated on accuracy, precision, recall, F1-score, AUROC, and agreement statistics such as Cohen's κ versus human coders.
Studies typically use large labeled datasets of clinical notes and claims, with train–test splits or cross-validation; some include real-world deployment data.
Key Studies
Systematic Review of AI-Driven Automated ICD Coding (2025)
- Design: Systematic review
- Sample: 11 peer-reviewed studies
- Findings: This review synthesized 11 studies focused on AI for automated ICD coding, concluding that modern models can substantially improve efficiency and accuracy but require careful integration and validation.
- Clinical Relevance: Establishes evidence base for AI coding
GPT-2–Based ICD Coding System (2025)
- Design: Model development study
- Sample: Hospital coding data
- Findings: A GPT-2–based model demonstrated strong performance with F1 = 0.667 on test data, 0.621 on real-hospital data, and κ = 0.714 with coding specialists, outperforming traditional models and underscoring the potential of transformer-based coding assistants.
- Clinical Relevance: Transformer models show promise
AI for Musculoskeletal CPT Coding (2025)
- Design: Comparative analysis
- Sample: MSK procedure notes
- Findings: Analyses of MSK CPT codes showed that traditional NLP models could achieve 97% accuracy, sometimes exceeding more complex neural models, and providing greater interpretability for clinical use.
- Clinical Relevance: Simpler models may suffice for specific tasks
Impact of Accurate Coding on Quality and Revenue (2023)
- Design: Narrative review
- Sample: Coding accuracy studies
- Findings: A review of coding accuracy studies found that hybrid AI–human workflows produce the highest accuracy while minimizing new error types introduced by automation.
- Clinical Relevance: Hybrid workflows recommended
Clinical Implications
For osteopathic practices, AI-assisted coding can reduce administrative workload and improve capture of OMT and MSK services when models are trained on appropriate data and tailored to osteopathic procedure codes.
Hybrid workflows, in which AI generates candidate codes reviewed by coders or clinicians, are likely safest and most effective, supporting both revenue integrity and compliance.
Limitations & Research Gaps
Most coding models are trained in large hospital systems and may not generalize to smaller practices or osteopathic-specific documentation styles.
There is limited research on AI performance for OMT-specific codes and on how AI coding affects audit risk and payer relationships in osteopathic practices.
Osteopathic Perspective
Accurate coding ensures that the structural and manual aspects of osteopathic care are recognized in reimbursement and quality metrics.
DOs should participate in designing and validating AI coding tools to ensure that OMT and osteopathic diagnostic nuances are appropriately represented, preserving both financial and professional recognition of osteopathic services.
References (2)
- Sharma A, et al. “Revamping Medical Coding with AI: A Systematic Review of Automated ICD Coding Systems.” Advances in Intelligent Systems, 2025;5:e78. DOI: 10.20517/ais.2024.78
- Kim S, et al. “Artificial Intelligence-Based Automated International Classification of Diseases Coding in Real Hospital Data.” BMC Medical Informatics and Decision Making, 2025;25:210. DOI: 10.1186/s12911-025-02010-9