MEDICAID FRAUD SERIES — POST 3 OF 4

What Billing Patterns Actually Look Like

We have 227 million rows and 1.8 million providers. Before training any model, the question is simpler: what does an "excluded" provider actually look like in the data?

From seven columns of billing data, we can construct dozens of provider features. Total claims, total payments, number of unique beneficiaries, procedure concentration, temporal billing patterns across 84 months. These are the inputs that every published fraud detection study starts with.

The temptation is to skip description and jump straight to machine learning. Nearly every paper in the field does. But looking at the data first matters, because what we see challenges assumptions about what fraud looks like.

Building a Provider Profile

The Medicaid spending file gives us billing by provider, procedure, and month. To construct useful features, we need to aggregate. A provider who bills for 40 different Healthcare Common Procedure Coding System (HCPCS) codes across 84 months might contribute hundreds of rows in the raw data. We roll those up into a single profile.

The features that matter, based on what the literature finds most predictive:

Category Examples What It Captures
Volume Total claims, total paid, unique beneficiaries How much and how many
Concentration Share of billing on top procedure, number of distinct codes How specialized
Temporal Month-to-month variation, billing spikes, trend How billing changes over time
Peer-relative Deviation from specialty average, percentile rank How different from peers

Peer-relative features outperform raw volume for distinguishing fraudulent from legitimate providers [3]. The reason is simple: a cardiologist billing $2 million looks different from a personal care attendant billing $2 million. Without specialty context, volume alone tells us almost nothing.

The public Medicaid file presents a problem here. It lacks provider specialty. To get that, we need to crosswalk each National Provider Identifier (NPI) against the National Plan and Provider Enumeration System (NPPES) registry, a separate federal database. Many of the amateur analyses circulating right now skip this step, comparing providers across specialties as if billing volume were meaningful in isolation.

With those features in hand, the first question is descriptive: do excluded providers look different from everyone else?

What “Excluded” Looks Like in the Data

When we match the Office of Inspector General’s (OIG) exclusion list against the Medicaid spending file, we get about 1,240 providers. Adding California’s Suspended and Ineligible (S&I) list and historical LEIE snapshots brings the total to roughly 1,800 matched excluded providers. Out of 1.8 million, that is a 0.1% match rate.

Fraud-excluded providers cluster in the top decile of their specialty peer groups on total spending, while all-type excluded providers skew toward the bottom. The divergence reflects label contamination from license revocations (1128(b)(4)) in the full LEIE.
Fraud-excluded providers cluster in the top decile of their specialty peer groups on total spending, while all-type excluded providers skew toward the bottom. The divergence reflects label contamination from license revocations (1128(b)(4)) in the full LEIE.

The descriptive comparison depends on which labels we use and how we measure billing activity.

Using all LEIE exclusion types, excluded providers appear to have lower median total claims and payments than non-excluded providers. Two things drive this. First, the label problem from Post 2: 40% of LEIE entries are providers who lost their licenses for reasons unrelated to fraud. Second, an observation window problem: providers excluded during the 2018-2024 panel had their billing histories truncated at their exclusion date. A provider excluded in 2020 has at most 24 months of billing data in the panel; a non-excluded provider has up to 84. Panel totals are mechanically lower for providers with shorter windows, regardless of how intensively they billed.

When we restrict to fraud-specific exclusion codes (1128(a)(1), 1128(a)(3), 1128(b)(7)) and separate providers by when enforcement acted, the picture changes. Providers excluded after the panel ends (2025-2026) have full, uncensored billing histories. Their median total paid is $242,405 — the 65th percentile of the non-excluded distribution. Providers excluded during the panel have a median of $88,164, roughly one-third the uncensored figure. Per-month billing intensity is nearly identical in both groups: 140 and 144 claims per month, respectively. Both bill at more than double the non-excluded rate of 63 claims per month.

Group N Median Total Paid Median Months Active Median Claims/Month
Uncensored fraud (excluded 2025-2026) 97 $242,405 18 144
Censored fraud (excluded 2018-2024) 230 $88,164 10 140
Non-excluded 487,703 $83,851 21 63

The signal is in the per-month rates, not the panel totals. Fraud-excluded providers are not small practices. They are above-median billers whose panel totals look low when enforcement truncates their observation window. Procedure concentration scores still skew more extreme for the fraud-excluded group. But the raw numbers — a few hundred providers out of 1.8 million — remain thin.

The Feature That Isn’t: Billing Volume as Fraud Signal

The first thing amateur analysts do when they download the data is sort by total paid, descending. The providers at the top of the list become the suspects. This feels intuitive: more money, more potential fraud.

Fraud-excluded providers differ from both non-excluded providers and non-fraud excluded providers on billing features. Non-fraud excluded providers (primarily license revocations) resemble or fall below the non-excluded distribution.
Fraud-excluded providers differ from both non-excluded providers and non-fraud excluded providers on billing features. Non-fraud excluded providers (primarily license revocations) resemble or fall below the non-excluded distribution.

The problem with this approach has a name. A landmark Science paper demonstrated it in a different context, with an identical mechanism [1]. The study showed that a widely used healthcare algorithm predicted costs rather than illness. Black patients spent $1,800 less annually than White patients at the same level of chronic illness. Access barriers, not health differences, drove that gap. The algorithm systematically under-referred Black patients for care management.

At the 97th percentile of the algorithm’s risk scores, Black patients had 26% more active chronic conditions than White patients with identical scores.

The parallel to fraud detection is direct. If we use billing volume as a proxy for fraud risk, we will systematically flag providers who serve populations with legitimately high healthcare utilization. Safety-net providers, pediatric specialists, behavioral health practices in underserved areas all bill high because their patients need more care.

The same research team later published an “Algorithmic Bias Playbook” with the Federal Trade Commission (FTC) [2]. They named this mechanism “label choice bias”: a mismatch between what we want to predict (fraud) and what the algorithm predicts (high spending). In our literature search, no published paper applies this framework to healthcare fraud detection, even though the mechanism is structurally similar.

The equity risk sharpens in specific service categories, and autism services are the clearest example.

The Autism Provider Problem

Consider what happens when we look at autism services.

DOGE framed its dataset release around the Minnesota autism fraud scheme as proof of concept. Within hours, amateur analysts began flagging clinics “diagnosing hundreds of children with autism per month” at residential addresses. This primes every subsequent analyst to treat high-volume autism billing as inherently suspicious.

The Minnesota autism fraud was real. Federal indictments allege providers billed tens of millions for services never delivered, phantom billing on an industrial scale. The question is what happens next: whether the detection methods that follow from this dataset can distinguish the fraud from the 99%+ of autism providers billing legitimately.

What do legitimate autism services look like in billing data?

Best-practice Applied Behavior Analysis requires 25 to 40 hours of therapy per week, per child, for one to three years. Most of those hours are delivered by Registered Behavior Technicians (RBTs), supervised by a Board Certified Behavior Analyst (BCBA). A BCBA overseeing 12 clients at 30 hours per week generates 360 billable hours weekly across the practice, hours that aggregate under the agency’s or supervising provider’s billing NPI. In the Medicaid spending file, that single billing NPI appears as an extreme statistical outlier to anyone unfamiliar with ABA’s supervised delivery model.

Under the Early and Periodic Screening, Diagnostic, and Treatment (EPSDT) mandate, Medicaid must cover medically necessary services for children at whatever intensity is clinically prescribed. By 2022, all state Medicaid programs covered ABA therapy. The predictable result: states have seen ABA spending growth between 400% and 2,800% in recent years. Indiana’s ABA Medicaid spending went from $21 million to $611 million between 2017 and 2023. North Carolina projects $639 million in ABA spending for FY 2026, a 423% increase [14].

Spending increases of 400% to 2,800% over five to seven years are the kind of growth rates that statistical fraud screens are designed to catch. In any other context, numbers like these would warrant immediate investigation. Here, they reflect coverage expansion meeting rising prevalence.

Autism prevalence itself varies five-fold across Centers for Disease Control and Prevention (CDC) surveillance sites: from 1-in-19 among 8-year-olds in San Diego to 1-in-103 in Laredo, Texas [6]. Researchers attribute this variation mainly to screening infrastructure and diagnostic capacity, not actual prevalence differences. California’s Get SET Early model trained hundreds of pediatricians to screen and refer, catching cases that would have gone undiagnosed elsewhere.

The implication for fraud screening: providers in high-screening states (California, New Jersey, Minnesota) will naturally show higher autism billing volumes. A naive cross-state comparison flags them as outliers. The DOGE dataset provides no clinical context to distinguish 360 hours per week of legitimate therapy from 360 hours of fabricated services.

“Improper” and “Fraudulent” Are Different Things

HHS-OIG audits of ABA Medicaid payments in Indiana, Wisconsin, and Maine found tens of millions in “improper” payments [7]. Indiana alone had $56 million. These numbers will almost certainly appear in crowdsourced analyses as evidence of fraud.

The vast majority of improper ABA payments are documentation failures: unsupported Current Procedural Terminology (CPT) code billing, missing session notes, absent provider signatures. Some reflect billing for individual therapy when records showed group sessions. Roughly 82% of improper Medicaid payments system-wide stem from insufficient documentation rather than fabricated services [16]. The OIG distinguishes between providers who delivered care with poor documentation and providers who fabricated claims entirely.

A related problem comes from Medically Unlikely Edits (MUEs). The Centers for Medicare & Medicaid Services (CMS) caps each code at the maximum units a provider would likely bill per patient per day. All ABA CPT codes carry an adjudication indicator meaning payors should still pay claims above the MUE when medically necessary. Some payors treat MUEs as hard denial limits anyway, forcing providers to reduce treatment intensity. A recent study documents this pattern: statistical thresholds designed for billing error detection get repurposed as tools to deny legitimate care [8]. The crowdsourced fraud detection model follows the same logic.

The patterns visible in the data are already misleading. The patterns invisible in the data present a different problem.

What Doesn’t Appear in the Data

Personal care service (PCS) attendants account for 34% of Medicaid Fraud Control Unit (MFCU) fraud convictions in fiscal year (FY) 2023 and 36% in FY 2024, despite being one of roughly 80 provider types [5]. This is the single largest source of detected Medicaid fraud.

In the public spending data, PCS fraud is nearly invisible.

PCS attendants typically bill through agencies. The agency’s NPI appears in the billing data; the individual attendant does not. A Government Accountability Office (GAO) report found $4.9 billion in PCS claims with no identified service provider [10]. States use over 400 different procedure codes for personal care services, making cross-state comparisons unreliable. In consumer-directed programs, beneficiaries hire their own attendants (often family members), creating billing patterns unlike any other provider type.

The contrast matters. The fraud type investigators catch most often is the one least visible in the public data. The fraud types most visible in the data, high-volume billing and extreme procedure concentration, disproportionately flag legitimate providers in high-need specialties. Enforcement selection bias runs in both directions.

When Detection Goes Wrong

What happens when automated or crowdsourced fraud detection gets it wrong?

Michigan’s MiDAS unemployment fraud detection system, deployed from 2013 to 2015, made over 40,000 automated fraud accusations. When the state auditor general reviewed 22,427 of those cases, it reversed 93% [11]. The system conflated intentional fraud with innocent mistakes. Consequences included evictions, destroyed credit, bankruptcies, and suicides.

The Dutch childcare benefits scandal (2013-2019) used an algorithm that weighted “foreign sounding names” and “dual nationality” as fraud indicators. The system wrongly accused over 26,000 innocent families. Authorities removed more than 2,000 children from their parents. The government acknowledged institutional racism. The entire cabinet resigned [12].

An analysis of 650,000 Medicaid dental claims found that 5 of the top 17 flagged providers (29%) were false positives [9]. Legitimate provider characteristics, not fraud, explained their billing patterns.

We can already see consequences in Minnesota. Following the autism and childcare fraud indictments, the state Department of Human Services paused payments to providers across 14 Medicaid service categories. Disabled Minnesotans lost Integrated Community Support services with no transition plan. Providers sued DHS over withheld payments. A Minnesota Disability Law Center attorney described it: “There was no real transition; they just stopped providing the services” [13].

The crowdsourced model amplifies these risks. Thousands of amateur analysts, with no clinical expertise and no methodology documentation, flag providers by name and address through NPI crossreferencing. The False Claims Act allows private citizens to sue on behalf of the government and collect up to 30% of recovered funds. That financial incentive rewards volume of accusations, not accuracy.

The Pattern

At least four layers of selection shape what we see in the billing data:

  1. Privacy suppression. Small providers fall below the 12-claim threshold and disappear entirely.

  2. Label contamination. The exclusion lists mix fraud with license revocations, loan defaults, and administrative actions. The “excluded” group produces a contaminated fraud signal.

  3. Enforcement bias. Certain fraud types (PCS ghost patients, controlled substance diversion) are easier to detect and prosecute than others (upcoding, medical necessity fraud). The labels over-represent easy cases.

  4. Proxy variable bias. Billing volume as a fraud indicator systematically flags providers serving high-need populations, exactly the providers whose billing should be high.

Descriptive analysis alone, before any machine learning, exposes these layers. The most common approach is to train a classifier on LEIE labels, rank providers by predicted fraud probability, and investigate the top of the list. That approach will likely disproportionately flag autism providers, pediatric specialists, and safety-net practices, while the PCS fraud that accounts for a third of actual convictions slips through.

In Post 4, we’ll see whether a classifier trained on cleaned labels can do better than these descriptive patterns suggest, and what the honest answer is when it can’t.


References

  1. Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453.
  2. Obermeyer, Z. et al. (2020). Algorithmic bias playbook. Presented at FTC PrivacyCon.
  3. Shekhar, S., Leder-Luis, J., & Akoglu, L. (2026). Can machine learning target health care fraud? Evidence from Medicare hospitalizations. Journal of Policy Analysis and Management, 45(1).
  4. Johnson, J.M. & Khoshgoftaar, T.M. (2023). Data-centric AI for healthcare fraud detection. SN Computer Science, 4(4), 389.
  5. OIG OEI-09-24-00200. (2024). Medicaid Fraud Control Units fiscal year 2023 annual report.
  6. CDC. (2025). Autism spectrum disorder prevalence, 2022. MMWR Surveillance Summary, 74(SS-2).
  7. Benesch Law. (2025). OIG finds significant improper Medicaid payments for ABA services.
  8. Serdyuk, A. et al. (2025). Preventing insurance denials of applied behavior analysis based on misuse of medically unlikely edits. Behavior Analysis in Practice, 18, 579-590.
  9. Skaff, L. et al. (2012). A case study in healthcare fraud detection. International Journal of Knowledge-Based Organizations, 2(1), 1-15.
  10. GAO-17-169. (2017). Personal care services: CMS needs to do more to reduce the risk of harm to Medicaid beneficiaries.
  11. Michigan Auditor General. (2016). Michigan Unemployment Insurance Agency fraud detection performance audit.
  12. Amnesty International. (2021). Xenophobic machines: Discrimination through unregulated use of algorithms in the Dutch childcare benefits scandal.
  13. Minnesota Reformer. (2025, December 3). Spooked by fraud, DHS neglects its responsibilities toward disabled Medicaid recipients.
  14. North Carolina Division of Health Benefits. (2025). ABA services utilization report. Cited in KFF Health News, December 2025.
  15. HHS-OIG. (2024). Indiana made at least $56 million in improper fee-for-service Medicaid payments for ABA. Report A-09-22-02002.
  16. GAO-24-107487. (2024). Medicare and Medicaid: Additional actions needed to enhance program integrity and save billions.

Last updated: February 25, 2026.

Updated February 25, 2026. The original analysis compared panel totals without accounting for observation window length. Providers excluded during the panel had shorter billing histories by construction, deflating their totals. Per-month billing rates — which correct for this — show fraud-excluded providers bill at more than double the non-excluded rate regardless of when enforcement occurred. Revised text reflects this correction.

Suggested Citation

Cholette, V. (February 2026). What Billing Patterns Actually Look Like. Too Early To Say. https://tooearlytosay.com/research/methodology/medicaid-billing-patterns/
Copy citation