From seven columns of billing data, we can construct dozens of provider features. Total claims, total payments, number of unique beneficiaries, procedure concentration, temporal billing patterns across 84 months. These are the inputs every published fraud detection study starts with.
The temptation is to skip description and jump straight to machine learning. Nearly every paper in the field does. But looking at the data first matters, because what we see challenges assumptions about what fraud looks like in billing records.
Building a Provider Profile
The Medicaid spending file gives us billing by provider, procedure, and month. To construct useful features, we need to aggregate. A provider who bills for 40 different Healthcare Common Procedure Coding System (HCPCS) codes across 84 months might contribute hundreds of rows in the raw data. We roll those up into a single profile.
The features that matter, based on what the literature finds most predictive:
| Category | Examples | What It Captures |
|---|---|---|
| Volume | Total claims, total paid, unique beneficiaries | How much and how many |
| Concentration | Share of billing on top procedure, number of distinct codes | How specialized |
| Temporal | Month-to-month variation, billing spikes, trend | How billing changes over time |
| Peer-relative | Deviation from specialty average, percentile rank | How different from peers |
Peer-relative features outperform raw volume for distinguishing fraudulent from legitimate providers [3]. The reason is simple: a cardiologist billing $2 million looks different from a personal care attendant billing $2 million. Without specialty context, volume alone tells us almost nothing.
The public Medicaid file presents a problem here. It lacks provider specialty. To get that, we need to crosswalk each National Provider Identifier (NPI) against the National Plan and Provider Enumeration System (NPPES) registry, a separate federal database. Many of the amateur analyses circulating right now skip this step, comparing providers across specialties as if billing volume were meaningful in isolation.
With those features in hand, the first question is descriptive: do excluded providers look different from everyone else?
What “Excluded” Looks Like in the Data
So what happens when we match the OIG exclusion list against the Medicaid spending file? About 1,240 providers. Adding California’s Suspended and Ineligible (S&I) list and historical LEIE snapshots brings the total to roughly 1,800 matched excluded providers. Out of 1.8 million, that is a 0.1% positive rate.
What happens when we restrict to fraud-specific exclusion codes (1128(a)(1), 1128(a)(3), 1128(b)(7)) and look at per-month billing rates? A gap opens up. Fraud-excluded providers bill at 144 claims per month; non-excluded providers bill at 63. That is more than double the intensity.
Separating providers by when enforcement acted sharpens the picture. Providers excluded after the panel ends (2025-2026, N = 95) have full, uncensored billing histories. Their median total paid is $249,563, the 65th percentile of the non-excluded distribution. Providers excluded during the panel (N = 229) have a median of $88,303, roughly one-third the uncensored figure, because enforcement truncated their billing histories mid-panel. Per-month billing intensity is stable across both groups: 137 and 144 claims per month, respectively. The intensity signal holds regardless of when enforcement occurred; panel totals diverge because truncation compresses them.
Two sources of noise obscure this signal if we are not careful. First, the label problem from Post 2: 40% of LEIE entries are providers who lost their licenses for reasons unrelated to fraud. Using all LEIE types as the positive class mixes fraud with license revocations, diluting any billing signal. Second, observation window length: a provider excluded in 2020 has at most 24 months of billing data in the panel, while a non-excluded provider has up to 84. Panel totals are mechanically lower for providers with shorter windows, regardless of how intensively they billed. Per-month rates correct for this.
| Group | N | Median Total Paid | Median Months Active | Median Claims/Month |
|---|---|---|---|---|
| Uncensored fraud (excluded 2025-2026) | 95 | $249,563 | 18 | 144 |
| Censored fraud (excluded 2018-2024) | 229 | $88,303 | 10 | 137 |
| Non-excluded | 487,703 | $83,232 | 21 | 63 |
What stands out in this table? The per-month intensity is stable across both fraud groups — 137 and 144 — whether or not enforcement has acted. Panel totals diverge because truncation compresses them for the during-panel group. But intensity holds. Procedure concentration skews more extreme for the fraud-excluded group too. The raw numbers remain thin, a few hundred providers out of 1.8 million, but the pattern is consistent.
The Feature That Isn’t: Billing Volume as Fraud Signal
Here is the most common approach to this data: sort by total paid, descending. The providers at the top become the suspects. More money, more potential fraud. It feels intuitive. But does it work?
The problem with this approach has a name. A landmark Science paper demonstrated it in a different context, with an identical mechanism [1]. The study showed that a widely used healthcare algorithm predicted costs rather than illness. Black patients spent $1,800 less annually than White patients at the same level of chronic illness. Access barriers, not health differences, drove that gap. The algorithm systematically under-referred Black patients for care management.
At the 97th percentile of the algorithm’s risk scores, Black patients had 26% more active chronic conditions than White patients with identical scores.
Why does this matter for fraud detection? The mechanism is the same. If we use billing volume as a proxy for fraud risk, we systematically flag providers who serve populations with legitimately high healthcare utilization. Safety-net providers, pediatric specialists, behavioral health practices in underserved areas all bill high because their patients need more care.
The same research team later published an “Algorithmic Bias Playbook” with the Federal Trade Commission (FTC) [2]. They named this mechanism “label choice bias”: a mismatch between what we want to predict (fraud) and what the algorithm predicts (high spending). In our literature search, no published paper applies this framework to healthcare fraud detection, even though the mechanism is identical.
The equity risk sharpens in specific service categories. Autism services are the clearest example.
The Autism Provider Problem
Consider what happens when we look at autism services.
DOGE framed its dataset release around the Minnesota autism fraud scheme as proof of concept. Within hours, amateur analysts began flagging clinics “diagnosing hundreds of children with autism per month” at residential addresses. This primes every subsequent analyst to treat high-volume autism billing as inherently suspicious.
The Minnesota autism fraud was real. Federal indictments allege providers billed tens of millions for services never delivered, phantom billing on an industrial scale. The question is what happens next: whether the detection methods that follow from this dataset can distinguish the fraud from the 99%+ of autism providers billing legitimately.
What do legitimate autism services look like in billing data?
Best-practice Applied Behavior Analysis requires 25 to 40 hours of therapy per week, per child, for one to three years. Most of those hours are delivered by Registered Behavior Technicians (RBTs), supervised by a Board Certified Behavior Analyst (BCBA). A BCBA overseeing 12 clients at 30 hours per week generates 360 billable hours weekly across the practice, hours that aggregate under the agency’s or supervising provider’s billing NPI. In the Medicaid spending file, that single billing NPI appears as an extreme statistical outlier to anyone unfamiliar with ABA’s supervised delivery model.
Under the Early and Periodic Screening, Diagnostic, and Treatment (EPSDT) mandate, Medicaid must cover medically necessary services for children at whatever intensity is clinically prescribed. By 2022, all state Medicaid programs covered ABA therapy. The predictable result: states have seen ABA spending growth between 400% and 2,800% in recent years. Indiana’s ABA Medicaid spending went from $21 million to $611 million between 2017 and 2023. North Carolina projects $639 million in ABA spending for FY 2026, a 423% increase [14].
Spending increases of 400% to 2,800% over five to seven years are the kind of growth rates that statistical fraud screens are designed to catch. In any other context, numbers like these would warrant immediate investigation. Here, they reflect coverage expansion meeting rising prevalence.
Autism prevalence itself varies five-fold across Centers for Disease Control and Prevention (CDC) surveillance sites: from 1-in-19 among 8-year-olds in San Diego to 1-in-103 in Laredo, Texas [6]. Researchers attribute this variation mainly to screening infrastructure and diagnostic capacity, not actual prevalence differences. California’s Get SET Early model trained hundreds of pediatricians to screen and refer, catching cases that would have gone undiagnosed elsewhere.
The implication for fraud screening: providers in high-screening states (California, New Jersey, Minnesota) will naturally show higher autism billing volumes. A naive cross-state comparison flags them as outliers. The DOGE dataset provides no clinical context to distinguish 360 hours per week of legitimate therapy from 360 hours of fabricated services.
“Improper” and “Fraudulent” Are Different Things
HHS-OIG audits of ABA Medicaid payments in Indiana, Wisconsin, and Maine found tens of millions in “improper” payments [7]. Indiana alone had $56 million. These numbers will almost certainly appear in crowdsourced analyses as evidence of fraud.
The vast majority of improper ABA payments are documentation failures: unsupported Current Procedural Terminology (CPT) code billing, missing session notes, absent provider signatures. Some reflect billing for individual therapy when records showed group sessions. Roughly 82% of improper Medicaid payments system-wide stem from insufficient documentation rather than fabricated services [16]. The OIG distinguishes between providers who delivered care with poor documentation and providers who fabricated claims entirely.
A related problem comes from Medically Unlikely Edits (MUEs). The Centers for Medicare & Medicaid Services (CMS) caps each code at the maximum units a provider would likely bill per patient per day. All ABA CPT codes carry an adjudication indicator meaning payors should still pay claims above the MUE when medically necessary. Some payors treat MUEs as hard denial limits anyway, forcing providers to reduce treatment intensity. A recent study documents this pattern: statistical thresholds designed for billing error detection get repurposed as tools to deny legitimate care [8]. The crowdsourced fraud detection model follows the same logic.
The patterns visible in the data are already misleading. The patterns invisible in the data present a different problem.
What Doesn’t Appear in the Data
Personal care service (PCS) attendants account for 34% of Medicaid Fraud Control Unit (MFCU) fraud convictions in fiscal year (FY) 2023 and 36% in FY 2024, despite being one of roughly 80 provider types [5]. This is the single largest source of detected Medicaid fraud.
In the public spending data, PCS fraud is nearly invisible.
PCS attendants typically bill through agencies. The agency’s NPI appears in the billing data; the individual attendant does not. A Government Accountability Office (GAO) report found $4.9 billion in PCS claims with no identified service provider [10]. States use over 400 different procedure codes for personal care services, making cross-state comparisons unreliable. In consumer-directed programs, beneficiaries hire their own attendants (often family members), creating billing patterns unlike any other provider type.
The contrast matters. The fraud type investigators catch most often is the one least visible in the public data. The fraud types most visible in the data, high-volume billing and extreme procedure concentration, disproportionately flag legitimate providers in high-need specialties. Enforcement selection bias runs in both directions.
When Detection Goes Wrong
What happens when automated or crowdsourced fraud detection gets it wrong?
Michigan’s MiDAS unemployment fraud detection system, deployed from 2013 to 2015, made over 40,000 automated fraud accusations. When the state auditor general reviewed 22,427 of those cases, it reversed 93% [11]. The system conflated intentional fraud with innocent mistakes. Consequences included evictions, destroyed credit, bankruptcies, and suicides.
The Dutch childcare benefits scandal (2013-2019) used an algorithm that weighted “foreign sounding names” and “dual nationality” as fraud indicators. The system wrongly accused over 26,000 innocent families. Authorities removed more than 2,000 children from their parents. The government acknowledged institutional racism. The entire cabinet resigned [12].
An analysis of 650,000 Medicaid dental claims found that 5 of the top 17 flagged providers (29%) were false positives [9]. Legitimate provider characteristics, not fraud, explained their billing patterns.
We can already see consequences in Minnesota. Following the autism and childcare fraud indictments, the state Department of Human Services paused payments to providers across 14 Medicaid service categories. Disabled Minnesotans lost Integrated Community Support services with no transition plan. Providers sued DHS over withheld payments. A Minnesota Disability Law Center attorney described it: “There was no real transition; they just stopped providing the services” [13].
The crowdsourced model amplifies these risks. Thousands of amateur analysts, with no clinical expertise and no methodology documentation, flag providers by name and address through NPI crossreferencing. The False Claims Act allows private citizens to sue on behalf of the government and collect up to 30% of recovered funds. That financial incentive rewards volume of accusations over accuracy.
The Pattern
So what do we actually learn from the billing data, once we account for all this noise? Fraud-excluded providers bill at more than double the per-month rate of non-excluded peers, regardless of when enforcement acted. That intensity signal survives label cleaning and adjustment for observation windows. The four layers below describe the noise around it.
At least four layers of selection shape what we see in the billing data:
-
Privacy suppression. Small providers fall below the 12-claim threshold and disappear entirely.
-
Label contamination. The exclusion lists mix fraud with license revocations, loan defaults, and administrative actions. The “excluded” group produces a contaminated fraud signal.
-
Enforcement bias. Certain fraud types (PCS ghost patients, controlled substance diversion) are easier to detect and prosecute than others (upcoding, medical necessity fraud). The labels over-represent easy cases.
-
Proxy variable bias. Billing volume as a fraud indicator systematically flags providers serving high-need populations, exactly the providers whose billing should be high.
Descriptive analysis alone, before any machine learning, exposes these layers. The most common approach is to train a classifier on LEIE labels, rank providers by predicted fraud probability, and investigate the top of the list. That approach will likely disproportionately flag autism providers, pediatric specialists, and safety-net practices, while the PCS fraud that accounts for a third of actual convictions slips through.
In Post 4, we test whether a classifier can formalize this intensity signal into a prospective screening tool, and whether the signal is fraud-specific or an artifact of enforcement timing.
References
- Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453.
- Obermeyer, Z. et al. (2020). Algorithmic bias playbook. Presented at FTC PrivacyCon.
- Shekhar, S., Leder-Luis, J., & Akoglu, L. (2026). Can machine learning target health care fraud? Evidence from Medicare hospitalizations. Journal of Policy Analysis and Management, 45(1).
- Johnson, J.M. & Khoshgoftaar, T.M. (2023). Data-centric AI for healthcare fraud detection. SN Computer Science, 4(4), 389.
- OIG OEI-09-24-00200. (2024). Medicaid Fraud Control Units fiscal year 2023 annual report.
- CDC. (2025). Autism spectrum disorder prevalence, 2022. MMWR Surveillance Summary, 74(SS-2).
- Benesch Law. (2025). OIG finds significant improper Medicaid payments for ABA services.
- Serdyuk, A. et al. (2025). Preventing insurance denials of applied behavior analysis based on misuse of medically unlikely edits. Behavior Analysis in Practice, 18, 579-590.
- Skaff, L. et al. (2012). A case study in healthcare fraud detection. International Journal of Knowledge-Based Organizations, 2(1), 1-15.
- GAO-17-169. (2017). Personal care services: CMS needs to do more to reduce the risk of harm to Medicaid beneficiaries.
- Michigan Auditor General. (2016). Michigan Unemployment Insurance Agency fraud detection performance audit.
- Amnesty International. (2021). Xenophobic machines: Discrimination through unregulated use of algorithms in the Dutch childcare benefits scandal.
- Minnesota Reformer. (2025, December 3). Spooked by fraud, DHS neglects its responsibilities toward disabled Medicaid recipients.
- North Carolina Division of Health Benefits. (2025). ABA services utilization report. Cited in KFF Health News, December 2025.
- HHS-OIG. (2024). Indiana made at least $56 million in improper fee-for-service Medicaid payments for ABA. Report A-09-22-02002.
- GAO-24-107487. (2024). Medicare and Medicaid: Additional actions needed to enhance program integrity and save billions.
Updated March 2026 with validated results from Cholette (2026).
This series is based on the peer-reviewed working paper: Cholette, V. (2026). What Do Medicaid Fraud Classifiers Actually Detect? SSRN Working Paper.
Suggested Citation
Cholette, V. (February 2026). What Billing Patterns Actually Look Like. Too Early To Say. https://tooearlytosay.com/research/methodology/medicaid-billing-patterns/Copy citation