In Post 1, we looked at what the Medicaid spending file contains and what it leaves out. The next question follows naturally: to build a classifier that screens providers for fraud, we need labeled training data. Which providers committed fraud? Which ones stayed clean? Any supervised machine learning (ML) approach starts here.
The obvious candidate for “known fraudulent” labels is the Office of Inspector General’s (OIG) List of Excluded Individuals and Entities, the LEIE. The federal government updates this registry monthly, and it contains about 82,000 entries. Nearly every published study that applies supervised ML to healthcare fraud detection uses it as ground truth.
The problem: the LEIE lists excluded providers, and exclusion covers far more ground than fraud.
What Gets a Provider Excluded
The LEIE operates under two sets of authorities from Section 1128 of the Social Security Act. Mandatory exclusions under Section 1128(a) carry a minimum five-year ban:
| Code | Description |
|---|---|
| 1128(a)(1) | Conviction of a program-related crime |
| 1128(a)(2) | Conviction related to patient abuse or neglect |
| 1128(a)(3) | Felony conviction related to healthcare fraud |
| 1128(a)(4) | Felony conviction related to controlled substances |
These are the exclusions most people think of when they hear “LEIE.” Providers who stole from Medicare. Providers who abused patients. Providers convicted of fraud felonies.
But there’s a second set. Permissive exclusions under Section 1128(b) cover a wider range of conduct:
| Code | Description |
|---|---|
| 1128(b)(4) | License revocation, suspension, or surrender |
| 1128(b)(7) | Fraud, kickbacks, and other prohibited activities |
| 1128(b)(14) | Default on a health education loan |
The single largest category in the LEIE is 1128(b)(4): license revocation or suspension. In the current database, over 32,000 entries carry this code, roughly 40% of all exclusions. These are providers who lost their professional licenses for any reason: negligence, substance abuse, administrative failures, lapsed continuing education. Many of them never billed a fraudulent claim.
When we download the LEIE and label every listed provider as “fraudulent,” we’re including tens of thousands of providers whose exclusion had nothing to do with fraud. We’re training our model on contaminated labels.
What This Does to a Classifier
To see why this matters, consider what a fraud classifier is trying to learn. It takes billing features as input (how much a provider bills, how many patients they see, how concentrated their procedure codes are) and predicts whether the provider is fraudulent.
If the training labels include thousands of providers who lost licenses for non-fraud reasons, the model stops learning “what does fraud look like in billing data?” Instead, it learns “what does any kind of professional trouble look like in billing data?” Those two questions have different answers.
Providers excluded for license problems tend to be smaller, lower-volume practitioners — and those excluded during the 2018-2024 billing panel have shorter observation windows by construction, which further deflates their panel totals. A solo practitioner who loses a license for substance abuse looks nothing like a provider running a billing mill. But both carry the same label in the LEIE. A model trained on this mixture may learn that lower billing volume predicts exclusion, the opposite of the fraud signal we’re looking for.
A preliminary look at the Medicaid spending data bears this out. Training a classifier on all LEIE exclusion types as the positive class, the top SHAP features (which decompose each prediction into individual feature contributions) all point the same direction: lower total claims, lower total paid, fewer unique beneficiaries push toward the “excluded” prediction. Two mechanisms drive this: label contamination from non-fraud exclusions, and the fact that providers excluded mid-panel have fewer months of observable billing. The model learns to spot small providers with short billing histories, missing the providers who committed fraud.
A useful correction is to compare per-month billing rates rather than panel totals. When we do, fraud-excluded providers bill at roughly 140 claims per month — more than double the 63 claims per month for non-excluded providers. The billing intensity signal is real. Panel totals obscure it because they conflate billing rate with observation window length.
Other researchers have noticed this too. One Medicare fraud study used eight exclusion codes as labels, including 1128(b)(4) [2]. The LEIE captures only “brazen, outlandish billing patterns,” while subtle fraud responsible for the majority of financial losses goes undetected [3]. And 38% of providers with fraud convictions remain in medical practice, with 21% never suspended [4]. Even the fraud-specific codes miss a substantial portion of actual fraud.
The Positive-Unlabeled Problem
Even restricting labels to fraud-specific exclusion codes leaves a deeper problem: what the ML literature calls positive-unlabeled (PU) learning.
In standard binary classification, we have positive examples (fraud) and negative examples (not fraud). In healthcare fraud detection, we have positive examples (providers on the LEIE) and unlabeled examples (everyone else). The difference matters: we don’t know which unlabeled providers are truly non-fraudulent and which simply haven’t been caught yet.
A survey of the PU learning literature identifies a foundational assumption: SCAR, or Selected Completely At Random [1]. The idea is that the selection process chooses labeled positives independently of their features. In fraud detection, SCAR breaks down. Providers who commit blatant, high-volume fraud are more likely to get caught and excluded than providers who commit subtle fraud. Providers in states with well-funded Medicaid Fraud Control Units face more scrutiny than providers in states where enforcement resources run thin.
The practical consequence: the LEIE over-represents blatant billing schemes, personal care attendant fraud, and controlled substance diversion, while under-representing systematic upcoding, medical necessity fraud, and managed care schemes. A classifier trained on LEIE labels inherits these enforcement biases and amplifies them.
As of this writing, no published paper appears to frame LEIE-based fraud detection explicitly as a PU learning problem, though the concept runs through several recent reviews. Providers “never reviewed did or did not commit fraud” remain in an uncertain state [5]. Researchers have applied PU learning to financial fraud detection, though not yet to healthcare [6]. That gap raises an open question: how much does enforcement selection bias shape what these models learn?
One way to test whether the federal list tells the whole story is to compare it against state-level exclusion data.
Where State Lists Help (and Where They Don’t)
The LEIE is a federal list, but 45 states maintain their own Medicaid exclusion or suspension lists. California’s Suspended and Ineligible (S&I) list, for example, contains about 22,000 entries. State lists capture a different slice: Medi-Cal billing violations, state license actions, contractual breaches that fall below the threshold for federal exclusion. Like the LEIE, they mix fraud with non-fraud actions, so the label contamination problem persists.
The overlap between state and federal lists turns out to be thin. Only about 6% of names on California’s S&I list appear on the LEIE. According to compliance vendors who track these databases, roughly 50% of state-level exclusions never appear on the federal list. (This figure comes from ExclusionScreening.com, a vendor source, not peer-reviewed.)
What happens when we combine federal and state lists? In the Medicaid spending data, the current LEIE matches about 1,240 providers by National Provider Identifier (NPI). Adding the California S&I list adds roughly 580 more. Adding an older LEIE snapshot (capturing providers later reinstated) adds a handful. The combined total comes to about 1,817 excluded providers matched to billing data, out of 1.8 million total providers. That’s a 0.1% positive rate.
Class Imbalance and What It Means
That 0.1% rate creates an extreme class imbalance problem. Published studies report similar rates:
| Study | Dataset | Fraud Rate |
|---|---|---|
| Johnson & Khoshgoftaar (2023) | Medicare Part B | 0.046% |
| Johnson & Khoshgoftaar (2023) | Medicare Part D | 0.065% |
| Bauder & Khoshgoftaar (2018) | Medicare Part B (2012-15) | 0.009% |
| Tajrobehkar et al. (2024) | Medicare ophthalmology | 0.038-0.074% |
At these rates, a classifier that predicts “not fraud” for every single provider achieves 99.9% accuracy. That number is meaningless. What matters is precision (of the providers flagged, how many are truly fraudulent?) and recall (of the truly fraudulent providers, how many did we catch?).
Class imbalance also means that even seemingly precise models generate large numbers of false positives. At 50% precision, every two flags produce one true positive and one false positive. At Medicaid Fraud Control Unit (MFCU) investigation costs, each false positive has concrete consequences: a provider’s billing gets frozen, patients lose access, and reputations take damage.
Why such extreme imbalance? Because the labels come from enforcement, and enforcement has its own patterns.
Enforcement Patterns Shape the Labels
The MFCU annual report for fiscal year (FY) 2023 gives us a window into what enforcement actually looks like [7]:
| Statistic | Value |
|---|---|
| Total convictions | 1,143 |
| Fraud convictions | 814 (71%) |
| Patient abuse/neglect convictions | 329 (29%) |
| Total recoveries | $1.2 billion |
| Return on investment | $3.35 per $1 spent |
| PCS attendants as % of fraud convictions | 34% |
Personal care service attendants account for 34% of fraud convictions despite being one of roughly 80 provider types. The likely explanation: PCS fraud is easier to detect (ghost patients, fabricated timesheets) and easier to prosecute (clear paper trails, lower legal complexity). So the labels inherit this enforcement pattern: the LEIE reflects what investigators prioritize, which is a narrow slice of actual fraud.
To put this in perspective: the National Health Care Anti-Fraud Association estimates that healthcare fraud consumes 3-10% of total healthcare expenditures, possibly exceeding $300 billion annually [8]. MFCU recoveries of $1.2 billion represent less than half a percent of that estimate. The gap between estimated fraud and detected fraud dwarfs what enforcement can reach, and every undetected fraudulent provider sits in our “unlabeled” class, indistinguishable from legitimate ones.
What This Means for the Series
The label problem constrains everything that follows. In Post 3, when we compare billing patterns of excluded versus non-excluded providers, the comparison is between providers labeled as excluded (a heterogeneous group including fraudsters, substance abusers, and loan defaulters) and providers not labeled as excluded (a group that includes both legitimate providers and undetected fraud).
In Post 4, when we train classifiers, label quality caps what the model can achieve. A perfect classifier on imperfect labels still produces imperfect results.
The more careful approach: treat the LEIE as a noisy, enforcement-biased signal that captures a non-random subset of problematic providers. It remains the best available label. But the published literature and the amateur analyses of the DOGE dataset too often conflate “the best we have” with “ground truth.”
References
- Bekker, J. & Davis, J. (2020). Learning from positive and unlabeled data: A survey. Machine Learning, 109(4), 719-760.
- Johnson, J.M. & Khoshgoftaar, T.M. (2023). Data-centric AI for healthcare fraud detection. SN Computer Science, 4(4), 389.
- Tajrobehkar, M. et al. (2024). Utilization analysis and fraud detection in Medicare via machine learning. medRxiv, 2024.12.30.24319784. [Preprint; not peer-reviewed.]
- du Preez, A. et al. (2025). Fraud detection in healthcare claims using machine learning: A systematic review. Artificial Intelligence in Medicine, 160, 103061.
- Kumaraswamy, N. et al. (2022). Healthcare fraud data mining methods: A look back and look ahead. Perspectives in Health Information Management, 19(1), 1i.
- Vinay, M.S., Yuan, S., & Wu, X. (2022). Fraud detection via contrastive positive unlabeled learning. IEEE International Conference on Big Data.
- OIG OEI-09-24-00200. (2024). Medicaid Fraud Control Units fiscal year 2023 annual report.
- NHCAA. (n.d.). The challenge of health care fraud. https://www.nhcaa.org/tools-insights/about-health-care-fraud/the-challenge-of-health-care-fraud/
Last updated: February 25, 2026.
Updated February 25, 2026. The original analysis compared panel totals without accounting for observation window length. Providers excluded during the panel had shorter billing histories by construction, deflating their totals. Per-month billing rates — which correct for this — show fraud-excluded providers bill at more than double the non-excluded rate regardless of when enforcement occurred. Revised text reflects this correction.
Suggested Citation
Cholette, V. (February 2026). The Label That Isn't: Why "Excluded" Doesn't Mean "Fraudulent". Too Early To Say. https://tooearlytosay.com/research/methodology/medicaid-fraud-labels/Copy citation