How it works

Methodology

Our platform processes 2,684,826 SAM.gov entity registrations across all 50 states, groups them by normalized physical address, and cross-references every cluster against 167,681 federal exclusion records to produce systematic state-level risk landscape reports.

Data sources

2,684,826

SAM.gov entity registrations

Expired and active System for Award Management registrations including legal name, UEI, physical address, NAICS codes, and registration dates across all 50 states + DC.

$5.08B

USAspending.gov contracts

Federal prime award summaries (FY2021-2026) including contract values, awarding agencies, set-aside designations, and contractor identification.

167,681

SAM exclusion records

Federal exclusion and debarment list including entity name, UEI, excluding agency, exclusion type, action date, and termination date.

Address normalization

Before clustering, all physical addresses are normalized by stripping suite numbers, unit designators, apartment numbers, room numbers, floor numbers, and building numbers. This ensures that entities at "525 Corporate Dr STE 201" and "525 Corporate Dr STE 203" are correctly grouped as co-located rather than treated as separate addresses. This single processing step increased our cluster count from 17,647 to 70,115 and confirmed findings from 37 to 129.

Entity clustering

Entities are grouped by normalized physical address plus city. A cluster is formed when three or more uniquely named entities share the same normalized address. Each cluster receives an automated risk score (0-100) based on entity count, coordinated SAM expiration dates, mixed contract status, total contract value, set-aside certification flags, NAICS code concentration, and the number of entities holding active contracts.

Exclusion cross-check

Every entity in every cluster is checked against the 167,681 federal exclusion/debarment records by both UEI exact match and firm name exact match. A positive match immediately elevates the cluster to "confirmed finding" status. Our database contains exclusion records from all major federal agencies including DLA, SBA, DOS, DOJ, USAF, Army, Navy, DOL, EPA, HHS, and others.

Network expansion

Six-ring network mapping engine

Each confirmed cluster is expanded through six concentric rings of relationship discovery to map the full entity network.

Ring 01

Anchor cluster

All entities at the confirmed address with their SAM registration, contract awards, and exclusion status. This is the starting point of the analysis.

Ring 02

Name-stem expansion

Distinctive name components (e.g., unique brand identifiers) are extracted and searched across all SAM entities nationwide. Results filtered by address overlap or exclusion list presence. Common words are blocklisted to prevent false matches.

Ring 03

Exclusion network

Same-agency exclusion records within 45 days of anchor exclusion date. Mandatory name-stem filter prevents unrelated entities from being grouped. Enforcement wave entities bypass stem filter only if wave contains an anchor entity.

Ring 04

Hub detection

All discovered entities grouped by normalized address to identify multi-entity hubs across the network. Each hub is characterized by entity count, exclusion count, and geographic location.

Ring 05

Contract overlay

Federal contracts pulled for every entity in the network regardless of state. Cross-state contract patterns and total contract value calculated for the complete network.

Ring 06

Principal detection

Individual names identified at hub addresses and in exclusion records matching network name patterns. Filtered to same-agency enforcement to prevent false attribution.

Quality assurance

Accuracy measures

The platform prioritizes accuracy over volume. Every design decision reflects the principle that a single false positive undermines the entire trust system.

Ring 3 precision controls — 45-day matching window with mandatory name-stem filter. 14-day wave grouping. Anchor-wave bypass verification. Prevents false merging of unrelated enforcement actions.

Virtual office detection — 13 known registered agent and virtual office addresses filtered from findings with clear documentation. Legitimate office buildings with real exclusion matches are individually reviewed and kept.

Common stem blocklist — 130+ common words excluded from brand family detection to prevent false name-stem matches on generic terms.

Auto-clear known firms — 130+ patterns covering universities, government entities, defense contractors, and utilities automatically cleared from review queue.

Pre-sale verification — UEI exclusion match verified on SAM.gov for 100% of excluded entities. Anchor addresses confirmed not virtual offices via Google Street View. Enforcement dates verified against source records.

Independent review — Platform methodology and findings independently reviewed four times by external reviewers. Rating progression: 7/10 to 9.5/10 after systematic corrections.

See the methodology in action

Browse state intelligence reports to see how our clustering, scoring, and exclusion matching produces actionable geographic intelligence.

View State Reports
Questions? info@convergence-data-analytics.com