Blog
Technical8 min read

Where Does Enrichment Data Come From? A Guide to Third-Party Data Sources

DH
Dennis Hegstad
Founder, sonarID · February 27, 2026
Where Does Enrichment Data Come From? A Guide to Third-Party Data Sources

Customer enrichment data comes from a handful of distinct source types: corporate records and business registries, public social network profiles and their APIs, public records and licensing databases, geographic and demographic datasets tied to zip codes, IP and device signals, and aggregated commercial data brokers who license and resell the above. An enrichment provider takes an input you already have, usually an email address and a shipping address from an order, and matches it against these sources to return a fuller picture: the person's likely employer, job title, social presence, neighborhood income band, and the signals that suggest whether they are an executive, an influencer, a journalist, or an affluent buyer.

For a Shopify merchant, the practical answer is that no single magic database exists. Enrichment is a matching and aggregation problem. Each source contributes one slice of identity, and a good provider stitches those slices together, scores the confidence of each match, and discards the noise. This guide walks through every major source category, explains what each one can and cannot tell you, and shows how a tool like SonarID uses a layered approach to keep cost low and accuracy high while staying on the right side of privacy expectations. If you want the conceptual foundation first, start with what identity data actually is in ecommerce, then come back here for the plumbing.

Corporate Records And Business Registries

The richest signal in B2B-flavored enrichment is the corporate domain. When a customer checks out with an email like jane@stripe.com, the domain itself is a public, structured fact: it maps to a company, and that company has a size, an industry, and a funding profile that are documented in business registries, government filings, and commercial company databases. Providers maintain large indexes of domain-to-company mappings, often built by crawling company websites, parsing public filings, and licensing business directories.

This is why email-domain matching is the backbone of free-tier signal layers. It does not require a paid per-lookup call to a third party for the common cases, because the domain-to-company relationship can be precomputed and stored. SonarID treats this as part of its no-cost signal layer alongside spend analysis and affluent-zip matching. If you want the deep mechanics of why a corporate domain is so revealing, and why a Gmail address is not a dead end either, read how email domain matching works. The short version: a corporate domain can identify a customer's employer with high confidence, which is exactly how you spot founders, executives, and investors hiding in your order feed. For the B2B angle specifically, see detecting corporate email domains.

The limitation is obvious. Most consumer orders use free webmail providers like Gmail, Yahoo, or iCloud. Corporate records do nothing for those, which is why enrichment cannot stop at the domain. It is the cheapest and most reliable source, but it only covers a slice of your customers.

Public Social Network Profiles And APIs

The second major source is the social graph. Platforms like LinkedIn, Instagram, TikTok, X, and YouTube host enormous amounts of self-published professional and creator data: job titles, follower counts, bios, verified-account status, and content categories. Some of this is available through official platform APIs, some through licensed data partnerships, and some through compliant aggregation of public profile pages.

This is the source that turns a name and email into a creator or press identification. Follower counts and engagement signals tell you whether a customer is a nano-influencer or a public figure with millions of followers. A verified status and a journalist bio tell you that an order came from press before they publish. Job-title fields on professional networks confirm the executive or investor signal that the corporate domain first hinted at. For a deeper look at what each platform actually reveals about a buyer's value, see social profile data and what LinkedIn, TikTok, and Instagram reveal.

Social data is powerful, but it is the most volatile and the most expensive to access responsibly. Profiles change, follower counts inflate, and matching an email to the correct profile carries real risk of false positives. This is where a confidence score matters most. A responsible provider returns a probability, not a certainty, and a responsible merchant treats a low-confidence creator match as a lead to verify rather than a fact to act on.

Public Records And Licensing Databases

Beyond companies and social profiles, a large category of enrichment data comes from genuine public records: property records, professional licensing databases, court and regulatory filings, voter and census-adjacent demographic files where legally permitted, and business ownership filings. These sources power signals like home value, profession (a licensed dermatologist, attorney, or financial advisor), and ownership of a registered business.

For specific verticals this is gold. A beauty brand wants to know if a customer is a licensed dermatologist or makeup artist. A B2B-leaning brand wants to know if a buyer owns a registered reseller business. These facts often live in public licensing and registration databases rather than in social or corporate sources. Public records are also the backbone of the affluent-buyer signal when combined with geography, which we cover next.

The responsibility bar here is high. Public-records data is legal to use but carries strong expectations about purpose and proportionality. Merchants should use it to improve service and prioritization, not to make eligibility decisions about credit, employment, or housing, which are regulated uses. Keeping enrichment in the service-and-marketing lane is the line that keeps you compliant.

Geographic And Zip-Code Demographic Data

Every order ships somewhere, and that destination is one of the most underrated enrichment sources available. Zip-code and census-tract datasets publish median household income, home values, education levels, and other aggregate demographics. By matching a shipping address to its zip-code profile, an enrichment system can estimate buying power without touching any individual personal record at all.

This is the affluent-zip signal, and it is part of SonarID's free, no-per-lookup layer for a reason: the data is aggregate and public, so it costs nothing per order and raises no individual-profile privacy concern. It tells you the neighborhood, not the person, which is precisely why it is both cheap and relatively safe. For the full breakdown of what a residential address reveals about spending capacity, see affluent zip code intelligence.

The reason SonarID weights the shipping address over the billing address is that the residence is where someone lives, which is a far stronger wealth and identity signal than a billing address that might be a corporate card or a digital-only entry. Address quality matters enormously here, which is why a step like address verification within enrichment sits upstream of the geographic match. A mistyped or undeliverable address poisons the zip signal.

IP, Device, And Behavioral Signals

A different class of source comes not from external databases but from the session itself. IP address geolocation can corroborate or contradict a shipping address, flag VPN and proxy usage, and hint at corporate networks. Device and behavioral signals, like time-of-day patterns, order frequency, and basket composition, are first-party data you already own and that no third party needs to supply.

These signals are most valuable for two jobs. The first is fraud and chargeback prevention, where a mismatch between IP geography and shipping address is a classic risk flag. The second is reseller and wholesale detection, where repeat high-volume ordering patterns reveal a buyer who is restocking, not personally consuming. Behavioral signals are entirely yours, which makes them the cleanest source from a privacy standpoint and a natural complement to externally sourced identity data. For where this line sits, compare third-party enrichment against first-party data.

Commercial Data Brokers And Aggregators

Underneath many enrichment APIs sits a layer of commercial data aggregators who license, clean, and resell data drawn from the categories above. They consolidate corporate, social, public-record, and demographic data into queryable indexes, and most enrichment providers blend their own first-party crawling with one or more of these licensed feeds.

This is the layer merchants should ask the hardest questions about. Where did the broker source the data, do they honor deletion and opt-out requests, and do they pass those obligations downstream. The provenance of broker data is exactly what regulators scrutinize, and it is why a merchant's compliance posture depends on the provider's compliance posture. Before you rely on any enrichment source, read GDPR and CCPA compliance for customer enrichment so you know what your provider must be able to demonstrate.

How A Layered System Uses These Sources Responsibly

The art of enrichment is not finding one perfect source. It is layering cheap, low-risk sources first and reaching for expensive, higher-risk ones only when they add real value. SonarID is built this way on purpose. The free signal layer combines corporate-domain matching, spend and lifetime-value analysis, and affluent-zip matching, none of which carries a per-lookup cost and all of which draws on aggregate or already-owned data. Only when those signals suggest a customer is genuinely worth a closer look does the system spend on a full paid enrichment, priced at five cents per enrichment, to pull the social and public-record detail that confirms exactly who the person is. Every plan caps how many of those paid enrichments run, so cost stays predictable rather than scaling without limit.

That ordering is both an economic decision and an ethical one. It keeps cost per identified VIP low, and it minimizes how often individual-level third-party data is touched. A merchant identifying a handful of true VIPs from thousands of orders should not be running a paid identity lookup on every single shopper, and a well-designed system does not. This is the heart of a privacy-first approach to customer intelligence: use the least invasive source that answers the question, and stop there.

Responsible use also means treating enrichment output as probabilistic. Every source contributes a confidence-weighted signal, and the score that SonarID surfaces reflects that uncertainty. You act with high confidence on a verified press contact or a clear corporate-domain match. You verify before acting on a borderline social match. For the broader picture of how all of this fits into a Shopify data strategy, the customer data enrichment guide for Shopify ties the sources, the scoring, and the workflows together.

What This Means For Your Store

You do not need to license a dozen data feeds or build a matching pipeline to benefit from enrichment. That is the entire point of using an app rather than building one. What you should understand as a merchant is the provenance chain: your order gives an email and an address, those inputs match against corporate, social, public-record, geographic, and behavioral sources, and the result is a scored identity you can act on. Knowing where each signal comes from helps you trust the strong ones, question the weak ones, and explain your practices if a customer or regulator ever asks. Enrichment is not surveillance. Done well, it is the disciplined use of public and first-party signals to recognize the customers who were already buying from you.

Frequently asked questions

Where does customer enrichment data actually come from?

It comes from corporate records and business registries, public social network profiles and APIs, public records and licensing databases, zip-code demographic data, IP and behavioral signals, and commercial data aggregators that license and resell those sources.

Which enrichment source is the cheapest and safest to use?

Aggregate sources like email-domain matching, zip-code demographics, and your own first-party spend and behavioral data are the cheapest and lowest-risk because they cost nothing per lookup and do not touch individual third-party profiles. SonarID uses these as a free signal layer before any paid enrichment.

Can enrichment identify a customer's employer from just an email?

Yes, when the email uses a corporate domain. The domain maps to a documented company in business registries, so an email like jane@company.com reliably reveals the employer. Free webmail addresses like Gmail require other sources to identify a workplace.

Is using third-party enrichment data legal and compliant?

Used for service and marketing prioritization, yes. The line to avoid is making regulated eligibility decisions about credit, employment, or housing. Compliance depends heavily on your provider's data provenance and their ability to honor deletion and opt-out requests under GDPR and CCPA.

Why does SonarID weight the shipping address over the billing address?

The shipping address is usually a residence, which is a far stronger wealth and identity signal than a billing address that may be a corporate card or digital-only entry. The residence drives the affluent-zip demographic match that estimates buying power.

Does every order trigger a paid enrichment lookup?

No. A layered system runs free signals first and only spends on a full paid enrichment, priced at five cents each, when those signals indicate a customer is worth a closer look. Each plan also caps the number of paid enrichments, which keeps cost per identified VIP low and predictable.

Ready to know who is buying from you?

Start identifying VIP customers, influencers, and notable figures in your order stream — automatically.

Start detecting VIPs
End
DH
Written by
Dennis Hegstad
Founder, sonarID