JSON-LD and Brand Data: Mining Structured Information
What Is JSON-LD?
JSON-LD (JSON for Linked Data) is a method of encoding structured data using JSON. When embedded in a web page, it provides machine-readable information about the page's content, following vocabularies defined by Schema.org.
For brand intelligence, JSON-LD is a goldmine. It often contains the most accurate and comprehensive information about an organization, including:
- Official company name
- Logo URL
- Contact information
- Social media profiles
- Physical address
- Description
Why JSON-LD Is Our Top Priority
In our extraction pipeline, we check JSON-LD before any other source. Here is why:
1. Explicit Intent
When a website includes JSON-LD, they are explicitly stating structured information about their organization. This is not inferred or guessed; it is intentionally provided by the site owner.
2. Standardized Format
Schema.org provides a consistent vocabulary. An Organization type always has the same properties, making extraction reliable and predictable:
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Fetching Company",
"url": "https://fetching.company",
"logo": "https://fetching.company/favicon.svg",
"email": "hello@fetching.company",
"sameAs": [
"https://twitter.com/fetchingcompany",
"https://github.com/fetchingcompany"
]
}
3. SEO Motivation
Sites add JSON-LD primarily for SEO benefits (rich snippets, knowledge panels). This means the data is typically well-maintained and accurate, since incorrect structured data can hurt search rankings.
Schema Types We Extract From
Organization
The most valuable type for brand intelligence. Contains name, logo, description, contact info, and social links.
WebSite
Provides the site name, URL, and sometimes search functionality details.
LocalBusiness
An extension of Organization with additional properties like opening hours, geo coordinates, and service area.
Product
For e-commerce sites, Product schemas reveal pricing, availability, and brand associations.
BreadcrumbList
Helps us understand the site's information architecture and hierarchy.
Extraction Examples
Social Profiles from sameAs
The sameAs property is the most reliable source for social media links:
{
"sameAs": [
"https://www.linkedin.com/company/example",
"https://twitter.com/example",
"https://www.facebook.com/example",
"https://www.instagram.com/example"
]
}
We parse these URLs to identify the platform and extract the profile handle or ID.
Contact Information
JSON-LD can contain multiple contact points with different purposes:
{
"contactPoint": [
{
"@type": "ContactPoint",
"telephone": "+1-555-123-4567",
"contactType": "customer service",
"email": "support@example.com"
}
]
}
Logo with Specific Guidelines
Google recommends specific logo requirements in JSON-LD:
- Minimum 112x112 pixels
- Image URL must be crawlable
- File format: JPG, PNG, or SVG
- Represent the organization, not the website
When a logo meets these criteria, we prioritize it over other sources.
Handling Multiple JSON-LD Blocks
Many pages contain multiple JSON-LD blocks, sometimes using @graph to combine them:
{
"@context": "https://schema.org",
"@graph": [
{ "@type": "WebSite", ... },
{ "@type": "Organization", ... },
{ "@type": "WebPage", ... }
]
}
We parse all blocks and merge relevant data, prioritizing Organization and LocalBusiness types.
Validation and Fallbacks
Not all JSON-LD is valid or complete. We validate extracted data against Schema.org specifications and fall back to other sources when JSON-LD is:
- Missing: approximately 40% of websites have no JSON-LD
- Incomplete: some sites only include WebSite type without Organization
- Incorrect: malformed JSON or invalid property values
When JSON-LD is present and valid, it typically provides the most accurate data. When it is missing or incomplete, we fall back to meta tags, Open Graph properties, and DOM analysis.
For Site Owners
If you want your brand to be accurately represented in APIs and search engines, adding comprehensive JSON-LD is the single most impactful thing you can do. Use the Organization type with logo, description, contactPoint, and sameAs properties.
Google's Structured Data Testing Tool can validate your markup.