JSON-LD and Brand Data: Mining Structured Information

JSON-LD and Brand Data: Mining Structured Information

Jasper Koers 7 min read Engineering

What Is JSON-LD?

JSON-LD (JSON for Linked Data) is a method of encoding structured data using JSON. When embedded in a web page, it provides machine-readable information about the page's content, following vocabularies defined by Schema.org.

For brand intelligence, JSON-LD is a goldmine. It often contains the most accurate and comprehensive information about an organization, including:

  • Official company name
  • Logo URL
  • Contact information
  • Social media profiles
  • Physical address
  • Description

Why JSON-LD Is Our Top Priority

In our extraction pipeline, we check JSON-LD before any other source. Here is why:

1. Explicit Intent

When a website includes JSON-LD, they are explicitly stating structured information about their organization. This is not inferred or guessed; it is intentionally provided by the site owner.

2. Standardized Format

Schema.org provides a consistent vocabulary. An Organization type always has the same properties, making extraction reliable and predictable:

{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Fetching Company",
  "url": "https://fetching.company",
  "logo": "https://fetching.company/favicon.svg",
  "email": "hello@fetching.company",
  "sameAs": [
    "https://twitter.com/fetchingcompany",
    "https://github.com/fetchingcompany"
  ]
}

3. SEO Motivation

Sites add JSON-LD primarily for SEO benefits (rich snippets, knowledge panels). This means the data is typically well-maintained and accurate, since incorrect structured data can hurt search rankings.

Schema Types We Extract From

Organization

The most valuable type for brand intelligence. Contains name, logo, description, contact info, and social links.

WebSite

Provides the site name, URL, and sometimes search functionality details.

LocalBusiness

An extension of Organization with additional properties like opening hours, geo coordinates, and service area.

Product

For e-commerce sites, Product schemas reveal pricing, availability, and brand associations.

BreadcrumbList

Helps us understand the site's information architecture and hierarchy.

Extraction Examples

Social Profiles from sameAs

The sameAs property is the most reliable source for social media links:

{
  "sameAs": [
    "https://www.linkedin.com/company/example",
    "https://twitter.com/example",
    "https://www.facebook.com/example",
    "https://www.instagram.com/example"
  ]
}

We parse these URLs to identify the platform and extract the profile handle or ID.

Contact Information

JSON-LD can contain multiple contact points with different purposes:

{
  "contactPoint": [
    {
      "@type": "ContactPoint",
      "telephone": "+1-555-123-4567",
      "contactType": "customer service",
      "email": "support@example.com"
    }
  ]
}

Logo with Specific Guidelines

Google recommends specific logo requirements in JSON-LD:

  • Minimum 112x112 pixels
  • Image URL must be crawlable
  • File format: JPG, PNG, or SVG
  • Represent the organization, not the website

When a logo meets these criteria, we prioritize it over other sources.

Handling Multiple JSON-LD Blocks

Many pages contain multiple JSON-LD blocks, sometimes using @graph to combine them:

{
  "@context": "https://schema.org",
  "@graph": [
    { "@type": "WebSite", ... },
    { "@type": "Organization", ... },
    { "@type": "WebPage", ... }
  ]
}

We parse all blocks and merge relevant data, prioritizing Organization and LocalBusiness types.

Validation and Fallbacks

Not all JSON-LD is valid or complete. We validate extracted data against Schema.org specifications and fall back to other sources when JSON-LD is:

  • Missing: approximately 40% of websites have no JSON-LD
  • Incomplete: some sites only include WebSite type without Organization
  • Incorrect: malformed JSON or invalid property values

When JSON-LD is present and valid, it typically provides the most accurate data. When it is missing or incomplete, we fall back to meta tags, Open Graph properties, and DOM analysis.

For Site Owners

If you want your brand to be accurately represented in APIs and search engines, adding comprehensive JSON-LD is the single most impactful thing you can do. Use the Organization type with logo, description, contactPoint, and sameAs properties.

Google's Structured Data Testing Tool can validate your markup.

Share this article

Ready to try the API?

Extract brand data from any website with a single API call. Start free.