Author: Jaclyn Harron

  • Brand Perception Atlas: Mapping the Modern Brand, From Social Signal to Core Equity

    A short video walkthrough of the Atlas can be viewed below:

    Video Walkthrough: Exploring the Brand Perception Atlas ⬇️

    Introduction

    A brand is not just what it says it is.

    It is what people repeatedly, collectively, and emotionally decide it is.

    In 2026, those decisions are happening everywhere: on Instagram, TikTok, in reviews, in news cycles, in comment sections, and in the accumulated memory of long-term brand equity.

    For organizations attempting to understand brand perception, this environment presents a fundamental challenge. The volume of available data is unprecedented, yet the signals it produces are often inconsistent and contradictory. Traditional research tools such as surveys and focus groups remain essential for measuring brand equity, but they capture perception only at specific points in time and cannot fully reflect the fast-moving nature of digital conversation.

    At the same time, social media offers a continuous stream of public commentary, revealing how brands are discussed, interpreted, and compared in everyday discourse. However, these signals are noisy and difficult to interpret in isolation.

    The Brand Perception Atlas was built to turn that noise into a map, designed to integrate diverse perception signals into a unified analytical framework. By combining social media data, public knowledge sources, Large Language Model (LLM) summaries, and established brand equity research, the Atlas aims to provide a more comprehensive understanding of how brands are perceived across the digital ecosystem.

    The Brand Perception Atlas

    The Brand Perception Atlas functions as a navigational system for brand strategy, much like a pilot relies on a map to understand the landscape below. From the cockpit, individual roads and buildings disappear. What matters is the wider pattern: landmarks, terrain, and relationships across the whole field.

    Brand leaders face a similar challenge. A single post, review, or survey response reveals very little on its own. But when thousands of signals are mapped together, larger patterns come into view.

    The Brand Perception Atlas turns those scattered signals into a shared perceptual map, showing how brands cluster, where they compete, and which meanings they occupy in the public imagination.

    To achieve this, the Atlas synthesizes perception signals from several sources:

    • Content from official brand accounts on platforms such as TikTok and Instagram
    • Public narratives reflected in sources like Wikipedia
    • AI-generated summaries, using Gemini, describing how brands are perceived in LLM-based discourse.
    • To anchor the analysis in long-term brand perception, the system incorporates survey-based brand equity data from the WPP Brand Asset Valuator® (BAV).

    BAV holds a special place among brand-perception sensors. Developed by WPP, BAV is one of world’s largest and longest-running brand equity studies, spanning more than three decades, thousands of brands, and multiple markets. Unlike social and digital signals, which infer perception from public behavior, BAV measures it directly by asking consumers what they believe. That makes it the *Atlas’*s anchor: not a snapshot of what people are saying today, but a benchmark for what they have come to believe over time.

    BAV captures this through 48 standardized Imagery Attributes, from functional traits such as Reliable and High Quality to more emotional cues such as “Charming”, “Daring”, and “Sensuous”. In the Atlas, those attributes provide a high-resolution view of brand meaning, making it possible to see exactly which dimensions of perception define its underlying equity.

    Table 1 presents samples from all Atlas sensors for two of the world’s most recognizable brands: Target and Airbnb. The table illustrates how the same brand can be perceived very differently across sensors, highlighting the need for a tool like the Brand Perception Atlas, which brings these diverse perspectives together into a single, contextualized view.

    Sample Brand Perception Reports
    #colspan#
    SourceTargetAirbnb
    **Survey (BAV)
    (Long-term Equity)**“Consumers perceive this brand as a highly accessible and dependable choice, offering excellent value for money. It consistently earns praise for its reliable, high-quality, and original offerings… Seen as a dynamic leader that stays up-to-date and trendy.”“Perceived as a highly original, daring, and progressive leader in the travel space. It is seen as a ‘cool’ and ‘friendly’ brand that offers unique, high-quality experiences. It resonates with intelligent, socially responsible, and trendy individuals looking for an authentic vibe.”
    **Wikipedia
    (public narrative)**“A major American retail chain… positioned as offering upscale products at below-average costs, appealing to a younger, more educated, and higher-income demographic… noted for its significant market presence.”“A global platform for homestays and vacation rentals… credited with revolutionizing the ‘sharing economy.’ However, its narrative is often complicated by discussions regarding its impact on local housing markets, regulatory battles, and ‘overtourism’ in major cities.”
    **Gemini LLM
    (digital discourse)**“Target cultivates an ‘upscale-for-less’ reputation… It’s perceived as a clean, organized, and pleasant one-stop-shop that blends everyday necessities with trendy, affordable finds.”“Widely viewed as a pioneer of authentic, ‘live like a local’ travel. While praised for design and convenience, digital discourse often centers on the ‘Airbnb-ification’ of cities, rising service fees, and the shift from quirky spare rooms to professionalized rentals.”
    **Instagram
    (brand official account)**“An accessible and enjoyable retail destination… The playful and organized presentation of shopping reinforces a positive, discovery-driven customer experience, emphasizing Target’s ability to host unique product lines.”“A visually stunning showcase of ‘bucket-list’ stays and architectural marvels. The brand projects an aspirational yet community-focused vibe, emphasizing human connection, design-led travel, and the ‘Omg!’ category of unique homes.”
    **TikTok
    (brand official account)**“Widely seen as a trendy, accessible retailer offering stylish, curated products and collaborations that enable a joyful lifestyle… captures a highly visual and energetic side of the brand.”“Highly energetic and trend-focused, leaning into creator-led tours of unique properties. The vibe is one of discovery and adventure, making global travel feel personal and attainable through viral ‘hidden gem’ content.”

    Table 1: Sample brand perception reports, for Target and Airbnb, from the Brand Perception Atlas dataset

    Core Brand Identity

    When a focus brand is selected, the Atlas reveals its Core Brand Identity: a breakdown of the top perceptual themes by channel. This provides an at-a-glance view of what each sensor is saying about the brand, and where those signals align or diverge.

    For example, selecting Adobe surfaces a clear identity profile. On BAV, the brand is perceived as Reliable, Innovative, and Intelligent. On Wikipedia, the dominant themes shift to Innovative, Dominant, and Controversial. Each sensor contributes a different angle, but together they reveal the full perceptual picture.

    Figure 1: Screenshot of the Selected Brand’s Core Brand Identity

    Mapping Brand Perception

    At the center of the Atlas is the Perception Map, a visual representation of how brands relate to one another in terms of public perception.

    Each perception signal is converted into a numerical representation of semantic meaning, known as an embedding. A helpful way to think about semantic meaning is as the “essence” of a word, rather than the word itself. For example, although “Luxury” and “High-end” are lexically different, they convey very similar meanings. The **Atlas uses semantic Gemini embeddings to capture these relationships and understand the semantic similarity between terms such as “Luxury,” “Premium,” “Prestigious,” and “High-End.” Because their meanings are closely related, the system places brands characterized by these words in the same neighborhood on the map.

    The resulting map resembles a landscape of brand meaning, allowing brand leaders to identify clusters of brands that share common associations and spot outliers that occupy distinctive perceptual positions.

    Omnichannel Consistency

    Another of the most useful insights derived from the Atlas is a metric that we refer to as Omnichannel Consistency. This measure evaluates how closely aligned a brand’s perception is across different information sources. If the signals derived from social media, surveys, and public narratives cluster tightly together, the brand is communicating a consistent identity. Conversely, if these signals are widely dispersed, the brand’s presence is more diverse.

    Analysis of the dataset identified several brands with exceptionally strong consistency across channels, including John Deere, Caterpillar, and Oscar Health, each showing more than 99% omnichannel consistency. John Deere is a clear example: its core perception of rugged reliability remains stable whether measured through a 30-year longitudinal BAV survey or reflected in viral TikTok videos. Figure 2 illustrates this pattern, with all John Deere signals appearing tightly clustered on the map.

    Figure 2: Perception map showing high omnichannel consistency for John Deere on all platforms (US, 2025-26). The BAV (black diamond), Instagram (red square), TikTok (light blue circle), Gemini (blue hex), and Wiki (gray triangle) sensors are all close to each other in the Perception Map.

    As shown in Figure 3, the brand’s digital content visualizes a bridge between historical heritage and modern utility. This alignment is not merely aesthetic but is deeply rooted in a consistent cross-platform narrative.

    • On BAV, the brand is perceived as a “formidable and dependable leader,” successfully blending a “rugged” and “traditional” foundation with a “distinctively cool and stylish” upper-class appeal. It is consistently rated as a “best-in-class” choice that is “worth more,” signaling its status as a premium investment.
    • On Gemini, the narrative reinforces this by projecting the “vibe of a legacy American icon built on reliability.” The discourse centers on the “durability and longevity” of the equipment, where the classic slogan “Nothing runs like a Deere” continues to underpin a reputation for high performance.
    • On social sensors, the brand’s perception is characterized by “practical innovation”. Reports highlight a “strong brand affinity” built through “real-life customer stories centered around dedication and essential work.” While the content celebrates the “versatile and dependable” nature of the machinery for tasks like snow removal, it also reflects a modern tension: the high cost of entry and proprietary technology, which mirrors the BAV finding that the brand is perceived as a “significant investment.”

    Figure 3: Screenshot of a John Deere Instagram video showing snow removal equipment

    In contrast, several other brands demonstrated much greater variation in perception across sources. Marriott provides a compelling example of a multi-faceted brand that occupies distinct perceptual territories depending on the sensor used. As shown in Figure 4, the Atlas reveals a clear divergence in how the brand is categorized.

    • The BAV sensor (bottom right in Figure 4) positions the brand within a territory of reliability, superior quality, and prestigious appeal, where consumers fundamentally view Marriott as an intelligent, high-performing, and traditional leader. BAV reports explicitly describe it as “a beacon of established excellence and prestige… commanding a perception of leadership and trustworthiness firmly rooted in tradition.”
    • The social sensors (Instagram and TikTok, upper right corner in Figure 4), however, pull the brand into a separate region defined by aspirational luxury and curated escapism. In these digital spaces, Marriott’s intended portrayal is human-centric and inclusive, with official reports describing an organization that “champions individuals” and demonstrates “deep cultural understanding.”

    Figure 4: Screenshot of the Brand Perception Atlas showing the dispersed points of the Marriott

    The variance in the data speaks to a brand successfully managing a legacy reputation while aggressively chasing a modern, inclusive digital identity (see Table 2).

    SensorKey Perceptual SnippetCore Narrative
    BAV“This brand is primarily perceived as highly reliable… a beacon of established excellence and prestige. It commands a perception of leadership and trustworthiness firmly rooted in tradition.”The Anchor: Focuses on dependability, classic excellence, and proven heritage.
    Instagram“Marriott International… projects a vibe of inclusivity, reliability, and cultural understanding. Praised for its global reach, diverse workforce, and initiatives focused on social responsibility.”The Bridge: Humanizes the giant; focuses on “people first” and cultural connectivity.
    TikTok“Cultivates an aspirational and exclusive vibe… promoting global travel and luxury experiences… purveyor of curated escapism and wellness.”The Future: Targets the frequent traveler seeking sophisticated, unique retreats and “Instagrammable” indulgence.

    Table 2: Sample brand perception text, for Marriott, from the Brand Perception Atlas dataset

    Interestingly, multiple travel-related brands exhibited high dispersity across the semantic perception space. The high volume of online discussion surrounding travel experiences, ranging from positive stories to customer complaints, may contribute to a more fragmented perception environment for brands in this sector.

    Omnichannel consistency is not inherently good or bad. Some brands benefit from a tightly aligned identity across platforms, while others thrive by expressing different facets of themselves in different contexts. In categories such as entertainment, fashion, and travel, more fragmented perception may reflect adaptability and cultural relevance rather than weakness.

    For this reason, the consistency metric is diagnostic, not prescriptive. It shows where a brand sits on the spectrum between a unified and multi-faceted perception, helping leaders assess whether that pattern aligns with their strategy.

    Shared Equity, Different Vibe (Close on BAV, Far on Socials)

    The Atlas also reveals unexpected relationships between brands in completely different industries. One of the clearest patterns appears when two brands share a similar equity foundation but project very different identities on social media.

    A good example is 3M and Marriott. At first glance, they are not intuitive neighbors. 3M is associated with science, engineering, and industrial innovation, while Marriott is associated with hospitality, travel, and aspirational leisure. On social media, they occupy very different cultural spaces, and in the Atlas’s social sensors they sit far apart.

    Source3MMarriott
    BAV“Projects a vibe of rugged, energetic reliability combined with a visionary, original spirit… praised for its trustworthiness and distinctively high quality.”“Perceived as superior quality and high-performing… described as unique, stylish, and authentically simple. Appeals to a sophisticated, aspirational lifestyle.”
    Gemini
    “Deeply divided: seen as a legacy American innovator of household staples, but its reputation is tarnished by high-profile legal and environmental controversies.”“A vast corporate giant in hospitality. Reputation has a dual identity: a provider of aspirational luxury vs. an impersonal entity with inconsistent service.”
    Instagram
    “Science and technology powerhouse, praised for its problem-solving capabilities. Emphasizes STEM education and its role in enabling future technologies.”“Cultivates a multifaceted image as a global provider known for community, sustainability, and inclusivity. Projects a vibe of reliability and cultural understanding.”

    Table 3: Sample brand perception reports, for 3M and Marriott, from the Brand Perception Atlas dataset

    Figure 6 illustrates this dynamic for 3M. Content such as its Instagram posts around the Young Scientist Challenge creates a perception centered on innovation, education, and community. The brand appears inspiring, responsible, and forward-looking. This is very different from Marriott, whose social presence is shaped by luxury, travel, and experience.

    Figure 6: Screenshot of 3M Instagram video highlighting the Young Scientist Challenge

    However, the BAV sensor tells a different story. At the level of deeper brand equity, 3M and Marriott emerge as close neighbors because both are anchored by reliability and leadership. In the consumer mind, 3M functions as an innovation backbone, while Marriott functions as a service backbone. Their social expressions differ, but their underlying equity plays a similar emotional role: both are seen as dependable institutions.

    A similar pattern appears with Costco and JetBlue, as shown in Table 4. These brands belong to very different categories, yet on the BAV sensor they appear close together within a shared “consumer champion” territory. Both brands are anchored by associations such as friendliness and reliability. Costco is strongly linked to simplicity, while JetBlue is associated with value. At a foundational level, both occupy a similar emotional space: trusted brands that provide essential services without the friction consumers often expect from their industries.

    SourceCostcoJetBlue
    BAV“Predominantly perceived as high-value, reliable, and authentic… offers helpful and intelligent solutions while demonstrating a commitment to equality.”“Largely perceived as fun, cool, and friendly… valued for its distinctiveness and high quality, contributing to a perception of being trendy and energetic.”
    Wikipedia
    “A highly successful, global membership-only warehouse club known for its value and its strong ‘Kirkland Signature’ private label.”“A hybrid low-cost carrier that disrupted the airline industry with premium amenities. Recently faced scrutiny over alliances and operational reliability.”
    Gemini
    “Widely perceived as a members-only ‘treasure hunt,’ fostering a cult-like loyalty. Praised for unbeatable value, but criticized for a chaotic in-store experience.”“A trendy, modern airline struggling to live up to its reputation. While the in-flight experience is praised, operational reliability is a significant pain point.”
    Instagram
    “Beloved, value-driven club offering a unique shopping experience. Transformed mundane shopping into a leisure activity and shareable ‘Costco haul’ content.”“Projects a customer-centric and approachable vibe. Highlights above-average amenities and playful brand interactions compared to typical budget carriers.”

    Table 4: Sample brand perception reports, for Costco and JetBlue, from the Brand Perception Atlas dataset

    This pattern is diagnostically useful, as it shows that brands can share the same equity backbone while expressing themselves very differently across platforms. Costco doubles down on functional value, while JetBlue leans into aspiration and lifestyle.

    Different categories. Different content. Same emotional job.

    Different Equity, Shared Vibe (Far on BAV, Close on Socials)

    The Atlas also reveals brands that follow the opposite pattern: brands from very different industries that share little underlying equity, yet converge into a similar “vibe” on social media. In these cases, the social layer acts as a cultural blender, pulling very different brands into the same perceptual neighborhood.

    A strong example is General Mills and Smucker’s. At the level of long-term brand equity, these brands occupy distinct positions, as shown in Figure 7. General Mills is perceived as a “highly reliable, high-performance leader” whose vibe blends “traditional prestige” with a “glamorous and daring appeal.” Smucker’s, by contrast, is seen as an “original and unique leader” that fuses a “traditional foundation with a trendy, cool, and dynamic aesthetic,” praised for its “rugged appeal and fun character.”

    According to the BAV sensor, while both are established food brands, they occupy meaningfully different perceptual territories: General Mills anchored in prestige and reliability, Smucker’s in originality and rugged charm.

    Figure 7: Screenshot of the Brand Perception Atlas showing the focus brand, Smuckers, and proximity to General Mills

    Yet on social media, the distinction between them fades. On Instagram and TikTok, both brands converge into a shared neighborhood of nostalgic comfort and family-friendly Americana.

    According to these social sensors, General Mills “cultivates a vibe of reliable convenience and family-friendly nostalgia,” engaging audiences with creative recipe ideas rooted in American food culture. Smucker’s similarly “evokes a strong sense of nostalgic comfort and reliable quality,” projecting tradition, community, and corporate social responsibility. On social platforms, both brands occupy the same emotional space: trusted pantry staples working to stay relevant with modern, health-conscious consumers.

    SourceGeneral MillsSmucker’s
    BAV“Perceived as a highly reliable, high-performance leader offering exceptional quality. Its vibe is one of traditional prestige blended with a glamorous and daring appeal, reflecting an intelligent and customer-caring image.”“Widely perceived as an original and unique leader, skillfully blending a traditional foundation with a trendy, cool, and dynamic aesthetic. Praised for its high quality, rugged appeal, and fun character.”
    Wikipedia
    “A long-standing American multinational food manufacturer with a diverse portfolio (Cheerios, Pillsbury). Focused on expanding health-conscious offerings and sustainability, though it has faced scrutiny over legal terms and health claims.”“A long-established and highly diversified American food and beverage manufacturer (Folgers, Jif). Known for an aggressive acquisition strategy that has transformed it into a Fortune 500 company with a broad portfolio.”
    Gemini LLM“Widely perceived as a dependable, nostalgic American staple increasingly viewed through a lens of corporate scrutiny. Discussions often focus on high sugar content and the ‘shrinkflation’ of its iconic cereal brands.”“Quintessential American heritage brand evoking nostalgia and comfort. While loyalty is built on consistent taste, the brand’s reputation has been challenged by major product recalls and concerns over nutritional value.”
    Instagram
    “Cultivates a vibe of reliable convenience and family-friendly nostalgia. Engages a wide audience with creative recipe ideas while facing criticism regarding the nutritional content of its processed foods.”“Projects a vibe of a reputable company with an emphasis on tradition and community. Praised for its deep-rooted history and significant corporate social responsibility initiatives, including sustainability advancements.”
    TikTok
    “Presents a comfortable, family-friendly image rooted in American food culture. Struggles to fully resonate with health-conscious consumers due to a perception of limited innovation in its core processed staples.”“Evokes a strong sense of nostalgic comfort and reliable quality. Working to engage younger audiences by promoting modern, sustainable practices and creative usage of its familiar pantry staples.”

    Table 5: Sample brand perception reports, for General Mills and Smuckers, from the Brand Perception Atlas dataset

    In this digital layer, General Mills‘ recipe-driven content and Smucker’s‘ heritage-focused storytelling occupy the same neighborhood. They are no longer distinguished by their different equity profiles; instead, they are unified as Nostalgic American Pantry Brands that use family-friendly content and tradition to build emotional resonance with their audiences.

    Conclusion

    Understanding brand perception has always been a central challenge in marketing and brand strategy. In the digital era, the challenge is no longer a lack of information, but an excess of it. Organizations now face a paradox: more data than ever, yet less clarity about what it actually means.

    In a world where social media amplifies attention without always reshaping perception, the real strategic advantage lies in understanding the gap between visibility and meaning. The Brand Perception Atlas makes that gap visible. It shows where brands cluster, where they drift, and where surface-level conversation either reinforces or obscures deeper brand equity. In doing so, it helps brand leaders understand not just what people are saying today, but how those conversations connect to the deeper beliefs that shape brand meaning over time.

    The real challenge is not simply tracking what a brand did yesterday. It is understanding what that brand means, what territory it occupies in people’s minds, and how difficult that territory is to shift.

  • Brand Perception Atlas Pod

    Brand Perception Atlas Pod

    The Brand Perception Atlas – A Technical Deep Dive

    The Brand Perception Atlas, is an interactive decision-support tool that helps brand teams understand, compare, and explain brand perception across platforms. It combines embedding-space visualization (UMAP) with interpretable clusters, and cross-platform consistency scoring. The result is a tool that moves teams from “interesting maps” to recommendations that are better documented and easier to explain, because they link back to specific underlying perception signals and clearly show where sources agree or disagree. These can be used in brand reviews, competitor analysis, and campaign planning. Read the Executive Summary for more information and results.

    Goal: a newcomer can reproduce the core outcomes in 2–3 days. For full setup and running instructions, refer to the GitHub README — this walkthrough provides the conceptual map and domain knowledge needed to understand what the code does and why.

    <aside> 🧭

    A note on what made this hard. The Brand Perception Atlas looks deceptively simple — embed text, project it, cluster it, display it. In reality, the single hardest problem was getting five fundamentally different perception sources to coexist in a shared space where distances actually mean something. WPP Brand Asset Valuator® (BAV) data comes from structured survey scores transformed through an LLM (Gemini) into prose. Social data comes from raw video transcripts and post captions, also transformed through an LLM (Gemini). Even though both pass through the same embedding model, the linguistic fingerprint of each source dominates the resulting vectors — the map would split cleanly into “survey-sounding text” vs “social-sounding text” rather than grouping brands by actual perception. Solving this required iterating through prompt normalization, aggregation rebalancing, and ultimately Procrustes alignment — a technique borrowed from shape analysis that rotates one embedding subspace onto another using shared anchor brands. Section 6 tells this story in full.

    </aside>

    1. Architecture Overview

    flowchart LR
        A["BAV Parquet"] --> E["1. Preprocess<br>Merges & Maps Names"]
        B["Wikipedia CSV"] --> E
        C["LLM CSV"] --> E
        D["Social Media CSVs"] --> E
        E --> F["2. Ingest<br>Vertex Embeddings"]
        F --> G["3. Aggregate<br>Procrustes Alignment"]
        G --> H["3b. UMAP Projection"]
        H --> I["4. Cluster<br>HDBSCAN & Batched LLM"]
        I --> J["LanceDB Storage"]
    

    Key constraint: All five perception sources must be projected into a shared embedding space so that distances are comparable across sensors. The pipeline uses Procrustes alignment to rotate BAV vectors into the social subspace via overlapping anchor brands, then UMAP projects everything into 2D. The system maps the ideas behind the words, not just the text — “Luxury,” “Premium,” and “Prestigious” land in the same neighbourhood.

    <aside> 💡

    Why this architecture isn’t obvious. A naive approach would be: embed everything → UMAP → cluster → done. The catch is that embedding models encode how something is said as much as what is said. A BAV narrative generated from “Helpful (45.2), Reliable (38.1)…” reads nothing like a TikTok perception report, even when both describe the same brand. Without the Procrustes step between “Ingest” and “UMAP,” the map would split by text style rather than brand perception. The arrows in this diagram look linear, but getting Step 3 (Aggregate) right took more iteration than every other step combined.

    </aside>

    2. Setup & Running

    Full install and deploy instructions are maintained in the README. Below is a summary for orientation.

    Internal (BitBucket — full pipeline with Vertex AI):

    git clone <bitbucket-repo-url>
    cd brand-perception-agent
    uv sync
    
    cd brand_perception/dashboard/atlas_pipeline
    uv run python main.py --step all
    
    uv run pytest
    # ~6 functional / integration tests, <1 minute
    

    Public (GitHub — two modes):

    ModeDescriptionAPI keys required?
    DefaultRun locally with provided toy dataset (20 brands, 681 reports) or own dataset in same CSV formatNo
    AdvancedPlug in own API keys for LLM clustering and embedding modelsYes (GOOGLE_API_KEY, GOOGLE_CLOUD_PROJECT)

    Key dependencies: Python 3.12, lancedb, pandas, numpy, google-cloud-aiplatform, google-genai, tqdm, umap-learn, hdbscan, scikit-learn, streamlit

    <aside> ⚠️

    Gotchas for newcomers:

    • **Vertex AI quotas.** The embed step hits gemini-embedding-001 in batches of 50. On a fresh Google Cloud Project (GCP) you may get rate-limited at ~60 requests/min. The pipeline handles retries, but if you see 429 errors, check your Vertex AI quota dashboard and request an increase before re-running.
    • **UMAP is not deterministic.** Runs with the same data can produce slightly different 2D layouts unless you pin random_state. The pipeline does pin it, but if you fork and forget, your clusters will shift between runs. However, relative distances between points do not change, so interpretation will remain the same.
    • **LanceDB lock files.** If a previous run crashed mid-write, LanceDB may leave a .lock file that blocks the next run. Delete *.lock files in the LanceDB directory if the pipeline hangs on startup.
    • uv sync vs pip install. The project uses uv for dependency management. If you install via pip instead, hdbscan and umap-learn can pull conflicting numpy versions. Stick with uv sync. </aside>

    3. Data Model

    Table / EntityDescription
    brandsMaster list of 200+ brands (internal) or 20 brands (toy dataset) with metadata (industry, country)
    perception_signalsOne row per (brand × sensor) — raw text summaries from each source
    embeddingsSemantic vector per perception signal — gemini-embedding-001, 768 dimensions
    umap_projections2D coordinates per embedding after UMAP reduction
    clustersCluster ID, 3-word label (e.g. “Hope Innovation Compassion”), and member brands
    bav_attributes48 BAV imagery attribute scores per brand × 12 audience segments
    consistency_scoresOmnichannel consistency % per brand (mean distance to centroid across sensors)

    The data model below describes what lives in LanceDB after the pipeline runs. Think of it as the “schema” of the Atlas — each table feeds a different part of the dashboard UI. The key thing to understand: a single brand has multiple rows across these tables (one per sensor, one per audience segment, etc.). The perception map plots one dot per row in umap_projections, not one dot per brand.

    Toy dataset format (CSV, used by public GitHub default mode):

    ColumnDescription
    BrandBrand name (e.g. Aetherium, Zenith Dynamics)
    IndustryIndustry category (Technology, Automotive, Food & Beverage, Retail, Healthcare, Finance, Entertainment)
    PlatformSource sensor: TikTok (Brand Known), TikTok (Brand Unknown), Instagram (Brand Known), Instagram (Brand Unknown), Wikipedia, LLM, Survey
    Survey_AudienceDemographic segment for Survey rows (e.g. Tech Early Adopters, Gen Z – Gamers); N/A for non-survey
    Brand_Perception_ReportFree-text perception summary

    4. Pipeline / Workflow

    The table below is the quick-reference version. Commentary after it explains what’s actually happening at each stage and where things can go wrong.

    PhaseKey numbers
    1. PreprocessInstant runtime, merges into 1 unified DataFrame
    2. Ingest (Embed)gemini-embedding-001, 768 dims, batches of 50 via Vertex APIs
    3. AggregateProcrustes rotation on anchor brands; UMAP n_neighbors=min(15, len-1), min_dist=0.1, <5 sec
    4. ClusterHDBSCAN (min_cluster_size=3/5). Several minutes via Gemini Batch labelling queue
    5. Consistencymax(0.0, min(100.0, 100.0 - (mean_dist_to_centroid * 35.0)))
    6. BAV joinProcrustes alignment via rotation matrix on overlapping anchor brands
    7. Atlas UIStreamlit (instant runtime)

    Phase-by-phase commentary

    Phase 1 — Preprocess. Deceptively simple: merge CSVs, normalize brand names, filter junk. The hidden complexity is name matching. BAV uses official corporate names (“The Procter & Gamble Company”), social media uses colloquial names (“P&G”), and Wikipedia uses yet another variant. preprocess.py maintains a manual alias map for this. If you add a new brand and it doesn’t appear on the map, check the alias map first — it’s almost always a name mismatch.

    Phase 2 — Ingest (Embed). Each Brand_Perception_Report text gets turned into a 768-dimensional vector via gemini-embedding-001. The critical thing to understand: these vectors encode writing style as much as meaning. A BAV report that says “Helpful (45.2), Reliable (38.1)” and a TikTok report that says “this brand gives cozy reliable vibes” will land in different regions of embedding space even though they describe similar perceptions. This is the root cause of the domain shift problem solved in Phase 3.

    Phase 3 — Aggregate (the hard one). This is where most of the iteration happened. Three things occur in sequence:

    1. Social aggregation: Multiple post-level embeddings per (Brand × Platform) are mean-averaged into a single vector, and Gemini generates a summary report. This smoothing pulls social vectors toward a shared centroid.
    2. Procrustes alignment: The BAV vectors are rotated into the social embedding subspace using 202 shared anchor brands (see Section 6 for the full story).
    3. **UMAP projection:** The combined, aligned vectors are reduced to 2D. Only the 'All Adults' BAV slice is fitted alongside social platforms — this prevents the 12 BAV demographic segments from dominating the topology.

    <aside> 🔬

    Why “balanced subset fit” matters. BAV has 12 audience segments per brand. Social has ~1–4 data points per brand. Without balancing, UMAP sees 12× more BAV points and builds its neighborhood graph around BAV structure, marginalizing social data. The fix: fit UMAP on the balanced subset (All Adults + social), then transform the remaining 11 BAV segments passively. This was a non-obvious but critical design choice.

    </aside>

    Phase 4 — Cluster. HDBSCAN groups nearby points into perception themes, then Gemini labels each cluster with exactly 3 words. min_cluster_size is set to 3 (toy dataset) or 5 (full dataset). The batch labelling step submits each cluster’s centroid + top 5 most similar reports to Gemini. This can take several minutes because it goes through the Vertex AI Batch queue — don’t assume it hung.

    Phase 5 — Consistency. A simple but effective metric: for each brand, compute the mean Euclidean distance from each sensor’s point to the brand’s centroid, then invert and scale. Brands where all sensors agree (John Deere, Caterpillar) score 99%+. Brands with platform-dependent perception (Marriott, American Airlines) score much lower. The * 35.0 scaling factor was empirically tuned to spread scores across a useful range.

    Phase 6 — BAV join. Brings in the raw 48-attribute BAV scores per demographic segment. These are the structured numbers (not the Gemini-generated prose) and power the “Survey Audience” filter in the dashboard.

    Phase 7 — Atlas UI. Streamlit renders everything from LanceDB. Instant startup because all computation was done in previous phases.

    5. Atlas Interface (Primary Interactions)

    • Focus Brand selector → perception map + sidebar with cluster label, per-sensor summaries
    • Survey Audience filter (BAV) → 12 demographic segments
    • Number of neighbours slider → controls perceptual neighbors on the map
    • Reference Platform selector → changes cross-modal overlap anchor (Wikipedia, BAV, etc.)
    • Competitor Set toggleShow unexpected neighbours (out-of-industry brands)

    <aside> 🎯

    What to look for when using the Atlas. The most interesting insights come from disagreements between sensors. If a brand’s BAV dot and TikTok dot are far apart, that’s a signal: the structured survey perception (what people say when asked directly) differs from the organic social perception (what people actually talk about). The Brand vs Content Effect tab adds another layer — when you hide the brand name from social content, does the perception shift? If so, the brand’s reputation is doing heavy lifting independent of the product itself.

    </aside>

    6. Domain-Specific Mechanics

    6.1 Why BAV is “ground truth”

    Survey-based, 48 structured imagery attributes, 30+ years of longitudinal data, 12 demographic segments. BAV captures deep-seated beliefs shielded from daily social flux. Social algorithms change daily; survey-based trait grids collected over decades establish structured cognitive associations completely shielded from current hype timelines.

    <aside> 📖

    For non-specialists: WPP Brand Asset Valuator® (BAV) is one of the largest brand research databases in the world, maintained by WPP. It works by asking thousands of consumers to rate brands on 48 specific attributes — things like “Helpful,” “Innovative,” “Trustworthy” — scored on a numeric scale. Because the same questions are asked year after year across demographic segments, BAV gives you a stable, structured snapshot of how people think about a brand when prompted. Social media gives you what people spontaneously say. These are fundamentally different signals, and combining them is the core challenge of this project.

    </aside>

    6.2 The BAV Alignment Problem — A Full Account

    This is the single most important section of this walkthrough. It documents the central technical challenge of the Atlas and the iterative process that solved it. If you only read one section, read this one.

    The problem

    When the Atlas was first built, the UMAP perception map split cleanly down the middle: all BAV dots on the left, all social dots on the right — regardless of whether they described the exact same brand. Nike’s BAV point and Nike’s TikTok point would be in completely different regions of the map. This made the entire visualization useless for cross-platform comparison.

    The separation was not evidence that BAV survey data captures genuinely different brand perceptions from social media. It was a methodological artefact caused by two compounding issues in the pipeline.

    Root cause: different text domains fed to the same embedding model

    All platforms are embedded with the same model (gemini-embedding-001), but the text being embedded is fundamentally different in style, vocabulary, and structure:

    PlatformBrand_Perception_Report contentSource
    BAVLLM-generated narrative from 48 numerical imagery sensors, e.g. “Helpful (45.2), Reliable (38.1)…” → Gemini prompt → prose paragraphanalysis.py, preprocess.py
    **TikTok / Instagram**LLM-generated perception report from watching a single video/postpreprocess.py
    WikipediaLLM-generated perception from Wikipedia article textpreprocess.py
    LLM (Gemini)Direct LLM perception (Gemini asked “what do you think of brand X?”)Same as Wiki source file

    The BAV text originates from a double LLM transformation: raw survey numbers → generate_semantic_statement() (structured string like “Full BAV Imagery Profile (48 Sensors): Helpful (45.2), Reliable (38.1), …”) → Gemini prompt → narrative paragraph. The social text comes from a single LLM step interpreting raw video/post content directly.

    This means the embedding model sees completely different linguistic distributions for BAV vs social. The BAV narratives share a common templated style (always referencing “imagery sensors,” “quantitative data,” survey language) while social narratives use informal, media-oriented language. Embedding models encode how something is said as much as what is said, so this systematic style difference pushes BAV vectors into a distinct cluster regardless of actual brand perception agreement.

    Compounding factor 1: aggregation asymmetry

    In aggregate.py, social data (TikTok, Instagram) has multiple post-level embeddings that are mean-averaged per (Brand, Platform), and a Gemini summary replaces the report text. This smoothing pulls social vectors toward a shared centroid. BAV / Wikipedia / Gemini data (df_research) passes through as-is with post_count = 1 — no averaging occurs.

    The result: social embeddings are inherently more “central” (mean-regression effect), while BAV embeddings retain their full individual variance. When UMAP runs on this combined set, the social vectors cluster tighter and the un-averaged BAV vectors spread out differently.

    Compounding factor 2: UMAP sees the domain gap

    UMAP is run on the entire combined dataset with n_neighbors=15. Because BAV embeddings share a systematic style signature different from social embeddings, UMAP‘s neighborhood graph naturally groups them apart — it finds the text-style cluster, not a genuine perception cluster.

    What we tried (in order)

    <aside> 🔬

    Option A — Normalize the text domain (first attempt). Rewrote the BAV perception report generation prompt in analysis.py to produce output that mimics the style of a social/Wikipedia perception report. Specifically: removed references to “BAV,” “imagery sensors,” “quantitative data” from the prompt. Used the same persona/format instructions as the social platform reports — describing what the brand feels like rather than referencing the data source. The prompt became: “Based on the following consumer perception data, write a concise paragraph describing how this brand is perceived by consumers. Focus on: overall vibe, what people praise, what people criticize, and who the typical customer is. Write as if describing public perception — do not reference the data source or format.”

    Result: Reduced but did not eliminate the BAV/social separation. The templated numerical origin still leaked through in subtle ways.

    </aside>

    <aside> ✅

    Option B — Procrustes alignment (the solution). Use scipy.linalg.orthogonal_procrustes to align the BAV embedding subspace to the social subspace before combining. This preserves within-platform structure while removing the cross-platform domain shift. This is what the pipeline uses today.

    </aside>

    Option C (embed a standardized perception schema across all platforms) remains a potential future improvement but requires significantly more work.

    <aside> ❌

    Option D — Per-platform z-score normalization (ruled out). Apply per-platform z-score normalization to embedding vectors before UMAP, centering each platform’s distribution to zero-mean and unit-variance. This would remove the systematic offset but also mask any genuine platform-level differences — making it a workaround, not a proper fix. Discarded.

    </aside>

    How Procrustes alignment works in the pipeline

    <aside> 📖

    For non-specialists: Procrustes alignment is a mathematical technique from shape analysis, named after a figure in Greek mythology who stretched or cut people to fit his bed. In our context, it “stretches” one cloud of data points to best overlap with another. Critically, it only uses rotation (spinning) and scaling — it doesn’t distort the internal relationships between points within each cloud. So the relative positions of BAV brands among themselves are preserved, but the entire BAV cloud is repositioned to overlap with the social cloud.

    </aside>

    Here’s exactly what the code in aggregate.py does:

    1. Finds the anchors. Identifies every brand that exists in both the BAV data and the social data (e.g., Nike exists in both). In the production pipeline log, this found 202 anchor brands.
    2. Computes the transformation. For each anchor brand, computes the mean social vector across all its social platforms. Centers both the BAV anchor vectors and the social anchor vectors. Then uses scipy.linalg.orthogonal_procrustes to find the optimal rotation matrix R that maps BAV → social.
    3. Applies the rotation. Multiplies all BAV vectors (even brands that didn’t have social data) by this rotation matrix, moving the entire BAV dataset into the social media spatial domain.
    4. Logs quality. Reports the Frobenius residual so alignment quality is traceable.
    5. Safeguard. If fewer than 10 shared brands exist, alignment is skipped with a warning — Procrustes is unreliable with too few anchors.

    How this changes interpretation of the dashboard

    This alignment profoundly upgrades what you can conclude from the perception map:

    • True cross-platform comparisons. If a BAV dot and a TikTok dot for “Adidas” sit right next to each other, it now genuinely implies that the core sentiment in the structured survey data closely matches the organic social conversations. Before Procrustes, proximity between BAV and social points was meaningless.
    • Distances have semantic meaning. By forcibly removing the structural domain shift, any remaining distance between two points is entirely due to a difference in meaning and perception. If the BAV point for a brand is far from its Instagram point, you can confidently analyze that gap as a genuine difference in audience perception or marketing strategy — not just an artefact of different text formatting.
    • Unified clustering. When HDBSCAN runs over this aligned space, it can finally cluster BAV reports together with social reports. The LLM-generated theme labels now encompass insights drawn from both quantitative surveys and viral videos simultaneously.

    6.3 UMAP parameter sensitivity

    min_dist=0.1, n_components=2, metric='cosine'. Balanced subset fit: only 'All Adults' BAV slice is fitted alongside social platforms, preventing 12 BAV audiences from overpowering topology. UMAP was chosen over t-SNE because it enables saving the reducer object — the pipeline strictly fits on one balanced subset and passively transforms newly injected demographics.

    <aside> ⚠️

    Watch out: Changing min_dist has outsized effects on the map. Lower values (e.g., 0.01) create tighter, more separated clusters — visually dramatic but can split genuinely related brands. Higher values (e.g., 0.5) spread everything into a uniform blob. The current 0.1 was chosen as a balance after visual inspection across multiple brand sets. If you change it, re-check whether brands with known perceptual similarity (e.g., Coca-Cola and Pepsi) still land in the same neighborhood.

    </aside>

    6.4 Cluster labelling

    Automated via LLM. Vertex AI Batch submits centroid + top 5 reports (by cosine similarity) to Geminiexactly 3 words. The 3-word constraint forces abstraction — “Hope Innovation Compassion” rather than a paragraph. If labels feel wrong, the issue is almost always that the cluster itself is incoherent (check HDBSCAN‘s min_cluster_size), not that the LLM mislabelled it.

    6.5 Omnichannel consistency

    100.0 - (mean Euclidean distance * 35.0), clamped 0–100%. Tight overlaps hit 99%+ (John Deere, Caterpillar), dispersed shifts drop fast (Marriott, American Airlines). The * 35.0 multiplier is an empirically tuned scaling factor — if you add new sensors or change the embedding model, you may need to recalibrate it so scores distribute meaningfully across 0–100%.

    6.6 Content vs Brand Effect methodology

    Tracks shift_2d (Euclidean magnitude), cos_shift (cosine diff between brand known/brand unknown), and BAV baseline deltas (bav_delta_known vs bav_delta_unk). Parses exact LLM cluster words added/lost due to brand awareness.

    <aside> 💡

    Why this matters. Social media perception reports are generated from video/post content. When the brand name is visible, the LLM’s perception is colored by everything it “knows” about that brand. When the brand name is hidden, the LLM can only react to what it actually sees in the content. The delta between these two tells you how much of a brand’s social perception is driven by brand reputation vs actual content quality. Large shifts indicate the brand name is doing heavy lifting.

    </aside>

    7. Module Map

    The codebase is intentionally small. Every module does one thing. If you’re debugging, start by identifying which phase failed (check the CLI output), then go straight to the corresponding file.

    brand_perception/dashboard/atlas_pipeline/
    ├── main.py              (43 lines)    CLI entry point
    ├── dashboard_v1.py      (~1133 lines) Core Streamlit frontend
    └── src/pipeline/
        ├── preprocess.py    (130 lines)   Sanitizes, normalizes, filters into LanceDB schemas
        ├── ingest.py        (126 lines)   GenAI models → 768-D embeddings
        ├── aggregate.py     (303 lines)   Procrustes, LLM reports, UMAP layouts
        └── cluster.py       (244 lines)   HDBSCAN groups + batch cluster labels
    

    Total: ~1,979 lines across 6 modules.

    <aside> 📖

    Where the complexity lives. Don’t let the line counts fool you. aggregate.py at 303 lines is where 80% of the intellectual difficulty sits — it handles social aggregation, Procrustes alignment, and UMAP projection. dashboard_v1.py at ~1,133 lines is the largest file but is mostly Streamlit layout code. If you’re onboarding, read aggregate.py first; it’s where the science meets the engineering.

    </aside>

    8. Test Coverage

    ~6 functional / integration tests. Runtime <1 minute.

    Test filePurposeCount
    scripts/test_bav_pipeline.pyBAV baseline ingestion flow1
    dev/test_apify*Social Scraper (TikTok/Instagram)2
    brand_perception/api/test_agent.pyJob Orchestrator backend queues2
    research/test_scrape_jh.pyManual methodology mockups1

    9. Design Decisions

    These aren’t just “decisions” — they’re the answers to questions that came up during development where the wrong choice would have broken the system or made it useless. Each one has a story.

    IDDecisionRationaleWhat happens if you reverse it
    DD-BPA-1UMAP over t-SNEBetter distance proportionality; enables saving reducer to fit on balanced subset and transform new demographicst-SNE can’t transform new points — you’d have to re-run the entire projection every time a new BAV demographic segment is added, and distances between clusters become meaningless
    DD-BPA-2BAV as ground-truth anchorSurvey-based trait grids over decades, shielded from daily social hypeUsing social as ground truth would anchor the map to volatile, algorithm-dependent signals — the map would shift with every TikTok trend cycle
    DD-BPA-3Semantic embeddings over keywordsCaptures meaning (“Luxury” ≈ “Premium” ≈ “Prestigious”)Keyword-based approaches treat “Luxury” and “Premium” as unrelated tokens — brands described with different vocabulary but identical perception would never cluster together
    DD-BPA-4Procrustes alignmentSolves text heterogeneity (surveys vs social) via rotation on anchor brands. Prompt normalization alone (Option A) reduced but did not eliminate the domain gapWithout it, the map splits by text style (BAV left, social right) rather than by brand perception — see Section 6.2 for the full account
    DD-BPA-5Two-mode public release (default no-keys + advanced)Lowers barrier to entry; toy dataset enables immediate exploration without infrastructureRequiring API keys upfront would prevent most people from ever trying the tool — the toy dataset lets someone see the full Atlas UI in under 5 minutes

    10. Extending the System

    The pipeline was designed to be extended — each of these is a realistic next step, not a hypothetical. They’re listed roughly in order of effort.

    1. Run with custom data — Format your own dataset as CSV matching the toy dataset schema (Brand, Industry, Platform, Survey_Audience, Brand_Perception_Report), drop into data directory, run default mode. This is the zero-effort way to test the Atlas on a new domain.
    2. Add a sensor — Collect as CSV/Parquet, import in preprocess.py under FINAL_COLUMN_ORDER (Super_Platform, Year, Brand, Raw_Text), run main.py. The Procrustes alignment will automatically include the new sensor in its anchor calculation if the new sensor shares brands with existing sources.
    3. Add a market — Reroll BAV datasets in ./paths, override GCS env vars, rerun batches. Note that Procrustes alignment quality depends on having enough shared anchor brands between BAV and social data — if you enter a market where BAV coverage is thin, check the Frobenius residual in the logs.
    4. Temporal trackingLanceDB already stores Year and BAV_Study; add slide toggle in dashboard_v1.py. This would let you see how a brand’s perception drifts over time across sensors — one of the most requested features.

    <aside> ⚠️

    If you add a new sensor: remember that the Procrustes alignment currently rotates BAV into the social subspace specifically. If your new sensor has a similarly distinct text style (e.g., Reddit comments vs TikTok captions), you may see a new domain gap. In that case, consider extending the alignment step to handle multiple source-target pairs, or grouping sensors into “formal” and “informal” categories for alignment.

    </aside>

    11. Results (End-to-End Validation)

    • Internal coverage: 200+ brands, 4,000+ data points across 5 modalities, 12 demographic segments
    • Public toy dataset: 20 brands, 681 perception reports across 7 industries, all 5 sensor types + brand known/brand unknown variants
    • Cross-industry insight validation:
      • Omnichannel consistency: John Deere, Caterpillar, Oscar Health at 99%+; Marriott, Southern Living, American Airlines identified as multi-faceted
      • Shared equity, different vibe: 3M ↔ Marriott (close on BAV, far on socials)
      • Different equity, shared vibe: General Mills ↔ Smuckers (far on BAV, converged on socials)
    • Validation metrics: Procrustes Residuals for subspace overlap + brand known/brand unknown cosine similarity differentials

    <aside> 💡

    How to read these results. The “shared equity, different vibe” and “different equity, shared vibe” patterns are the most commercially interesting findings. They reveal cases where a brand’s formal positioning (BAV) disagrees with its organic social presence — exactly the kind of insight that’s invisible to either data source alone. The Atlas’s value proposition is making these cross-modal disagreements visible and quantifiable.

    </aside>


    ✅ Quality Checklist

    GateCriterionStatus
    Pre-executionRequired inputs present and substantive
    GitHub required?✅ (has_github = true)
    Post-executionPart 1 is non-technical and self-contained
    Part 1 is 1–1.5 pages with before/after table
    Part 2 is a tutorial with concrete proof points✅ (768-D, 6 tests, line counts, UMAP params, consistency formula)
    No repetition between parts
    2–3 day reproduction test✅ (~1,979 lines, clear CLI, extension recipes, toy dataset for immediate start)
    GitHub is a funnel, not a mirror✅ Part 2 references README for install/deploy; walkthrough provides conceptual map

    🔗 GitHub README Completeness Checklist

    RequirementStatus
    Description matches Part 1 summary[TODO: verify once README is written]
    Install and deploy instructions tested — both modes (default no-keys + advanced with keys)[TODO: test both paths]
    At least one usage example (running with toy dataset)[TODO: add example]
    Toy dataset format documented (Brand, Industry, Platform, Survey_Audience, Brand_Perception_Report)[TODO: add schema table]
    Instructions for using own dataset in same format[TODO: add section]
    Public API or entry point docs (if applicable)[TODO: if applicable]
    License specified[TODO: choose and add license]

    🔗 Repositories

    RepoAccessStatus
    BitBucket (internal): bitbucket.org/satalia/brand_perception_agentInternal (Satalia)Active
    GitHub (public): [TODO: URL]PublicIn development
  • The Brand Perception Atlas – Mapping the Modern Brand, From Social Signal to Core Equity

    Introduction

    In the digital age, a brand is no longer defined solely by what a company communicates about itself. Instead, it is shaped by what millions of people collectively believe it to be. These beliefs are expressed continuously through social media posts, product reviews, news coverage, and countless other digital interactions.

    For organizations attempting to understand brand perception, this environment presents a fundamental challenge. The volume of available data is unprecedented, yet the signals it produces are often inconsistent and contradictory. Traditional research tools such as surveys and focus groups remain essential for measuring brand equity, but they capture perception only at specific points in time and cannot fully reflect the fast-moving nature of digital conversation.

    At the same time, social media offers a continuous stream of public commentary, revealing how brands are discussed, interpreted, and compared in everyday discourse. However, these signals are noisy and difficult to interpret in isolation.

    This tension motivated the development of a Brand Perception Atlas, designed to integrate diverse perception signals into a unified analytical framework. By combining social media data, public knowledge sources, Large Language Model (LLM) summaries, and established brand equity research, the atlas aims to provide a more comprehensive understanding of how brands are perceived across the digital ecosystem.

    The first iteration of this project analyzed perception signals for more than 200 brands and over 4,000 individual data points, enabling the construction of a visual representation of brand perception that reveals relationships, consistencies, and divergences across multiple sources.

    The Brand Perception Atlas

    The Brand Perception Atlas functions as a navigational system for brand strategy, like how a satellite navigation map helps a pilot understand the terrain below. A pilot does not see individual roads or buildings from the cockpit. Instead, they rely on a map that reveals patterns, landmarks, and relationships across the entire landscape.

    Brand leaders face a similar challenge. Individual data points, social posts, reviews, survey responses, are like isolated coordinates. On their own, they reveal very little. But when thousands of signals are plotted together, patterns begin to emerge.

    The Brand Perception Atlas converts these signals into a shared perceptual map, allowing leaders to see how brands cluster, where they compete, and which meanings they occupy in the public imagination.

    To achieve this, the atlas synthesizes perception signals from several sources:

    • Content from official brand accounts on platforms such as TikTok and Instagram
    • Public narratives reflected in sources like Wikipedia
    • AI-generated summaries, using Gemini, describing how brands are perceived in digital discourse
    • To anchor the analysis in long-term brand perception, the system incorporates survey-based brand equity data from the WPP Brand Asset Valuator® (BAV).

    Figure 1 illustrates the Brand Perception Atlas and the different data sources used. By integrating these sources, the atlas creates a more holistic representation of brand perception than any individual data source can provide alone.

    Figure 1: Data sources flowing into the Brand Perception Atlas

    The actual data from our sensors for two of the world’s most recognizable brands, Target and Amazon, are outlined in Table 1. By presenting the actual reporting generated by each source, we can see how the Brand Perception Atlas bridges the gap between decades of structured consumer research and the lived experience of these brands in digital spaces.

    Sample Brand Perception Reports
    SourceTargetAmazon
    BAV Survey
    (Long-term Equity)“Consumers perceive this brand as a highly accessible and dependable choice, offering excellent value for money. It consistently earns praise for its reliable, high-quality, and original offerings… Seen as a dynamic leader that stays up-to-date and trendy.”“Broadly perceived as an innovative and reliable leader, offering high quality and exceptional value. Consumers praise its distinctiveness and original thinking… It resonates with intelligent, forward-thinking individuals.”
    Wikipedia
    (public narrative)“A major American retail chain… positioned as offering upscale products at below-average costs, appealing to a younger, more educated, and higher-income demographic… noted for its significant market presence.”“Perceived as a global technology behemoth recognized for innovation and market leadership… However, its perception is tempered by persistent discussions regarding labor practices and its influence on traditional retail.”
    Gemini LLM
    (digital discourse)“Target cultivates an ‘upscale-for-less’ reputation… It’s perceived as a clean, organized, and pleasant one-stop-shop that blends everyday necessities with trendy, affordable finds.”“A study in contrasts, embodying both indispensable convenience and controversial corporate power. The vibe is one of a modern utility, deeply integrated into daily life, often viewed as a ‘necessary evil’.”
    Instagram
    (brand official account)“An accessible and enjoyable retail destination… The playful and organized presentation of shopping reinforces a positive, discovery-driven customer experience, emphasizing Target’s ability to host unique product lines.”“Viewed as an essential, incredibly convenient, and reliable platform offering immense selection with unparalleled speed. Users celebrate its efficiency while simultaneously expressing a desire for greater focus on ethical concerns.”
    TikTok
    (brand official account)“Widely seen as a trendy, accessible retailer offering stylish, curated products and collaborations that enable a joyful lifestyle… captures a highly visual and energetic side of the brand.”“Projects a vibe of convenience, accessibility, and an expanding reach into curated wellness and lifestyle products… making niche, high-quality products feel easily attainable.”

    Table 1: Sample brand perception reports, for Target and Amazon, from the Brand Perception Atlas dataset

    Developed by WPP, BAV is one of the largest and longest-running brand equity research programs in the world, with data collected over more than three decades across thousands of brands and markets. Unlike social media and digital sources, which provide indirect signals by analyzing what people post or what algorithms curate, BAV is survey-based and taps directly into human sentiment. By asking consumers directly, it captures deep-seated beliefs through structured imagery attributes such as innovative, reliable, and trustworthy.

    Within the Brand Perception Atlas, BAV serves as the foundational benchmark. It represents the long-term equity and established identity of the brand, providing a stable baseline against which the more volatile, “everyday” signals of social media can be compared. This distinction is crucial: while social media shows what people are talking about today, BAV reveals what they fundamentally believe about the brand.

    To provide a granular view of brand personality, the BAV system evaluates each brand against a standardized set of 48 Imagery Attributes. These attributes are the “building blocks” of perception, ranging from functional traits like Reliable, High Quality, and Simple to emotional and aspirational markers such as Charming, Daring, and Sensuous. By measuring a brand against this consistent list, the system can pinpoint exactly where its personality “peaks”, whether it is anchored in the “sturdiness” of a traditional leader or the “energy” of a modern disruptor.

    In the Brand Perception Atlas, these 48 attributes act as a high-resolution lens. Rather than simply stating a brand is “liked,” the data reveals the specific nuances of why it is liked. For example, a brand may score highly on Intelligent and Visionary while scoring low on Friendly or Down to Earth. This multidimensional data allows the Atlas to plot brands with extreme precision, identifying “perceptual neighbors” who share the same underlying DNA across these 48 distinct dimensions of human sentiment.

    Mapping Brand Perception

    At the center of the Brand Perception Atlas lies the Perception Map, a visual representation of how brands relate to one another in terms of public perception.

    Each perception signal is converted into a numerical representation of semantic meaning, known as an embedding. It is helpful to think of semantic meaning as the “vibe” or “essence” of a word, rather than just the word itself. In a traditional search, if you looked for “Luxury,” a computer might ignore a result for “High-end” because the letters don’t match. However, the Brand Perception Atlas uses semantic embeddings to understand that “Luxury,” “Premium,” and “Prestigious” all mean roughly the same thing. Because their meanings are similar, the system places brands described with these words in the same neighborhood on the map. It maps the ideas behind the words, not just the text.

    This semantic translation approach is shared across the Satalia AI Research Lab. The Ad Performance Pod applies a complementary technique, using multimodal LLMs to convert raw ad creatives, images, video, and metadata, into structured text descriptions, and then predicting post-level engagement from those descriptions. Where the Atlas asks “what does this brand mean?”, the Ad Performance Pod asks “what makes this content perform?” — both relying on the principle that rich semantic embeddings can unlock patterns invisible to traditional feature engineering.

    These embeddings are then plotted within a shared space. In this space, the distance between points reflects the similarity of the associations surrounding different brands. Brands with similar perceptions appear close together, while those associated with different ideas occupy distinct regions.

    The resulting map resembles a landscape of brand meaning, allowing brand leaders to observe clusters of brands that share common associations and identify outliers that occupy unique perceptual positions.

    Figure 2: Perception map for Nike across all platforms (US, 2025-26)

    When a brand is selected within the Perception Map interface, as shown in Figure 2, the system reveals a rich, multi-layered profile that visualizes how that brand lives in the public imagination. The interface acts as a command center, surfacing the distinct signals captured by each sensor to provide a 360-degree view of the brand’s identity.

    Once a brand like Nike is selected, the map highlights its specific coordinates across all platforms, while the sidebar populates with synthesized insights including the brand’s industry, country, and a “Cluster Label”, for example “Hope Innovation Compassion”, which summarizes the overarching sentiment. Users can navigate the Perception Signal Summaries to see the unique reports for each platform, from the foundational BAV sentiment to real-time narratives on TikTok and Instagram, allowing for an immediate comparison of how the brand “sounds” in social discourse versus how it is “valued” in long-term surveys.

    Insights from the Unified Perception Map

    Once the perception spaces were aligned, several patterns became visible:

    1. Omnichannel Consistency (Unified Brands): These brands have the most “clustered” perception signals, meaning they look the same regardless of which sensor you use vs Heterogeneous Perception (Diverse Brands): These brands are “dispersed”, they present different facets of their identity depending on where you look.
    2. Shared Equity, Different Vibe (Close on BAV, Far on Socials): These brands share deep-seated consumer beliefs but look different on TikTok/Instagram.
    3. Different Equity, Shared Vibe (Far on BAV, Close on Socials): These brands come from different worlds but converge into a single “vibe” in the public imagination.
    4. Invisible Infrastructure Brands: These foundational entities occupy a shared emotional territory of essentiality and high-stakes reliability, despite having radically different levels of public visibility.

    These insights demonstrate the value of examining brand perception across multiple data sources rather than relying on any single indicator.

    Omnichannel Consistency (Unified Brands) vs Heterogeneous Perception (Diverse Brands)

    One of the most useful insights derived from the perception map is a metric we refer to as Omnichannel Consistency. This measure evaluates how closely aligned a brand’s perception is across different information sources. If the signals derived from social media, surveys, and public narratives cluster tightly together, the brand is communicating a coherent identity. Conversely, if these signals are widely dispersed, the brand’s presence is more diverse.

    Analysis of the dataset revealed several brands with particularly strong consistency across channels, including John Deere,Caterpillar, and Oscar Health, with over 99% omnichannel consistency.

    John Deere serves as the definitive benchmark for Omnichannel Consistency, maintaining one of the most stable identities in the Brand Perception Atlas. While many brands act like chameleons, shifting their personality to mirror the fleeting trends of each platform, John Deere operates as a perceptual anchor, where its core essence of “rugged reliability” remains unchanged whether viewed through a 30-year longitudinal BAV survey or a viral TikTok demonstration. Figure 3 demonstrates this pattern, showing all of the John Deere data points close together on the map.

    Figure 3: Perception map showing high omnichannel consistency for John Deere on all platforms (US, 2025-26)

    This “Nothing Moves Like a Deere” factor is so powerful that even complex public debates, such as the “Right to Repair” controversy, appear with remarkable alignment across all five sensors, reinforcing the brand’s status as an essential, high-stakes infrastructure provider for its community. For leadership, this proves that a sufficiently clear perceptual identity pulls every digital channel toward a single, unified truth.

    As shown in Figure 4, the brand’s actual digital content reinforces this stability by bridging the gap between historical heritage and modern utility. The Brand Perception Report for this specific Instagram post highlights a reputation for “practical innovation,” where powerful, reliable machinery transforms arduous tasks like snow removal into efficient, even enjoyable endeavors. The data reveals a narrative centered on problem-solving and the ultimate reward of reclaimed time, positioning the brand as a premium, aspirational solution. While the content emphasizes the ease of machine efficiency over manual labor, it also subtly reflects the “significant investment” required for such equipment, a sentiment that aligns perfectly with the “rugged yet refined” and “worth more” attributes captured by the BAV survey sensors.

    Figure 4: Screenshot of a John Deere Instagram video showing snow removal equipment

    In contrast, several brands demonstrated much greater variation in perception across sources. These included Marriott, Southern Living, and American Airlines, whose signals appeared more widely distributed across the perception map.

    Marriott provides a contrasting example of a multi-faceted brand that successfully occupies distinct perceptual regions depending on the sensor used. The BAV sensor positions the brand within a territory of reliability, superior quality, and prestigious appeal, where consumers fundamentally view Marriott as an intelligent, high-performing, and traditional leader. However, the social sensors (Instagram and TikTok) pull the brand into a separate region defined by aspirational luxury and curated escapism. In these digital spaces, Marriott’s intended portrayal is human-centric and inclusive, with official reports describing an organization that “champions individuals” and demonstrates “deep cultural understanding.” This is brought to life through actual content featuring stories of people, such as the one shown in Figure 5, an Instagram post highlighting Chef Kashif Khan using sign language to communicate at the St. Regis Mumbai. This mapping is diagnostic for leadership, revealing that Marriott is not a monolith; it is a reliable corporate giant that strategically uses social media to humanize its identity through stories of diversity, sustainability, and global connection.

    Figure 5: Screenshot of the Instagram video of Chef Kashif Khan using sign language

    Interestingly, multiple travel-related brands appeared in this lower-consistency group. The high volume of online discussion surrounding travel experiences, ranging from positive stories to customer complaints, may contribute to a more fragmented perception environment for brands in this sector.

    Perhaps the most critical lesson our dashboard taught us is that online buzz doesn’t automatically change what people fundamentally believe about a brand. Social media is a powerful megaphone, but it’s not always a mind-changer. It is easy to be distracted by conversation volume and the sheer amount of noise, visibility, and viral activity a brand generates on platforms like TikTok or Instagram. However, the Atlas looks past this “digital footprint” to uncover a brand’s equity: the deep-seated associations that define how a brand is understood by the public. When we look at these deeper signals, brands that seem worlds apart on social media often turn out to be close neighbors in the mind of the consumer.

    The perfect case study for this is Palo Alto Networks, a titan in the cybersecurity industry.

    If you were to judge them solely by their Instagram presence, you’d see a flawless picture of a market champion. The conversation is filled with words like “Reliable, Innovative, Leader.” It’s a polished, professional, and highly positive image, the brand broadcasting its strengths and amplifying its message of leadership.

    But our BAV data, the “ground truth” of long-term brand equity, built from years of customer experiences and market reputation, tells a more complicated story. The BAV data agrees on their high status, describing them as “Intelligent” and “Prestigious.” However, it adds a critical, and very human, third dimension: “Arrogant.” This single word is the key. The perception of “arrogance” isn’t something a few Instagram posts can erase. It’s a feeling built from countless interactions: sales calls, pricing negotiations, customer support tickets, and industry chatter. It’s a core part of their brand’s DNA, for better or worse.

    What the map shows is that the high volume of positive conversation on Instagram is successfully amplifying the “Leader” and “Innovative” message. However, it hasn’t fundamentally altered the deep-seated perception of arrogance. The social media buzz is happening on the surface, while the core brand equity, the good and the bad, remains firmly in place. It’s a powerful reminder that you can’t just tweet your way to a new reputation.

    In other words, social media can amplify a reputation, but it rarely rewrites it.

    For leadership, this mapping is diagnostic, offering a clear view of whether this dual identity represents a successful multi-segment strategy or an unintended drift from the brand’s core promise.

    It is important to note that omnichannel consistency is not inherently positive or negative. Some brands benefit from a tightly aligned identity across all platforms, while others thrive precisely because they present different facets of themselves in different contexts. For example, entertainment, fashion, and travel brands often generate highly diverse narratives online, reflecting the variety of experiences consumers have with them. In these cases, perceptual fragmentation may actually be a sign of adaptability and cultural relevance.

    The value of the metric is therefore diagnostic rather than prescriptive. It reveals where a brand sits on the spectrum between a unified or multi-faceted perception ecosystem, allowing leaders to determine whether that specific pattern aligns with their strategic intent. For some, a tightly clustered, unified identity is the goal; for others, a more distributed, multi-faceted presence across different platforms is a deliberate strategy to reach diverse audiences.

    Shared Equity, Different Vibe (Close on BAV, Far on Socials)

    The Atlas also reveals unexpected relationships between brands that operate in entirely different industries. The most striking examples of the “Perception Paradox” occur when two brands share an identical equity backbone but present entirely different faces to the digital world.

    Consider 3M, (originally the Minnesota Mining and Manufacturing Company) a global diversified technology conglomerate that applies material science to develop over 60,000 products for industrial, worker safety, healthcare, and consumer markets. At its core, 3M functions as a science-driven “innovation engine.” The company is famous for its “15% Rule,” which allows employees to spend a portion of their time on passion projects, a culture that famously led to the invention of the Post-it Note.

    Then consider Marriott, a premier global hospitality company that operates, franchises, and licenses a vast and diverse portfolio of hotels and lodging facilities.

    On social media, these brands live in completely different worlds. 3M’s presence is defined by science and industrial innovation, while Marriott trends through a lens of “aspirational luxury” and global travel. If you judge only by social footprint, you would assume these brands occupy different cultural spaces; indeed, on social sensors, they move to opposite ends of the map, as 3M focuses on the mechanics of research and Marriott on the spirit of leisure.

    Figure 6 illustrates this social-layer distinction for 3M. This video is just one example of the content driving an overall Instagram vibe characterized as Innovative, Responsible, and Sustainable. The brand perception report for this post captures an aspirational narrative of “innovation, education, and community.” By highlighting initiatives like the “3M Young Scientist Challenge,” the brand is perceived as a champion of scientific exploration, fostering a “fun” and “inspiring” platform for young people to solve real-world problems. While this specific content positions 3M as a driver of STEM education, the Atlas reminds us that beneath this vibrant digital layer lies the same sturdy, reliable identity that makes it a perceptual neighbor to a hospitality giant like Marriott.

    However, when we look at the BAV sensor, they appear as surprisingly close neighbors. The reason becomes clear when you look at the deeper associations: both brands are anchored by Reliability and Leadership. In the foundational mind of the consumer, 3M is an “innovation backbone,” while Marriott is a “service backbone.” While their social streams look nothing alike, their underlying brand equity is rooted in being an essential, dependable institution.

    Figure 6: Screenshot of 3M Instagram video highlighting the Young Scientist Challenge

    Now, consider Costco, a warehouse retailer known for bulk purchasing, and JetBlue, a major passenger airline. At first glance, these brands share little in common; however, when we look at the BAV sensor, they emerge as close neighbors within a shared “Consumer Champion” territory.

    On BAV imagery attributes, both brands are anchored by a core identity of being Friendly and Reliable. Costco is fundamentally perceived through a lens of Simplicity, while JetBlue is defined by its Value. In the foundational mind of the consumer, both brands occupy a similar emotional space: they are the “trusted alternatives” that provide high-quality essential services without the exploitative friction often associated with their respective industries.

    As shown in Figure 8, the interface allows us to see how these signals diverge as we move into the “Experience Layer” of social media. On Instagram, Costco remains tightly aligned with its BAV profile, with its digital presence reinforcing a narrative of affordability and value for money. JetBlue, however, shifts into a more aspirational territory on social platforms, emphasizing convenience and the lifestyle of travel.

    Figure 7: Screenshot of the Brand Perception Atlas showing Costco and JetBlue, with Costco selected as the focus brand

    This distinction is diagnostic for brand leaders. It shows that while two brands may share an identical “equity backbone” as consumer champions, their social expressions can vary significantly based on platform context, Costco doubling down on its “no-frills” value while JetBlue leans into the aspirational allure of the journey.

    The result is that two brands from entirely different industries converge within the same emotional territory: brands that make consumers feel smart, respected, and well served.

    Different Equity, Shared Vibe (Far on BAV, Close on Socials)

    The Brand Perception Atlas also reveals an inverse pattern: brands that come from entirely different industries and share almost no long-term equity, yet converge into a single, unified “vibe” on social media. In these cases, the social layer acts as a cultural blender, pulling disparate corporate giants into the same neighborhood.

    Fox is a major American media conglomerate primarily known for its television broadcasting, news, and sports programming. It operates the Fox Broadcasting Company, Fox News, and Fox Sports. Its identity is built on high-stakes live events, character-driven drama, and real-time news reporting—a business that naturally scores high on “Dynamic” and “Charming” attributes but often generates polarizing sentiment.

    Kimberly-Clark is a global giant in personal care and hygiene products. It manufactures essential household brands such as Huggies diapers, Kleenex tissues, Cottonelle toilet paper, and Kotex feminine hygiene products. Its business is rooted in high-volume manufacturing and daily utility, leading to a long-term reputation for “Reliability” and “Practicality” that is very different from the world of mass media.

    When we look at the BAV sensor, Fox and Kimberly-Clark are worlds apart, with a distance. Fox is perceived through a lens of high-stakes entertainment and sophistication, scoring highly on attributes like Charming, Sensuous, and Dynamic. Kimberly-Clark, meanwhile, is anchored in the “Practical Utility” of household essentials, defined by Reliability, Quality, and Trustworthiness.

    Figure 8: Screenshot of the Brand Perception Atlas showing Fox and Kimberly-Clark, with Fox selected as the focus brand

    To the average consumer, a polarizing media giant and a dependable tissue manufacturer should share no common ground. However, as shown in Figure 10, the Social Sensors reveal a convergence. On Instagram and TikTok, the distinction between a news broadcast and a household utility fades as both brands pivot toward lifestyle-centric storytelling.

    According to the social reports, Fox is perceived through a lens of “intense, character-driven drama” and high-stakes emotional depth, appealing to those who value mature storytelling. Simultaneously, Kimberly-Clark’s social presence is described as “authentic and engaging,” characterized by a “vibrant” and “charismatic” brand presence that uses humor and social commentary to create a “deeper connection beyond entertainment.” In this digital layer, Fox’s dramatic clips and Kimberly-Clark’s community-focused narratives occupy the same neighborhood. They are no longer judged as “The News” vs. “The Diaper Company”; instead, they are unified as Established Lifestyle Entities that use high-quality content to build emotional resonance with their audiences.

    Invisible Infrastructure Brands

    The Atlas also reveals a unique category of brands that operate as “Invisible Infrastructure.” These are entities that sit at the center of the perception map not because they are famous, but because they are foundational. One striking example is the proximity of Pfizer and Kyndryl.

    Kyndryl is one of the world’s largest IT infrastructure services companies, spun off from IBM in 2021. It manages the mission-critical technology backbone for hospitals, banks, airlines, and governments around the world. Most consumers have never heard of its name. Its revenue is enormous; its consumer presence is almost zero. And yet, in our perception map, it sits directly next to Pfizer.

    Figure 9: Screenshot of the Brand Perception Atlas showing Kyndryl and Pfizer, with Kyndryl selected as the focus brand

    On the BAV survey, both brands score highly on imagery such as intelligent, reliable, progressive, and original, placing them far from lifestyle or aspirational brands. The depth of the BAV dataset further reveals that this “Expertise” profile is universally recognized across demographics. For example, both Male and Female audiences consistently identify these brands as Visionary and High-Performance leaders, regardless of their personal familiarity with the company’s day-to-day operations. Despite radically different visibility levels, they share the same underlying promise: critical systems do not fail because of us. Kyndryl has no social media presence. Post-Covid, Pfizer is perceived as controversial on Wikipedia and Gemini.

    Kyndryl, a brand name most consumers don’t recognise, sits directly next to one of the world’s most recognised pharmaceutical giants in perception space. Not because they sell similar products, but because they occupy identical emotional territory: the invisible infrastructure of modern life that you only notice when it fails. The data from our sensors provides strong evidence for this “Invisible Infrastructure” connection between Pfizer and Kyndryl. Despite their radically different levels of public fame, they share a distinct perceptual fingerprint. When we look at the BAV data, the reason for this proximity becomes clear. Both brands are anchored by attributes associated with competence and essentiality rather than lifestyle appeal.

    The Gemini sensor perceives Kyndryl as a “Powerful and Reliable” entity that provides the backbone for critical systems. The report emphasizes its “indispensable” nature to global commerce. Pfizer, while highly recognized, is described as a “polarization” where the brand is viewed as a “scientific savior” clashing with a “profit-driven corporate giant.” The Wiki and Gemini and reports specifically mention it is a “Big Pharma” entity whose “trust is a highly contested issue.”

    Even though one is a household name and the other is a ghost to the average consumer, the Atlas recognizes them as the same thing: the high-stakes machinery that keeps the modern world running. One is noticed because of its global news cycle; the other is only noticed when the “mission-critical backbone” fails.

    Conclusion

    Understanding brand perception has always been one of the central challenges of marketing and brand strategy. In the digital era, the difficulty lies not in the absence of information but in its overwhelming abundance. Organizations now face a paradox: more data than ever before, yet less clarity about what it means.

    The Brand Perception Atlas represents an effort to bring structure to this complexity. By integrating diverse perception signals, ranging from social media discourse to public narratives and LLM summaries, and anchoring them against the long-term ground truth of BAV brand equity data, the atlas makes it possible to observe how brand meaning evolves across different layers of public perception.

    This unified view reveals something important: the signals surrounding a brand are not random noise. They form patterns. Brands cluster around shared emotional territories, compete in perceptual spaces that often cut across industry boundaries, and reveal whether social conversation reinforces or diverges from long-standing brand equity.

    Rather than replacing traditional research methods, this approach extends them. It allows brand leaders to see not only what people say about brands today, but also how those conversations relate to the deeper beliefs that define brand meaning over time.

    In a world where social media amplifies attention but does not always reshape perception, the real strategic advantage lies in understanding the relationship between visibility and meaning. The Brand Perception Atlas offers a way to navigate that relationship. Not as a replacement for human judgment, but as a new map, one that helps brand leaders see where their brand truly sits in the landscape of consumer perception, and how that landscape is changing.

    Because in the end, the challenge is not simply measuring brand performance, it is understanding the territory of meaning that a brand occupies in people’s minds and learning how to navigate it.

    Learn more about WPP Brand Asset Valuator® (BAV) and how you can access this data for your own analysis.