Author: Andreas Stavrou

  • From core interests to rabbit holes – an AI experiment

    From core interests to rabbit holes – an AI experiment

    Social Agents Pod builds autonomous AI personas that browse social platforms like real users, each driven by demographic and psychological profiles. Early experiments show recommendation algorithms quickly lock onto core interests, then probe adjacent topics – which agents often adopt. The goal is a scalable, auditable framework for studying how algorithmic exposure shapes beliefs.


    Team Members: Andreas Stavrou, Nikos Gkikizas

    Introduction

    What if an AI agent could experience the internet the way a person does – scroll through feeds, react to content, develop tastes, get influenced, and evolve?

    That is the question behind the SocialAgents research pod. We are building an autonomous agent that browses social media platforms the way a real human would: it sees content, forms opinions based on its personality/background, decides whether to engage and over time develops new interests shaped by what the algorithm chooses to show it. This blog post documents the first phase of that effort, focused on social platforms as the initial source of information.

    The work tries to answer the following question:

    How do online platforms shape what different users see, engage with, and eventually come to think or believe?

    Content-recommendation algorithms do not just match interests – they introduce new content, test engagement, and then reinforce it. By running controlled agents with known starting profiles, we can track how exposure differs across user types, how it changes over time, and what drives those shifts. This gives us a precise way to study algorithmic influence that is impossible with real users.

    Each agent is defined by a rich profile (including age, occupation, cultural background, content affinities, aversions etc.) and interacts with platform content through the same actions available to any user: scrolling, liking, saving, commenting, following and sharing. Engagement decisions are made by a multimodal AI model that reasons over the agent’s personality and the content it encounters. Every session is designed to remain undetected, with timing patterns and warm-up progressions that mirror how a genuine new user explores a platform.

    The sections that follow detail the methodology, early experimental results and the infrastructure required to run these simulations at scale. The early findings show that within a single session, the algorithm accurately identified each agent’s interests and then began expanding them into adjacent territories.


    The mechanics of human navigation of social media

    When you scroll through a feed, your brain runs a rapid filtering process, forming relevance judgments in under 50 milliseconds and pauses when it detects something novel, emotionally charged or personally relevant. Surprise, humor, curiosity and outrage are the strongest scroll-stoppers, because they trigger emotional circuits faster than conscious thought.

    What keeps you scrolling isn’t satisfaction but anticipation: the infinite scroll removes natural stopping points, feeding a dopamine loop where the next post might be the rewarding one, a variable-ratio reinforcement pattern, the most compulsive reward schedule in behavioral psychology.

    Figure 1 – The TikTok feed – each creative takes up the whole vertical space in the “For You” section

    TikTok and Instagram exploit this differently. On TikTok, attention is measured in watch time and rewatches. The algorithm auto-serves content and hooks you within the first second of a video, so seconds of hovering are captured passively and automatically. Instagram earns attention more deliberately: you lean in, judge a visual aesthetically and decide to tap. Key signals are saves and swipes rather than raw watch time. This is why TikTok’s average engagement rate (~4.64%) is much larger than Instagram’s (~0.43%) [TikTok vs. Instagram: A Deep Dive into Engagement Rates and Content Performance].

    Figure 2 – The Instagram “For You” home feed – creatives are being fed in a vertical feed

    The interest of people browsing social media is incredibly short-lived. The average attention span for a video is 1.7 seconds on mobile and 2.5 on desktop [Capturing Attention in Feed: The Science Behind Effective Video Creative].

    A Fors Marsh Group study found that as little as 0.25 seconds of exposure is enough for people to recall mobile feed content at a statistically significant level, meaning the brain is processing and encoding content far faster than conscious attention suggests [Facebook video ad viewability rates are as low as 20%] This suggests that simulating human content browsing on social media using generative AI can be particularly tricky. That is because the response time of multimodal transformer based API ranges from roughly 4 to 8 seconds for 200 tokens [LLM Latency Benchmark by Use Cases in 2026], way above the average attention span, erroneously indicating interest to the platform for every creative just to consider it.


    Simulating human behavior on social media

    Our framework decomposes human browsing into three layers – persona construction, perception and judgment and behavioral execution – each calibrated against real-world engagement distributions. But the framework serves a deeper purpose than creative testing: it is how we test a foundational question – can AI personas reliably stand in for real humans in the eyes of a recommendation algorithm?

    Every simulation begins with a synthetic persona – not a shallow archetype but a deeply specified psychological and demographic profile. Each persona encodes age, gender, location, occupation, education, income bracket, cultural background, daily routines, content affinities and content aversions. These are the digital equivalents of the implicit biases and taste structures that real users carry into every scroll session. A 34-year-old veterinary nurse in Manchester with a dry sense of humor and a distaste for influencer culture will engage with content in measurably different ways from a 22-year-old design student in Brooklyn who follows streetwear accounts.

    For every social post, our agent estimates probabilities for each possible action – scroll away, like, save, comment, follow – accompanied by a reasoning trace explaining why this persona would or would not engage with this specific piece of content. That trace is essential for auditing how the agent is genuinely responding to the persona’s specific traits.

    Raw model outputs are not behaviors. A 16% “Like” probability and an 8% “Comment” probability mean nothing without calibration against platform-specific base rates. We apply a smoothing layer that adjusts per-post probabilities to known engagement benchmarks. The calibrated probabilities are then sampled to produce a single action.


    What each simulation produces

    Each simulation produces two outputs:

    • An interaction log: a record of every post the agent saw, what it did (scrolled past, liked, saved, commented), the probability behind that decision, and the reasoning.
    • A feed report: a snapshot of the content the platform served at different points in the session, showing how the feed changed over time.

    Imagine an agent built to mirror a 28-year-old personal finance enthusiast. Over a one-hour social media session it encounters 500 posts. The interaction log records that it liked 12, saved 3, commented on 1, and scrolled past the rest – along with why (e.g., “liked because the budgeting tip matched the agent’s stated interest in saving strategies”).

    The feed report then shows that by minute 40, the social media platform had started mixing in mental-health and self-improvement clips alongside the finance content – a shift the agent didn’t ask for, but that the algorithm introduced on its own.

    Running multiple distinct agents through the same platform for hours doesn’t just produce engagement metrics – it produces a controlled experiment on the algorithm itself. We observe what content the algorithm pushes to each agent, how that mix shifts over time, and what happens when the algorithm starts exposing the agent to novel or trending types of content.

    By logging the agent’s reasoning at every step, we can identify exactly which creative attributes – visual tone, emotional register, narrative hook – made that unexpected content compelling enough to earn a like or a save.


    Analysis of interactions based on persona characteristics

    We ran two agents through extended sessions on a social media platform. Before diving into results, here’s who they are.



    George is a 36-year-old senior finance analyst based in Athens. He follows investment strategies, personal finance, fitness, and business leadership content. He values data-driven advice, skips past crypto hype and hustle culture, and engages most with content that offers practical, actionable takeaways. He scrolls deliberately - slowing down for charts and analysis, skipping memes in under two seconds.


    Sofia is a 25-year-old social media coordinator, also in Athens, who creates content around fashion, travel, and fitness. She engages with styling tips, travel itineraries, workout routines, and creator growth strategies. She scrolls fast past ads but lingers on vibrant visuals and aesthetic content. Her feed time is high - she checks social media five times a day.

    Within the first session, the platform identified each agent’s core interests accurately. George’s feed was dominated by stock analysis, personal finance tips, and fitness content. Sofia’s feed filled with recipe tutorials, fitness routines, and travel vlogs. Roughly 60–80% of the content served matched their declared interests – measured by whether the content category aligned with the agent’s stated affinities.

    But the remaining 20–40% is where the story gets interesting.

    The off-topic content was not random. George was shown mental health clips, motivational content, and street food showcases – adjacent emotional territories that share the aspirational tone of self-improvement media. Sofia received tech gadget unboxings, entrepreneurship stories, and macro-economic forecasts – probing whether her preference for short-form, personality-driven content would transfer to informational topics. The algorithm wasn’t guessing. It was testing the edges of each agent’s taste profile.

    And the agents followed. George developed sustained engagement with psychology content and food showcases, reaching interaction rates comparable to his core finance interests. Sofia adopted tech gadgets and entrepreneurship narratives — topics that traditional demographic targeting would never have surfaced to a 25-year-old fashion content creator. By session five, these weren’t exploratory recommendations anymore. They were part of each agent’s regular content diet.

    Figures 3 and 4 below visualize this shift. Each chart tracks the proportion of content categories served to the agent over time, showing how the feed gradually expanded beyond the original interest profile.

    Figure 3 – George’s Content Ecosystem Evolution


    Figure 4 – Sofia’s Content Ecosystem Evolution

    What these results suggest is that the algorithm doesn’t just confirm existing tastes – it actively expands them. It found the edges of each agent’s interest profile and pushed content into those gaps, widening what each agent consumed over time.


    Persona adaptation to trends and suggestions

    The previous section showed that the algorithm quickly identifies what each agent cares about – and then starts pushing content beyond those boundaries. The natural follow-up question is: what happens if the agent actually adopts those new interests?

    To test this, we took the content categories that the algorithm surfaced and that each agent consistently engaged with during the first round of experiments, and folded them into the agent’s profile as declared interests. In other words, we let the first round of browsing reshape who the agent claims to be.

    For George, the enrichment added five categories that emerged from his initial sessions: player performance clips, quick recipe tutorials, media bias and propaganda breakdowns, music performances and concerts, and travel destination vlogs. None of these were part of his original finance-and-fitness profile – they were interests the algorithm introduced and George chose to engage with.

    For Sofia, the enrichment was broader – nine new categories: motivational speeches and quotes, day-in-the-life vlogs, mental health and psychology clips, personal finance hacks, home and furniture, music performances and concerts, tech gadget unboxings, workout tutorials, and geopolitical conflict updates. Some of these, like tech gadgets and personal finance, were far outside the fashion-travel-fitness profile she started with.

    We then re-ran the full simulation with these enriched agents. Same platform, same session structure, same interaction approach – but with agents whose declared interests now reflected the expanded taste profiles earned in the first round.

    The results confirmed that the cycle continues. With a richer interest profile to work from, the algorithm pushed even further. George, who originally cared about finance and fitness and had since adopted recipe content and travel vlogs, was now being served bodybuilding content, tech gadget reviews, and podcast highlight reels – and engaging with them. Sofia’s feed expanded in similar ways. Each round of enrichment gave the algorithm more surface area to explore, and it used that surface area aggressively.

    Figures 5 and 6 below show the content mix evolution for George and Sofia’s enriched profiles, following the same format as Figures 3 and 4. The key difference is the starting point: the agents entered this round with a wider interest profile, and the algorithm expanded it further still.

    Figure 5 – George’s Content Ecosystem Evolution (Enriched Profile)


    Figure 6 – Sofia’s Content Ecosystem Evolution (Enriched Profile)

    This observe-enrich-rerun approach turns a single experiment into an iterative process. Each cycle produces agents whose interests more closely resemble how real users evolve on a platform over time – not just what they start with, but what they become after sustained exposure to algorithmic recommendations.

    Conclusion

    AI Agents give us a controlled way to observe something we couldn’t observe before: how algorithms reshape what people care about. George started as a finance-and-fitness person. After two rounds of interaction, he was engaging with bodybuilding content, recipe tutorials, and podcast highlight reels – none of which he would have sought out on his own. Sofia went from fashion and travel to tech gadgets and geopolitical updates. These shifts weren’t random. They followed a clear pattern: the algorithm identified adjacent emotional territories, tested them, and when the agent responded, it pushed further.

    The next step is to give our agents access to more sources of information beyond social media – news, trends, search – making their online experience even closer to that of a real person browsing the web. The closer the agent gets to a full human browsing experience, the more we learn about how the digital world shapes what people see, think, and ultimately believe.

    Future Work

    Topics that deserve more focus over the next months are:

    1. Expansion to other sources of dynamic information (News, Trends etc.) – Social media platforms are interesting but specific content types might never surface to them, or be delayed. An interesting question to answer is: how do other sources of dynamic information affect the way personas perceive content and interact with it?
    2. Impact of trends on personas – Determine how social media trends (i.e. viral videos, trending brands etc.) influence the interests of different personas. This is particularly important to understand which demographics are more susceptible to targeting for specific brands and products.
    3. Marginal contribution of specific interests on the variability of content – We have already seen that engaging with specific content types might be more influential on what the algorithm serves. More work is needed to understand which personas are more sensitive to adding/removing interests in terms of how their feed evolves.
    4. Understanding of the content adaptation velocity between slow and fast-paced platforms – Not all algorithms are created equal. Further research is required to measure how quickly content evolves on different social media platforms.
  • Social Agents Pod

    Social Agents Pod

    SocialAgents Pod builds autonomous AI personas that browse social platforms like real users, each driven by demographic and psychological profiles. Early experiments show recommendation algorithms quickly lock onto core interests, then probe adjacent topics – which agents often adopt. The goal is a scalable, auditable framework for studying how algorithmic exposure shapes beliefs.


    Team Members: @Andreas Stavrou @Nikos Gkikizas

    Introduction

    Content-recommendation algorithms on social platforms do not simply match users to their existing interests – they actively introduce new content, test engagement and reinforce what sticks. Over time, this feedback loop reshapes what users see, engage with and ultimately come to believe. Understanding this process at scale is difficult with real users: you cannot control their starting profiles, isolate algorithmic influence from organic behavior or ethically run repeated experiments on their feeds.

    To address this, we developed a framework for deploying AI-driven synthetic personas that browse social media the way real people do. Each agent is defined by a rich demographic and psychological profile and makes engagement decisions – scrolling, liking, saving, commenting, following – powered by a multimodal AI model that reasons over the agent’s personality and the content it encounters. By running controlled agents with known starting profiles, we can precisely track how platform algorithms adapt, expand and shift the content they serve over time. Early results show that within a single session the algorithm accurately identifies each agent’s core interests and then begins pushing content into adjacent territories the agent never sought out.

    For a full discussion of the research motivation, methodology and experimental findings, see the SocialAgents Pod Executive Summary.

    The sections below focus on the technical implementation: the tools, architecture and infrastructure required to run these simulations at scale.


    Frameworks & Libraries

    Various frameworks and libraries were considered but the ones that were finally adopted were SeleniumBase for browsing automation and PyAutoGUI for GUI automation.

    SeleniumBase is a Python framework built on top of Selenium WebDriver that provides a higher-level, more “batteries included” way to write browser automation. Compared to writing raw Selenium, it gives you a cleaner test structure and a lot of helpers out of the box (smart waits, assertions, screenshots, logging, HTML reports), which reduces boilerplate and makes scripts easier to maintain. It also tends to integrate more smoothly with the browsers you already have installed on the machine, handling a lot of the setup and driver management for you so you can run against a “real” local Chrome/Edge/Firefox without as much manual configuration.

    A key difference from plain Selenium is that SeleniumBase includes convenience modes and configurations that produce browser sessions closely resembling typical user behavior. For example, it manages browser flags, profile handling and wait strategies out of the box, driving the browser in ways that look closer to natural user sessions than a minimal Selenium script. In contrast, with raw Selenium you usually have to implement these reliability-related choices yourself, whereas SeleniumBase centralizes many of them behind simpler APIs and defaults.

    PyAutoGUI is a Python library for automating interactions with your computer’s graphical user interface (GUI). It lets you programmatically control the mouse and keyboard to move the cursor, click, drag, scroll and type, which makes it useful for scripting repetitive tasks that would otherwise be done manually. Because it operates at the UI level, it can automate many desktop applications even when they do not expose an API.

    Figure 1 – Example of GUI automation using PyAutoGUI automating the use of Microsoft Paint

    A key feature of PyAutoGUI is basic screen-based automation through image recognition: you can locate buttons or UI elements on the screen (via screenshots) and then click or interact with them based on their position. This is helpful for building lightweight “robot” scripts for workflows like form filling, desktop app navigation or testing simple user flows. In practice, it tends to be most reliable when paired with careful timing, consistent screen resolution and safeguards such as its built-in “failsafe” (moving the mouse to a corner to stop the script).

    Here is a list of the most common methods:

    1. pg.moveTo(x,y) – moves the mouse to position x,y (x, y pixels rom the top left corner)
    2. pg.click(x,y) – clicks in the x,y position
    3. pg.write(text) – simulates consecutive keyboard presses
    4. pg.scroll(y) – scrolls y pixels vertically
    5. pg.hotkey(key1, key2) – can be used for keyboard combinations (i.e. ctrl-C, ctrl-V etc)
    6. pg.locateCenterOnScreen(image) – tries to locate an image in the screen and returns the coordinates of its center

    Implementation & Examples

    Two distinct approaches were tested and implemented, each detailed in the sections below.

    The first approach combines SeleniumBase with GUI automation. SeleniumBase provides a mature, full-featured browser automation framework with built-in conveniences such as easy cookie access, configurable browsing modes and fine-grained session control. Pairing it with GUI automation brings the best of both worlds – robust browser-level control alongside OS-level input simulation.

    The second approach relies on pure GUI automation. By controlling the mouse and keyboard directly, the agent is effectively indistinguishable from a real user. This offers greater control over all interactions, though it requires additional effort to replicate features that SeleniumBase provides out of the box (e.g., cookie handling, session management).

    SeleniumBase & GUI automation

    After exploring and testing various browser automation tools, we selected SeleniumBase for its maturity and the convenience it offers on top of the familiar Selenium framework. Sessions were configured with test mode and guest mode disabled and CDP mode activated, giving us a realistic browser environment. The most valuable feature for our use case was SeleniumBase’s straightforward cookie management, which we leveraged for cookie-based authentication flows.

    Initially, we attempted a SeleniumBase-only approach, identifying and interacting with page elements via XPath selectors and similar methods. However, we found that combining SeleniumBase with GUI automation (PyAutoGUI) produced significantly more robust results. The framework handles higher-level concerns like session management, cookie handling and browser configuration, while GUI automation takes over at the interaction level, simulating mouse and keyboard input in a simple and reliable way.

    In summary, SeleniumBase owns the session lifecycle and authentication layer, while PyAutoGUI drives all low-level user interactions (clicks, typing, scrolling), combining the strengths of both tools.

    Pure GUI automation

    Pure GUI automation operates at the OS level, generating mouse and keyboard inputs that are identical to those of a real user. This makes it a natural fit for producing browsing sessions that closely mirror genuine human behavior.

    We developed an alternative method of simulating social media browsing using PyAutoGUI and visual automation. The pure GUI automation approach produced smooth, uninterrupted sessions on already warmed-up social accounts (the methodology can be seen here).

    This approach will be expanded to more social platforms.


    Agent Design

    The decision-making core of the framework is a multimodal AI agent that runs continuously alongside the simulation, evaluating each post in real time as the browser scroller surfaces it. Rather than receiving a batch of posts upfront, the agent operates as a live judge – it is initialized once with the persona definition and a set of instructions and then processes posts one by one as the feed delivers them.

    Each time the scroller encounters a new post, the agent receives a multimodal input bundle comprising:

    • Visual content – the post’s image or video frames
    • Contextual metadata – caption text, hashtags, account information and any other on-screen context captured during the session
    • Persona definition – the full demographic, psychological and behavioral profile of the synthetic user
    • Instructions – the evaluation prompt specifying the expected output format and decision criteria

    Given these inputs, the agent returns a structured output containing the fields bellow:

    1. Interaction distribution – a probability distribution over all possible actions (scroll past, like, save, comment, follow, share), representing how likely the persona is to perform each action on this post
    2. Reasoning – a natural-language explanation grounding the decision in specific persona traits and content attributes
    3. Post description – a textual summary of the post’s content for downstream categorization and analysis
    4. Duration on post – A field that dictates how long to stay on any given post.

    Every response is validated against a Pydantic model that enforces the expected schema, field types and value ranges. If the output fails validation, it is rejected and retried automatically.

    We use Gemini 2.5 Flash Lite as the primary model, chosen for its speed, low latency and strong multimodal reasoning across image and video inputs. Gemini 2.0 Flash Lite serves as a fallback. The pipeline applies automatic retries with exponential backoff and graceful failover to the backup model if the primary is unavailable or returns repeated validation failures.

    Once the agent produces a validated report for a post, the result is immediately appended to the simulation’s interaction log – a structured record capturing the post content, the persona’s response, the probability distribution and the reasoning trace. The simulation then scrolls to the next post and the cycle repeats until the session ends. After all posts in a session have been evaluated, the complete log is persisted and the experiment proceeds to the next session or enrichment cycle.

    Figure 2 – Agent Design


    Scalability & Containerization

    Pure GUI automation relies on controlling a visible desktop session – every bot instance needs its own screen, mouse and keyboard which means you cannot spin up hundreds of headless processes on a single machine the way you can with browser-driver automation. Each additional agent requires a dedicated virtual display (e.g., via Xvfb or a VNC session) with a fixed resolution, so that interactions like locateCenterOnScreen that depend on approximate pixel matching are not fragile across different display configurations.

    This is why we built a containerized version simulator, only for some social media using GUI-based browsing. This approach has been successfully tested on both MacOS and Windows, making the process much more scalable. The details can be found in the relevant Dockerfile. To use the solution follow the instructions written in the README.

    Note that true scalability is throttled by the speed of creating and warming up new accounts. Further research is needed to determine if warming up can be also automated.


    Replicating Results

    This section provides a step-by-step guide for running a simulation and reproducing the results, assuming access to the repository.

    1. Configure and run the simulation
      • Update the configuration file with the desired parameters (e.g., persona definitions, session duration). Then execute the social media scroller for the chosen platform. Each run produces a session log file as output.
    2. Understand the Log Structure
      • Each row in the log represents a single post-persona pair and contains the following fields:
        • Persona – the synthetic persona that encountered the post
        • Interaction – the action taken (e.g., like, skip, scroll past, comment)
        • Post description – a textual description of the post’s content
        • Reasoning – the LLM-generated rationale behind the chosen interaction
    3. Categorize Post Descriptions
      • Once the logs are collected, an LLM categorization step maps each post description to a known meta taxonomy. This step is not part of the core simulation – its purpose is to facilitate visualization and evaluation by grouping posts into standardized content categories.
    4. Analyze the results
      • With the enriched logs (raw data + taxonomy labels) in hand, you can proceed with exploratory data analysis to examine patterns in content exposure, persona behavior, platform recommendation tendencies and any other dimensions of interest.

    Limitations & Future Work

    The following are the key limitations identified during development, along with proposed directions for addressing them.

    Account Registration

    Certain social media platforms require registration through Google accounts or phone numbers. As a result, scaling to new platforms or creating additional bot accounts may require access to these resources, which adds an external dependency to the setup.

    LLM Latency & Engagement Signal Noise

    The communication between the agent and the LLM introduces a non-trivial delay – averaging 3.4 seconds per interaction. During this time the agent remains idle on the current post and the platform logs this pause as content view time and engagement. This effectively sends an inflated engagement signal to the recommendation algorithm for every post, regardless of whether the persona would have lingered on it. While this is a known source of noise, we expect the impact to be roughly uniform across all posts, since every interaction incurs the same overhead. As such, relative comparisons between posts and personas should remain meaningful, even if absolute dwell-time metrics are skewed.

    To mitigate the above, we’re planning on introducing a two-stage decision pipeline. A lightweight first stage – using classical machine learning models or small vision models – would make a rapid scroll-away vs. stay decision in near real-time. The full LLM call would only be triggered when the agent decides to stay, allowing the 3.4-second overhead to occur only on posts the persona has chosen to engage with – closely mimicking how a real user would pause before deciding on an action.