Can AI Predict Human Attention? Yes AND No!

Why do some studies show almost perfect correlation between AI and human eye-tracking, while others show discrepancies?

Can AI Predict Human Attention? Yes AND No!

Why do studies from MIT show 98% correlation between AI-based attention prediction and human eye-tracking, while other findings shows discrepancies? Who is right? The answer is: both!

Research spanning cognitive neuroscience and marketing science demonstrates that human attention operates through two fundamentally different mechanisms. Decades of cognitive neuroscience, from the Feature Integration Theory of Anne Treisman & Gelade (1980) to the Biased Competition Model by Desimone & Duncan (1995) show that our visual system continuously balances between external “pop-out” cues and internal goals. Attention is not a single spotlight – it’s a dynamic negotiation between two systems that determine what captures our awareness:

Bottom-Up Attention: driven by the sensory input and salience in context.

Top-Down Attention: driven by motivation (goals / job-to-be-done / needs).

Understanding these two mechanisms helps to understand what AI can reliably predict and when to expect discrepancies vs. human eye-tracking.

𝗕𝗼𝘁𝘁𝗼𝗺-𝗨𝗽 𝗔𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻: The Stimulus-Based System

This system explains why a bright red "SALE" banner captures attention in milliseconds. The mechanism operates pre-attentively – you notice the banner before you decide whether it matters to you or not.

Bottom-up attention operates automatically: Itti & Koch (2000) demonstrate how the visual system creates saliency maps before conscious awareness. Neuroimaging by Corbetta & Shulman (2002) shows this process activates the ventral attention network, responding to stimulus properties independent of observer goals. So, this type of attention is driven purely by sensory features like color contrast, motion, faces, size and novelty.

Bottom-up attention isn't just about initial capture – it's a critical driver of which products consumers actually choose. Extensive research by Wedel & Pieters (2008, 2019) demonstrates that visual salience determines consideration set formation. Products that don't attract bottom-up attention never enter the choice process, regardless of their objective quality or goal-relevance. This creates what they term the "attention gate": consumers can only choose from alternatives they notice.

Their eye-tracking studies in retail environments show that shelf position, package size, color contrast, and brand logo salience predict fixation probability—and fixation probability predicts purchase likelihood. Even when consumers have clear goals ("buy healthy cereal"), bottom-up features determine which healthy cereals get evaluated. A visually prominent product has a 3-4x higher probability of consideration than an equivalent but less salient alternative.

Mormann et al. (2012) used neuroimaging to link this process to choice: neural responses to visual salience in early visual cortex predicted subsequent purchase decisions, demonstrating that bottom-up attention operates as a pre-conscious filter on the choice set.

The mechanism works through gaze cascade effects—initial bottom-up attention triggers elaboration, which strengthens preference through mere exposure and processing fluency (Shimojo, Simion, Scheier, 2003). The Attentional Drift-Diffusion Model (Krajbich et al., 2010) shows that gaze duration biases value integration – the longer you look, the more likely you choose it.

 

Top-Down-Attention: We See What We Desire

Top-down attention operates under cognitive control, guided by motivation: our currently active goals, needs and jobs-to-be-done. The landmark study by Yarbus (1967) proved this definitively: participants viewing the same painting under different instructions—"estimate the ages" versus "judge their wealth"—produced radically different eye-tracking patterns. Attention follows intention, not stimulus salience.

Desimone & Duncan (1995) and Chelazzi et al. (1998) established that consumers maintain an "attentional template" in working memory – a neural representation of target features that biases attention allocation toward goal-relevant stimuli. These templates include the where (location) and what (cues) to look for.

Critically, neuroimaging studies show these templates are represented in the prefrontal cortex, the brain region associated with executive control, motivation, and goal-directed behavior. This neural architecture reveals the motivation-based, goal-dependent nature of top-down attention.

When searching for "wireless headphones under $100," the prefrontal cortex uses associated cues in the attention template to amplifying cues that are associated with the current goal while suppressing unrelated features – regardless of their visual prominence. This top-down signal from prefrontal cortex modulates activity in visual processing regions, fundamentally altering what captures attention based on current goals rather than stimulus properties. Brands, products and offers that use cues that fit to the respective attention template will attract attention.

 

AI models excel at predicting bottom-up attention but are limited predictors of top-down attention.

Cutting-edge, deep-learning AI models like the ones used by Brainsuite (www.getbrainsuite.com) analyze stimulus properties. So, they predict bottom-up attention with high accuracy because they mirror the underlying neural mechanism: the AI identifies what attracts the eye based on sensory salience, just as the ventral attention network does. AI shows high correlation with human eye-tracking in case of free exploration of the stimulus and in the early scanning phase (1-3 seconds).

Given Wedel & Pieters' findings on attention's role in choice, this predictive accuracy has direct commercial implications. Bottom-up attention gets products into consideration; top-down attention determines which products survive evaluation. AI therefore, can predict the gate but not the selection process – because it lacks access to the motivation-based, goal-dependent signals originating in prefrontal cortex.

AI can reliably predict which design elements will capture initial attention and enter the consideration process – a critical first step in the purchase funnel.

AI models don't know what the consumer is trying to accomplish in the very moment. AI can't encode the prefrontal cortex's goal representations – the attentional templates that define search intent or category knowledge. When Wedel & Pieters (2019) analyzed eye-tracking in goal-directed tasks, they found that intent-driven search behavior showed little correlation with visual salience.

 

Why Research Results Diverge

Studies reporting strong correlation between AI and human eye-tracking focus on bottom-up attention:

  • Free viewing without task instructions or specific goals

  • Free exploration

  • Initial fixation patterns in cluttered visual environments

Studies reporting poor AI-human correlation typically measure top-down attention:

  • Search for explicit product criteria

  • Task-based exploration with specific information goals

  • Re-fixation patterns and long exposure time

𝗖𝗿𝗶𝘁𝗶𝗰𝗮𝗹 𝗕𝗼𝘂𝗻𝗱𝗮𝗿𝗶𝗲𝘀

Product category moderates bottom-up importance – Wedel & Pieters found that visual salience drives choice more strongly in low-involvement categories (FMCG, food) than high-involvement categories (electronics, financial products)

Cognitive load moderates the balance – Lavie (2005) showed that when consumers multitask or experience cognitive overload, prefrontal cortex control weakens and bottom-up salience gains influence. AI predictions become more accurate under high cognitive load because attention becomes more stimulus-driven.

Time moderates correlation – Wedel & Pieters (2008) found that bottom-up salience predicts first fixation, but top-down goals predict fixation duration and re-visits.

Practical Implication – Make Sure You Leverage Push And Pull

When pre-testing creative assets like pack, in-store media at DECODE we measure both types of attention:

  • Make sure your creative assets stand-out in context by being salient

  • Make sure you use cues that are associated with shoppers’ need to attract attention

Sources:

Wedel, M., & Pieters, R. (2008). A review of eye-tracking research in marketing. Review of Marketing Research, 4, 123-147.

Wedel, M., & Pieters, R. (2019). Journal of Marketing Research.

Mormann, M., et al. (2012). Journal of Neuroscience.

Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual Review of Neuroscience, 18(1), 193-222.

Chelazzi, L., et al. (1998). Responses of neurons in macaque area V4 during memory-guided visual search. Cerebral Cortex, 8(7), 652-672.

Yarbus, A. L. (1967). Eye Movements and Vision. Plenum Press.

Corbetta, M., & Shulman, G. L. (2002). Nature Reviews Neuroscience, 3(3), 201-215.

Shimojo S, Simion C, Shimojo E, Scheier C. Gaze bias both reflects and influences preference. Nat Neurosci. 2003 Dec;6(12):1317-22. doi: 10.1038/nn1150. Epub 2003 Nov 9. PMID: 14608360.