Planning often involves mentally simulating (“rolling out”) possible futures before acting. Rollout‑based models have seen great success in artificial intelligence, yet cognitive process models of human rollout planning remain scarce because their likelihoods are typically intractable. We introduce Exact Rollout‑Based Planning (ERBP), a class of process models that yields closed‑form action probabilities, enabling exact, simulation‑free parameter estimation. ERBP treats planning as an absorbing Markov chain where in a simulated state, the agent decides to either continue or terminate the rollout. Once the rollout is terminated, the agent decides whether enough information has been gathered to take an action, or whether a new rollout should be initiated. Because ERBP provides exact likelihoods, inferences about latent planning parameters such as the depth of planning are uncontaminated by sampling noise, allowing for more reliable and accurate conclusions based on these parameters. ERBP therefore provides a bridge between algorithmic theories of rollout planning and quantitative analysis of human behavior, opening avenues for rigorous comparison of planning strategies across individuals, tasks, and biological versus artificial agents.
We develop a first instantiation, Progress‑Regress ERBP (PR‑ERBP), in which rollouts terminate when the agent perceives sufficient improvement, sufficient deterioration, or when the cognitive cost of continuing simulation is too high. These three psychologically interpretable stopping rules map to distinct absorbing states, allowing analytical computation of rollout probabilities and action probabilities. To demonstrate empirical utility, we fit PR‑ERBP to 44,227 actions taken by 42 participants solving Rush Hour puzzles. We found that the model was able to recapitulate trends in summary statistics such as an increase in errors further from the goal and an increase of subgoal-relevant actions near the goal.
Human life is filled with problem solving. From something as simple as crossing the road, to complex problems such as chess, any situation where an agent desires change can be framed as a problem. Many problems are too complex to tackle directly, and instead can be broken down into subgoals which are more easily solved. The process of starting at the main goal of the problem and breaking it down into subgoals recursively is called backward reasoning. Under the supervision of Wei Ji Ma, we proposed that AND-OR trees that chain together subgoals and actions to attain them provide a useful representation to study this process.
AND-OR trees provide a representation predictive of human actions
Closer to the goal, the predictive power increases
The trees get smaller the closer to the goal a person is
People are biased toward shorter trains of thought (in agreement with resource rationality theory)
People make fewer mistakes the closer to the goal they are
A simple process model built on AND-OR trees can capture these trends, while alternative models cannot
Royal Game of Ur is one of the oldest board games in the world. Excavated at the Royal Cemetery at Ur, five boards were found dating back to around 2,600–2,400 BC. It is a predecessor of many race games like backgammon. The game has received increased attention after Tom Scott's interview and the launch of https://royalur.net/. In this project, I collaborated with the creator of RoyalUr.net Padraig Lamont, to (strongly) solve the game and analyze strategic and tactical gameplay.
This is an ongoing project. So far, we have successfully strongly solved the game using value iteration and used it to analyze a few basic strategies. The long term objective is to compare this to human gameplay data and analyze in which situations people behave optimally.
The statistics of the world are constantly in flux, so the brain must be able to adapt its internal model in real time to make sense of this evolving environment. This problem is further complicated by the fact that sensory information is itself dynamic and ambiguous, so neural circuits have to disambiguate meaningful changes from noise through a dynamic inference process. Together with Camille Rullán and Cristina Savin, we proposed a framework for online learning in probabilistic models that can infer model parameters from the stochastic, continuous-time dynamical systems underlying inference over time. Our sampling-based approach allows for unified learning solutions for different internal models. The learning process is temporally local and does not require an evaluation of gradients, both computational prerequisites for biological learning.
The phenomenological effects of our learning framework can be mapped into neural responses and behavior via a spiking recurrent network encoding process allowing us to directly compare its properties to experimental data. We found evidence in nonhuman primate data on smooth pursuit that supports our theory.
How do you find the distance between two neural networks (biological or artificial) when their output is stochastic? Measures of deterministic representational similarity ignore the scale and geometric structure of noise, both of which play important roles in neural computation. In collaboration with researchers at the Flatiron Institute, we extended Alex William's theory of Generalized Shape Metrics to apply to stochastic neural networks.
We applied our theory to both biological and artificial neural network data and found that the stochastic geometries of neurobiological representations of oriented visual gratings and naturalistic scenes respectively resemble untrained and trained deep network representations. We were also able to more accurately predict certain network attributes (e.g. training hyperparameters) from their position in stochastic (versus deterministic) shape space.
Localising objects is a task plagued by uncertainty, such as when looking for an object in a cluttered scene. While eye movements in visual search have been shown to optimally reduce uncertainty about target location called active sensing, they have mostly been studied using scenes devoid of landmarks. Conversely, landmarks are central to the study of navigation, but the role of uncertainty, and its reduction by active sensing, remain unclear. Instead, various heuristic strategies, using landmarks proximal or distal to the goal, have been described. With the help of Yul Kang and under the supervision of Máté Lengyel and Daniel Wolpert, we ran a psychophysics experiment to test whether people use optimal or heuristic strategies when localising objects in the presence of landmarks.
We found diverse strategies, with some participants close to optimal active sensing, while others heavily biased towards using the proximal landmark, with a significant anticorrelation between these two strategies. Each participant's active sensing strategy was adapted to their idiosyncratic patterns of errors, underscoring the importance of subjective uncertainty. Our study reveals active sensing in localisation, by systematically separating it from a proximity-based heuristic, and suggests a landmark-specific representation of subjective uncertainty, thus placing stringent constraints on candidate neural mechanisms of localisation and, more generally, of navigation.