Markov chain Monte Carlo (MCMC) is a method used in cognitive science to estimate the distribution of probabilities across hypotheses. Calculating probabilities exactly is often too resource intensive, so MCMC instead approximates the distribution. It is a Monte Carlo method because it stochastically simulates hypotheses from the probability distribution such that each hypothesis is generated proportional to its probability. A precise approximation requires many samples. MCMC is a Markov chain because each new sample depends only on the immediately preceding sample. It is most often used in Bayesian statistics, but in cognitive science it is used to elicit human beliefs and as a method for modeling how the brain deals with uncertainty.
MCMC was initially developed in statistical physics (Metropolis et al., 1953) and is now widely used for approximating complex probability distributions (Jones & Qin, 2022; Neal, 1993). Later work in cognitive science broadened the use of MCMC from a data analysis method to a method for eliciting beliefs (Sanborn et al., 2010) and into an explanation of how the brain works (Hoyer & Hyvärinen, 2002).
While there are efficient methods for sampling from distributions in which all probabilities are known exactly, MCMC sidesteps some difficult calculations when this is not the case. In Bayesian inference, the quantity of interest is often the probability of a hypothesis H given some evidence E, P(H|E). For example, we may be interested in the probability that the adjective “happy” (H) best describes an image of a face (E).
Using Bayes rule, calculating P(H|E) first involves multiplying how likely it is that the evidence arises if the hypothesis is true, P(E|H), by the prior probability that the hypothesis is true, P(H). This calculation is often easy, but normalizing the probability—making sure that across all hypotheses, the probabilities of the hypotheses given the evidence sum up to one as they should—requires dividing each outcome by the normalizing constant that sums across all hypotheses, ∑H P(E|H)P(H).
This sum can be intractable—very difficult or expensive to compute with current computers—with many hypotheses. For example, interpreting facial expression would involve summing across thousands of hypothesis adjectives (e.g., “discombobulated”).
MCMC avoids the need to calculate the normalizing constant. It works by starting with a particular hypothesis, H, and then randomly proposing a new hypothesis, H′, that is similar to H (e.g., starting with “happy” and proposing “ecstatic”). A stochastic decision (i.e., the acceptance function) is then made whether to switch to this new hypothesis; as the probability of switching depends only on the probability of the new state P(H′|E) divided by the probability of the old state P(H|E), the normalizing constant cancels and so is unnecessary.
A long sequence of MCMC states can be treated as samples from the probability distribution, but this tractability comes with a cost: samples are no longer independent of one another. This means that many MCMC samples may only be as informative as a small number of independent samples (Jones & Qin, 2022; Neal, 1993).
For complex problems, more sophisticated versions of MCMC are used. Gibbs sampling is an MCMC method that works for multivariate probability distributions that can be analytically separated. Hybrid Monte Carlo, employed in the software package Stan (Carpenter et al., 2017), is a method that utilizes the gradients (i.e., slopes) of continuous probability distributions.
MCMC is not always the best approximation. If data are observed sequentially, then MCMC is inefficient, as it needs to start afresh after each new observation. For this situation, sequential Monte Carlo methods such as particle filters are often used (Doucet et al., 2001). Other approaches abandon sampling altogether and instead simplify the probability distribution itself. These variational approximations usually are fast and deterministic but are biased by these simplifications (Jordan et al., 1999).
In addition to its use as a data analysis tool, in cognitive science, MCMC has been used as a method to elicit beliefs. Bayesian models of cognition assume that people act as if they employ probability distributions in their minds but it is challenging to elicit these distributions (Mikkola et al., in press). Using a formal link between the acceptance probability used by MCMC and the probabilities with which people make decisions, MCMC with People has participants make decisions between pairs of stimuli (Sanborn et al., 2010). For example, to determine the distribution of faces associated with the adjective “happy,” participants are asked which of a pair of faces is happier. A long sequence of chosen options can then be treated as samples from that person’s belief distribution of what happy faces look like. There are several variants of this method (Harrison et al., 2020; Hsu et al., 2019).
MCMC has been used to model how the brain generates the hypotheses that drive behavior [see Bayesian Models of Cognition]. It is more psychologically plausible than exact Bayesian inference, as there is no need to simultaneously represent all possible hypotheses—instead only one or a few hypotheses are represented at any one time (Hoyer & Hyvärinen, 2002). For example, when interpreting a facial expression, the brain might use MCMC to sample “happy” alone, instead of representing the entire probability distribution of adjectives.
Behaviorally, MCMC has been used to explain a wide range of biases. It helps to explain why decisions are probabilistic rather than deterministic—if peoples’ decisions are based on a small number of samples, the most prevalent response in the samples will not always be consistent (Vul et al., 2014). In studies of perception, MCMC has been used to explain bistable perception: why people switch between interpretations of stimuli, such as the face/vase illusion or a Necker cube, that have more than one distinct interpretation. These MCMC models have also been connected to underlying neural representations (Fiser et al., 2010; Hoyer & Hyvärinen, 2002; Moreno-Bote et al., 2011).
In studies of cognition, MCMC has produced framing effects in probability judgments by assuming task demands influence where MCMC starts sampling and similarly produces anchoring effects in estimates (Dasgupta et al., 2017; Lieder et al., 2018). In investigations of causal reasoning [see Causal Reasoning], it has been used to explain the trial-by-trial correlations in judgments and the initial judgment biases (Bramley et al., 2017; Davis & Rehder, 2020). MCMC is a general approximation and so can unify explanations of different judgments, including probability judgments, decisions, and response times (Zhu et al., 2024).
Van Ravenzwaaij, D., Cassey, P., & Brown, S. D. (2018). A simple introduction to Markov chain Monte–Carlo sampling. Psychonomic Bulletin & Review, 25(1), 143–154. https://doi.org/10.3758/s13423-016-1015-8
Mikkola, P., Martin, O. A., Chandramouli, S., Hartmann, M., Pla, O. A., Thomas, O., Pesonen, H., Corander, J., Vehtari, A., Kaski, S., Bürkner, P. C., Klami, A. (in press). Prior knowledge elicitation: The past, present, and future. Bayesian Analysis. https://doi.org/10.1214/23-BA1381
Sanborn, A. N., & Chater, N. (2016). Bayesian brains without probabilities. Trends in Cognitive Sciences, 20(12), 883–893. https://doi.org/10.1016/j.tics.2016.10.003