Spatial cognition is used in cognitive science, as well as in other allied disciplines, to denote thinking about spatial content, such as positions in an environment or shapes of objects. The term additionally covers a wide variety of heterogeneous mental activities, given that the world is inherently spatial. A useful distinction is between two kinds of spatial thinking: navigation-relevant and object-focused cognition. Each requires representation, either of the environment or of an object, as well as mental transformation of the information represented. Navigation involves finding the way around the world to gather food and drink, find social partners, avoid dangers, and return home and is prerequisite to survival for all mobile organisms. To find the way requires the representation of the relative locations of environmental features (allocentric framework) and self-motion (inertial navigation) internally and within that framework. Object-focused cognition involves representing the shape and structure of objects and anticipating their appearance after physical or mental transformation, such as rotation, cutting, or folding. Object-focused thinking is closely related to tool invention and tool use and is likely more specific to humans than is navigation, although tool use has also been documented in primates and corvids.
Humans have long observed the movements of the stars and planets, watched the migratory habits of birds, built dwellings, and fashioned tools. Only lately has empirical science considered how humans do these things. When the science of mental life began midway through the 19th century, with laboratories headed by scientists such as Wilhelm Wundt and Karl Ebbinghaus, it aimed to uncover basic laws of perception and memory, with little interest in spatial thinking or in individual variation. The study of individual differences was left to figures such as Francis Galton, who examined variations in sensory processing, and Alfred Binet, who produced the first version of the intelligence test.
Paper-and-pencil tests, largely of object-focused spatial thinking, became an important focus for psychologists building assessments of talent. Over the years, psychometricians created hundreds of tests of how quickly and accurately people could perform a wide array of spatial activities, e.g., judge true horizontal and vertical, extract a target figure from a complex drawing, judge which direction a boat would be facing after a specified transformation, or decide what a piece of paper would look like after being folded, having holes punched through the folds, and then being unfolded. Datasets of this kind were repeatedly factor analyzed, i.e., investigators examined how variables were intercorrelated, looking for clusters of abilities. However, after a century of this kind of activity, Hegarty and Waller (2005) argued that the enterprise has not borne the hoped-for fruit, namely a characterization and categorization of spatial skills, at least in part because the tests were constructed without any guiding theory of spatial thinking.
In the early 20th century, in parallel with this work by psychometricians, various kinds of behaviorism came to dominate experimental psychology, at least in the United States, and the study of mental life languished. This situation changed radically in the middle of the century, with discoveries from Harlow and Lorenz showing that traditional rewards such as food are not necessary to change behavior, arguments by Noam Chomsky showing that behaviorism could ever explain language acquisition, and more. One important turning point in the cognitive revolution involved navigation, occurring when Edward Tolman (1948) proposed the idea of the cognitive map based on his study of rats trained to go straight ahead and then turn to get a food reward. When this route was blocked but various other exits were opened, arranged in a sunburst fashion around the start box, the rats chose the arms leading relatively directly to the reward, showing they had encoded the location in the wider world, without reward, not just memorized a series of turns that led to the goal box.
Twenty years or so later, looking at encoding and transformation of object structure, Shepard and Metzler (1971) demonstrated that people mentally move their representations in a way that tracks the amount of time it might take to move them physically. The most famous task involved the mental rotation of block structures, but similar phenomena appeared for tasks such as paper folding. Steven Kosslyn continued the theme of mental transformation that mimicked perception in his series of studies on visual imagery and documented that it takes time proportional to size and distance to scan mental images (e.g., Kosslyn, 1975). Much later, however, it became clear that visual imagery and spatial imagery are distinct, the former being static and detail-oriented and the latter more dynamic (Chabris et al., 2017).
Behaviorism never took hold in Europe to the same extent it had in the United States. Even before the cognitive revolution, Jean Piaget had been working in Switzerland on an approach to cognitive development that involved postulating internal mental structures. His work included investigations of mental imagery, including tasks involving horizontality-verticality and spatial perspective-taking (Piaget & Inhelder, 1967, 1971). (Of course, psychometricians also had tasks of these kinds, but the literature did not make contact at first.) Piaget was not interested in how adults vary, but he rather emphasized the slow development of children into adult thinkers over the first decade of life. Later investigations began to question Piaget’s characterization, however, and investigators suggested that dynamic spatial thinking can occur even in infancy (e.g., Moore & Johnson, 2008). The nature of what infants can do, how it differs from adult capabilities, and how development occurs over years is an area of active current investigation (Frick et al., 2014).
The study of the neural underpinnings of spatial thought began during the 1970s and accelerated over the next decades. In the study of navigation, the appearance in 1978 of John O’Keefe and Lynn Nadel’s book The Hippocampus as Cognitive Map situated the then-recent discovery of place cells in the much wider theoretical canvas inspired by Tolman (O’Keefe & Nadel, 1978). More spatially specific cells were soon discovered, including head-direction cells (Muller et al., 1996) and grid cells (Hafting et al., 2005). The award of a Nobel Prize to John O’Keefe and Edvard and Mae-Britt Moser in 2014 drew world-wide attention to this kind of research. Of course, techniques that identify such cells cannot be used with humans, but the technology to image the workings of the human brain at a coarser level through the use of magnetic resonance imaging has matured during the past two decades and has shed light on the neural networks supporting navigation (e.g., Epstein et al., 2017) and the areas supporting object-focused processing (Ayzenberg & Behrmann, 2022; Zacks, 2008).
Navigation-relevant cognition is the thinking necessary for finding the way in the world.
Allocentric frameworks are the landmarks that provide external reference points for encoding location.
Inertial navigation involves tracking where one is by encoding distance and direction from bodily cues.
Object-focused cognition is the representation of the shape and structure of objects and the ability to manipulate and transform these representations in a variety of ways, such as rotating or folding.
Cognitive maps are representations of an array of objects in an environment using a common framework and allowing for charting detours and shortcuts.
Mental rotation is imagining turning an object in three-dimensional space.
Visual imagery is imagining something that has been perceived, in its absence.
Spatial imagery is imagining objects moving or oneself moving in an environment.
Cognitive graphs are representations of an array of objects in an environment in which local relations are represented as distances and directions if experienced, but there is no common framework allowing for inferences.
Embodied cognition is the theory that thinking does not involve representations but instead involves interior perception-action linkages.
There are several areas of controversy in spatial cognition, all of which are also central issues in cognitive science more generally.
One of the key issues in cognitive science is the nature of mental representation. Research on spatial cognition has contributed to debate on this front by focusing on several specific controversies relevant to the wider issue. Regarding navigation, there has been disagreement concerning whether Tolman and others were correct in using the term cognitive map. There is a long history of suggestions that the knowledge that guides navigation is much more fragmentary, both for nonhuman animals and for humans (e.g., Shettleworth, 2009; Wehner & Menzel, 1990). One prominent alternative is the idea that there is simply a cognitive graph, which contains local information about angles and distances but does not utilize a common framework that would allow for inference (Warren et al., 2017). The debate continues, although a recent review and synthesis suggests that whether there is evidence of a map depends on the nature of the environment to be encoded and the characteristics of the observer (Peer et al., 2021). The latter suggestion brings the field of individual differences into play in an arena that has been dominated by the assumption that there is a canonical form of encoding. That is, some people are better than others at forming cognitive maps, and some environments are easier to encode in this way than others.
In the domain of object-focused cognition, initial research on mental rotations and paper folding indicated a second-order isomorphism to physical action, i.e., an analogue representation without a role for analytic and verbal processes. Not everyone agreed with this claim, especially once investigators started to examine the considerable individual differences in the speed and accuracy of rotation and uncovered verbal analytic as well as analogue ways to solve the problem; using both strategically may be adaptive (Nazareth et al., 2019a). Similarly, there have been claims that visual imagery is an epiphenomenon, and all mental coding is propositional (Pylyshyn, 1973). Behavioral studies never seemed to resolve this issue, but there is now increasing neural evidence that brain areas used for vision are engaged during imagery (Farah, 1988) and that there is a two-way flow of signals between brain areas that award meaning and retrieve memories and those that conduct initial processing (Steel et al., 2024).
As in other domains of thinking, such as color perception, there are proposals that the nature of the language used to describe spatial relations influences how they are represented. For instance, people might encode locations with respect to distant landmarks when their languages require indicating directions using slope or compass coordinates, whereas people speaking languages that use terms such as “left” and “right” might encode spatial relations internal to a display. The key issue is how strong these influences are. Characterization of the constraints imposed by language have ranged from being described as virtually determinative (Levinson, 2003), to shifting and optional (Gleitman & Papafragou, 2013). Investigations of these issues are part of the wider discussion of the relations of language and thought, which has also concerned domains such as color (Regier & Kay, 2009) and odor (Majid et al., 2018).
Another key issue in cognitive science concerns the competing perspectives on development that come from nativist and empiricist perspectives. Research on both kinds of spatial cognition has provided fertile ground for working through the possibilities. In navigation, debate has centered on the issue of whether there is an encapsulated geometric module, as proposed by some scholars (e.g., Gallistel, 1990; Spelke, 2022). This proposal requires the use of linguistic coding to account for adult flexibility (an interesting variant on the idea that language strongly structures thinking). The alternative perspective is that infants and toddlers come equipped to notice both continuous boundaries and distal landmarks as they encode their environment, with development consisting in developing an adaptive combination of cues given environmental experience (Newcombe, 2024; Xu et al., 2017). In object-focused spatial cognition, possibilities about development range from early competence concerning concepts such as solidity, gravity, and object permanence as inferred from looking at time data (Spelke et al., 1992) to Piaget’s idea of slow change over a decade of life. There is now a solid database to suggest that competence begins in infancy but is far from mature, building in a cascade that depends on factors such as motor development and interactive experience (Oakes & Rakison, 2019). However, none of these views have yet grappled adequately with the question of why adults show such wide variation in many but not all abilities. We all understand solidity, but many of us struggle with mental rotation.
Considering the role of sensory and motor experience in development brings up the issue of the role of embodiment in adult cognition [see also Proxemics]. In the extreme, there have been suggestions that perception-action loops can account for cognition and that all cognition is necessarily embodied (Barsalou, 2020; Thelen & Smith, 1994). These models deny that there is a need to postulate representation (of any kind) at all. An intermediate position emphasizes the role of perception and action in building knowledge, both in infancy and later, as we learn new skills. However, there may be a continuum in each learning sequence from body-based to symbolic representations. This perspective opens the way for investigating the role of gesture in this abstraction process (Goldin-Meadow, 2015) [see Gesture] as well as for examining the role of spatial representations created by learners and teachers, such as the use of sketches, diagrams, and graphs. These symbolic schemes capture and communicate knowledge and provide a platform on which it can be reflectively refined.
Spatial cognition research has many potential applications, e.g., robotics, city planning, better signage design for malls and airports, or creation of more user-friendly instructions for furniture assembly at home. It is also a focus of research on cognitive aging, because navigation deficits are a prominent and troubling aspect of Alzheimer’s disease (Lester et al., 2017). An especially prominent connection is to support education in science, technology, engineering, and mathematics (STEM) across ages and levels of expertise. However, other subjects also benefit from spatial analysis, e.g., history (White, 2010). The links to STEM are strongest for object-focused spatial skills, where the literature has shown not only concurrent correlations, but also longitudinal correlations with appropriate control variables at various ages from preschool through university (e.g., Frick, 2019; Verdine et al., 2014; Wai et al., 2009). Because these skills are malleable (Uttal et al., 2013), there has been increasing interest in testing whether interventions to raise spatial skills will translate into learning gains. This literature is the largest for early math learning, with enough data to support a meta-analysis that showed small but significant effects (Hawes et al., 2022).
It is unclear whether certain types of spatial skills are more important than others, as well as whether effects vary across specific disciplines and subdisciplines. In mathematics, although it is natural to suppose that areas such as geometry are more spatial, expert mathematicians engage spatial areas of the brain when reasoning across a wide array of mathematical problems (Amalric & Dehaene, 2019). In science, there is some evidence that spatial skills may vary across disciplines. For instance, although geoscientists and chemists both excel in mental rotation compared to English professors, geoscientists also showed high scores on a test of reassembling a stimulus that had been cracked and the pieces slid apart, as can happen in Earth’s history, while chemistry professors performed no better than English professors (Resnick & Shipley, 2013). Although navigation skills have been less-studied in this area than object-focused skills, geoscientists perform better in learning a novel virtual environment than psychology professors (Nazareth et al., 2019b). Sorting out which skills relate to which kinds of STEM learning and reasoning would be easier if there was clearer specification of the underlying mechanisms. For mathematics, several possibilities have been suggested, including shared automatic processing, perhaps linked to shared neural substrates, spatialized representations of quantity such as the number line, and strategic recruitment of spatial thinking (Mix, 2019; Hawes & Ansari, 2020).
Ekstrom, A. D., Spiers, H. J., Bohbot, V. D., & Rosenbaum, R. S. (2018). Human spatial navigation. Princeton University Press. https://doi.org/10.2307/j.ctvc773wg
Lester, A. W., Moffat, S. D., Wiener, J. M., Barnes, C. A., & Wolbers, T. (2017). The aging navigational system. Neuron, 95(5), 1019–1035. https://doi.org/10.1016/j.neuron.2017.06.037
Newcombe, N. S. (2018). Three kinds of spatial cognition. In J. T. Wixted (Ed.), Stevens' handbook of experimental psychology and cognitive neuroscience (Vol. 3, Chapter 15). John Wiley & Sons. https://doi.org/10.1002/9781119170174.epcn315
Shettleworth, S. J. (2009). Cognition, evolution, and behavior (2nd ed.). Oxford University Press. https://doi.org/10.1093/oso/9780195319842.001.0001