When people use language together in informal social interactions, whether spoken or signed, researchers often describe it as conversation. While having a conversation can be a social activity in its own right, the term also applies to interactive language that accompanies everyday practical activities (e.g., preparing or eating a meal together). A continuum exists from naturally occurring informal conversation, in which participants organize the interaction themselves (deciding, e.g., who takes part, what is said and done, and how they position themselves in space), to more formal or controlled situations, in which features of an interaction are determined in advance (specifying, e.g., the number of participants or what they should say or do). The fewer constraints the participants are under, the more conversational their interaction will be. Conversation exhibits stable features that appear in languages and cultures around the world, yet it also exhibits variation along a number of key dimensions.
The study of conversation as a domain of scientific inquiry began with the emergence of conversation analysis in sociology in the 1960s and 70s, a time when audio recording equipment first became widely available. Conversation analysts recognized that recorded conversations could provide an empirical foundation for the study of social order and developed conversation analysis into a unique naturalistic observational, qualitative method (Robinson et al., 2024). The most influential research in conversation analysis has focused on the most ubiquitous and elemental forms of social conduct in conversation, such as turn-taking and repair. Language per se was not the primary object of study, but in the 1990s, linguists from a discourse-functional tradition, which prioritizes empirical descriptions of language use over abstract theorization, adopted conversation analysis as a method for the study of linguistic structure, developing what is known today as interactional linguistics (Fox et al., 2013). The widespread use of recorded telephone calls led to a perception of a bias towards so-called talk-in-interaction—that is, verbal and vocal conduct that occurs in social interaction—yet early research in the field also examined gesture and other visible bodily actions. However, it was not until the early 2000s that the field underwent an embodied turn and began to adopt a fully multimodal view of social interaction (Mondada, 2016). The most recent developments in the field have seen models developed by conversation analysts being used by psycholinguists and cognitive scientists to investigate the cognitive processes that underlie multimodal language and interaction (Holler & Levinson, 2019) [see Psycholinguistics].
The organization of conversation exhibits several key features that are relatively context free and invariant. No matter where the interaction takes place, who the participants are to each other, or what they do together, one finds these features. In contrast to linguistic structures that show variation at all levels across the world’s languages (Evans & Levinson, 2009), the organization of conversation is remarkably stable, leading some to conclude that features of it are cross-culturally universal (see, e.g., Levinson, 2006; Stivers et al., 2009; Kendrick et al., 2020). Yet at the same time, we also observe variation along several key dimensions that change with the context-specific particulars of an interaction (e.g., the number of participants and the distribution of knowledge and expertise among them).
Opportunities to participate in conversation are organized by a system of turn-taking (Sacks et al., 1974). Turns are constructed out of units that facilitate prediction and have recognizable completions at which a transition between speakers becomes relevant (Kendrick et al., 2023). The turn-taking system also accounts for the timing of gaps and overlaps (Levinson & Torreira, 2015), in addition to the selection of next speakers. Each turn-constructional unit performs a social action (offering, requesting, agreeing, disagreeing, and so on), which involves practices of action formation and ascription (Levinson, 2013a). Social actions do not occur in isolation but rather cohere to form sequences (Schegloff, 2007), such as question/answer adjacency pairs, which can be recursively expanded to produce complex interactional structures (Levinson, 2013b). The selection between alternative actions (e.g., agreeing vs. disagreeing or complimenting oneself vs. complimenting another) involves normative principles of social conduct, which are described as preferences (e.g., Pomerantz & Heritage, 2013). Generally speaking, socially affiliative actions are preferred and are performed quickly and straightforwardly (e.g., confirming a polar question), whereas socially disaffiliative actions are dispreferred and often involve delay, qualification, or justification (e.g., not answering a question). The organization of conversation rests on a presumption of intersubjectivity and mutual understanding, which is enabled by practices of grounding (i.e., interlocutors signaling that a conversational contribution has been understood; Clark & Brennan, 1991) and repair (Albert & de Ruiter, 2018). Repair concerns the management of troubles in producing, perceiving, and understanding turns at talk and includes both self-initiated repair (e.g., correcting an error on one’s own turn) and other-initiated repair (e.g., identifying a source of trouble in another’s turn).
While cross-cultural research has shown remarkable stability in many key features of conversation, even across spoken and signed languages (e.g., Iwasaki et al., 2022), the language and culture of the participants can influence how (or even whether) they take turns at talk and which social actions are performed (e.g., Blythe et al., 2018). So, too, can the mutual visibility of conversational participants; being mutually visible allows participants to use not only words to communicate but also utilize a myriad of visual signals, such as hand gestures, facial expressions, head movements, torso movements, and eye gaze. Visual signals contribute critical semantic and pragmatic meaning, and consequently, turns without visual signals may differ in fundamental respects (e.g., turn timing or transition relevance; Holler et al., 2018; Kendrick et al., 2023). The number of participants in a conversation also varies and affects practices of turn-taking (Egbert, 1997; Holler et al., 2021).
The distribution of knowledge affects conversation as well, which research has investigated in terms of the common ground (i.e., knowledge mutually shared between interlocutors; Clark, 1996) and epistemics (i.e., knowledge differentially distributed between participants; Heritage, 2012). As an interaction begins to move away from conversation per se, further dimensions of variation become relevant. A conversation can be relatively talk or task oriented. The former refers to conversations in which talk as such is the primary involvement of the participants and other parallel activities are secondary and not the focus of the talk; the latter may involve practical activities such as assembling furniture, playing a game together, or tasting and purchasing food, in which what the participants do with their words and actions centers on the task at hand (e.g., Mondada, 2022).
Finally, whether a conversational interaction is ordinary or institutional (e.g., an informal conversation vs. talking with a doctor, a student, or perhaps even a researcher) can have profound effects (see Toerien, in press; Heritage & Clayman, 2011 ), including affecting key features of conversation such as turn-taking and repair. Participants orient to relevant institutional identities by performing or withholding certain actions (e.g., interviewers ask questions while interviewees answer them), and the more constraints the participants are under, the less conversational the interaction will be.
Conversation analysis has provided invaluable insights into the structural organization of conversation, but language production and comprehension has been studied largely outside this environment, with the typical psycholinguistic paradigm involving a single participant speaking or listening in social isolation (Kuhlen & Abdel Rahman, 2023) [see Language Production]. Some advances have been made in studying psycholinguistic processes in combination with conversational principles and phenomena, for example, centering around the debate about how early next speakers begin to plan their turn, or investigations into how early social actions are ascribed when dialogues are observed (Bögels & Levinson, 2017). One major challenge the field faces is developing experimental paradigms suitable for manipulating and measuring both behavior and cognitive processes that relate to the key features and dimensions characterizing conversation in a manner that retains as much as possible of the spontaneous, naturalistic nature of conversation. Amongst others, such paradigms may rest on virtual reality to capture the multimodal, face-to-face nature of human communication [see Virtual Reality] and neuroimaging methods (such as dual-electroencephalogram or –functional magnetic resonance imaging) to measure multiple brains during interaction.
The human experience of social interaction is undergoing a major change due to the rapid rise of digital technologies, and conversations with virtual agents are likely to become the norm in the decades to come. Likewise, regarding the abovementioned novel paradigms that are and will need to be developed to study cognitive processing in situated conversational settings, conversing with virtual humans in increasingly immersive settings will play a prominent role. Creating multimodal conversational behaviors that appear and feel realistic therefore seems appealing and important if the aim is to create interactive experimental paradigms that employ virtual humans. At the same time, the field must grapple with the ethical and moral implications this can give rise to, especially if the extent of the realism affects processes that are core to human interaction such as intersubjectivity, trust, and interpersonal affiliations.
Clift, R. (2016). Conversation analysis. Cambridge University Press.
Levinson, S. C. (2016). Turn-taking in human communication – Origins and implications for language processing. Trends in Cognitive Sciences, 20(1), 6–14. https://doi.org/10.1016/j.tics.2015.10.010
Schegloff, E. A. (2007). Sequence organization in interaction: A primer in conversation analysis. Cambridge University Press.