Two bots are better than one?

Good and successful interactions with chatbots and voice-operated devices rely on more than just a mastery of language. There’s more to typed and spoken interactions than just accurate use of language. Other features of speech and interaction come into play as well. These are already codified in some form (e.g. amazon alexa’s intents) or another. Features such as commands, requests, confirmations, cancellations, and so on.

In everyday social interactions we also have “bracketing” moves such as greetings and farewells. These inform participants as to the intensity expected of an interaction. Will it be a short interaction, a passing hello, a longer inquiry, an open-ended conversation? We use linguistic expressions, face and physical gesture, and an understanding of context to negotiate expectations for all manner of social interactions. And in doing so, inform others as to what to expect from an interaction, how to behave, what level of intensity to show, and so on. (See Erving Goffman’s work on framing for examples — the field of symbolic interaction is rich.)

Interaction with artificial agents is obviously always going to be constrained in this regard. Bots — whether chat or voice-based — can’t respond to the meta level of human communication, or the relational intent of communication and interaction. They neither understand such intents (all of which can be based on a fundamentally mutual understanding of what’s going on in the interaction) nor have the consciousness to experience them. And to the degree that they may achieve some degree of realism in their performance, might only risk confusing matters (as the more realistic they are, the more we will expect them to be truly conversational, thus increasing the potential for failed interactions from a functional perspective).

This raises the question of whether there might be non-linguistic means to improve the effectiveness of artificially mediated interactions.

There is a basic rule in the sociology of interaction that first and foremost participants in an interaction must understand what’s going on in order to know how to proceed. And precisely this kind of assessment is what happens when we first interact with conversational agents: we establish what the agent is about, for, capable of, and so on.

Normally two people can do this effortlessly and without conscious reflection in the due course of conversation. But in the case of artificial agents, the estasblishment of basic ground rules for interaction is fraught with risk. It’s the conversational space in which the greatest amount of failure can occur, resulting in abandonment of the interaction, improper, inaccurate, incomplete, or unintended submission of data, misunderstandings of actions and outcomes, and so on.

We will get better at interacting with these agents over time, as we become accustomed to their modes of speech / chat and as we learn how to effectively and efficiently communicate with them (yes, a structured meta language might be a good idea for codification of commands, requests, replies, etc).

But there might be another option. What if two bots were used instead of one? We seem to have made the assumption that interactions with chatbots and voice UIs will be one to one interactions. But as we know from sociology, the context of an interaction (what it’s about) is supplied by participants. So what if we were to use two (or more?) bots / agents instead of one? What if bots could converse with each other, thus providing context about the interaction on offer: how to talk, vernacular, preferred terms, appropriate responses, etc?

What if, instead of deadspace and repeated questions, a single bot could call in a second bot, and we could learn from their interaction with each other what we can do with them and how to talk and behave? It works in real life, so why not in bot life? Bots could be used to set the “rules of the game” so-to-speak, thus informing us of how to interact with them (and for what purposes).

We tend (as designers) to think in terms of one to one interactions. But social context comes from interactions of three or more. Perhaps adding a second conversational agent to mediated interactions would provide us with a means to deliver guidance, context, conversational constraints and boundaries, error handling, and more.

Might be worth some experimentation.

