Beyond Zoom Fatigue: Ritual and Resilience in Remote Meetings

Share Share Share Share Share
[s2If !is_user_logged_in()] [/s2If] [s2If is_user_logged_in()]

DESIGNING FOR SOCIAL AGENCY

As mentioned above, our research in Phases I and II provided inspiration for technology concepts that we then tested in Phase III of our research. Both of the concepts we introduce in this section feature artificial intelligence – specifically, machine learning technologies, reflecting the research focus of colleagues in the lab where we work. Despite the specificity of our focus, this phase of work was helpful in clarifying an important design insight that we believe applies beyond artificial intelligence: optimizing for what we call “social agency.”

We begin with a technology inspired by Tom’s example – the asymmetrical desire for visibility of audience in large meetings. Our extended team proposed to solve this problem by providing an intermediate layer between speakers and audiences. Instead of requiring individuals to share their audio and video feeds directly with colleagues, we proposed that they share such feeds only with an intelligent agent that could detect feedback signals (head nods, expressions of puzzlement, hand raises, or other routine expressions), anonymize and aggregate them, and report them to speakers as a stylized form of feedback that can be easily interpreted. This intervention, we thought, might be less invasive than being constantly on camera, yet less effort than manually using the “emoji” buttons [cf. 1] that became more common in meeting apps during COVID.

Similar ideas have been proposed elsewhere. Murali, et al (2021), for instance, describe such a system, which the authors developed into a functioning prototype and tested with users, reporting favorable reviews. We note, however, that such reviews come only from those acting as speakers or presenters in meetings, not from those audience members whose feedback was gathered. We believe this is a significant gap. Our own tests involved assessments from participants (n=17) presented with concept storyboards in two separate studies. One of those studies is documented in (Aslan et al, 2022). In both we found a clear asymmetry in the desirability of this idea, unsurprisingly matching the asymmetry in roles and performances associated with speakers versus those for their audience. While speakers/presenters may see the value in receiving feedback, people commenting form the point of view of audience members unanimously rejected it. Here is a sampling of feedback.

“Engagement feedback feels like big brother is watching—not a fan at all.” (P3-15)

“It seems creepy. If I could have full control over it, I might be fine, but then it is extra work.” (P3-13)

“I would be worried about accuracy of my feedback. I am also concerned about privacy of data.” (P3-14)

“I personally do not like it because I multitask during meetings. My reactions could be towards something else. I have concerns around privacy and security as well.” (P3-17)

In these comments we see two closely interconnected critiques. First, notions of “creepiness” and privacy were part of every critique, suggesting a discomfort with having an intelligent agent monitoring one’s reactions. Superficially, there are obvious hints at discomfort with surveillance, including potential loss of control over who gets to view one’s reactions. At a deeper level, the “creepy” reaction is the tacit recognition of what we have explicitly noted above – that even the signaling of feedback is a form of performance. One subject was explicit in this recognition:

“I’d always have to have the camera on so the system will see the gestures I am doing – kind of performing on the camera, which could be distracting or silly.” (P3-16)

This recognition of feedback signaling as a form of social performance stands in contrast to a view, underlying much work in deep learning, that human emotional expression is an objectively verifiable “detection” problem (cf. Goodfellow et al, 2015; Kunstler et al, 2021). The most immediate objection is the assumption that there would inevitably be inaccuracies, and these would result in extra work of monitoring the agent and correcting its output. As a consequence, the majority thus preferred the more direct manual labor of selecting their own expression from “emoticon” buttons. At heart, here, is a matter of social agency.

“Autonomy and agency are … top concern[s], so I would be more interested in manual… it feels invasive otherwise. It starts to feel unethical if you don’t have someone’s explicit consent to see reactions versus the active consent of clicking a button” (P3-5)

Participants maintained this insistence on social agency even in hypothetical situations where their feedback cues might be anonymized and aggregated. As one put it: “I have a fundamental mistrust of the ability of the system to understand nuance.” Social signaling, particularly in the ritually charged context of a meeting, is a job for humans. As Goodwin (2000: 1491), explains: The production of social action is “a contingent achievement of relevant intersubjectivity,” which “requires that not only the party producing an action, but also that others present, such as its addressee, be able to systematically recognize the shape and character of what is occurring.”

It’s not merely a matter that human interpretation, with its richer sense of context, is likely to be superior than machine intelligence at making a situationally correct interpretation. Human interpretation is also essential for participants to create a shared basis for subsequent social actions:

Without this it would be impossible for separate parties to recognize in common not only what is happening at the moment, but more crucially, what range of events are being projected as relevant nexts, such that an addressee can build not just another independent action, but instead a relevant coordinated next move to what someone else has just done (Goodwin, 2000:1496).

Ambiguity and interpretive flexibility invite further action and engagement, whether that is affirmation, repair or other means of both ensuring the robustness of the interaction and a shared sense of meaning. We undermine this process when we introduce technology into the middle of it in a way that replaces such human agency with a set of pre-trained models. Suchman (1993) anticipates the current argument in a much older study, and recognizes in such efforts the attempt to reduce the messiness of real-world social action to something more “disciplined” and governable. We argue that our participants’ rejection of an automated feedback detector reflects this understanding, and testifies to the importance they place on retaining their own agency in the midst of ongoing social production.

Designing for social agency means providing meeting-goers with the tools to optimize their ability to engage in this messy, contingent achievement of intersubjectivity, in ways that suit both the situation and their own sense of personal or relational risk. Sometimes, when risks seem high or the benefits of engagement seem low, attendees should have the option of being both present and invisible, with simple tools for their own intentional expression.

Increasing social agency by enhancing embodied presence

This is not to say that AI has no place in meetings, or even in supporting the kinds of verbal and non-verbal performance that meetings entail. Rather, designing for social agency demands a more careful understanding of the type of problems that AI technologies might solve in the context of meetings, as well as the types of situations where such solutions might best apply. For a very different take on social agency, we turn to issues raised in Tabitha’s example. As noted, Tabitha’s performance of transparency and trust depended on shared access to given artifacts (spreadsheets) along with the desire for mutual visibility of participants, a situation much different than Tom’s. While shared applications are common in remote meetings, rich mutual visibility is less so, particularly when shared applications are in use.

Figure 1: Mixed reality collaboration prototype showing a user superimposed with a shared application, using machine learning technology to provide body positioning and gestural controls

Figure 1 offers a visual introduction to a step our organization has taken in that direction – a prototype that features mixed reality combination of both participants and shared workspaces in a meeting.4 As the image hopefully suggests, this technology superimposes in one scene both a meeting participant and a shared digital work surface, thus enabling a collaborator to see both the colleague and their actions in a shared workspace. Through the use of body positioning, gaze, indexical gestures, or even specific actions in the work space, colleagues can both detect and direct each other’s attention, make their intentions clear, or better coordinate joint action, what Goodwin (2000) calls “embodied participation frameworks.” A key first step in this regard may be the deceptively simple step of providing remote meeting attendees some analogy to the positioning of bodies in physical space, a capability we are undertaking both in the visual and auditory channels that, so far, seems quite promising. Though we are unable to go into detail in this paper, simply providing spatialized audio may enhance the creation of meeting “informality” by permitting more overlapping speech, a familiar feature in face to face interactions (Schegloff, 2000), or easier engagement in verbal play (Sherzer and Webster, 2015). In the case of sharing visual representations, as is somewhat evident in the figure above, the position of one’s body relative the shared workspace provides information about the user’s attention and potential next actions. We note that this prototype represents more than a simple superimposition of images. Machine learning algorithms are essential for its successful functioning. Skeletal tracking enables a mapping of motions or gestures to particular actions in the user interface, and to support the appropriate placement, alignment and sizing of the representation of the body, Note that this represents a very different use of deep learning than in the case of detecting audience feedback. In this case, machine learning algorithms provide a scaffolding or substrate for action, to support richer expressive potential to meeting participants – enhancing their social agency, rather than attempting to mediate social signals directly. By expanding the expressive repertoire, rather than designing to infer social signals directly, we believe we can enhance both productivity and provide users with resources for their own processes of ritualization.

CONCLUSION

These are but two examples from a range of activities within one ongoing research effort, which, as mentioned, is mostly focused on applications of artificial intelligence in remote collaboration. Our efforts thus represent only one small corner of what we believe is a much larger space of opportunities made possible by explicitly recognizing the social and ritual dimension of meetings. Moreover, by thinking about meetings as rituals we can ask how, and in what situations, technologies might undermine social agency and introduce risks, or conversely enhance participants’ sense of agency. It’s not that these considerations lead to simple and straightforward design directions. While it was relatively easy for us to distinguish between meeting types on the basis of relatively static parameters (e.g.,. meeting size, familiarity of participants), we are still pondering how to enable teams to fluidly transition among different social framings in situ.

Attention to the ritual dimension of meetings may be beneficial beyond technology design. First and most simply, current discussions of the future of work that focus too heavily on the simple binary distinction between “home” and “office” might do well to consider the ways that different types of meetings entail different roles and modes of participation that may be more or less appropriate for remote or copresent meetings. In social science research more generally, seeing the process of ritualization in meetings might be useful for connecting detailed attention to the ways ritualization in meetings connects with broader questions of social scientific interest, including the complex relationship among technologies, professional identity formation and institutions (Orlikowski and Barley, 2001) or issues of diversity, equity and inclusion. Considerable evidence has shown, for instance, that women bore a much heavier burden balancing home and work tasks during the early days of the pandemic, and that individuals from communities with limited technical access were seriously disadvantaged during the period of social distancing (Parker et al, 2022). The effects are still being felt, and have affected professional relationships and career trajectories. How might the effects of other, more subtle differences, such as preferences or toleration of latency in turn-taking, the use of gaze, or other factors affected by technology and contributing to ritualization create disadvantages for certain attendees?

Conversely, by looking closely at the relationship between technology and meetings-as-rituals, we might ask what new kinds of rituals, identities or relationships we might facilitate. How might novel ritualization practices disrupt traditional forms of disadvantage, subjugation or denigration of the work of certain people? How might designing for social agency provide workers with new ways of imagining work, or challenge prevailing ideas about what it means to be a professional (cf. Balka and Wagner, 2021)? What new social realities might we enable workers to create? More prosaically, how might we make meetings just a little less painful and exhausting? We are not yet done either with COVID or the changes it has wrought, there are still many questions to ask and hopefully more possibilities to imagine. Hopefully the lens we have introduced in this paper helps contribute to that endeavor.

NOTES

1. This paper is dedicated in loving memory to our friend and colleague Suzanne Thomas, without whom this project would never have been completed. She led the early phases of research and analysis, and was first to note the distinctions underlying this paper. The authors would also like to kindly acknowledge Liubava Shatokhina for her thoughtful feedback on earlier drafts. Any errors or inaccuracies are the responsibility of the remaining authors.

2. We were not prescriptive about the definition of “meeting,” recognizing that formal definitions of what counts as a meeting have met with difficulty (Sandler and Thedvall, 2017), and is more likely matter of family resemblance (Wittgenstein, 1953), that is, sharing no set of essential features, but rather displaying a set of overlapping similarities: attendance by multiple participants, embeddedness within a professional or bureaucratic setting, and a sense of instrumental or organizational purpose. Participants often explicitly described or named meetings in terms of their purpose or their attendees (e.g., “sales meetings”, “client update meetings”, “committee meetings,” etc.)

3. This name, as with all others used in this paper, is a pseudonym

4. Our sincere gratitude to our colleague Julio Zamora-Esquivel (who is pictured in Figure 1) for his creative and technical wizardry and leadership in the creation of this prototype.

2022 EPIC Proceedings, ISSN 1559-8918, https://www.epicpeople.org/epic

[/s2If]

Pages: 1 2 3 4

Leave a Reply