Reading the Tea Leaves: Ethnographic Prediction as Evidence

CLAIRE MAIERS
WillowTree, Inc.

[s2If is_user_logged_in()] Download PDF
[/s2If] [s2If current_user_can(access_s2member_level1)]

[/s2If]

Those who work in research know that we live in a world that is strongly influenced by what Tricia Wang has called the quantification bias. More so than other forms of information, numbers have incredible formative power. In our culture, numbers are seen as trustworthy representations of reality that are strongly associated with objectivity and untainted by human bias and shortcomings. Recently, data science, big data, algorithms, and machine learning have fueled a new wave of the quantification bias. One of the central fascinations of this wave has been the promise that humans now have the power of prediction at their fingertips. In this paper, I reflect on what it means to make predictions and explore the differences in how predictions are accomplished via quantitative modeling and ethnographic observation. While this is not the first time that ethnographic work has been put in conversation and in contrast with quantified practices, most theorists have framed the role of ethnography as providing context to that quantified work. Here, I argue that ethnographers produce predictions in their own right. I begin by discussing what it means to predict something, focusing on its function. This is followed by a discussion of the ways in which predictions are constructed through both machine learning and ethnographic work. In the course of this discussion I show the commonalities that exist between ethnographic work and machine learning, and I outline methodologies that claim that ethnographic work can make generalizable and accurate statements about the world, including predictive claims. I also point to some of the challenges in using machine learning as a means of producing predictions. This discussion is not meant to discredit these practices, but to demystify the process as a means of loosening quantification’s authority, contextualizing its best applications, and putting the two approaches to knowledge production on even footing. Finally, I discuss circumstances in which qualitatively produced predictions may be most valuable, such as when dealing with emerging phenomena and unstable contexts.

[s2If current_user_is(subscriber)]

video-paywall

[/s2If] [s2If !is_user_logged_in()]

FREE ARTICLE!
Please sign in or create a free account to access the leading collection of peer-reviewed work on ethnographic practice. To access video, Become an EPIC Member.

[/s2If] [s2If is_user_logged_in()]

INTRODUCTION

As ethnographers in industry, our work is increasingly combined with or compared against the perceived power of big data and data science. Anybody who works in research understands that we live in a world strongly influenced by what Tricia Wang has called the quantification bias (Wang 2016). More so than interpretive work, theoretical concepts, or narrative, numbers have incredible formative power. In our culture, numbers are seen as trustworthy representations of reality (Espeland and Stevens 2008) that are strongly associated with objectivity and untainted by human bias and shortcomings (Daston 1992; Jasanoff 2005). Data science, big data, algorithms, and machine learning not only fit neatly into an epistemological view in which numbers and metrics are seen as taken-for-granted representations of reality (Beer 2016; Espeland and Stevens 2009; Poovey 1998), but they also have fueled a new wave of the quantification bias.

One of the central fascinations of this wave has been the promise that humans now (finally) have the power of prediction at their fingertips. According to the tales told through the public discourse of big data, two key developments have delivered on this promise of science. First, the proliferation of data points provided by the expansion of digital sensors has gifted us the ability to measure and capture the dynamics of a complex world without human interpretation and distortion. Second, the process of machine learning in general, and unsupervised machine learning in particular, has freed knowledge production from human-generated theories and concepts. Together, the narrative goes, these developments have made prediction a reality and revitalized our value of all things quantified.

To be sure, a great deal of this revitalized enthusiasm for numbers is inspired by material changes in our ability to record and create data, our capacity to store and move data, increased processing power, and greater ease of access to the tools to complete these tasks. But the enthusiasm for big data and prediction that stems from the narrative described above has generally outpaced, or at least out-performed, discussions of the epistemological reality of big data predictions among both the public in general and key decision-makers, such as chief marketing officers, policy makers, or even research directors, in particular.

As others have made clear, this new emphasis on data in recent years has provided both an opportunity to reflect on the distinctive value that qualitative and ethnographic work offers in contrast to data science (Wang 2016) and to draw some lessons on what we as qualitative researchers can learn from the practice of data science (Nafus 2016). In this paper, my intention is to add to this conversation by reflecting on what it means to make predictions and to explore the differences in how predictions are accomplished via quantitative modeling and ethnographic observation.

While this is not the first time that ethnographic work has been put in conversation and in contrast with the new quantified practices associated with datafication (van Dijck 2014), most theorists have framed the role of ethnography as providing context to that quantified work. In this paper, I make a slightly different argument, showing that ethnographers produce predictions in our own right. I begin by discussing what it means to predict something, focusing on its function. This is followed by a discussion of the ways in which predictions are constructed through both machine learning and ethnographic work. In the course of this discussion I elaborate on the methodologies that allow us to claim that ethnographic work can generate generalizable, causal, and accurate statements about the world, including predictive claims. I also discuss some of the shortcomings of using machine learning as a means of producing predictions. This discussion is not meant to discredit these practices, but to demystify the process as a means of loosening quantification’s authority, contextualizing its best applications, and putting the two approaches to knowledge production on even footing.

WHAT ARE PREDICTIONS?

In order to make claims about the role that ethnographic work plays in generating predictions, we first need to come to terms with what we mean when we use the word “prediction.” I discuss both colloquial and technical definitions and then suggest that we utilize a definition that focuses on the function of predictive claims in practice.

Colloquially, we think of a prediction as a claim about an event or a state that will occur in the future. It is this very general, and yet powerful, conception that most us rest upon when using the term. Those more versed in statistical practices, machine learning, or data science may have a nuanced definition in mind: a prediction includes an assessment of the likelihood of such states or events actually manifesting. For example, SAS, a company that is in the business of producing predictive analytics, describes prediction as “the use of data, statistical algorithms and machine learning techniques to identify the likelihood of future outcomes based on historical data” (SAS Institute Inc). This statement of likelihood takes the form of mathematical measurements, such as confidence intervals.

Neither of these definitions serves us well for thinking about the possibility of ethnographic predictions. The colloquial definition says nothing about the origins or production of prediction, while the definition provided by SAS is already infused with the assumptions of statistical sampling methods and inferences. Although references to ethnographic and qualitatively-based predictions can be found in discussions of ethnographic methodology (Burawoy 1998; Small 2009), technical definition for these kinds of predictions are not usually part of these discussions.

However, if we observe the production and application of predictions, we can construct a working understanding. First, the kinds of predictions we talk about with regard to research are linked to empirical data. Second, in applied settings such as marketing agencies, hospitals, consulting firms, or government, predictions are used to support decision-making. At this point, we could say that predictions are empirically-supported claims that are believed to reduce uncertainty about future events or states that are used to buttress decision-making. This definition encompasses a variety of empirical approaches without implying a particular method for generating these claims, and points toward their use in applied settings.

However, careful observation of the problems to which predictions are put and the ways in which they influence decisions allows us to refine this definition further. In my own observations of predictive medical algorithms (Maiers 2017), nurses and doctors used predictive risk scores designed to predict the likelihood of future infection. Despite the theoretical indication of the future onset of infection, these predictions were used to determine if the patient needed antibiotics immediately, suggesting that the prediction factored into the decision-making process by affecting the clinicians’ assessment of the patient’s current condition.

As I discuss in more detail below, using predictive claims to better recognize current states is a common practice with regard to statistically and machine-learning derived predictions. In order to continue to keep the application of machine-learning predictions within our definition, we must alter it slightly: predictions are empirically-supported claims believed to reduce uncertainty about current states, future events, or future states that are used to buttress decision-making. This aspect of the definition is not simply an accommodation for the sake of giving ethnographic work some authority over the realm of predictive research. Rather, it is based on both the function and application of claims that are already called “predictions” in the context of big data and machine learning practices. By this definition, ethnographers and qualitative researchers frequently engage in predictive work. We produce descriptions and claims about the world that help our clients and stakeholders make better decisions. These claims may not always be explicitly future oriented.

HOW PREDICTIONS ARE MADE

To show how ethnographers make predictions in their own right, the remainder of this paper will touch on methodological aspects of both ethnography and machine learning. I focus on machine learning because it is an analytical technique associated with big data and because of its growing application as a quantitative approach to generating predictions. After pointing to some of the commonalities between ethnography and big data and machine learning practices, I outline the process for generating predictions through machine learning. Through this discussion, I hope to arm readers with a better understanding of these practices and the ability to ascertain when and where they work best. I then discuss the scientific status of ethnographic work. Predictions are often seen as belonging to the territory of positive science and seemingly depend upon definitively measured phenomena and the development of covering laws or models. As a result, the interpretive endeavor of ethnographic work may appear to preclude the possibility of prediction. In an effort to show how ethnography can be used in predictive work, I share an alternative framework upon which to base the accuracy of ethnographic claims in general.

Common Ground

When it comes to social data, the processes by which predictions are made in machine learning and big data are similar to the ethnographic process in its basic approach. The value of ethnographic observation is our ability process and synthesize a complex set of data points and relationships. Similarly, the advantage of new data collection practices and the proliferation of sensors is their ability to capture a wide range of data points. On this front, qualitative and quantitative work are increasingly in conversation as data scientists and ethnographers collaborate and utilize new digital tools in the research process (Rattenbury and Nafus 2018; Anderson, Rattenbury, and Nafus 2009).

Although the kinds of data points created through big data and ethnography take different forms, they often describe something similar, namely behavior in context rather than the lab. This is a relatively new application for statistical inference. When it comes to data related to the social world, much of the data that fueled quantitative analysis of social and behavioral data in the past was sourced through surveys. Its application to behavioral data has been expanded thanks to the many sensors and practices that leave “digital sweat,” or record of human behavior (Gregg 2015). Whether it is social interactions, unintended uses of technology, purchasing patterns, or twitter traffic, both ethnography and big data work with representations of human behavior in context (Golias 2017; Ladner 2014).

From an analysis of this data, both ethnographers and data scientists extrapolate generalizable claims that help us to better understand phenomena and the relationships between phenomena, thereby reducing our uncertainties about the world. To be sure, the process for extrapolating those claims differs a great deal in process. I now want to walk in a little bit more detail through those processes.

How Machine Learning Makes Predictions

The following section, I describe, in very basic terms, the process by which machine learning produces predictions. This description draws upon my work as a sociologist of knowledge, in which I studied the cultural and epistemological dynamics that both promote and result from quantified practices of knowledge production. I used observations of and conversations with data scientists to examine the cultural assumptions about how legitimate knowledge is produced and the ways in which various methods and claims interact with these cultural assumptions. As part of that work, I frequently asked data scientists to describe the process of machine learning. The resulting description is based, in part, on those conversations.

In the simplest terms, machine learning is a process that allows computers to develop methods for making predictions and inferences. The first step is to provide the computer with a data set. This is often called training data. The learning process may be supervised, in which case the computer is given a data set with labeled or classified phenomena, such as a collection of photos of pets that have been labeled as either “cat” or “dog” and told to develop a method for telling those phenomena apart. Note that this requires the human work of assigning labels to phenomena at some point in the data collection process. Or it may be unsupervised, meaning that the computer defines the categories by which data are described. When it comes to pictures of pets, an unsupervised process could result in categories that humans find meaningful, such as brown pets versus spotted pets, but it might also develop categories that are less salient or even noticeable to humans, such as a mathematical relationship between tail length and ear shape. Once there is an algorithm or model for identifying which images are of cats and which are of dogs, the model will be tested. Often it is tested on a subset of the original data set which has been intentionally set aside for these purposes. If the model fails to successfully predict the known outcomes, the model can be adapted and tested again in an iterative process.

Despite this appeal, there are some limitations in this process worth noting. First, even though the resulting model may be great at predicting which pictures are cats and which are dogs in the original training data set and the test data sets, it may not be very good at making similar predictions on new data sets. In other words, the model may not be very generalizable to new settings and contexts. It is impossible to know in advance how the model will perform in truly novel settings that may have slightly different variables at play. When it comes to social data, this is a particularly difficult problem given that social data are endlessly complex and shaped by both macro structures and local contexts. Furthermore, once these models are applied to novel settings, it can be almost impossible to evaluate their accuracy. The only way to know if predictions are correct is to measure them against the real outcomes, and in many cases that may not be possible.

In order to better clarify this problem, let’s consider some cases in which it is possible to compare predictions to actual outcomes. The infamous case of the Google Photos algorithm from 2015 that identified and labeled people of color as gorillas is one such instance. This offensive and problematic misrecognition by the algorithm may have stemmed, in part, from a data set trained on photos with an insufficient amount of diversity, bringing attention to issues surrounding bias in data sets and algorithms. It also shows, more generally, how algorithms can fail to be accurate when released from their testing environment on the boarder world. However, we were only aware of the prediction’s flaws because we could see and compare the alogrithm’s prediction to our own assessment. Similarly, in my own work with predictive medical algorithms, I watched intensive care unit (ICU) clinicians develop a critical assessment of algorithmic predictions (Maiers 2017). Over time, they were confronted with the corporeal reality of their patients in contrast to the algorithm’s claims. They saw that the algorithm tended to be successful in some cases and less reliable in others, allowing them to build rules of thumb for contextualizing and sometimes discounting these predictions. However, in many cases, predictions will be used to make crucial decisions long before their accuracy beyond a testing environment can be assessed.

Furthermore, the predictions themselves may be “performative,” meaning that the very act of predicting shapes the outcomes that are observed (Callon 1998). This makes it difficult to know what the outcomes would have occurred in absence of such a prediction. Think, for example, of credit scores which are used to assess the likelihood of someone defaulting on a loan. Given that these scores preclude many individuals from taking out a loan in the first place, the algorithm is shaping the very outcomes which it aims to predict, making it ever more difficult to know if the assessments of one’s likelihood to default on a loan was accurate in the first place.

The fact that predictive algorithms work best when applied within the same system or domain in which they were trained and tested leads to a second issue. Predictive algorithms are less well-suited for dealing with emerging phenomena, rare events, and unprecedented events. As the definition from SAS reminds us, machine learning predictions are dependent on historical data. This eliminates the possibility of novel events or factors from being included in the model and greatly reduces the chances that the model will sufficiently account for rare occurrences. In addition, depending on the chosen model and method, rare events may be labeled as “outliers” and intentionally eliminated during the data cleaning process. This means that although machine learning may be great at making predictions with stable systems and conditions, it is more likely to mis-predict outcomes in unstable or changing contexts such as social systems or globalizing markets.

Finally, throughout this section of the paper readers may have noticed that the example I used was not about future states at all, but about estimating the likelihood of current states. Though the language of prediction is used to talk about these processes, the actual results are far from our colloquial definition of predictions: they are not about the future. In fact, the same webpage from which I quoted the technical definition of prediction provides many examples of the kinds of predictions machine learning can provide. The first of these predictions is fraud detection. This is not a prediction about the future at all, but an assessment of the likelihood that certain claims are fraudulent. In other words, it is reducing uncertainty about a current state. This is also the case with the predictive medical algorithms that I mentioned earlier. These predictions are used to identify patients that are developing blood infections. In each of these cases, the “prediction” is not about a future state, but about current states. This is not to say that predictive algorithms are not put to uses that are about the future. Models for setting ticket prices, assigning credit scores, or determining how to stock store shelves are future oriented. The point I want to convey is that in their application and function, machine learning predictions are sometimes about the present.

The point of the previous paragraphs has not been to undermine the legitimacy of machine learning claims. Indeed, data scientists have methods in place to mitigate some of the issues discussed here. My hope instead is to demystify how these algorithms work in order to lay the foundation for claiming that ethnography is also a legitimate way to reduce uncertainty about the future.

Foundations for Making Ethnographic Claims

Most of the work that we do in industry is aimed at reducing uncertainty when making decisions (Dourish and Bell 2014). In reviewing her work on the home in several European countries, Genevieve Bell (2001) describes the job of her team as “understanding people and their daily practices with an eye toward finding new users and uses of technology.” Not only does this task suggest that her team’s work was aimed at reducing uncertainty, their search for the new suggests a future orientation in which they aim to identify which technologies and applications might be developed and successfully adopted by consumers. Given this demand for reducing uncertainty, ethnographers have employed novel qualitative methods designed to better derive insights about potential futures (Dourish and Bell 2014, Forlano 2013, Lindley, Sharma and Potts 2014).

But what is the epistemological framework that allows us to claim that our thick descriptions and qualitative inquiries can reduce uncertainty about consumers and how they will behave and react to products? While some ethnographers might take this capability for granted, a suspicion of qualitative and interpretive work is part and parcel of the quantitative bias in our culture. Depending on the home discipline of the ethnographer, she may not have had to question the validity of her sampling, her analytical methods, or her conclusions based on their epistemological legitimacy before entering into industry. Luckily, this is not the case in my home discipline of sociology, where quantitative sociologists sit on the dissertation committees, editorial boards, and grant committees that review and therefore pass judgement on ethnographic and qualitative work.

In the following paragraphs, I draw on the work of qualitative sociologists who have explored the methodological foundations for claiming that ethnographic work is as justly suited for reducing uncertainty as quantitative research. Though there are many issues I could cover here, I focus primarily on the question that colleagues and clients most frequently ask me when I present my work: how can we be sure that our findings are generalizable? This is also closely related to our ability to establish correlations or causal connections between observed phenomena and consumer or user behaviors. By exploring the epistemological foundations of our work, my hope is to bolster our faith in ethnographic claims and to help put the predictions of machine learning and ethnography on even footing.

Mimicking Statistical Inference or Rejecting Generalizable Claims – The nature and status of ethnographic and qualitative work is one that has been greatly debated within the social sciences (Reed 2011; Pugh 2013; Vaisey 2009). Following Geertz (1973), many of us have stressed the value and legitimacy of interpretation as a type of knowledge production. We have seen first-hand how these thick descriptions illuminate everything from the inner workings of high energy physics (Knorr Cetina 1999) to seemingly paradoxical political phenomena (Hochschild 2016). While this function, in and of itself, is sufficient for work that aims to add to the accumulation of knowledge or to provide better understandings of our fellow humans, it is not a sufficient perspective for those aiming to make broad empirical claims or predictions about entire regions, markets, or segments of consumers or users. This is the case for two reasons. First, interpretation often fails to hold authority in a world saturated in quantification bias and the epistemological mental models that accompany this bias. This is due, in part, to the common complaint that qualitative work fails to meet the standards of statistical representativeness and inference. Second, the ways in which qualitative and interpretative work have become associated with methodological programs that emphasize local particularities over generalized patterns or the creation of shared understanding over relational claims makes it difficult to extend the claims of qualitative work beyond its immediate cases and contexts. Both of these can be overcome through an exploration of the methodological approach to qualitative work.

There have been a variety of attempts to remedy this situation. First, we might try to solve this problem by mimicking the assumptions of positivist quantitative work. The idea is that in choosing the right case we can mimic the assumptions of statistical representativeness upon which many quantitative claims are based. We look for cases and locations that best represent a broader population or we do comparative ethnography as a way to isolate and identify causal relationships and to “control” for confounding variables. This is inevitably problematic. As Small (2009) makes clear, this process often mistakes the concept of representativeness with that of averages. For example, a study of social engagement in a mid-sized town in the mid-west which matches the national average of income or education levels cannot somehow represent this process across most American communities, even though it may statistically resemble the nation as a whole. Furthermore, by intentionally excluding rare or unique cases, we miss out on the opportunity to learn about emergent and developing phenomena or to observe the effects of interactions that may be difficult to observe in average cases.

As an alternative, many of us have been trained in a perspective of interpretive empiricism (as discussed in Reed 2012), in which social knowledge is created through inductive processes that stress the locality of social investigation. Under this epistemological framework, we are not trying to make generalizable, objective claims about covering laws or models at all, but to best explain the social world by articulating the dynamics of the particular. This focus on locality, alongside a resistance to theory and macro social constructs, makes it all but impossible to generalize the findings from one case out to a broader population. It is particularly a problem for industry work in which we hope to use in-depth qualitative analysis of a small sample to inform decisions about entire markets or populations. Is there a way forward in which interpretive and qualitative analysis can make generalized inferential claims without trying to wedge itself into the assumptions of statistical inference?

Finding Our Own Footing – First, we should recognize that statistical inference, and its reliance on large, representative samples, is not the only way to generalize claims. In examining the extended case method (Burawoy 1998), Mario Small (2009) suggests that ethnographers use logical inference instead. This means that the inferences refer to situations rather than populations. As Small explains, in a statistical inference, we hypothesize that populations with a given set of characteristics will display the same set of corresponding characteristics or properties observed in a sample. An example is to say that active adults in the Charlottesville, Virginia are more likely to purchase a gym membership if their household income is over $60,000. These kinds of claims require some sort of instrument for establishing representativeness and therefore the accuracy of claims. With logical inference, the focus shifts to processes and mechanisms of a situation. We might hypothesize that when offered a free trial at a new gym, the consumer’s decision to book or ignore the offer depends partially on perceptions of cultural fit between the gym and the potential customer. This statement is based on our ability as ethnographers to observe a chronology of events and therefore make causal links between behaviors, decisions, feelings, and events.

This kind of logical inference is particularly good for making what Small calls “ontological statements” or “the discovery of something previously unknown to exist” (2009: 24). In the example I have given, logical inference allows me to make a claim about the relationship between cultural fit and gym membership purchases. This is a great advantage of ethnographic work. Big data does not capture what it does not measure. In addition to the challenges presented in measuring emotions and things like “perception of cultural fit,” a quantitative study could not take such phenomena into account without someone determining it was a variable worth measuring. All phenomena must be known, at least in the form of measurable data points, ahead of the machine learning process.

Another option offered by Small is to take a different approach to sampling. Rather than look for a representative case, we use “case study logic” to sample. With case study logic, instead of relying on the representativeness of a sample, each additional iteration of investigation brings the researcher closer to an accurate understanding of the area under investigation. As such, this is a sequential process that ends only when the researcher is able to accurately predict the dynamics of the next case and no new phenomena or relationships have emerged. Rather than validating a claim or relationship by statistically showing that our sample would be highly unlikely to contain such a correlation or causal relationship when there is not one in the population, the hypothesis is validated through continual testing that challenges and refines the claim. Interestingly, the iterative nature of case study logic as a means of validation is somewhat similar to the process used in machine learning. Where ethnographers return again and again to the field to test and refine hypotheses, machine learning processes also refine models and algorithms through iterative testing with test data sets. Though the processes may look quite different, it is through repeated exposure to data that both data scientists and ethnographers gain confidence in their conclusions.

So far, I have discussed the ability of ethnography to make accurate and generalizable claims that reach beyond the immediate location of our observations. In the course of this discussion, I have also suggested that we can identify causal connections between phenomena through observation. These are important pieces in understanding why ethnographic work can make predictions. Our work is predictive insofar as it is used to reduce the uncertainty about future states and events, such as changes in markets or the reactions and decisions of users.

As I indicated at the start of this section, ethnographers in industry regularly engage in work that serves the function of prediction. We also use analytical and sampling methods that are similar to those offered by Small. In the next portion of the paper, I talk through an example, pointing out ways in which we might frame the epistemological legitimacy of our predictive work to stakeholders along the way.

[/s2If]

EPIC

Intelligences