The Domestication of Data: Why Embracing Digital Data Means Embracing Bigger Questions

Share Share Share Share Share
DAWN NAFUS
Intel Corporation
[s2If is_user_logged_in()]Download PDF[/s2If] [s2If current_user_can(access_s2member_level1)]
[/s2If]

The EPIC community has been wrestling with ways to integrate quantitative and qualitative methods in light of the increasing role that digital data plays in business practices. Some focus on methodological issues (digital data as method), while others point to the consumer value in data products (data as thing in the world). This paper argues that “digital data as method” and “digital data as thing in the world” are becoming increasingly intertwined. We are not merely witnessing ethnographers’ haulting embrace of digital data, but a wider process of the domestication of data, in which we, alongside the people we study, are participants. The domestication of data involves everyday situations in which ordinary people develop their own sense-making methods—methods remarkably similar to ethnographic knowledge production. In this way, the domestication process tightens the connection between data as thing in the world and data as method. I argue that seeing the interconnection gives us the conceptual resources necessary to open up new areas where ethnographers can gain both intellectual and practical footholds in data-rich environments.

[s2If current_user_is(subscriber)]

video-paywall

[/s2If][s2If !is_user_logged_in()] [/s2If] [s2If is_user_logged_in()]

DATA AS METHOD/DATA AS THING IN THE WORLD

Sensor data, click data, and other forms of automated time series data are all now well established elements of contemporary digital culture. Because they are largely numerical, they also reinvigorate longstanding debates about the uses of quantitative versus qualitative research, and the nature of mixed methods. Tools change what is thinkable and knowable. New tools renew questions of epistemology, ontology, and methodology. They shape methods discussions and, in turn, are shaped by them. The distribution of those tools also change the situation. Data collecting technologies may once have been exclusively the domain of scientific and social scientific enquiry, but they are no longer. Particularly through the widespread use of sensors, data have now become everyday facts of life. Neither good, nor bad, nor neutral, data serve as an infrastructure of everyday living, the substrate of the most banal business decisions, and forms of evidence for answering questions of many kinds. While professional researchers debate what sensor-generated data now means to them, data’s extension into everyday life opens up questions about who gets to be a knowledge producer. As ethnographers find new ways to engage with contemporary digital data, I hope to suggest that the popularization of data matters to our ongoing methodological debates, and to the choices now available about how we participate in a data-rich social world. I will argue that these are more interrelated than it might at first appear, and that seeing the interconnection might open up new areas where ethnographers can gain both intellectual and practical footholds in data-rich environments.

Within the EPIC community, as well as ethnographically-minded academic communities, there are two lines of discussion about ethnography and “big data” that stand out.1 The first one lays out a set of methodological concerns and approaches, or what I will call “data as method.” Patel (2014) and Curran (2014), for instance, remind us of the reasons why qualitative work and quantitative data are not inherently opposed. Wang (2013) and Boyd and Crawford (2011) take an additive approach by noting how ethnography can add richness to big data. It is no secret that, in the private sector, data analytics is often seen as a substitute for ethnographic work , and ethnographers’ businesses have been disrupted by cheaper and worse data. Dig closer, and one finds possibility for collaborative, multi-method approaches. There are ample reasons why data scientists, who often lack domain expertise in social behavior, might want ethnographers as collaborators. In turn, ethnographers have begun to forge their own ways into these sorts of datasets, particularly through temporality (Ladner 2013, Nafus 2016). Anthropology and sociology have good approaches for understanding temporality as forms of social and cultural organization, and electronically collected datasets are particularly good at recording various cadences that fall out of ordinary memory. In this way, some new possibilities for ethnographically researching temporality have opened up.

Other work at EPIC approaches data not in terms of method, but as “things in the world.” Margolis (2013), for example, sees direct consumer value in data beyond its aggregation and professional analysis. Roberts (2013) addresses ways that consumer-facing data have value but also create asymmetries between device providers and consumers. Material culture and STS approaches to data similarly seek to understand what data is doing in the social world, and how it mediates social lives. They see in data a kind of performativity, enacting rather than merely quantifying in a neutral way. Price data, for example, do not just measure value but also actively participate in the relations between exchangers, and in doing so directly shape the terms on which future exchanges take place (Munesia 2007). Carbon emissions data become political material that shape the world they also describe (Knox 2015). These scholars also see in data a materiality that is inseparable from data’s meanings (Day, Lury and Wakeford 2014, Pryke 2010). For example, market traders’ price visualizations shape how those traders act in a market. The numbers are not just abstractions but have color and shape and a screen size, all of which are part of what it means to act in a market. Like clay or paint, data are media through which people shape the world around them.

Latour (2002, 2010) laid vital conceptual ground for thinking about how data as “thing in the world” might be connected to methods. He revisited long-forgotten debates between Emile Durkheim and Gabriele Tarde in order to suggest that a deeper transformation is taking place in what quantitative measurement is about. He argues (2010) that the shift from survey-based work to working with digital traces represents a profound change in how we come to comprehend social patterning. Durkheim saw social structuring as something that takes place outside the individual, and inspired entire fields of inquiry to look for what was “higher” than the individual. Surveys became the instrument of choice for weeding out individual circumstances in order to identify the social structure that lies beyond any particular person. Tarde, on the other hand, believed structure to be nothing more solid than a perpetually emergent flow of discrete interactions between individual persons. In the Tardean view, there is no structure “out there” to be surveyed and known through what later scholars would call God tricks—a view from everywhere and nowhere at the same time, responsible to no human being in particular. There are instead only transactions between people that create paths that in turn potentiate the terms of future transactions. Here society is not cause, but highly provisional consequence, subject to constant renegotiation (Latour 2002: 10).2

Tarde lost a series of debates with Durkheim over this in part because he lacked the technology to do it. What would become Durkheim’s technologies—surveys—were more or less ready to hand. “Transactional data”—data that is created through the interaction between people and the technologies they make and use—became more widely available much later, through digitization. Indeed, Tarde himself speculated that one day there would be a “gloriometer” that would both measure reputation, and enable people to follow and reflect on those metrics. Transactional data like our modern-day gloriometers (i.e., social media) are a priori embedded in a social relation of some kind. Ontologically, they are fundamentally different from survey data, more comparable to the archival objects of historians or visual anthropologists. They are a part of the relationships that ethnography was designed to uncover, and therefore cannot be fairly treated as mere survey data writ large. Instead, transactional data calls into question sociology’s “fictive distinction between micro-interactions and macro-structures” (Venturini and Latour 2010:4).

It is perhaps not coincidence that the statistical approaches designed to find patterns in digital traces have more in common epistemologically with ethnography than survey-based sociology, even if the computational methods are harder for us to get our heads around. Bayesian approaches (the everyday workhorses of big data computation) tolerate the absence of a strong hypothesis when survey-wielding Gaussians get twitchy. Bayesians accept ‘found’ data not optimized for a particular question, which survey researchers often reject as poor research design. There is, of course, a much richer epistemic diversity in data science than I am portraying here. Some machine vision specialists, for example, work from general first principles and build algorithms accordingly, while others take the mess of real-world images as a given, and try to reduce that mess into a pattern that seems workable for the current, contingent situation at hand (Suzanne Thomas, pers. comm.). The differences in ways of knowing are far more complex than whether one is positivist or not. The newfound richness, made possible by a widening diversity of computational methods, means that it is now more possible for ethnographers to find the sorts of kinships and alliances they need to work in a data-rich world than were available even ten years ago.

The differences are also proliferating in disciplines closer to home. Building on the Tardean turn and other intellectual currents, there has been growing interest in social liveliness in sociology (Lury 2012, Marres and Weltevrede 2013, Ruppert, Law and Savage 2013). Here, researchers seek better approaches to the ongoingness of social life. This work asks, if data are things in the world—if they are not mere signifiers or indicators but actually act in the world—are new methods required to maintain the liveliness that sociological evidence is also a part of? For these researchers, data as thing-in-the-world and data-as-method are no longer separate—there is a social life of methods (Law, Ruppert, Savage 2013). Data and devices become possible to work with in new ways as a result.

This work is an important example of how it is possible to ask methodological questions about numerical data without concluding that one must choose between aping the methods of positivist social science, and relegating oneself to the role of friendly addendum—the storyteller that situates the numbers that scale. These scholars are working directly with digital data in ways that further their intellectual commitments, even if in a context that privileges resources for researching all things digital at the expense of other kinds of work.3 Whether these sociologists can open up a broader conversation about what constitutes quantitative methods, of course, an unanswered political question. Nevertheless, they have created certain “facts on the ground” that de facto enrich the methodological diversity now available.

Indeed, if we glance over to the discussions happening in the digital humanities about the use of digital data, we find an analogous set of debates about what it means to be a humanities scholar with new methodological options. There is a good deal of concern about how funding for digital humanities projects is so often predicated on scholars’ willingness to render humanities scholarship into a positivistic exercise in computer science. Like our colleagues in the humanities and academic sociology, we also face the realpolitik of working conditions. Even if ethnographers do forge ways of working collaboratively with data scientists, we occupy the epistemological minority position. Genuine collaboration that truly respects epistemological difference is a wonderful thing, but it is also hard to come by. Wider social conditions and cultural norms can make meaningful collaboration easier or harder to establish. We could ask, then, what are those wider conditions exactly, and what kind of a say do we have in them?

A SPECIAL CASE OF DATA AS THING

One important part of our overall working conditions is the widespread preference for talking about “data” in very general ways. Much of the technology market (and some data scientists, and some scholarship) holds the assumption that data can be disembedded from its particulars entirely, and stockpiled. While most ethnographers would describe this assumption as a kind of magical thinking, our technical systems make it a social fact: I can download my steps count file as a .csv, and ship it to you, and you can fuse it with whatever you would like to attempt. Data’s set-forming ability leads to the urge to build up ever larger datasets, and the urge to minimize the incommensurabilities between different kinds of data–incommensurabilities that become more apparent when one is tasked with having to actually work with it. In this talk, the labor it takes to move data around is largely invisible. It all just becomes “the data,” writ large, like a kind of greased pig of truth. Here, “the data” is spoken of as if it were already encoded into one single database, awaiting querying. This reifying discourse is itself a (real) thing in the world. Young tech workers can be spotted in t-shirts proclaiming “data is the new bacon,” as if it came vacuum-sealed packets, ready to be sprinkled on everything in sight. When we get a greased pig instead of bacon, it is no wonder that the sight of actually existing data so often seems deflating!

This reified way of talking about data is so widespread as to be unavoidable– a significant part of how discussions in various disciplines unfold. Pointing out its faults only gets us so far. I suspect Tarde would find it more fruitful to pay greater attention to data’s various concrete manifestations, and its everyday life as a medium like clay, paint, or language. The datasets I have in mind—sensor data largely—can be thought about in abstract ways, but when they matter, the where and the how matters most. We can see this when we look at sensor-generated datasets in a spreadsheet and then through a visualization tool, or in an app. We can readily see how some things change, while others stay the same. They have the same begin date and end date, for example, and an indication of what they refer to—steps counted in one minute, say. They can be re-calculated and re-visualized, but they cannot bend to any mathematical or visual will without losing meaning. It is only in its concrete forms where questions about what signifies what to whom can really emerge—questions that ethnographers are quite used to puzzling through.

THE DOMESTICATION OF DATA

There other conditions, beyond the prevalence of fantasy thinking, that are worth our attention. I would argue that in EPIC’s current discussion of digital data, we are not merely witnessing ethnographers’ halting embrace of working with such data, but a wider process of the domestication of data, in which we, alongside the people we study, are participants. By “domestication of data” I mean to evoke the processes of consumption and adoption that have taken place with other kinds of technologies, like computers and mobile phones. These were designed for narrow types of use, and yet meanings and practices quickly proliferated once people adapted them for the richness of everyday life (Silverstone and Haddon 1996). Few of these were anticipated or anticipatable by their designers. Text messaging famously was an afterthought to the mobile phone, and with use, entire genres of communication have been developed. Early personal computers were going to or business-sify our homes, and instead have become platforms for myriad other activities. In each case, users of new technologies push device capabilities, and a market ecosystem began to pay attention and recalibrate their wares accordingly. When adoption scales, what once was “a device” becomes myriad possibilities created by a combination of consumers, prosumers, artists, open source developers, companies and other institutional actors. Consumption is an active process that changes both the consumer and consumed.

An analogous domestication process is well underway with respect to data. Data is rarely sold to consumers as such, but devices and apps where the consumer is an audience for the data are now commonplace. This is relatively new, as data’s long social history has largely been an institutional one. Long before electronic systems, data creation was an important technique of early European state-making, enabling large-scale taxation and conscription (Scott 1998, Desrosières 2002). Similar measurement practices were then adopted by the bourgeoisie in an attempt to legitimate their businesses as activity comparable to scientific practice (Poovey 1998). As more of Western social life became caught up in formal institution-making, and later audit culture (Strathern 2000), the tropes of measuring that were commonplace in institutions became facts of everyday life. Test scores, measures of height and weight, land ownership in so many meters or acres, are all largely taken for granted today as frames for how the world works (not just how institutions work). While quantification as a personal practice goes back at least to Benjamin Franklin, data as something that many people consume is relatively new. And yet here we are. Some obsess over the number of social media followers they have, while others occupy themselves with daily step count, and others still watch closely air quality or miles per gallon on their cars. That is, not only is data a “thing in the world” in the way that prices or test scores are, it is also now a consumer thing.

Consumption, of course, is never solely a market activity, even while it is also at key moments squarely a market activity (Slater 2002). Consumers live in much bigger social and cultural worlds than markets, even in the most neoliberal of societies. They bring their own non-market frames to data and devices. Social media, for example, is consumed for individualistic pleasure, and for articulating cultural and class identities of various kinds, but it is also used for the coordination of protest. That is, social media is sometimes a consumer good to be managed inside the moral economy of the household, and sometimes the means of (social, non-market) production. Some adaptations and uses of data by consumers will feed the cycle of design evolution, but many will fall outside the scope of what a for-profit firm can optimize for. In these ways, then, we can recognize that when we ethnographers are working on this or that consumer good, we are shaping a much wider set of social circumstances beyond the particular markets are clients are in, even when those clients are in no position whatsoever to recognize it as such.

The data part of data products, similarly, is sometimes a market commodity and sometimes a means of social reproduction. When sensor data goes beyond consumption, and becomes a means of production, one important thing that is produced is knowledge. Sensor data opens up spaces in which everyday notions of evidence and research unfold. In consumers’ hands, sensor data mobilizes everyday notions of health, biology, environmental sciences, and the like. People start asking research-like questions about whether the measurement is true, whether it is relevant, and whether more data is needed, largely because sensor data tends to be semiotically rather vague (Nafus 2014). It tends to offer up partial, messy answers to half-formulated questions.

Many good examples of using data as a means of non-professional knowledge production can be found in the Quantified Self community (QS), a community I study, participate in, and design for. QS members are people who get together in person to discuss what they have learned from the data they collect about themselves through various technologies new and old. QS is an environment where people do not merely ask “how can I take more steps?” but also “why 10,000?” or “Is that appropriate for me?” Often, the key to making meaning from data is outside the dataset itself. Therefore, members of QS often emphasize the importance of context, because that is largely where the value in sensor data lies. One has to know the individual context well enough to know what aspect of the data is or is not relevant. It takes work to puzzle through these matters, and many people put real intellectual effort into making meaning from data (Kragh-Furbo et al 2016).

One example of using activity tracker data “off-script” in order to get meaning from it comes from Jacqueline Wheelwright (2015), who spoke quite movingly at the 2015 QS Conference about how she worked with her activity tracker data to draw important conclusions about her autoimmune disease. Most activity trackers emphasize a daily step count by default, but Wheelwright reworked daily counts of steps into a total aggregation across a month, which better revealed a long-term pattern. She then carefully annotated those monthly time bins with a timeline of flare-ups, which led her to the conclusion that taking 10,000 steps per day was triggering her autoimmune disease. Avoiding 10,000 steps a day was the only way to get her autoimmune disease under control.

This sort of adaptation is what studies of consumption teach us to expect: she took a product’s data apart and reassembled it anew, using the data science skills she had to make it speak to her circumstances (an autoimmune disease). We can even speculate on how designers working in this space might respond to ethnographies that explain and document this sort of activity. Some designers might respond by elaborating a “steps for autoimmune tracking” app, while others might reject the finding entirely as a kind of exception, and choose to double down on culturally loaded assumptions about “healthiness” as only ever more exercise. Other designers still might respond by developing a data analysis tool that make it easier to spot patterns in their steps data, which is what my colleagues and I did (Nafus et al 2016). Different designers will respond by emphasizing different valences of the underlying data (Fiore-Silfvast and Neff 20013), thus building up the ecosystem over time.

Similar examples of the intellectual work that people do with data can be found in the emerging class of domestic Internet of Things (IoT) devices. In a study of early adopters of home energy monitors, we found that these users either elaborated their sensing capabilities after an initial foray, by adding additional sensors or infrastructure, or else they abandoned the project altogether upon seeing that total kilowatts consumed did not actually help them measure “energy efficiency,” which they came to see as much bigger than kilowatts consumed (Nafus and Beckwith 2016). While we saw less interest in the meaning-making part than in Quantified Self (and more interest in hardware setup), the presence of sensors did prompt people to come to a point of view about what constituted “efficiency” and what an appropriate measurement of it would be, if not the one on offer. As with self-tracking, we can only expect that such views will shape the terms of subsequent adoption of these sorts of devices.

[/s2If]

Pages: 1 2 3

Leave a Reply