How Modes of Myth-Making Affect the Particulars of DS/ML Adoption in Industry

Share Share Share Share Share
[s2If !is_user_logged_in()] [/s2If] [s2If is_user_logged_in()]

Empowering Data Teams

The consultants talked to the clients in-house data analytics and data engineering team, two data analysts and one data engineer. They were joined by their current, interim manager (who did not have a data science background). In conversation, it became apparent that the data analytics team was overwhelmed by creating reports requested by the business or creatives on the performance of the business or content. Requests were fulfilled in an ad hoc manner, each one custom based on the specifics of the request. The data engineer worked on making data accessible where needed to satisfy requests. The data analysts were eager to develop self-serve approaches, dashboards that could communicate to the business performance metrics on demand, however, ad-hoc requests took priority and occupied the majority of their time: there was little to no time to build this functionality.

This situation is not uncommon. Data analytics teams tend to struggle to handle their workloads often due to the very specific nature of the requests they are asked to handle and short timelines. To remedy the situation, data analytics teams need to log and monitor incoming requests to identify common themes. They then can build self-serve dashboard for on-demand delivery of data insights around those common themes that will cover a range of frequently asked questions. In doing so, data analytics teams, tasked, due to their function, “to count” need to define “what to count”? They need to answer questions such as “What is a daily active user?” or “For how long does someone need to visit a website, watch a video, or interact with content to qualify as a content consumer?”

Within organizations, there tends to be a variety of definitions of terms such as daily active user or content consumer. Often, differences go unrecognized and unacknowledged. They surface when the data analytics team is tasked to count: they need to translate daily active user into a set of instructions (e.g., a SQL query) that demands specificity. Lacking specificity, data analysts tend to borrow details from their own, sometimes idiosyncratic, definitions of these terms. This practice has several consequences. First, asked to count daily active users, different data analysts tend to produce different answers. What is more, even the same data analyst may give different answers depending on the definition, if available, of daily active user passed on by the stakeholder. This discrepancy in answers is, at many companies, gradually eroding trust in data. Second, idiosyncratic definitions prevent data analysts to build on-demand, self-serve dashboards and other tools. At the extreme, every request becomes custom because every request demands a different way of counting a similar, often seemingly same, concept.

To remedy the situation, data analysts need to be empowered, in collaboration with the business, to define what to count. As they receive requests, they are in the best position to record definitions in current use and to consolidate definitions. To do so, they need to set aside time to work on recording requests and consolidating terms. Working with the client, the consultants received significant pushback to these suggestions, despite a clear opportunity to consolidate terms (it was suggested and requested by members of editorial and creative). According to the client, the core function of the data analytics team was to respond to ad hoc requests first, not to define or redefine them, and then, as time permits, to build on-demand, self-serve tools. There was lack of recognitions of the impossibility of accomplishing the latter task without a say in the consolidation of terms. Disempowered to establish the conditions for their own long-term success, the data analytics team was seen as a mere service function to the detriment of the organization.

Understanding DS/ML as modular, bolt-on solutions de-emphasize the importance of data readiness and interferes with deriving value from data for increased efficiency or novel products. By contrast, understanding DS/ML as an emergent capability emphasizes that a robust in-house data analytics capability is the foundation for successful in-house DS/ML projects and products; it portrays data as a resource. Like any resource, data needs to be harvested and managed. Data analysts interact with the data, build an understanding of the data, in counting they establish concepts, such as daily active user, that find use often as labels in the predictive algorithms of data scientists and machine learning engineers: e.g., the success of a piece of content may be measured in how many content consumers it attracted. In the DS/ML as modular mode, the client, surprised by our suggestions, rejected them. Coming from the DS/ML as emergent technology mode, we were surprised by the client reaction. Each mode leads to a different set of expectations, suggestions, and ultimately strategy.

Successful Data Science Requires Data Analytics

The client company had let go of their only data scientist a couple month prior to our engagement after a tenure of less than one year; the data scientist had failed to make an impact. The client’s failures in data science is grounded in the their approach to data analytics. Without a robust analytics function, data science cannot succeed. Data science depends on definitions of what to count as well as data quality and access. Lacking data analytics, data science roles tend to morph into either data analytics roles, the data scientist helps fulfill ad hoc stakeholder requests or does data quality assessment, or helps build the pipelines for better data quality and access. Thes patterns can be exacerbated by lack of a clear distinction between a data science and a data analytics team, as was the case at the client company. Without a robust data analytics function, data science cannot succeed. In such situations, data scientists tend to leave or, as the more expensive members of the data team, are asked to leave, as happened in this case. Understanding DS/ML as modular deemphasizes data readiness, prevents data science from having an impact within organizations; it does not highlight, much less create, the conditions for successful data science within organizations.

Protection of Editorial and Creatives

In the run-up to the engagement, the consultants were advised by the technology and product side of the business to “tread lightly” so as not to upset editorial and creatives who may fear about changes in or loosing their job. During conversations, they found editorial and creatives to be eager to hear about our work, solutions, and possible externally or internally facing data products; they freely talked about their work. They encountered healthy skepticism, not fear. In many ways, editorial and creatives were more receptive to our suggestions and eager for adoption than the technology and product side of the business.

Viewed as spectacle, DS/ML offers modular, bolt-on solutions to add new products or business functions or to replace existing ones; it promotes self-sufficient, mostly autonomous systems. It de-emphasizes the importance of conditions and context. It de-emphasizes the importance of data readiness and the contributions to data and data readiness by people across the organization, from the data analytics team to editorial and content creators. It paints a picture of users as collaborators with machines but on the machine’s terms. Workers are to assist the machines, to be tasked with the edge cases that machines can’t handle, providing the glue between the complex work environment and its simplified version that allows machines to succeed. This view fosters fear of replacement by machines; AlphaGo pitted the machine against the human. AlphaGo Zero excluded humans from training machines.

Viewed as a an emergent capability akin to the emergence of electricity, DS/ML is a potential, fueled by its resource: data. It not only emphasizes the need to harvest and manage this resource, it encourages us to think of applications not in terms of add-or-replace model but in terms of an open-horizon model: electricity enabled humankind to build entirely new kinds of products; it gave us superpowers, in many ways. We tamed electricity, and it has enabled us to build products that to many were unimaginable prior to their invention. Our lives changed alongside these inventions, we adapted. The view of DS/ML as emergent technology emphasises the potential of DS/ML without giving it concrete form. It emphasizes that we can change, as technology changes around us by our own actions; the adoption of DS/ML becomes less of a zero sum game with winners (machines, technologists, STEM) and losers (humans, humanities).

In the DS/ML as modular mode, the technology and product side of the business were concerned about editorial and creatives and their reaction to the arrival of consultants at the company and their suggestions. There was a big difference between the expected and the actual situation. Viewed as modular, DS/ML devalues conditions and context and with it interaction with teams across companies, especially outside the technology teams, a potential explanation for this discrepancy. It devalues the importance of knowledge of teams outside technology groups, it can lead a kind of “benevolent paternalism”. Editorial and creatives, on the other hand, were aware of inefficiencies in their work and were keenly aware of what questions they would like to have answered by data. With confidence in their work, they were looking to DS/ML as an enabler, potential partner, more in line with seeing DS/ML as the new electricity; different lines of business may be more susceptible for one way of thinking about DS/ML with consequences for communication across business lines.

Vendor Strategy

For some of their DS/ML solutions, the client relied on vendors; they paid a companies for a data product or DS/ML service. In one case, the client shared their data with a vendor company for product/service delivery. It was considered to be a “good deal” since the client company was not charged by the vendor (the payment, of course, is in the form of data).

The DS/ML as modular view sees data as a mere requirement for data products and DS/ML services (as AlphaGo Zero showed us, not even a necessary one). As long as you get a product in return for your data, it is a “good deal”. The DS/ML as emergent technology view promotes the idea of data as a resource, an enabler. Data enables an entire suite of data products; sharing your data in exchange for one data product becomes a “bad deal” especially if data sharing enables your competition. Most vendors work with multiple organizations often in the same line of business. Data sharing, via such vendor, can remove competitive advantage that increasingly lies in data, as the DS/ML as emergent technology view emphasizes. The DS/ML as modular view deemphasizes data as a resource, a valuable asset that is best protected; it can lead to decisions with negative consequences for the competitiveness of the business in the long term.

ETHNOGRAPHIC LESSONS LEARNED

The Role of Expertise

Taking as a starting point the “simple premise that expertise is something people do rather than something they have” (Carr 2010), it becomes possible to see this case study as revealing the tensions and misunderstandings that arise from the differing sets of practices that are called upon in the shift towards DS/ML within business enterprises. The two motivating myths presented above constitute DS/ML as two different kinds of capabilities. One myth presents a modular capability that can be added instrumentally to existing practices, the other presents a transformative capability that requires the reshaping of existing business processes to new, sometimes custom, interfaces of the emerging and still unstable technology. Each of these two kinds of capabilities, then, entails a different set of interactions between data, personnel, products, and tools, and therefore a different set of practices through which expertise functions. By understanding these two myths as motivating different forms of expertise, the positioning of various actors in the case study presented above becomes more legible.

What expert practices are motivated by the modular myth? The modular myth lends itself to picking and choosing amongst instruments to be deployed, and expertise in this context would be constituted by performing knowledge about these available tools. Such performances might include conducting cost-benefit analyses on available vendor solutions, performing knowledge about the available packages and implementations, and situating DS/ML development as the stepwise incorporation of such modular tools into existing architectures. Indeed, we observed a reliance on such performances of expertise in the reaction of some in the case study presented above. And in particularly power-laden ways, this exercise of expertise was able to repress challenges posed by alternate forms of expertise (see below) by leveraging existing control of economic resources to prioritize one set of priorities (vendor solutions) over others (reorganizing the DS/ML team).

The expert practices mobilized by the modular myth also draw strength from a particular conception of objectivity mobilized by the modular myth. As a historically- and socially-constituted value, objectivity (Daston and Galison 2007) can take many forms. The modular myth contributes to a form of “mechanical objectivity” that sees human judgement as failable, whereas algorithmic systems can stand in for human actors who may introduce “bias, inefficiency, and discrimination” (Christin 2016). Trusting algorithmic systems over human actors allows those who exert control over the use of such systems to participate in this form of objectivity as a further practice of their own expertise. However, sources of mechanical objectivity, whether they be crime scene photographs or brain scans (Dumit 2004) tend towards a situation in which the products of these tools themselves require further expertise in order to be translated for lay audiences or integrated into other sociotechnical systems. The ability to do so constitutes a form of objectivity Christin identifies as “trained judgment”. It was precisely this form of objectivity that we highlighted by introducing the story of DS/ML as the “new electricity” through the transformative myth presented above. By demonstrating the ways in which trained judgment could form a “hybrid entanglement of human and machine expertise” (Christin 2016), we show that there was a great deal of human ingenuity still required to craft DS/ML solutions for the particular problems the client was facing. How to make those problems legible to the machine were very human questions, and their outcomes were uncertain, so our best recommendations centered on empowering that form of expertise.

What expert practices are motivated by the transformative myth? The transformative myth lends itself to precisely those practices of expertise that constitute trained judgement, but a trained judgement that extends beyond that which might evaluate between several similar products offered by a vendor. These practices include engaging in forms of collaboration and experimentation that treat DS/ML not as a stable product, but as a set of open, unresolved questions from which meaningful solutions might emerge. Specifically, the expertise of a data science lead (or a VP of electricity, for that matter), is entailed by fostering different lines of communication between disciplinary silos, for example by enacting a process in which data analysts work with data scientists to craft key performance indicators that are useful for machine learning experiments. This form of expertise is also entailed by wielding economic resources to engage in experiments that may be fruitless, but also may produce useful insights or products for further development.

What other expert practices are at stake? The creative team, who in the planning stages of the on-site workshop were to be insulated from any hints that their roles could be automated, was revealed during the workshop to have their own expert practices that actually positioned them to be promising collaborators for the DS/ML team. Indeed, they were central to the business offering at the company, but they also were able to position their work as primarily valuable because they were the ones who ‘crafted’ new content for the media company. By foregrounding this aspect of their work and downplaying the routinized labor they performed, they could have pragmatic conversations about how to automate the routine work without their central expert practices being compromised. The DS/ML team could potentially be given broad latitude in building systems for the curation of past content, summarization of aggregated content, and the monitoring of dashboards without threatening the practices that constituted creative expertise.

In reflecting on “the pervasive sense that technologies transform us in irrevocable ways means that idealistic concepts of technology are always accompanied by the anxiety that they will also promote some kind of loss – loss of connectivity, of intimacy, of desire, of authenticity in some way.” (Sturken & Thomas 2004) we were surprised to realize that this was a far more active concern for those whose expertise depended on control over the technologies that were the subject of the workshop, and not those who were most central to the production of the content offered by the company. This points towards two key findings from the engagement. The first is that where there is resistance to recommendations for a move away from modular solutions and towards transformative capabilities, sensitivity to different enactments of expertise are key. Unless existing expert practices can be reshaped or otherwise adapted to the kinds of practices entailed by a focus on transformative capabilities, a defensive, dismissive, or destructive reaction is possible from those like the CTO, whose existing expertise will be subsumed by such a shift.

The second key insight is not all that different from the lessons Latour drew from examining the history of the pasteurization of France (Latour 1993). While singular inventions and modular capabilities may sometimes be identified as transformative in their own right, they are not enacted or brought to bear on the world without a broad accommodation of the social sphere to the technological apparatus, and of the technical apparatus to the existing practices within the social sphere. As little as Louis Pasteur could accomplish in France on his own, just as little could be done by any one person in the offices of the client in our case study. Rather, ground must first be laid across the organization to accommodate the kinds of changes that any particular form of DS/ML might take. This groundwork can be done purposefully, but requires the active participation of the entire range of actors likely to be impacted by such changes. It also requires working against the emotional grain produced by spectacular demonstrations of DS/ML. The ways in which such spectacles mobilize the sublime are quite persistent, and effectively immunize against alternate understandings of the technologies as anything but modular.

[/s2If]

Leave a Reply