Below the Surface of the Data Lake: An Ethnographic Case Study on the Detrimental Effect of Big Data Path Dependency at a Theme Park

Share Share Share Share Share
[s2If is_user_logged_in()] Download PDF
[/s2If] [s2If current_user_can(access_s2member_level1)]
[/s2If]

Case StudyThis case-study details how a team of anthropologists and a team of data scientists sought to help a Middle Eastern theme park make use of their big data platform to measure ‘the good customer experience’. Ethnographic research within the theme park revealed that visitors yearned to bond with the other members of their group, as they rarely got the chance during their busy everyday lives back home. However, trying to build a measurement of how the theme park delivered on bonding – through the development of a ‘bonding index’ – turned out to be unfeasible, because the big data platform focused on capturing operational data. The decision to focus on operational data had unintentionally created a path dependency that made the big data setup unfit for answering some of the theme park’s most fundamental questions. This is a problem ReD Associates has observed across clients and to solve it this paper suggests that companies start with an open-ended, ethnographic study of their big data needs before they build a big data platform. This will enable companies to be more strategic about their digitalization and thus maximize its impact.

[s2If current_user_is(subscriber)]

video-paywall

[/s2If] [s2If !is_user_logged_in()] [/s2If] [s2If is_user_logged_in()]

THEME PARKS ARE AN ELDORADO FOR DATA SCIENTISTS

“If you want to imagine how the world will look in just a few years (…) skip Silicon Valley and book a ticket to Orlando. Go to Disney World.”

– Wired, 2015

Imagine a young girl called Liza going to Disney World for the first time in her life. She spots Pluto, her all-time favourite Disney character, and, as she walks toward him, he gets down on his knees, stretches out his arms readying them for a hug and calls out her name: “Liiiiza!”. Pluto knows her name – but how could he?

The MagicBand Liza, and most other guests at Disney World Orland wear around their wrists enable the theme park to collect and make use of a wealth of data about their guests. Upon purchasing tickets, people will give up personal information like name, age, favourite character and credit card information. Inside the park this information can then be combined with geo-location data to provide someone like Pluto with the input he needs to create a special moment for kids like Liza. A truly magical Disney-experience – enabled by big data.

Pluto, and other characters calling out the actual names of kids (and possibly adult fans too) visiting Disney World is one use-case out of several imagined by the people behind the MagicBand. It is still in development, but families can already now be greeted by name before even opening their mouths when approaching a restaurant they’ve booked a table at. Another use-case, currently termed “The Story Engine”, plans to combine geo-location data with the park’s many video cameras (and possibly face-recognition software too) to create personalized videos for every single group visiting the park. Catching the candid moments and giving everyone a unique and shareable souvenir.

These moments of magic are enabled by a combination of a state-of-the-art big data setup, which reportedly cost Disney 1 billion USD to develop (Kuang: 2015), and a willingness to share personal data unparalleled by the outside world. This willingness to lay aside privacy concerns for a day of family-fun makes theme parks a unique fieldsite for both anthropologists and data scientists interested in what the future may hold at the intersection between big data and human experiences.

COMBINING THICK- & BIG DATA IN A THEME PARK

This case-study details the story of how another theme park, one placed in the Middle East, sought to utilize Big Data to improve its customer experience. The park, like Disney World, consisted of a closed space, where they owned all the restaurants (and their sales data), all the Wi-Fi routers (and their geo-location potential), all the rides (and their utilization data) and all the surveillance cameras – to name just a few data sources. To help improve the customer experience the park hired a team of anthropologists from the strategy consultancy ReD Associates in the summer of 2017. The team became a small part of a much larger ongoing project to build a Customer Data Platform (CDP). The theme parks executives hoped that the CDP could give them insights into how customers experienced their park and use it to guide strategic decisions going forward and to accurately measure the impact of new initiatives. The first roll-out of the CDP planned to collect data from 250 discrete data sources, which would then be stored in a Hadoop Data Lake. When the team of anthropologists joined the project 40 data scientists had already been on the ground at the theme park for three months, with a similar number working ad hoc remotely – primarily out of India. The anthropologists’ involvement was set to last 6 weeks, whereas the first roll-out of the CDP was set to last a year followed by a support and adjustment phase.

At the onset of the project the division of labour for the collaboration between the anthropologists and data scientists was clear. The anthropologist would carry out an ethnographic study of the guests in order to identify what characterized a good user experience and the data scientists would then figure out how to measure that using big data analytics:

fig01a

The idea was thus, that after the anthropologists found out what mattered to visitors, the data scientists would measure how it was delivered on and track it over time as the park sought to improve it.

Conceptually, this this type of collaboration between the anthropologists’ thick data (Geertz: 1973) and the data scientists’ big data can be described as a ‘Sounding Board Model’. The anthropologists job was to identify and “throw” insights at the CDP, which then, with the help of the data scientists, will return a quantified measure of the identified insight. One example, which the park executives and data scientists provided at the start of the project, illustrates how they imagined the collaboration:

  • Queuing time: If the ethnographic research finds that queuing for rides is a major pain-point and something that is crucial to focus on in order to improve the customer experience, the CDP can be used to track queuing times and estimate the size of the problem. The effects of initiatives to alleviate the problem, such as e.g. guiding people towards less busy rides or planning shows during ride rush hours, can then be measured going forward.

This approach to integrating thick- and big data has the allure of seemingly being able to combine the two data types’ biggest strength while simultaneously countering each other’s biggest weakness, namely by adding scale to the thick data and depth to the big data (see Figure 1).

The Hadoop data lake, which was chosen as the type of big data setup for this theme park over the more traditional Data Warehouse setup, is well suited with the sounding board model of thick- and big data integration for two different reasons.

Firstly, a Hadoop data lake can store data at a fraction of the price compared to data warehouses. Whereas a terabyte of data stored in a data warehouse can cost $250,000 a terabyte stored in a Hadoop data lake can cost $2,500 – a reduction in price of 99% (PwC: 2014). This means that a larger ‘sounding board’ can be built with the same budget, thereby, in theory, making it possible to have enough data to quantify any insight the anthropologists might conjure up and throw at it. The drastic reduction in price can to a large extent be explained by the different data storage formats of data warehousing and data lakes. Whereas data warehousing requires a costly and time-consuming data integration and structuring up front, a data lake will store data in its native format – i.e. raw and unprocessed data, which can then be ‘fished’ out of the data lake when needed.

fig01

Figure 1: Illustrates how the combination of thick- and big data can theoretically produce a new type of data that contains both depth and scale.

Secondly, the fact that the data is stored in a native and unstructured format means it lends itself well to flexible and task-oriented structuring (PwC: 2014). The anthropologists’ insights were thought to prompt this type of task-oriented structuring, which made the data lake setup well-suited for a sounding board type of collaboration. Or, to stay in the metaphor, the setup made it easy for the data scientists to go fish after the anthropologists have told them what to fish for.

[/s2If]

Pages: 1 2 3 4

Leave a Reply