Who and What Drives Algorithm Development: Ethnographic Study of AI Start-up Organizational Formation

Share Share Share Share Share
[s2If !is_user_logged_in()] [/s2If] [s2If is_user_logged_in()]

DEEP LEARNING EARLY STAGE START-UP FEATURES

How did these expectations measure against our real-life start-up experience?

Features of our early stage start-up were less than magical and included product discovery instead of product development, immediate press coverage and AI hype, fear of replacing radiologists by robots, extreme leadership pains, and intense VC oversight and intervention at the expense of a vulnerable organizational culture. It comprised of a mix of possibility, struggles over knowledge to be brought into the organization and algorithm architecting and iteration. Trial and error were essential ingredients to gaining replicable results and to rapidly building an early stage start-up.

In daily operations, the team’s approach to capturing disease was not at first to understand it but to start with a particular, by identifying the features of lungs, their shapes, edges, anomalies. Our team had to first segment thousands of lungs before we could begin to achieve any results with our algorithms in identifying lung nodules. This meant using chest x-ray and chest CT’s pixel data and annotations from publicly available data sets that were by no means perfect data and required preprocessing and extensive labeling. The lungs are a vascular world onto themselves. Millions of veins spread out into the lungs and different types of scar tissue can be present that obscure cancerous nodules until late stage becomes incurable. In building our algorithms we were working with convolutional neural networks (CNNs) that have the ability to auto discriminate image features.

This was an exciting time to work at an early AI start-up. Deploying and testing CNNs was a creative endeavor in 2014 when these algorithms were not in wide application for medical imaging analysis. It was a time in which venture capital investors were smitten with deep learning algorithms and needed only a good team and a good idea to trigger an investment. It was a time in which the horizon of what was algorithmically possible in medicine was at an inflection point but the practical application and proof of good performing algorithms were sometimes daunting to demonstrate. Additionally, this excitement came with the price of reducing human/patient complexity to the purity of an all-powerful algorithm that could be generalized across medical contexts.

Ideas and conflicts abounded. They were worked through when we took ‘one-on-one’ walks. We walked along grassy walk ways puzzling through who to bring in as consultants or full-time employees (FTEs). We white-boarded approaches. As I walked with the lead scientist we often considered bringing in oncologists, radiologists, primary care physicians. Social scientists were seen as “not useful.” The social sciences and ethnographic knowledge were a hinderance to successful algorithm development at this time. Clinical expertise was considered but walled off, kept as consultants, advisors, reduced to domain knowledge. In other words, the highly skilled area of radiology could not travel well beyond radiology but data science could travel and exceed radiology workflows, image interpretation, disease classification, sub specialties and the human. What we faced during this period was how to formulate problems and how to formulate a diverse team who held particular forms of knowledge appropriate to the problems we were trying to solve. It was difficult and slippery discussions. No one seemed to have a magic-bullet answer.

When faced with a lack of diversity of knowledge around algorithm development Jeff Dean, Head of AI at Google has stated:

I am personally not worried about an AI apocalypse” but “I am concerned about the lack of diversity in the AI research community and in computer science more generally” (Dean 2016).

Not only he was concerned about encouraging people from different backgrounds to build algorithms, he was thoughtful that certain forms of thinking may not get into algorithm development. Experts from the Google Brain Residency program, which would have been a feeder for such diversity recruitment were composed of “physicists, mathematicians, biologists, neuroscientists, electrical engineers, as well as computer scientists” (Dean 2016). These were largely STEM practitioners. This range of diversity did not include unexpected perspectives. Diversity appeared narrow, bounded.

The fear-embrace of computational forms could be viewed as an epistemological tension between those whose knowledge contributed to algorithm development and those whose experience and knowledge was viewed as consultative. Such consultative knowledge was typically referred to as “domain expertise” and could be reduced to a kind of consulting artifact. As indicated in Dean’s statement domain expertise may not have gone completely unacknowledged, it was called for but not followed through with as evidenced in Dean’s listing. A STEM defined in this way could screen out social sciences, physicians, policy experts, artists, ethicists, community members and patients to name a few. Such screening out came in the form of particular/local knowledge that was perceived as not algorithmically scalable across industries. Could ethics scale across industries? Could a patient’s experience of navigating and overcoming a deadly cancer and a fractured healthcare system scale? We are not questioning Google’s idea of inclusiveness or diversity. We are pausing on what/who gets persistently divided up as contributory and held up as core knowledge in algorithm development in an organizational context.

When it comes to marginal, diverse or unexpected perspectives not held in high regard in algorithm development, two words come to mind – problem formation. The kinds of problems that get identified and privileged for algorithm development are shaped by the kinds of people who are brought together to identify and attempt to solve those problems. Problem formation is as valuable as problem solving. A red light is as valuable as a green light for a specific algorithm project. When diversity of perspectives, experiences and background is a slogan-only proposition, acute problems simply are invisible to machine learning innovation or worse, they are seen as exciting problems that have positive social value when they instead have potential negative societal consequence. Different kinds of problems drive different outside engagement practices.

PROBLEM FORMATION COMING THROUGH THE DOOR

Machine learning problem formation could come through the door from strategic partners and data providers. Problems were not always defined from behind the organizational door among team members. Locating and defining a problem that fit the capabilities of algorithm development was sometimes called “product-market fit” but in terms of our case it was also an effort to locate large data sets and then allow the problem to emerge from the data. An attractive offering of large data sets could supersede a more sober problem formation process. The team around the table and their backgrounds and training often determined which problem got selected that could set the organization down a developmental road for months and years.

An Example

It was a bright brisk northern California day, the sun bouncing off pavement at a nearby corporate square. It couldn’t be more gorgeous. For months we had been negotiating the terms of a partnership with a large health insurer that promised near-term revenue for us. The health insurer team arrived outside the front of our building and discussed things before collectively announcing themselves and walking in. They greeted us in sharply dressed suits and strong handshakes. We all entered a meeting room and settled around a glass table with a white-board of diagrams of medical imaging archiving, workflows and model layers. I turned the white board around and they began by introductions and then launching into the billions of transactions they do each day. One of their team members a man in his mid 50s hair combed straight back who appeared to be a 1970’s version of a suburban executive, call him J., began describing their value in terms of transactional data and the possible extent of a strategic partnership with them, “as you know, we have all the data on the patient journey, how the patient is treated and if they go to X hospital to Y pharmacy, rehabilitation center, pharmacy – you know, we have the whole thing.” He punctuated his brief description by “we have more patient data than we even know.” It was billons of transactions. The project as they described it, was for us to build algorithms using this massive transactional data set.

“We want to eliminate unnecessary [insurance] audits, get rid of them all together if we can.” He said.

“You mean a percentage of the audits you conduct” I responded.

“Yes, they’re a waste of time.”4

Their machine learning lead engineer, a man in his mid-thirties in a plaid shirt began to lay out some ideas around predictors and unsupervised learning strategies that could work together to assist in this direction. The goal was to identify the probability of medical fraud and reduce unnecessary insurance audits. They were looking for precision in what was known as “fraud detection.”

Algorithm development was to provide a means to rank medical practices in terms of probability of committing fraud not in terms of actually committing fraud. On one level such algorithms could save private practices the back-breaking process of unnecessary insurance audits that could cause mountains of paperwork, anxiety, and administrative time. They wanted our algorithm development to share in a moral victory for physicians and physician practices, we would be the good guys, less audits less onerous oversight less unnecessary phone calls and less legal expenses for the private medical practice. It was a hero’s problem and we could be the heroes to solve this problem.

To get the problem in focus I paraphrased it for our meeting to make sure I understood. “You want us to help you build predictive models based on transactional and temporal data that would save millions of dollars of wasted effort in wrongful audits of potentially thousands of private practices.”

“Not wrongful, but yes to help save millions in unnecessary audits.” He said, they would invest and provide all the data we needed.

Around the table I could feel a kind of moral victory flag being raised. But something was not quite right.

I asked if their mission of reducing audits was all they were intending with our algorithms. He said it was “hard to determine future uses” and technology was “moving so fast” but this was absolutely their “main use case.”

“If we are building algorithms to assist you in identifying less unnecessary audits couldn’t these same algorithms be repurposed to help you identify more audit opportunities? You want us to help you reduce audits and I understand this, but couldn’t you just as easily increase your audits with our technology? Couldn’t you become an audit powerhouse in some way?

“That would not be good for business” he said and then defensively mentioned “we’re looking for the right team, it’s a great opportunity.” I felt our CEO shift in his chair. We needed the data.

Then J. said something more interesting. “In reality”, we “rarely commit unnecessary audits, our approach to audits even if they do occur are over years.” He took a drink of water as if to refuel and then said, “for example, we never audit a practice twice in the same year and most practices never get audited at all.”

My own experience was different. Coming from a private surgical practice background I had been part of exactly such unnecessary insurance audits and their impact on staff. These effects were fresh on my mind. In fact, my practice was audited twice in the same year by this same insurer with no upcoding or wrong doing of any kind found. I was the one who actually experienced this back-breaking administrative work first hand. I experienced late hours, certified letters, operation reports and patient records reassembled daily based on changing requests from the insurance auditing department. How could he have known that I as a data scientist could possibly have experienced this same insurance auditing process. I kindly responded.

“I think that may be inaccurate, my practice was audited twice by you last year and we came away with [our] claims in order.”

He quickly shot back. “That’s very unusual, we don’t conduct audits this way. The name of your practice?”

The point was that they did indeed operate this way and audited more than once in the same year and unnecessarily. From their perspective such algorithm development was a win-win saving time and money for their insurance company and saving human distress and labor for private medical practices across the country. However, “future uses” were to be determined.

Digging deeper, algorithms that were designed to decrease audits could be weaponized to increase audits across millions of medical practices. Physicians could learn quickly to avoid complex patients that might have presented a risk for a billing error or they might experience another consequence of the ongoing threat of audit, and bail out of independent practice altogether and become an employee of the local hospital leaving billing responsibility and legal exposure to the hospital. The ongoing threat of medical audit could reshape networks of private medical practice ownership. No doubt medical fraud has been a key challenge in U.S. healthcare with the justice department in July 2018 announcing 1.3 billion in fraudulent claims across doctors and treatment facilities. A large number of these were for over-prescription of opioids and false billings. On the other hand, delinquency aside, the impact of unnecessary and aggressive insurance audits known as “fraud detection” could collapse a resource-strapped medical practice, drive physicians from owning their own practices, encourage consolidation of medical practices by large health systems and breed a culture of reimbursement fear. There was irony in an early stage machine learning organization asked to take on such a project that could accelerate the destruction of the very kind of organizational ethos it holds so dear: entrepreneurism.

In the current climate of insurance audits it was not the evidence of fraud but the evidence of a mistake that could trigger an audit. A single mistake could trigger a multi-year audit over hundreds or thousands of patient encounters. In my previous experience a misspelled word in an operation report could trigger a process that would lead to a claw-back of hundreds of thousands of dollars. It could cause crushing legal fees, employee burn out and an ever-present anxiety of the next audit always around the corner. Just the threats of audits could be the quickest means for driving out independent ownership of medical practices and the quickest way of controlling (reducing) the complexity of patients that a physician accepts. The more complex the patient the more likelihood of a mistake, no matter how small or administratively mundane.

In our meeting the question was who/what would set the criteria for such fraudulent probabilities? Would criteria be set by this specific insurer? By the insurance industry, by Medicare/Medicaid or those holding federal office? Features could be identified and built into algorithms and customized over time. What defined an outlier billing event could shift, and such outliers could be categorized as suspicious. Such “suspicious” billings events could tilt towards criminalizing practices.

Thus, problem formation coming through the door could hold both, human benefit and harm. We discovered the negative impact of fraud detection at this time not because we were skeptical of health insurance companies but because specific experience was at hand that gave dimensionality to the problem they offered and scope to its downstream cascading possibilities. This expertise was domain specific but was central in making a decision for the entire organization. However, herein laid a problem regarding domain expertise which most typically was disregarded in machine learning start-ups.

[/s2If]

Leave a Reply