A.I. Among Us: Agency in a World of Cameras and Recognition Systems

Share Share Share Share Share
[s2If !is_user_logged_in()] [/s2If] [s2If is_user_logged_in()]

Cindy Toddler Monitoring

Cindy is raising her two toddlers in Shanghai with the help of two nannies, her in-laws, a cook, and seven in-home surveillance cameras. Cameras in almost every room are used to monitor activities and behaviors, to understand when a routine is broken, to look for lost items or to trace the root cause of a dispute.

Cindy operates a centralized system where her children are the assets and she is the processing hub. All the analytics run through Cindy who uses the cameras to collect data she uses to monitor and investigate activities in order to shape the behaviors of other actors responsible for her children’s care. In one incident described during our fieldwork,

Cindy goes home to find her son and nanny are napping earlier than the established schedule. Cindy reviews the camera footage to understand what transpired and sees her mother-in-law fighting with the nanny who proceeds to retreat to the bedroom with her son. Cindy understands the context for the earlier nap time and reprimands her mother-in-law via WeChat text. When the nap is over, Cindy instructs the nanny in person about mother-in-law best practices.

In Cindy’s system, the data inputs may be distributed, but analytics and decision-making are centralized. Her system’s performance requires a particular set of members (nannies, parents, in-laws) to align to a particular set of values and practices (regarding food, hygiene, sleep, play) that demonstrate her version of good parenting. Cindy taps her system of cameras to access data and make sense of the actions and events that do and do not follow protocol. This constantly updated contextual insight allows Cindy to intervene and correct the behavior of the other human actors as needed to maintain optimal performance.

St. Nicholas School Safety (USA)

A similar situation unfolds at St. Nicholas of Myra, a private Catholic Pre-K to 8th grade school in a gentrifying urban neighborhood. The principal at St. Nicholas of Myra has recently deployed a facial recognition system. The recognition system is made up of humans, multiple cameras and computer technology. The cameras at St. Nicholas of Myra are used to monitor who comes in and out of the school and “to know the community better.” Unlike either of the HS systems in China, the system at St. Nicholas of Myra only identifies adults, not students or anyone under eighteen. The principal and receptionist see a face and name on the facial recognition system monitors for almost every adult including the milk delivery person and the food staff. This allows the principal and the school receptionist to make sure the right people have access to the school. The system allows the principal and receptionist to identify and greet everyone by name, which they feel fosters a feeling of community. The principal sees his role as making sure the kids are “safe, happy, healthy and holy,” and feels the facial recognition program helps him to achieve those goals.

Ways of Watching

Of course, the staff at HS X, Cindy, and the Catholic school principal actively manage how people act and exert power in their respective systems; a fact that is not dependent on the presence of cameras. They do so in the name of particular kinds of human value, but there are key differences in how that value is produced because cameras are present. In Cindy’s case, value lies in her ability to care for her children the way that she wants through resources she has enlisted (nannies, in-laws, etc.). For Cindy, value is achieved by restricting the capacity of her nannies and in-laws to act independently of her parenting plans and goals, plus introducing the capacity of the camera to document what has taken place. In doing so, Cindy uses the camera as a means of witnessing, producing evidence that she employs, to ends that are of her own choosing. Indeed, the camera data gives Cindy another partial view on what took place—not the nanny’s or her in-law’s. Cindy’s understanding, enabled by the camera, allows her to shape the human links between herself and her nannies and between the nannies and her in-laws (“best mother-in-law practices”). This human work doesn’t disappear; rather the presence of a camera enables it and gives Cindy more direct control over it. Conflicts may be deviations from the plan, but they also give Cindy the opportunity to work on stitching together human relationships that are central to the system.

In the St. Nicholas of Myra case, monitoring access and movement in the school increases social connectedness and an overall sense of community, but does not prevent all bad things from happening. If an unknown person or a person marked by the system (entered manually by the principal) as a “concern” tries to enter the school, the door will not open unless the receptionist or principal unlocks it. For instance, a parent suffering from substance abuse who is not currently allowed to see his kids, will be blocked by the system from entering the school. Here the opportunities for mistakes or misuse are rife, but trust is placed in the principal to make these decisions—extending his capacities to act, but still allowing him to retain authority over the system.

In China’s HS X, school administrators guard against disruption to the learning environment from both inside and out. The disruption can be at the individual or the community level. Anyone not granted access is blocked, just as in the St. Nicholas of Myra system. But this system is more proactive in monitoring internal activities. Kids skipping classes, rough housing, regular visitors going places they aren’t authorized to be, are all behaviors that can lead to a decision to act. Previously, if one of the same people had noticed an irregularity, they would also act. This resembled the system at St Nicholas of Myra, where the principal or receptionist using the camera monitoring system can spot kids hanging out under a main staircase in the school – a place they shouldn’t be during school hours. One key difference is that the camera system brings the situation to the immediate attention of security, or others if they are on the system, so action can happen sooner. The other key difference is in the ability to pull together a series of incidents over time; to create a narrative of what took place. Sam, a student at HS X, was known by the system of technology, security, IT and administration, to skip class occasionally, after checking in on the camera system. He would go out to a remote (unmonitored) part of the garden area on campus, smoke, read books, and work on his homework until the class session ended. They knew he did this because they could see him out of class and entering the garden on video. Security people learned about the smoking. None of that was acceptable behavior generally, but because Sam was one of the top students in his class and did nothing that would hurt or infringe upon his classmates, this was permitted. They school officials were willing to assume that Sam just had days when he needed to get away. The principal at St Nicholas of Myra made similar kinds of decisions when he spotted kids hanging out under the stairs, for instance. He wondered, is this just a kid trying to disappear in the midst of a bad day or are kids engaged in improper or destructive behavior? In both cases, humans continue to own the judgment about the importance of the behavior. Based on a calculation of value, they are willing to interpret and to read between the proverbial lines to explain the student’s behavior beyond what policy permits. Staff or teachers can then speak to the students about their behaviors, and so create new paths for human to human interaction. The human work doesn’t disappear, but is enabled, managed and focused by the cameras.

Agency Denying Systems

Steamed fish today. No chips.

Chinese High School Z had a nutritional system that was powered in part by facial recognition. It was really not “a system,” but five independent projects built upon each other: cafeteria ordering system, cafeteria and cafe payment system, cafeteria delivery system and two different vending machine systems. Besides incorporating different applications, there were at least three different recognition software pieces integral to the system, so even the core underlying programs were not shared. When we visited, all the food a student could acquire on campus was nutritionally noted to generate a recommendation for eating. Based on what the student had eaten, the nutrition was evaluated, scored and recommendations sent to the HS administration, and the student, and the parents. The student could then determine what, if anything, they might change in what they selected to eat. However, the system was doing not always work to enable student-led decisions.

Initially, the school ran the system so that the student would have a meal at the cafeteria that was predetermined, based on a student’s optimal nutritional in-take. If the student’s optimal nutritional in-take exceeded the guidelines on one day, the system would compensate and adjust the guideline to be nutritionally appropriate on the following day. A student could order whatever she wanted as long as it fit the guidelines. In practice this meant that students whose nutritional intake was deliberately constrained might get served steamed fish in the cafeteria instead of the barbecued pork. These same students might have their access to one of the vending machines blocked. Students who mapped to the need for guidelines had virtually no agency to select their own food since the system would make value judgments and constrain decisions on their behalf.

This food selection and decision-making system for students lasted less than a month. Parents and students both complained fiercely (“after all we (parents) we’re paying for the food so our son should be able to choose what he wants”). Parents suggested to school administrators that the school should have a nutritional system similar to Sesame Credit where it would offer rewards, not punishments so students could earn points for special foods, or credits for the vending machines. The HS Z didn’t have a way to economically implement this type of system. Today, the system is designed to enable conversations. It provides students with a view onto how they are doing, from a nutritional standpoint, for the day and for the week, and on how their behavior, indeed performance, matches to the suggested standards from the government. Parents can encourage their kids to eat correctly. They can have conversations with their kids about the administration’s idea of how they should eat. Although, in the course of our research, we did not encounter any stories of parents who reported having those conversations with their kids. Finally, the students can use the report as a guide to reflect on food choices.

With respect to the cases that we observed, China’s recognition systems do not appear to be bad things. The nutrition systems, at least in one case, was redesigned to help to bring awareness to some choices, actions and behaviors; awareness that could be used to adjust behavior towards desired goals. These examples show us that recognitions systems go wrong when they act alone to deny options to humans, who have their own creativity, ingenuity and agency to solve problems. The nutrition system as it operates today has been reduced from an active agent that determines what food is consumed to an off-site coach. The lack of malleability or flexibility for the students in the initial system created a brittle partnership which did not get traction with students or parents. Students were not learning new skills. Parents were frustrated with unseemly distinctions. Both sets of stakeholders were constrained by a system, rather than encouraged to work with it. In China, this sort of system failed.

Personalize It!

Students, teachers, administrators, parents, and even IT people in the schools all talked about the hope that A.I. technology in the schools would increase personalized learning. Squirrel A.I. Learning, a private, A.I.-powered tutoring service in China, had become fairly well known as an after-school program using A.I. to generate personalized drill and practice sessions to improve students’ scores on national tests. The public schools didn’t have a computer per child to replicate that kind of personalized A.I. program. However, they did have cameras in classrooms. One camera set-up was tasked with taking attendance during class and it worked well. In addition to knowing who was in class, the parent-faculty-IT-admin community thought the camera and A.I. could create a better learning environment to know how the students were feeling, and in particular that it would recognize when they were “confused” “bored” or “frustrated” in class.3 The IT-admins contacted a company to build an experimental system for them, though this didn’t work out satisfactorily. The company said it could deliver an attention system that could tell whether a student was paying attention in class or not. Given that a typical class size is around fifty, this was perceived by the school as a way to ensure each student was engaged with the work (and so going to do their best). It would give the teacher insight into which students he or she was able to engage, or not able to reach. Because the key goals of the system were to 1) help students to learn more and 2) improve teacher performance, the system was assumed to cater to all classroom stakeholders. Further, for students and administrations, this would be a means to assure “no teacher bias” in the process of helping the students, or as American’s might say, no favoritism in how attention is distributed to “teacher’s pets.”

The company provided the hardware and software. The system had two A.I. components, a facial recognition component and an affect detection component. The facial recognition was tied to the student ID data base. They guaranteed a 97% accuracy on affect detection, on the specific dimension of attention. The system had one camera mounted at the front of rooms that did an S scan every minute. The system would recognize each face and deliver an “attention” value (yes/no). Nested up at the top of a wall, it was virtually invisible, near to the camera that took attendance.

The teacher had a live report of the class activity (bottom of screen) and an overall report on the class session on his/her computer screen. The teacher was expected to be able to respond in-class to adapt their lesson in order to better engage the students. Students and their parents were sent a report with a percentage assigned to the dimension of “attention” in the class session. The students were supposed to try to improve their overall attention towards the teachers in class in the next session. The administration also had access to the reports on the class session for both students and teachers.

Parents started to complain within a couple of days about “privacy” violations of the system. At a different school there had been leaks of video footage of classroom activity by one of the school’s camera systems. Some of the footage was humorous or embarrassing to some students. Some parents were concerned that video moments when their child was “inattentive” would be caught and “escape” onto the Internet. The system had other problems that were working against it. Although no one disputed the facial recognition part, some felt uncertain that what the system “thought” and what their child was “actually” doing were at odds. For instance, some parents argued that, “My son concentrates with his head down on the desk. He is paying attention not sleeping,” because they feared their child’s behavior would be interpreted as inattentive. While verifying a student’s identity (matched to photos) was perceived to be a straightforward process by parents and students, determining attention was perceived to be an inference. It was subjective. The affect detection technology may have had high accuracy in some dimensions, but it wasn’t accurate in the way the community thought it should be. The school community discovered that it needed a human agent, such as the teacher, to interpret the data and then to take some immediate action, both for effective interpretation and action. The roles in the assemblage needed realignment. The school community learned an important point: that A.I. recognition assemblages are all probabilistic, never 100% accurate. They introduce a new kind of interaction with computer infrastructure that isn’t about 0/1, right-wrong, correct-incorrect, etc. because by definition A.I. will always be wrong at some point, in some circumstance. The community’s solution was to propose to increase the presence of the human agent in the assemblage to help negotiate value for the teachers and students.

All of these insights result in too much complexity to deal with. The affect detection experiment was quickly shut down.

The affect experiment did not work … we learned a lot … we expected too much from the technology and not enough of ourselves… . we’ll continue to experiment with new ways to help students & teachers in schools… . We’re exploring a system that can detect actions like reading, writing, raising hands … That might come before the next affect use – HS Principal.

The community came together to shut down this system. The system did not have a life beyond what its constituents enabled it to have. Social forces prevailed. The teachers, administration, parents and students’ still believed in A.I. recognition technology, and felt it would eventually lead to a better learning environment – a win-win for everyone. The path forward, however, was clearly going to be one of experimentation to enable more learning in the slow process of people forming new relationships with the technologies. “There may never be a perfect system, but we can do better,” said one of the IT people involved in the set-up. The community, however, still had agency to put a stop to the recognition technologies, as well as, to be actively engaged to create what the next recognition technology should be and do.

Perfectly Imperfect: A.I. Is Human Too

Many of the particular systems we have discussed—eating, attention–have been part of larger systems, for instance as extended means to create better learning environments. One of the systems we explored in the USA was the use of facial recognition by a sheriff’s department. What is striking about this context of use is the lack of agency the facial recognition software is granted, and conversely, the ways in which human agency is retained. This might not be surprising were it not for the amount of agency such law enforcement facial recognition applications are believed to have based on repeated, reports about police departments use of facial recognition leading to bad results (Brewster 2019; Einhorn 2019; Garvie 2019; Stat 2019; and White 2019). Facial recognition applications were deemed so bad that San Francisco (Thadani 2019)and Oakland (Ravani 2019) have banned use by police departments and Portland, OR (Ellis 2019) is considering it.

For the Sheriff’s Department of Rock County, facial recognition software is used in a very particular way by one particular department: as a partner in a larger more distributed crime solving team. The sheriff and detectives collect video of a crime. In the case highlighted in our research, they collected video of a theft that had occurred at a local store. Sometimes the video comes from neighborhood cameras, other times from other stores’ security cameras, and still other times, from both. In this case, the footage was from an in-store camera. The guidelines for the sheriff’s department are very clear in that the video does not come from any city or county public cameras, it only comes from private residential or commercial cameras. Often the video from these residential and in-store cameras isn’t good enough quality to be used with the sheriff’s department system.

Once the video is acquired, detectives work with the agency’s Special Investigations Unit using facial recognition software to see if an image of the perpetrator’s face from the store’s surveillance footage is a match with an image from the internal database of convicted criminal mugshots from the county system. An algorithm makes a template of the face, measures the shapes of features and their relative distances from each other. A database consisting solely of convicted persons’ photos from the county is then searched as the source of potential candidates — not photos from the Department of Motor Vehicles, not Facebook, not traffic cameras or the myriad streams of close-circuit TV video from around the city. What’s more, facial “landmarks” are compared without reference to race, gender or ethnicity.

After the software generates a list of possible matches, an investigator assesses their resemblance to the suspect. Typically, there are 5 multiple hits. There is nothing visible to the investigators on the accuracy of the hits—it is simply a list of 5 previously convicted individuals who might be a match for the person in the video. The county realizes that the system is not perfectly accurate. Sometimes, the team decides none of the mugshots is a correct match. If one or more is selected, a review is conducted by detectives and supervisors, noting similarities and differences. If a person is selected from this list, that person becomes an investigative lead. The identification team will provide only a single lead to the case detective. If they affirm a match, the detective proceeds with further research, pursuing it like any other lead they would get, e.g., an anonymous caller, witnesses at the scene, 911 call etc. Notably, no one can be arrested on the basis of the computer match alone. For an arrest to happen, there must be traditional verifiable evidence of probable cause for an arrest. As such, the photo match does not count as legal “evidence.” The facial recognition system is “just one input among many in our 100% human driven investigations” said one of the identification team members. His colleague added, “it provides a simple solution to an otherwise-tedious hunt through photos.” And while the facial recognition doesn’t count as evidence, the investigators see it as at least as reliable a lead as some eye witness accounts.

Other police departments in the USA have tried to give facial recognition systems more power in the police force, as is the case in Orlando, but they have been shut down (Stat 2019). Raji and Buolamwini (2019) examined all commercial facial recognition systems in the USA and highlighted the flaws and inadequacies of the systems in addition to fundamental injustices perpetrated by those inaccuracies. The assumption in these understandings of the facial recognition systems is that they need to have closer to perfect accuracy, operate independently of humans and have trustworthy value. This sheriff’s office is an interesting case in that it assumes the system isn’t perfect, just as the sheriff’s deputies aren’t perfect, and so sets in place a series of procedures to account for [non]human frailties. Technology–human interactions are frequently reduced to being thought of as issues around trust. Trust seems inaccurate to describe the role facial recognition technology is playing. The system has the accountability to discover the suspect, and because the system has many agents in it this accountability is necessarily shared. The ‘black boxing” (Crawford and Schultz 2013) of the recognition system, or the investigator, or the detective, or the eye witness, etc. is not crucial as it is part of a distributed system of action.

FRAMEWORK FOR THINKING ABOUT RECOGNITION SYSTEMS

We have demonstrated a range of uses of A.I. and recognition assemblages. While still new and “cutting edge”, it seems clear to us that these systems are rapidly becoming a commodity infrastructure that even small businesses will be able to build new applications upon. Across the research, we identified seven variables that give us a way to start to account for how these assemblages work and when and why they stop working:

Explicit permission. Does the agent give permission to be part of the system and know? Is it voluntary? Is the person aware of what is being recognized and why? Or is the hidden and unclear?

Recourse – is the path to correct any problems clear and reasonable. Recognition is probabilistic, which means at some point it will be wrong. Knowing this, having an actionable course of action when things are not right is important;

Consistent – is the system deployment consistent with the institution’s stated business interests?

[/s2If]

Pages: 1 2 3 4

Leave a Reply