Skip to main content

Noisy Decision-Making, Bias, and Algorithms: Can A.I. Save the Universe?

 

The development of Two Minds Theory was strongly influenced by Nobel laureate Daniel Kahneman's work on how people think. Kahneman's work in economics received the Nobel prize for Prospect Theory, an account of how people make decisions when they have limited information available to them. After that award was made, it became common to see lists of cognitive biases, which are common errors in the way that people process information. (I have argued that many of these can be more parsimoniously explained by immediacy, the difference between how people think when they are asked about their own experiences right now versus how they think when asked for predictions of the future or recollections of the past). Kahneman might not disagree too much: He is probably best known for his book Thinking: Fast and Slow, where he described the difference between "System 1" (what I call the Intuitive mind) and "System 2" (what I call the Narrative mind). Kahneman's picture of Narrative thinking as "slow" is foundational to the idea that Narrative thoughts can't control behavior -- they happen later in the cycle of responses, after an Intuitive-level behavioral response has occurred.

One of Kahneman's central contentions in Thinking: Fast and Slow is that the Intuitive mind almost always produces less helpful responses than logical, Narrative-level thinking. On that point we disagree: I have argued that Intuitive-level responses are often superior to Narrative-level ones in situations where practice and expertise are required or when Narratives lead us into rigid opposing positions. An example given in my original paper about Two Minds Theory is the role of Intuitive-level thought in playing golf or in the strategy of chess masters: practice and experience leads to a level of thought that is able to select the best strategy completely outside of reason or language. I don't think that Kahneman denies this, only that he finds this kind of decision-making relatively uninteresting. His contention is that in higher-stakes decision-making, people would almost always be better off if we let our rational-but-lazy Narrative minds to do most of the thinking.

In a new book titled Noise, Kahneman and colleagues put that concept to the test. His coauthors are a business professor, Olivier Sibony, and a lawyer, Cass Sunstein, who co-wrote the book Nudge with another Nobel-prize-winning psychologist, Richard Thaler. Kahneman and colleagues this time set their sights on a different kind of decision-making failure, which is statistical variability or "noise." Noise is illustrated in the scatterplot at the top of this post, which shows the well-documented relationship between health and wealth. On average, the upward-trending line through the middle of the figure shows that people who have more money also tend to be in better health. But for any specific case, a wealthier person might be in worse health than a poorer one, or a person with better health might have less money than a sicker person does, because the dots representing specific cases don't all fall exactly on the line. Statistically this is called "scatter" or "error," and it shows the degree to which our model of reality based on some set of variables doesn't completely explain all of the differences seen in real life.

Here is how Kahneman and colleagues explain the difference between their previous work on biases and heuristics, versus their newer work on noise:

Cognitive biases and other emotional or motivated distortions of thinking are often used as explanations for poor judgments. Analysts invoke overconfidence, anchoring, loss aversion, availability bias, or other biases to explain decisions that turned out badly. Such bias-based explanations are satisfying, because the human mind craves causal explanations. Whenever something goes wrong, we look for a cause -- and often find it. In many cases, the cause will appear to be a bias.

Bias has a kind of explanatory charisma, which noise lacks. If we try to explain, in hindsight, why a particular decision was wrong, we will easily find bias and never find noise. Only a statistical view of the world enables us to see noise, but that view does not come naturally -- we prefer causal stories. The absence of statistical thinking from our intuitions is one reason that noise receives so much less attention than bias does. 

Another reason is that professionals seldom see a need to confront noise in their own judgments and in those of their colleagues. After a period of training, professionals often make judgments on their own. Fingerprint experts, experienced underwriters, and veteran patent officers rarely take time to imagine how colleagues might disagree with them -- and they spend even less time imagining how they might disagree with themselves. (p. 369, emphasis in original)

Besides fingerprint analysis and insurance underwriting, some of Kahneman et al.'s examples in Noise are the discrepancies among medical professionals making a diagnosis, or the differences in sentencing across Federal judges when defendants are convicted of the same crime. Some of these examples are vulnerable to bias as well, as in the well-documented tendencies for Black defendants to get harsher sentences or for minority patients to receive less preventive screening in medical settings. But the argument here is that they are also affected by noise, such as a tendency for physicians to order fewer medical tests in the patients they see right before lunch, compared to the number they ordered for patients seen first thing in the morning. Another example given is when different customers receive different levels of service from a computer company for a malfunctioning laptop, based solely on which service representative worked with them and how the person was feeling that day. That type of variability is what Kahneman and colleagues want to eliminate.

The most clear-cut way to reduce statistical noise is to replace human judgments with hard-and-fast rules, which often may be expressed in a computer algorithm. Kahneman et al. cite the work of another psychologist whose work I have long admired, Paul Meehl. Working at the University of Minnesota from the 1950s onward, Meehl conducted a series of studies to show that simple rules often produced more accurate decisions than expert judgments, in situations that included predictions about a student's academic performance or the prognosis for a patient with a mental health condition. A key feature of these studies was that there was a right answer - some future state that was not yet known but was ultimately knowable with the passage of time. In that context, experts did worse than algorithms. Meehl then asked the experts to explain their reasoning, and he built new algorithms that incorporated those rules. The new algorithms did work even better than the old ones, but the experts still couldn't beat them. Meehl and colleagues concluded that experts often violate their own rules of thumb, and that when they do they are usually wrong. This line of research is often taken as evidence that we should make more decisions based on algorithms, and leave less room for human judgment. But Kahneman et al. point out that 

Meehl himself was ambivalent about his findings. Because his name is associated with the superiority of statistics over clinical judgment, we might imagine him as a relentless critic of human insight, or as the godfather of quants, as we would say today. But that would be a caricature. Meehl, in addition to his academic career, was a practicing psychoanalyst. A picture of Freud hung in his office. He was a polymath who taught classes not just in psychology but also in philosophy and law and who wrote about metaphysics, religion, political science, and even parapsychology. (He insisted that there is "something to telepathy.") None of these characteristics fit the stereotype of a hard-nosed numbers guy. Meehl had no ill will toward clinicians -- far from it. But as he put it, the evidence for the advantage of the mechanical approach to combining inputs was "massive and consistent." (p. 116)

Kahneman et al. suggest that we should at the very least provide experts with the output of the best-available decision-making algorithms, as one input for their decision-making process. Sometimes the best-available algorithm might just be a simple base rate, such as the percentage of students with a GPA of X who go on to be successful in graduate program Y. In other cases, a model with several variables might be helpful, but using standard regression-based approaches we find that models with relatively few variables are almost always the most predictive. Kahneman et al. make an exception to that rule, however, for modern artificial intelligence (AI) approaches based on machine learning. The key to this approach is to train a computer using a large data set, where the answer to a question is already known: For example, "identify the cases in this group of defendants where the person jumped bail while waiting for trial." If we already know who violated the terms of their bail agreement, and can provide the computer with many examples of people who did versus didn't, with a wealth of demographic and legal information about each of those examples, then the computer can learn to predict which individuals in the future are also likely to be a risk for flight. A parole officer or a judge could be provided with that information in advance to help them make better decisions, or (in what many of us would view as a future dystopia) the machine could be empowered to make those decisions with no human input at all.

Why does the "tyranny of the algorithm" seem so awful to us, if Kahneman is right that this will reduce the total number of errors in the world? He's right that algorithms reduce noise -- if a rule is perfectly applied in every instance, with no opportunity for human judgment, then the level of noise will automatically go to zero. But one worry is that noise will be reduced at the expense of increasing bias. Here's an example from George Zarkadakis's book Cyber Republic:

Algorithms that feed on user citations tend to pose a major threat to the human rights of marginalized groups. Black teenage boys, Noble notes, are to be found [in common web search engine results] next to criminal background check products. Gender equity is also affected. The word "professor" returns almost exclusively white males, as does the word "CEO." Because of that, Google's online advertising, which feeds from search results, shows high-income jobs more often to men rather than to women, thus perpetuating the gender imbalance. Bias in data reinforces social inequalities and prejudices." (p. 41).

The solution to this problem, Kahneman says, is simply to improve the algorithm. But modern AI algorithms are a black box: They cannot explain their thinking, only recognize patterns. Sometimes this is valuable -- in one recent study, an algorithm was able to accurately identify cases of age-related macular degeneration based on a picture of the eye, in some cases even when qualified ophthalmologists didn't see them. The machine-learning algorithm had discovered some slight deviations from the normal picture of an eye that the ophthalmologists weren't aware of. Similar findings have been observed in recent studies of knee arthritis. Dr. Bapu Jena's podcast Freakonomics MD even describes the outputs of these algorithms as the discovery of "new medical knowledge." Yet it is often unexplainable knowledge, information with out a clear rationale that makes sense to our story-seeking Narrative minds. 

Zardakis suggests that this second weakness of AI is even more problematic than the first: "a weakness of deep neural networks is that they cannot explain the reason for their outputs. A combination of data bias and algorithmic inexplicability can be highly problematic when AI systems have an impact on citizens' lives. From a classic liberal perspective, it is politically intolerable as it alienates citizens from the state and transforms the latter into an authoritarian and oppressive machine" (pp. 41-42). The European Union's new rules about AI (which apply only to certain types of high-stakes decision-making) require not only that an algorithm produce unbiased results, but that it be able to explain its thinking -- in other words, to specify what specific variables, in what combination, produced a given result. This "right to explanation" is contrary to the current practice of machine learning, which uses pattern recognition rather than hard-and-fast rules, and the new rule may simply have the effect of outlawing certain uses of AI in Europe. Opponents suggest that human decisions are often similarly unexplainable because of their reliance on "gut feelings" and other Intuitive-mind elements.

Kahneman et al. skirt these difficulties by focusing on the benefits of noise reduction, although they do acknowledge the following challenges:

Noise may be too costly to reduce: a high school could eliminate noise in grading by having five teachers read each and every paper, but that burden is hardly justified. Some noise may be inevitable in practice, a necessary side effect of a system that gives each case individualized consideration, that does not treat people like cogs in a machine, and that grants decision makers a sense of agency. Some noise may even be desirable, if the variation it creates enables a system to adapt over time -- as when noise reflects changing values and goals and triggers a debate that leads to change in practice or in the law. Perhaps most importantly, noise-reduction strategies may have unacceptable downsides. ... Algorithms may produce stupid mistakes that a human would never make. ... They may be biased by poor design or by training on inadequate data. Their facelessness may inspire distrust. Decision hygiene practices also have their downsides: if poorly managed, they risk bureaucratizing decisions and demoralizing professionals who feel their autonomy is being undermined. (p. 375)

These challenges notwithstanding, Kahneman et al. continue to express confidence that rules-based Narrative decision making is better than Intuitive-level decision-making, in terms of reducing noise as well as bias. This conclusion is perfectly in line with Kahneman's prior work, and continues to privilege reason and logical narratives over hunches and intuitions. Yet the very best example of algorithm-based decision-making, modern AI, essentially involves the use of non-rational hunches, supported by no narrative at all, to make decisions. 

It seems indisputable that algorithms do better than experts in certain limited circumstances, involving the accuracy of predictions about observable future states. Yet there are many areas of decision-making in which we cannot use this paradigm, because any action we take will conceivably affect that future state. In other words, the accuracy of an algorithm requires us to do nothing other than observe the result. Once we use that information in any way, we have changed the future state. And across all areas of science we are far better at predicting the future than we are at controlling it, so our intervention has the potential, or even the likelihood, of producing unintended consequences. That might include increasing bias as a consequence of reducing noise. Kahneman's work might be taken as a prescription for more use of AI. I'm concerned that this would exclude valid knowledge that comes from the Intuitive mind, limit some human variability that is actually useful, and replace inexplicable human intuitions with inexplicable machine intuitions that are untempered by empathy or moral values.

Comments

Popular posts from this blog

Why Does Psychotherapy Work? Look to the Intuitive Mind for Answers

  Jerome Frank's 1961 book Persuasion and Healing  popularized the idea of "common factors" that explain the benefits of psychotherapy, building on ideas that were first articulated by Saul Rosenzweig in 1936 and again by Sol Garfield in 1957. Frank's book emphasized the importance of (a) the therapeutic relationship, (b) the therapist's ability to explain the client's problems, (c) the client's expectation of change, and (d) the use of healing rituals. Later theorists emphasized other factors like feedback and empathy that are sub-components of the therapeutic relationship, and that can be clearly differentiated from specific behavior-change techniques like cognitive restructuring or behavioral reinforcement . Additional aspects of therapy that are sometimes identified as common factors include the opportunity to confront difficult past experiences, the opportunity for a "corrective emotional experience" with the therapist, and the chance t

Brain Chemistry is a Metaphor for Depression

You are probably familiar with the idea that depression, anxiety, and other mental health conditions are caused by a " chemical imbalance " or a deficiency of certain neurotransmitters in the brain. This causal explanation became popular in the late 1980s and early 1990s, coinciding with the development of a new set of drugs that treat depression, selective serotonin reuptake inhibitors or SSRIs. The first of these was fluoxetine (aka Prozac or Sarafem: sold by Eli Lilly & Co.). Other drugs in the same class are sertraline (Zoloft: Pfizer), paroxetine (Paxil: GlaxoSmithKline), citalopram (Celexa: Lundbeck), escitalopram (Lexapro: Lundbeck & Forest Labs), and fluvoxamine* (Luvox: Solvay). It became convenient for providers to explain the benefits of antidepressant medication by talking about how they modified brain chemistry: These drugs increase the availability of naturally occurring serotonin neurotransmitter molecules in the brain, by slowing down a process in whic

Chatbot Changes and Challenges in 2023

I wrote last summer  about artificial intelligence tools that are increasingly able to approximate human speech in free-form conversations. These tools then burst onto the public stage with the release of OpenAI's ChatGPT  at the end of November last year. As you probably know by now, the acronym "GPT" stands for "generative pre-trained transformer," which highlights the three most important aspects of this technology: (1) it generates novel responses that aren't based on an a specific algorithm or decision rule, but instead rely on pattern recognition; (2) it has been pre-trained  by consuming massive amounts of writing from the Internet -- much more than a human could read in several lifetimes; and (3) it transforms  those prior writing samples using a trial-and-error process that predicts the next phrase in a sequence until it has come up with a response that seems intelligible to humans. ChatGPT works much like the auto-complete feature in your email or