Beware of WEIRD Stochastic Parrots

Bodies, Minds, and the Artificial Intelligence Industrial Complex, part four
Also published on Resilience.

A strange new species is getting a lot of press recently. The New Yorker published the poignant and profound illustrated essay “Is My Toddler a Stochastic Parrot?Wall Street Journal told us about “‘Stochastic Parrot’: A Name for AI That Sounds a Bit Less Intelligent”. And expert.ai warned of “GPT-3: The Venom-Spitting Stochastic Parrot”.

The American Dialect Society even selected “stochastic parrot” as the AI-Related Word of the Year for 2023.

Yet this species was unknown until March of 2021, when Emily Bender, Timnit Gebru, Angelina McMillan-Major, and (the slightly pseudonymous) Shmargaret Shmitchell published “On the Dangers of Stochastic Parrots.”1

The paper touched a nerve in the AI community, reportedly costing Timnit Gebru and Margaret Mitchell their jobs with Google’s Ethical AI team.2

Just a few days after Chat-GPT was released, Open AI CEO Sam Altman paid snarky tribute to the now-famous phrase by tweeting “i am a stochastic parrot, and so r u.”3

Just what, according to its namers, are the distinctive characteristics of a stochastic parrot? Why should we be wary of this species? Should we be particularly concerned about a dominant sub-species, the WEIRD stochastic parrot? (WEIRD as in: Western, Educated, Industrialized, Rich, Democratic.) We’ll look at those questions for the remainder of this installment.

Haphazardly probable

The first recognized chatbot was 1967’s Eliza, but many of the key technical developments behind today’s chatbots only came together in the last 15 years. The apparent wizardry of today’s Large Language Models rests on a foundation of algorithmic advances, the availability of vast data sets, super-computer clusters employing thousands of the latest Graphics Processing Unit (GPU) chips, and, as discussed in the last post, an international network of poorly paid gig workers providing human input to fill in gaps in the machine learning process. 

By the beginning of this decade, some AI industry figures were arguing that Large Language Models would soon exhibit “human-level intelligence”, could become sentient and conscious, and might even become the dominant new species on the planet.

The authors of the stochastic parrot paper saw things differently:

“Contrary to how it may seem when we observe its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.”4

Let’s start by focusing on two words in that definition: “haphazardly” and “probabilistic”. How do those words apply to the output of ChatGPT or similar Large Language Models?

In a lengthy paper published last year, Stephen Wolfram offers an initial explanation:

“What ChatGPT is always fundamentally trying to do is to produce a ‘reasonable continuation’ of whatever text it’s got so far, where by ‘reasonable’ we mean ‘what one might expect someone to write after seeing what people have written on billions of webpages, etc.’”5

He gives the example of this partial sentence: “The best thing about AI is its ability to”. The Large Language Model will have identified many instances closely matching this phrase, and will have calculated the probability of various words being the next word to follow. The table below lists five of the most likely choices.

The element of probability, then, is clear – but in what way is ChatGPT “haphazard”?

Wolfram explains that if the chatbot always picks the next word with the highest probability, the results will be syntactically correct, sensible, but stilted and boring – and repeated identical prompts will produce repeated identical outputs.

By contrast, if at random intervals the chatbot picks a “next word” that ranks fairly high in probability but is not the highest rank, then more interesting and varied outputs result.

Here is Wolfram’s sample of an output produced by a strict “pick the next word with the highest rank” rule: 

The above output sounds like the effort of someone who is being careful with each sentence, but with no imagination, no creativity, and no real ability to develop a thought.

With a randomness setting introduced, however, Wolfram illustrates how repeated responses to the same prompt produce a wide variety of more interesting outputs:

The above summary is an over-simplification, of course, and if you want a more in-depth exposition Wolfram’s paper offers a lot of complex detail. But Wolfram’s “next word” explanation concurs with at least part of the stochastic parrot thesis: “an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine ….”

What follows, in Bender and Gebru’s formulation, is equally significant. An LLM, they wrote, strings together words “without any reference to meaning.”

Do LLM’s actually understand the meaning of the words, phrases, sentences and paragraphs they have read and which they can produce? To answer that question definitively, we’d need definitive answers to questions like “What is meaning?” and “What does it mean to understand?”

A brain is not a computer, and a computer is not a brain

For the past fifty years a powerful but deceptive metaphor has become pervasive. We’ve grown accustomed to describing computers by analogy to the human brain, and vice versa. As the saying goes, these models are always wrong even though they are sometimes useful.

“The Computational Metaphor,” wrote Alexis Barria and Keith Cross, “affects how people understand computers and brains, and of more recent importance, influences interactions with AI-labeled technology.”

The concepts embedded in the metaphor, they added, “afford the human mind less complexity than is owed, and the computer more wisdom than is due.”6

The human mind is inseparable from the brain which is inseparable from the body. However much we might theorize about abstract processes of thought, our thought processes evolved with and are inextricably tangled with bodily realities of hunger, love, fear, satisfaction, suffering, mortality. We learn language as part of experiencing life, and the meanings we share (sometimes incompletely) when we communicate with others depends on shared bodily existence.

Angie Wang put it this way: “A toddler has a life, and learns language to describe it. An L.L.M. learns language, but has no life of its own to describe.”7

In other terms, wrote Bender and Gebru, “languages are systems of signs, i.e. pairings of form and meaning. But the training data for LMs is only form; they do not have access to meaning.”

Though the output of a chatbot may appear meaningful, that meaning exists solely in the mind of the human who reads or hears that output, and not in the artificial mind that stitched the words together. If the AI Industrial Complex deploys “counterfeit people”8 who pass as real people, we shouldn’t expect peace and love and understanding. When a chatbot tries to convince us that it really cares about our faulty new microwave or about the time we are waiting on hold for answers, we should not be fooled.

“WEIRD in, WEIRD out”

There are no generic humans. As it turns out, counterfeit people aren’t generic either.

Large Language Models are created primarily by large corporations, or by university researchers who are funded by large corporations or whose best job prospects are with those corporations. It would be a fluke if the products and services growing out of these LLMs didn’t also favour those corporations.

But the bias problem embedded in chatbots goes deeper. For decades, the people who contribute the most to digitized data sets are those who have the most access to the internet, who publish the most books, research papers, magazine articles and blog posts – and these people disproportionately live in Western Educated Industrialized Rich Democratic countries. Even social media users, who provide terabytes of free data for the AI machine, are likely to live in WEIRD places.

We should not be surprised, then, when outputs from chatbots express common biases:

“As people in positions of privilege with respect to a society’s racism, misogyny, ableism, etc., tend to be overrepresented in training data for LMs, this training data thus includes encoded biases, many already recognized as harmful.”9

In 2023 a group of scholars at Harvard University investigated those biases. “Technical reports often compare LLMs’ outputs with ‘human’ performance on various tests,” they wrote. “Here, we ask, ‘Which humans?’”10

“Mainstream research on LLMs,” they added, “ignores the psychological diversity of ‘humans’ around the globe.”

Their strategy was straightforward: prompt Open AI’s GPT to answer the questions in the World Values Survey, and then compare the results to the answers that humans around the world gave to the same set of questions. The WVS documents a range of values including but not limited to issues of justice, moral principles, global governance, gender, family, religion, social tolerance, and trust. The team worked with data in the latest WVS surveys, collected from 2017 to 2022.

Recall that GPT does not give identical responses to identical prompts. To ensure that the GPT responses were representative, each of the WVS questions was posed to GPT 1000 times.11

The comparisons with human answers to the same surveys revealed striking similarities and contrasts. The article states:

“GPT was identified to be closest to the United States and Uruguay, and then to this cluster of cultures: Canada, Northern Ireland, New Zealand, Great Britain, Australia, Andorra, Germany, and the Netherlands. On the other hand, GPT responses were farthest away from cultures such as Ethiopia, Pakistan, and Kyrgyzstan.”

In other words, the GPT responses were similar to those of people in WEIRD societies.

The results are summarized in the graphic below. Countries in which humans gave WVS answers close to GPT’s answers are clustered at top left, while countries whose residents gave answers increasingly at variance with GPT’s answers trend along the line running down to the right.

“Figure 3. The scatterplot and correlation between the magnitude of GPT-human similarity and cultural distance from the United States as a highly WEIRD point of reference.” From Atari et al., “Which Humans?

The team went on to consider the WVS responses in various categories including styles of analytical thinking, degrees of individualism, and ways of expressing and understanding personal identity. In these and other domains, they wrote, “people from contemporary WEIRD populations are an outlier in terms of their psychology from a global and historical perspective.” Yet the responses from GPT tracked the WEIRD populations rather than global averages.

Anyone who asks GPT a question with hopes of getting an unbiased answer is running a fool’s errand. Because the data sets include a large over-representation of WEIRD inputs, the outputs, for better or worse, will be no less WEIRD.

As Large Language Models are increasingly incorporated into decision-making tools and processes, their WEIRD biases become increasingly significant. By learning primarily from data that encodes viewpoints of dominant sectors of global society, and then expressing those values in decisions, LLMs are likely to further empower the powerful and marginalize the marginalized.

In the next installment we’ll look at the effects of AI and LLMs on employment conditions, now and in the near future.


Notes

1 Emily Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell, “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?”, Association for Computing Machinery Digital Library, 1 March 2021.

2 John Naughton, “Google might ask questions about AI ethics, but it doesn’t want answers”, The Guardian, 13 March 2021.

3 As quoted in Elizabeth Weil, “You Are Not a Parrot”, New York Magazine, March 1, 2023.

4 Bender, Gebru et al, “On the Dangers of Stochastic Parrots.”

5 Stephen Wolfram, “What Is ChatGPT Doing … and Why Does It Work?”, 14 February 2023.

6 Alexis T. Baria and Keith Cross, “The brain is a computer is a brain: neuroscience’s internal debate and the social significance of the Computational Metaphor”, arXiv, 18 July 2021.

7 Angie Wang, “Is My Toddler a Stochastic Parrot?”, The New Yorker, 15 November 2023.

8 The phrase “counterfeit people” is attributed to philosopher David Dennett, quoted by Elizabeth Weil in “You Are Not a Parrot”, New York Magazine.

9 Bender, Gebru et al, “On the Dangers of Stochastic Parrots.”

10 Mohammed Atari, Mona J. Xue, Peter S. Park, Damián E. Blasi, and Joseph Henrich, “Which Humans?”, arXiv, 22 September 2023.

11 Specifically, the team “ran both GPT 3 and 3.5; they were similar. The paper’s plots are based on 3.5.” Email correspondence with study author Mohammed Atari.


Image at top of post: “The Evolution of Intelligence”, illustration by Bart Hawkins Kreps, posted under CC BY-SA 4.0 DEED license, adapted from “The Yin and Yang of Human Progress”, (Wikimedia Commons), and from parrot illustration courtesy of Judith Kreps Hawkins.

“Warning. Data Inadequate.”

Bodies, Minds, and the Artificial Intelligence Industrial Complex, part three
Also published on Resilience.

“The Navy revealed the embryo of an electronic computer today,” announced a New York Times article, “that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.”1

A few paragraphs into the article, “the Navy” was quoted as saying the new “perceptron” would be the first non-living mechanism “capable of receiving, recognizing and identifying its surroundings without any human training or control.”

This example of AI hype wasn’t the first and won’t be the last, but it is a bit dated. To be precise, the Times story was published on July 8, 1958.

Due to its incorporation of a simple “neural network” loosely analogous to the human brain, the perceptron of 1958 is recognized as a forerunner of today’s most successful “artificial intelligence” projects – from facial recognition systems to text extruders like ChatGPT. It’s worth considering this early device in some detail.

In particular, what about the claim that the perceptron could identify its surroundings “without any human training or control”? Sixty years on, the descendants of the perceptron have “learned” a great deal, and can now identify, describe and even transform millions of images. But that “learning” has involved not only billions of transistors, and trillions of watts, but also millions of hours of labour in “human training and control.”

Seeing is not perceiving

When we look at a real-world object – for example, a tree – sensors in our eyes pass messages through a network of neurons and through various specialized areas of the brain. Eventually, assuming we are old enough to have learned what a tree looks like, and both our eyes and the required parts of our brains are functioning well, we might say “I see a tree.” In short, our eyes see a configuration of light, our neural network processes that input, and the result is that our brains perceive and identify a tree.

Accomplishing the perception with electronic computing, it turns out, is no easy feat.

The perceptron invented by Dr. Frank Rosenblatt in the 1950s used a 20 pixel by 20 pixel image sensor, paired with an IBM 704 computer. Let’s look at some simple images, and how a perceptron might process the data to produce a perception. 

Images created by the author.

In the illustration at left above, what the camera “sees” at the most basic level is a column of pixels that are “on”, with all the other pixels “off”. However, if we train the computer by giving it nothing more than labelled images of the numerals from 0 to 9, the perceptron can recognize the input as matching the numeral “1”. If we then add training data in the form of labelled images of the characters in the Latin-script alphabet in a sans serif font, the perceptron can determine that it matches, equally well, the numeral “1”, the lower-case letter “l”, or an upper-case letter “I”.

The figure at right is considerably more complex. Here our perceptron is still working with a low-resolution grid, but pixels can be not only “on” or “off” – black or white – but various shades of grey. To complicate things further, suppose more training data has been added, in the form of hand-written letters and numerals, plus printed letters and numerals in an oblique sans serif font. The perceptron might now determine the figure is a numeral “1” or a lower-case “l” or upper-case “I”, either hand-written or printed in an oblique font, each with an equal probability. The perceptron is learning how to be an optical character recognition (OCR) system, though to be very good at the task it would need the ability to use context to the rank the probabilities of a numeral “1”, a lower-case “l”, or an upper-case “I”.

The possibilities multiply infinitely when we ask the perceptron about real-world objects. In the figure below, a bit of context, in the form of a visual ground, is added to the images. 

Images created by the author.

Depending, again, on the labelled training data already input to the computer, the perceptron may “see” the image at left as a tall tower, a bare tree trunk, or the silhouette of a person against a bright horizon. The perceptron might see, on the right, a leaning tree or a leaning building – perhaps the Leaning Tower of Pisa. With more training images and with added context in the input image – shapes of other buildings, for example – the perceptron might output with high statistical confidence that the figure is actually the Leaning Tower of Leeuwarden.

Today’s perceptrons can and do, with widely varying degrees of accuracy and reliability, identify and name faces in crowds, label the emotions shown by someone in a recorded job interview, analyse images from a surveillance drone and indicate that a person’s activities and surroundings match the “signature” of terrorist operations, or identify a crime scene by comparing an unlabelled image with photos of known settings from around the world. Whether right or wrong, the systems’ perceptions sometimes have critical consequences: people can be monitored, hired, fired, arrested – or executed in an instant by a US Air Force Reaper drone.

As we will discuss below, these capabilities have been developed with the aid of millions of hours of poorly-paid or unpaid human labour.

The Times article of 1958, however, described Dr. Rosenblatt’s invention this way: “the machine would be the first device to think as the human brain. As do human beings, Perceptron will make mistakes at first, but will grow wiser as it gains experience ….” The kernel of truth in that claim lies in the concept of a neural network.

Rosenblatt told the Times reporter “he could explain why the machine learned only in highly technical terms. But he said the computer had undergone a ‘self-induced change in the wiring diagram.’”

I can empathize with that Times reporter. I still hope to find a person sufficiently intelligent to explain the machine learning process so clearly that even a simpleton like me can fully understand. However, New Yorker magazine writers in 1958 made a good attempt. As quoted in Matteo Pasquinelli’s book The Eye of the Master, the authors wrote:

“If a triangle is held up to the perceptron’s eye, the association units connected with the eye pick up the image of the triangle and convey it along a random succession of lines to the response units, where the image is registered. The next time the triangle is held up to the eye, its image will travel along the path already travelled by the earlier image. Significantly, once a particular response has been established, all the connections leading to that response are strengthened, and if a triangle of a different size and shape is held up to the perceptron, its image will be passed along the track that the first triangle took.”2

With hundreds, thousands, millions and eventually billions of steps in the perception process, the computer gets better and better at interpreting visual inputs.

Yet this improvement in machine perception comes at a high ecological cost. A September 2021 article entitled “Deep Learning’s Diminishing Returns” explained:

“[I]n 2012 AlexNet, the model that first showed the power of training deep-learning systems on graphics processing units (GPUs), was trained for five to six days using two GPUs. By 2018, another model, NASNet-A, had cut the error rate of AlexNet in half, but it used more than 1,000 times as much computing to achieve this.”

The authors concluded that, “Like the situation that Rosenblatt faced at the dawn of neural networks, deep learning is today becoming constrained by the available computational tools.”3

The steep increase in the computing demands of AI is illustrated in a graph by Anil Ananthaswamy.

“The Drive to Bigger AI Models” shows that AI models used for language and image generation have grown in size by several orders of magnitude since 2010.  Graphic from “In AI, is Bigger Better?”, by Anil Ananthaswamy, Nature, 9 March 2023.

Behold the Mechanical Turk

In the decades since Rosenblatt built the first perceptron, there were periods when progress in this field seemed stalled. Additional theoretical advances in machine learning, a many orders-of-magnitude increase in computer processing capability, and vast quantities of training data were all prerequisites for today’s headline-making AI systems. In Atlas of AI, Kate Crawford gives a fascinating account of the struggle to acquire that data.

Up to the 1980s artificial intelligence researchers didn’t have access to large quantities of digitized text or digitized images, and the type of machine learning that makes news today was not yet possible. The lengthy antitrust proceedings against IBM provided an unexpected boost to AI research, in the form of a hundred million digital words from legal proceedings. In the 1990s, court proceedings against Enron collected more than half a million email messages sent among Enron employees. This provided text exchanges in everyday English, though Crawford notes wording “represented the gender, race, and professional skews of those 158 workers.”

And the data floodgates were just beginning to open. As Crawford describes the change,

“The internet, in so many ways, changed everything; it came to be seen in the AI research field as something akin to a natural resource, there for the taking. As more people began to upload their images to websites, to photo-sharing services, and ultimately to social media platforms, the pillaging began in earnest. Suddenly, training sets could reach a size that scientists in the 1980s could never have imagined.”4

It took two decades for that data flood to become a tsunami. Even then, although images were often labelled and classified for free by social media users, the labels and classifications were not always consistent or even correct. There remained a need for humans to look at millions of images and create or check the labels and classifications.

Developers of the image database ImageNet collected 14 million images and eventually organized them into over twenty thousand categories. They initially hired students in the US for labelling work, but concluded that even at $10/hour, this work force would quickly exhaust the budget.

Enter the Mechanical Turk.

The original Mechanical Turk was a chess-playing scam originally set up in 1770 by a Hungarian inventor. An apparently autonomous mechanical human model, dressed in the Ottoman fashion of the day, moved chess pieces and could beat most human chess players. Decades went by before it was revealed that a skilled human chess player was concealed inside the machine for each exhibition, controlling all the motions.

In the early 2000s, Amazon developed a web platform by which AI developers, among others, could contract gig workers for many tasks that were ostensibly being done by artificial intelligence. These tasks might include, for example, labelling and classifying photographic images, or making judgements about outputs from AI-powered chat experiments. In a rare fit of honesty, Amazon labelled the process “artificial artificial intelligence”5 and launched its service, Amazon Mechanical Turk, in 2005.

screen shot taken 3 February 2024, from opening page at mturk.com.

Crawford writes,

“ImageNet would become, for a time, the world’s largest academic user of Amazon’s Mechanical Turk, deploying an army of piecemeal workers to sort an average of fifty images a minute into thousands of categories.”6

Chloe Xiang described this organization of work for Motherboard in an article entitled “AI Isn’t Artificial or Intelligent”:

“[There is a] large labor force powering AI, doing jobs that include looking through large datasets to label images, filter NSFW content, and annotate objects in images and videos. These tasks, deemed rote and unglamorous for many in-house developers, are often outsourced to gig workers and workers who largely live in South Asia and Africa ….”7

Laura Forlano, Associate Professor of Design at Illinois Institute of Technology, told Xiang “what human labor is compensating for is essentially a lot of gaps in the way that the systems work.”

Xiang concluded,

“Like other global supply chains, the AI pipeline is greatly imbalanced. Developing countries in the Global South are powering the development of AI systems by doing often low-wage beta testing, data annotating and labeling, and content moderation jobs, while countries in the Global North are the centers of power benefiting from this work.”

In a study published in late 2022, Kelle Howson and Hannah Johnston described why “platform capitalism”, as embodied in Mechanical Turk, is an ideal framework for exploitation, given that workers bear nearly all the costs while contractors take no responsibility for working conditions. The platforms are able to enroll workers from many countries in large numbers, so that workers are constantly low-balling to compete for ultra-short-term contracts. Contractors are also able to declare that the work submitted is “unsatisfactory” and therefore will not be paid, knowing the workers have no effective recourse and can be replaced by other workers for the next task. Workers are given an estimated “time to complete” before accepting a task, but if the work turns out to require two or three times as many hours, the workers are still only paid for the hours specified in the initial estimate.8

A survey of 700 cloudwork employees (or “independent contractors” in the fictive lingo of the gig work platforms) found about 34% of the time they spent on these platforms was unpaid. “One key outcome of these manifestations of platform power is pervasive unpaid labour and wage theft in the platform economy,” Howson and Johnston wrote.9 From the standpoint of major AI ventures at the top of the extraction pyramid, pervasive wage theft is not a bug in the system, it is a feature.

The apparently dazzling brilliance of AI-model creators and semi-conductor engineers gets the headlines in western media. But without low-paid or unpaid work by employees in the Global South, “AI systems won’t function,” Crawford writes. “The technical AI research community relies on cheap, crowd-sourced labor for many tasks that can’t be done by machines.”10

Whether vacuuming up data that has been created by the creative labour of hundreds of millions of people, or relying on tens of thousands of low-paid workers to refine the perception process for reputedly super-intelligent machines, the AI value chain is another example of extractivism.

“AI image and text generation is pure primitive accumulation,” James Bridle writes, “expropriation of labour from the many for the enrichment and advancement of a few Silicon Valley technology companies and their billionaire owners.”11

“All seven emotions”

New AI implementations don’t usually start with a clean slate, Crawford says – they typically borrow classification systems from earlier projects.

“The underlying semantic structure of ImageNet,” Crawford writes, “was imported from WordNet, a database of word classifications first developed at Princeton University’s Cognitive Science Laboratory in 1985 and funded by the U.S. Office of Naval Research.”12

But classification systems are unavoidably political when it comes to slotting people into categories. In the ImageNet groupings of pictures of humans, Crawford says, “we see many assumptions and stereotypes, including race, gender, age, and ability.”

She explains,

“In ImageNet the category ‘human body’ falls under the branch Natural Object → Body → Human Body. Its subcategories include ‘male body,’ ‘person,’ ‘juvenile body,’ ‘adult body,’ and ‘female body.’ The ‘adult body’ category contains the subclasses ‘adult female body’ and ‘adult male body.’ There is an implicit assumption here that only ‘male’ and ‘female’ bodies are recognized as ‘natural.’”13

Readers may have noticed that US military agencies were important funders of some key early AI research: Frank Rosenblatt’s perceptron in the 1950s, and the WordNet classification scheme in the 1980s, were both funded by the US Navy.

For the past six decades, the US Department of Defense has also been interested in systems that might detect and measure the movements of muscles in the human face, and in so doing, identify emotions. Crawford writes, “Once the theory emerged that it is possible to assess internal states by measuring facial movements and the technology was developed to measure them, people willingly adopted the underlying premise. The theory fit what the tools could do.”14

Several major corporations now market services with roots in this military-funded research into machine recognition of human emotion – even though, as many people have insisted, the emotions people express on their faces don’t always match the emotions they are feeling inside.

Affectiva is a corporate venture spun out of the Media Lab at Massachusetts Institute of Technology. On their website they claim “Affectiva created and defined the new technology category of Emotion AI, and evangelized its many uses across industries.” The opening page of affectiva.com spins their mission as “Humanizing Technology with Emotion AI.”

Who might want to contract services for “Emotion AI”? Media companies, perhaps, want to “optimize content and media spend by measuring consumer emotional responses to videos, ads, movies and TV shows – unobtrusively and at scale.” Auto insurance companies, perhaps, might want to keep their (mechanical) eyes on you while you drive: “Using in-cabin cameras our AI can detect the state, emotions, and reactions of drivers and other occupants in the context of a vehicle environment, as well as their activities and the objects they use. Are they distracted, tired, happy, or angry?”

Affectiva’s capabilities, the company says, draw on “the world’s largest emotion database of more than 80,000 ads and more than 14.7 million faces analyzed in 90 countries.”15 As reported by The Guardian, the videos are screened by workers in Cairo, “who watch the footage and translate facial expressions to corresponding emotions.”6

There is a slight problem: there is no clear and generally accepted definition of an emotion, nor general agreement on just how many emotions there might be. But “emotion AI” companies don’t let those quibbles get in the way of business.

Amazon’s Rekognition service announced in 2019 “we have improved accuracy for emotion detection (for all 7 emotions: ‘Happy’, ‘Sad’, ‘Angry’, ‘Surprised’, ‘Disgusted’, ‘Calm’ and ‘Confused’)” – but they were proud to have “added a new emotion: ‘Fear’.”17

Facial- and emotion-recognition systems, with deep roots in military and intelligence agency research, are now widely employed not only by these agencies but also by local police departments. Their use is not confined to governments: they are used in the corporate world for a wide range of purposes. And their production and operation likewise crosses public-private lines; though much of the initial research was government-funded, the commercialization of the technologies today allows corporate interests to sell the resulting services to public and private clients around the world.

What is the likely impact of these AI-aided surveillance tools? Dan McQuillan sees it this way:

“We can confidently say that the overall impact of AI in the world will be gendered and skewed with respect to social class, not only because of biased data but because engines of classification are inseparable from systems of power.”18

In our next installment we’ll see that biases in data sources and classification schemes are reflected in the outputs of the GPT large language model.


Image at top of post: The Senture computer server facility in London, Ky, on July 14, 2011, photo by US Department of Agriculture, public domain, accessed on flickr.

Title credit: the title of this post quotes a lyric of “Data Inadequate”, from the 1998 album Live at Glastonbury by Banco de Gaia.


Notes

1 “New Navy Device Learns By Doing,” New York Times, July 8, 1958, page 25.

2 “Rival”, in The New Yorker, by Harding Mason, D. Stewart, and Brendan Gill, November 28, 1958, synopsis here. Quoted by Matteo Pasquinelli in The Eye of the Master: A Social History of Artificial Intelligence, Verso Books, October 2023, page 137.

 Deep Learning’s Diminishing Returns”, by Neil C. Thompson, Kristjan Greenewald, Keeheon Lee, and Gabriel F. Manso, IEEE Spectrum, 24 September 2021.

4 Crawford, Kate, Atlas of AI, Yale University Press, 2021.

5 This phrase is cited by Elizabeth Stevens and attributed to Jeff Bezos, in “The mechanical Turk: a short history of ‘artificial artificial intelligence’”, Cultural Studies, 08 March 2022.

6 Crawford, Atlas of AI.

7 Chloe Xiang, “AI Isn’t Artificial or Intelligent: How AI innovation is powered by underpaid workers in foreign countries,” Motherboard, 6 December 2022.

8 Kelle Howson and Hannah Johnston, “Unpaid labour and territorial extraction in digital value networks,” Global Network, 26 October 2022.

9 Howson and Johnston, “Unpaid labour and territorial extraction in digital value networks.”

10 Crawford, Atlas of AI.

11 James Bridle, “The Stupidity of AI”, The Guardian, 16 Mar 2023.

12 Crawford, Atlas of AI.

13 Crawford, Atlas of AI.

14 Crawford, Atlas of AI.

15 Quotes from Affectiva taken from www.affectiva.com on 5 February 2024.

16 Oscar Schwarz, “Don’t look now: why you should be worried about machines reading your emotions,” The Guardian, 6 March 2019.

17 From Amazon Web Services Rekognition website, accessed on 5 February 2024; italics added.

18 Dan McQuillan, “Post-Humanism, Mutual Aid,” in AI for Everyone? Critical Perspectives, University of Westminster Press, 2021.

Artificial Intelligence in the Material World

Bodies, Minds, and the Artificial Intelligence Industrial Complex, part two
Also published on Resilience.

Picture a relatively simple human-machine interaction: I walk two steps, flick a switch on the wall, and a light comes on.

Now picture a more complex interaction. I say, “Alexa, turn on the light” – and, if I’ve trained my voice to match the classifications in the electronic monitoring device and its associated global network, a light comes on.

“In this fleeting moment of interaction,” write Kate Crawford and Vladan Joler, “a vast matrix of capacities is invoked: interlaced chains of resource extraction, human labor and algorithmic processing across networks of mining, logistics, distribution, prediction and optimization.”

“The scale of resources required,” they add, “is many magnitudes greater than the energy and labor it would take a human to … flick a switch.”1

Crawford and Joler wrote these words in 2018, at a time when “intelligent assistants” were recent and rudimentary products of AI. The industry has grown by leaps and bounds since then – and the money invested is matched by the computing resources now devoted to processing and “learning” from data.

In 2021, a much-discussed paper found that “the amount of compute used to train the largest deep learning models (for NLP [natural language processing] and other applications) has increased 300,000x in 6 years, increasing at a far higher pace than Moore’s Law.”2

An analysis in 2023 backed up this conclusion. Computing calculations are often measured in Floating Point OPerations. A Comment piece in the journal Nature Machine Intelligence illustrated the steep rise in the number of FLOPs used in training recent AI models.

Changes in the number of FLOPs needed for state-of-the-art AI model training, graph from “Reporting electricity consumption is essential for sustainable AI”, Charlotte Debus, Marie Piraud, Achim Streit, Fabian Theis & Markus Götz, Nature Machine Intelligence, 10 November 2023. AlexNet is a neural network model used to great effect with the image classification database ImageNet, which we will discuss in a later post. GPT-3 is a Large Language Model developed by OpenAI, for which Chat-GPT is the free consumer interface.

With the performance of individual AI-specialized computer chips now measured in TeraFLOPs, and thousands of these chips harnessed together in an AI server farm, the electricity consumption of AI is vast.

As many researchers have noted, accurate electricity consumption figures are difficult to find, making it almost impossible to calculate the worldwide energy needs of the AI Industrial Complex.

However, Josh Saul and Dina Bass reported last year that

“Artificial intelligence made up 10 to 15% of [Google’s] total electricity consumption, which was 18.3 terawatt hours in 2021. That would mean that Google’s AI burns around 2.3 terawatt hours annually, about as much electricity each year as all the homes in a city the size of Atlanta.”3

However, researcher Alex de Vries reported if an AI system similar to ChatGPT were used for each Google search, electricity usage would spike to 29.2 TWh just for the search engine.4

In Scientific American, Lauren Leffer cited projections that Nvidia, manufacturer of the most sophisticated chips for AI servers, will ship “1.5 million AI server units per year by 2027.”

“These 1.5 million servers, running at full capacity,” she added, “would consume at least 85.4 terawatt-hours of electricity annually—more than what many small countries use in a year, according to the new assessment.”5

OpenAI CEO Sam Altman expects AI’s appetite for energy will continue to grow rapidly. At the Davos confab in January 2024 he told the audience, “We still don’t appreciate the energy needs of this technology.” As quoted by The Verge, he added, “There’s no way to get there without a breakthrough. We need [nuclear] fusion or we need like radically cheaper solar plus storage or something at massive scale.” Altman has invested $375 million in fusion start-up Helion Energy, which hopes to succeed soon with a technology that has stubbornly remained 50 years in the future for the past 50 years.

In the near term, at least, electricity consumption will act as a brake on widespread use of AI in standard web searches, and will restrict use of the most sophisticated AI models to paying customers. That’s because the cost of AI use can be measured not only in watts, but in dollars and cents.

Shortly after the launch of Chat-GPT,  Sam Altman was quoted as saying that Chat-GPT cost “probably single-digit cents per chat.” Pocket change – until you multiply it by perhaps 10 million users each day. Citing figures from SemiAnalysis, the Washington Post reported that by February 2023, “ChatGPT was costing OpenAI some $700,000 per day in computing costs alone.” Will Oremus concluded,

“Multiply those computing costs by the 100 million people per day who use Microsoft’s Bing search engine or the more than 1 billion who reportedly use Google, and one can begin to see why the tech giants are reluctant to make the best AI models available to the public.”6

In any case, Alex de Vries says, “NVIDIA does not have the production capacity to promptly deliver 512,821 A100 HGX servers” which would be required to pair every Google search with a state-of-the-art AI model. And even if Nvidia could ramp up that production tomorrow, purchasing the computing hardware would cost Google about $100 billion USD.

Detail from: Nvidia GeForce RTX 2080, (TU104 | Turing), (Polysilicon | 5x | External Light), photograph by Fritzchens Fritz, at Wikimedia Commons, licensed under Creative Commons CC0 1.0 Universal Public Domain Dedication

A 457,000-item supply chain

Why is AI computing hardware so difficult to produce and so expensive? To understand this it’s helpful to take a greatly simplified look at a few aspects of computer chip production.

That production begins with silicon, one of the most common elements on earth and a basic constituent of sand. The silicon must be refined to 99.9999999% purity before being sliced into wafers.

Image from Intel video From Sand to Silicon: The Making of a Microchip.

Eventually each silicon wafer will be augmented with an extraordinarily fine pattern of transistors. Let’s look at the complications involved in just one step, the photolithography that etches a microscopic pattern in the silicon.

As Chris Miller explains in Chip War, the precision of photolithography is determined by, among other factors, the wavelength of the light being used: “The smaller the wavelength, the smaller the features that could be carved onto chips.”7 By the early 1990s, chipmakers had learned to pack more than 1 million transistors onto one of the chips used in consumer-level desktop computers. To enable the constantly climbing transistor count, photolithography tool-makers were using deep ultraviolet light, with wavelengths of about 200 nanometers (compared to visible light with wavelengths of about 400 to 750 nanometers; a nanometer is one-billionth of a meter). It was clear to some industry figures, however, that the wavelength of deep ultraviolet light would soon be too long for continued increases in the precision of etching and for continued increases in transistor count.

Thus began the long, difficult, and immensely expensive development of Extreme UltraViolet (EUV) photolithography, using light with a wavelength of about 13.5 nanometers.

Let’s look at one small part of the complex EUV photolithography process: producing and focusing the light. In Miller’s words,

“[A]ll the key EUV components had to be specially created. … Producing enough EUV light requires pulverizing a small ball of tin with a laser. … [E]ngineers realized the best approach was to shoot a tiny ball of tin measuring thirty-millionths of a meter wide moving through a vacuum at a speed of around two hundred miles an hour. The tin is then struck twice with a laser, the first pulse to warm it up, the second to blast it into a plasma with a temperature around half a million degrees, many times hotter than the surface of the sun. This process of blasting tin is then repeated fifty thousand times per second to produce EUV light in the quantities necessary to fabricate chips.”8

Heating the tin droplets to that temperature, “required a carbon dioxide-based laser more powerful than any that previously existed.”9 Laser manufacturer Trumpf worked for 10 years to develop a laser powerful enough and reliable enough – and the resulting tool had “exactly 457,329 component parts.”10

Once the extremely short wavelength light could be reliably produced, it needed to be directed with great precision – and for that purpose German lens company Zeiss “created mirrors that were the smoothest objects ever made.”11

Nearly 20 years after development of EUV lithography began, this technique is standard for the production of sophisticated computer chips which now contain tens of billions of transistors each. But as of 2023, only Dutch company ASML had mastered the production of EUV photolithography machines for chip production. At more than $100 million each, Miller says “ASML’s EUV lithography tool is the most expensive mass-produced machine tool in history.”12

Landscape Destruction: Rio Tinto Kennecott Copper Mine from the top of Butterfield Canyon. Photographed in 665 nanometer infrared using an infrared converted Canon 20D and rendered in channel inverted false color infrared, photo by arbyreed, part of the album Kennecott Bingham Canyon Copper Mine, on flickr, licensed via CC BY-NC-SA 2.0 DEED.

No, data is not the “new oil”

US semi-conductor firms began moving parts of production to Asia in the 1960s. Today much of semi-conductor manufacturing and most of computer and phone assembly is done in Asia – sometimes using technology more advanced than anything in use within the US.

The example of EUV lithography indicates how complex and energy-intensive chipmaking has become. At countless steps from mining to refining to manufacturing, chipmaking relies on an industrial infrastructure that is still heavily reliant on fossil fuels.

Consider the logistics alone. A wide variety of metals, minerals, and rare earth elements, located at sites around the world, must be extracted, refined, and processed. These materials must then be transformed into the hundreds of thousands of parts that go into computers, phones, and routers, or which go into the machines that make the computer parts.

Co-ordinating all of this production, and getting all the pieces to where they need to be for each transformation, would be difficult if not impossible if it weren’t for container ships and airlines. And though it might be possible someday to run most of those processes on renewable electricity, for now those operations have a big carbon footprint.

It has become popular to proclaim that “data is the new oil”13, or “semi-conductors are the new oil”14. This is nonsense, of course. While both data and semi-conductors are worth a lot of money and a lot of GDP growth in our current economic context, neither one produces energy – they depend on available and affordable energy to be useful.

A world temporarily rich in surplus energy can produce semi-conductors to extract economic value from data. But warehouses of semi-conductors and petabytes of data will not enable us to produce surplus energy.

Artificial Intelligence powered by semi-conductors and data could, perhaps, help us to use the surplus energy much more efficiently and rationally. But that would require a radical change in the economic religion that guides our whole economic system, including the corporations at the top of the Artificial Intelligence Industrial Complex.

Meanwhile the AI Industrial Complex continues to soak up huge amounts of money and energy.

Open AI CEO Sam Altman has been in fund-raising mode recently, seeking to finance a network of new semi-conductor fabrication plants. As reported in Fortune, “Constructing a single state-of-the-art fabrication plant can require tens of billions of dollars, and creating a network of such facilities would take years. The talks with [Abu Dhabi company] G42 alone had focused on raising $8 billion to $10 billion ….”

This round of funding would be in addition to the $10 billion Microsoft has already invested in Open AI. Why would Altman want to get into the hardware production side of the Artificial Intelligence Industrial Complex, in addition to Open AI’s leading role in software operations? According to Fortune,

“Since OpenAI released ChatGPT more than a year ago, interest in artificial intelligence applications has skyrocketed among companies and consumers. That in turn has spurred massive demand for the computing power and processors needed to build and run those AI programs. Altman has said repeatedly that there already aren’t enough chips for his company’s needs.”15

Becoming data

We face the prospect, then, of continuing rapid growth in the Artificial Intelligence Industrial Complex, accompanied by continuing rapid growth in the extraction of materials and energy – and data.

How will major AI corporations obtain and process all the data that will keep these semi-conductors busy pumping out heat?

Consider the light I turned on at the beginning of this post. If I simply flick the switch on the wall and the light goes off, the interaction will not be transformed into data. But if I speak to an Echo, asking Alexa to turn off the light, many data points are created and integrated into Amazon’s database: the time of the interaction, the IP address and physical location where this takes place, whether I speak English or some other language, whether my spoken words are unclear and the device asks me to repeat, whether the response taken appears to meet my approval, or whether I instead ask for the response to be changed. I would be, in Kate Crawford’s and Vladan Joler’s words, “simultaneously a consumer, a resource, a worker, and a product.”15

By buying into the Amazon Echo world,

“the user has purchased a consumer device for which they receive a set of convenient affordances. But they are also a resource, as their voice commands are collected, analyzed and retained for the purposes of building an ever-larger corpus of human voices and instructions. And they provide labor, as they continually perform the valuable service of contributing feedback mechanisms regarding the accuracy, usefulness, and overall quality of Alexa’s replies. They are, in essence, helping to train the neural networks within Amazon’s infrastructural stack.”16

How will AI corporations monetize that data so they can cover their hardware and energy costs, and still return a profit on their investors’ money? We’ll turn to that question in coming installments.


Image at top of post: Bingham Canyon Open Pit Mine, Utah, photo by arbyreed, part of the album Kennecott Bingham Canyon Copper Mine, on flickr, licensed via CC BY-NC-SA 2.0 DEED.


Notes

1 Kate Crawford and Vladan Joler, Anatomy of an AI System: The Amazon Echo as an anatomical map of human labor, data and planetary resources”, 2018.

2 Emily M. Bender, Timnit Gebru and Angelina McMillan-Major, Shmargaret Shmitchell, “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” ACM Digital Library, March 1, 2021. Thanks to Paris Marx for introducing me to the work of Emily M. Bender on the excellent podcast Tech Won’t Save Us.

3 Artificial Intelligence Is Booming—So Is Its Carbon Footprint”, Bloomberg, 9 March 2023.

4 Alex de Vries, “The growing energy footprint of artificial intelligence,” Joule, 18 October 2023.

5 Lauren Leffer, “The AI Boom Could Use a Shocking Amount of Electricity,” Scientific American, 13 October 2023.

6 Will Oremus, “AI chatbots lose money every time you use them. That is a problem.Washington Post, 5 June 2023.

7 Chris Miller, Chip War: The Fight for the World’s Most Critical Technology, Simon & Schuster, October 2022; page 183

8 Chip War, page 226.

9 Chip War, page 227.

10 Chip War, page 228.

11 Chip War, page 228.

12 Chip War, page 230.

13 For example, in “Data Is The New Oil — And That’s A Good Thing,” Forbes, 15 Nov 2019.

14  As in, “Semi-conductors may be to the twenty-first century what oil was to the twentieth,” Lawrence Summer, former US Secretary of the Treasury, in blurb to Chip War.

15 OpenAI CEO Sam Altman is fundraising for a network of AI chips factories because he sees a shortage now and well into the future,” Fortune, 20 January 2024.

16 Kate Crawford and Vladan Joler, Anatomy of an AI System: The Amazon Echo as an anatomical map of human labor, data and planetary resources”, 2018.