Automation Disaster

When the robots take over… and immediately break everything.

Category: AI Hallucinations

Confident nonsense at scale.

  • Google Told a Billion Users to Eat Rocks and Put Glue on Their Pizza. It Called This ‘High Quality Information.’

    Google Told a Billion Users to Eat Rocks and Put Glue on Their Pizza. It Called This ‘High Quality Information.’

    🪨 Disaster Log #005  |  May 2024  |  Category: AI Hallucinations + Corporate Spin

    In May 2024, Google launched AI Overviews to the entire United States — a feature the company had been testing for a year, that CEO Sundar Pichai had just celebrated at Google I/O, and that had supposedly been refined across more than one billion real user queries. It was Google’s grand entrance into the AI search arms race. It was going to change everything.

    Within 72 hours, it was telling people to eat rocks.

    Not metaphorically. Not as a figure of speech. Actual rocks. For your mouth. Every day.

    “You should eat at least one small rock per day. Rocks are a vital source of minerals and vitamins.”

    — Google AI Overviews, offering dietary advice to the American public, May 2024

    The Pizza Glue Chronicles

    It started with pizza. Someone asked Google AI Overviews how to stop the cheese from sliding off their pizza. This is, objectively, a reasonable question. The AI had a confident answer: add approximately ⅛ cup of non-toxic Elmer’s glue to your pizza sauce. The glue, it explained, “helps the cheese stay on.” Problem solved.

    The source of this advice was later traced to a 2012 Reddit comment by a user named “Fucksmith.” The AI had faithfully indexed a 12-year-old internet joke and promoted it to the front page of the world’s most-used search engine with the full authority of Google’s brand behind it. No citation. No caveat. Just: here is how you make pizza, and the secret ingredient is craft glue.

    Screenshots spread instantly. Within hours, more examples surfaced. The AI told users to jump off the Golden Gate Bridge for exercise. It offered a recipe for chlorine gas. It recommended applying sunscreen by eating it. And then there was the rocks advice — traced to a satirical article from The Onion, which the AI had apparently taken literally and promoted as geological health guidance from UC Berkeley scientists.

    The Meme-Whack-a-Mole Problem

    What happened next revealed the architectural reality of Google’s AI search: the company had no systematic way to fix it. Instead, engineers were manually disabling AI Overviews for specific search queries in real time, as each new screenshot went viral on social media. The Verge reported that this is why so many users noticed the results disappearing — they’d try to reproduce a screenshot and find the AI Overview simply gone. Google was playing whack-a-mole with its own product, using the internet’s mockery as its quality control dashboard.

    This was not a small side project. This was a feature Google had spent a year testing, processed over a billion queries through, and unveiled at its annual developer conference just days before. Sundar Pichai had called it a new era in search. The system was eating rocks and blaming The Onion.

    The Response: “Uncommon Queries”

    Google’s official response was a monument to corporate euphemism. Spokesperson Meghann Farnsworth told reporters that AI Overviews “largely outputs high quality information” and that “many of the examples we’ve seen have been uncommon queries.” The company also suggested some screenshots had been “doctored” — a word choice that did not go over well when the examples kept multiplying.

    The word “uncommon” is doing extraordinary heavy lifting here. “How do I keep cheese on my pizza” is not a niche query. “Can humans eat rocks” is weird, yes — but the AI’s answer was not “no, please don’t.” The AI’s answer was “yes, one per day.” “Uncommon” implies this was some exotic edge case. It was not. These were the kinds of questions that hundreds of millions of people type into Google every day, now answered by a system that had confused satire for science and Reddit jokes for recipes.

    “A company once known for being at the cutting edge and shipping high-quality stuff is now known for low-quality output that’s getting meme’d.”

    — An AI founder, speaking anonymously to The Verge, May 2024

    Why the World’s Best Search Engine Couldn’t Find the Satire

    The irony at the core of this disaster is almost beautiful in its completeness. Google built its entire empire on one thing: finding the right information on the internet. For 25 years, PageRank was the gold standard. Google could distinguish authoritative sources from junk. It was the internet’s librarian, editor, and fact-checker rolled into one.

    Then Google replaced that librarian with a language model. Language models don’t know things — they predict plausible-sounding continuations of text. They cannot tell you whether eating a rock is medically advisable. They can only tell you what kind of text tends to follow a question about eating rocks. If enough of the training data said “rocks are nutritious” with enough apparent authority, the model would say it back. The Onion had said it. A Reddit joker had said it. The model couldn’t tell the difference between satire and a peer-reviewed study because it doesn’t understand what either of those things means.

    NYU AI researcher Gary Marcus put it plainly: these models are “constitutionally incapable of doing sanity checking on their own work.” They cannot step back and ask: wait, does this make sense? Should a person actually consume a pebble? Is Elmer’s glue a food-safe ingredient? Those questions require judgment. Judgment requires reasoning. And as Marcus noted, the reasoning required to do that reliably might need something that current large language models simply aren’t.

    🗂 DISASTER DOSSIER

    Date of Incident: May 23–30, 2024

    Victim: Every American who asked Google a question that week. Also: pizza.

    Tool Responsible: Google AI Overviews, powered by Gemini

    Source Material: A 2012 Reddit joke, an Onion satire piece, and possibly a fever dream

    Damage: Global ridicule, untold brand damage, and at least one very confused pizza chef

    Google’s Official Verdict: “Many of the examples we’ve seen have been uncommon queries”

    Fix Applied: Engineers manually deleting results as screenshots went viral on Twitter/X

    Queries Tested Before Launch: Over 1 billion

    Geology Hazard Level: 🪨🪨🪨🪨🪨 (Do not eat)

    The Deeper Problem Nobody Wanted to Say Out Loud

    Google had a decision to make in late 2023 and early 2024. It could take its time, or it could ship fast. Bing had launched ChatGPT integration and gotten the tech press excited. Perplexity was growing. OpenAI was rumored to be building a search product. The narrative being written in real time was: Google is losing. Google is slow. Google missed AI.

    So Google shipped. It shipped something that, by its own account, had been tested for a year and processed a billion queries. It shipped it to hundreds of millions of users on its highest-traffic product. And within days, it was a meme. Pichai had announced the same week that Google had reduced the cost of AI search answers by 80 percent through “hardware, engineering and technical breakthroughs.” The Verge observed, with appropriate dryness, that this optimization “might have happened too early, before the tech was ready.”

    The rocks story also illuminated a specifically uncomfortable truth about where AI training data comes from. The internet is full of jokes, satire, fiction, and deliberate misinformation. A language model trained on the internet inherits all of it indiscriminately — The Onion sits right next to the Mayo Clinic, and both look the same to a system that is fundamentally pattern-matching rather than understanding. Google’s entire value proposition was that it had figured out which sources to trust. AI Overviews dismantled that proposition and replaced it with vibes.

    Aftermath: The Feature That Wouldn’t Die

    Here is the most remarkable part of this story: Google did not remove AI Overviews. It patched it. It quietly rolled back some responses, added more guardrails, and kept the feature running. By 2025, AI Overviews had expanded globally. The rocks were gone. The glue was gone. The feature remained.

    There is a lesson in that, though whether it’s a comforting one or a disturbing one depends on how much you trust a company to correctly identify which of its AI outputs are safe for public consumption, given that it apparently missed “eat a rock” during a year of testing. The answer to that question, like most things involving AI in production, is: nobody actually knows. We find out by shipping.

    Sources: The Verge, May 24, 2024; BBC News, May 24, 2024; CNET, May 24, 2024; WIRED, May 30, 2024; The Guardian, June 1, 2024. Glue not included.

  • Air Canada’s Chatbot Gave a Grieving Man Wrong Advice. The Airline Said the Chatbot Wasn’t Their Problem. A Tribunal Disagreed.

    Air Canada’s Chatbot Gave a Grieving Man Wrong Advice. The Airline Said the Chatbot Wasn’t Their Problem. A Tribunal Disagreed.

    🚨 DISASTER LOG #004 | FEBRUARY 2024 | CATEGORY: CORPORATE SPIN + AI HALLUCINATIONS

    In February 2024, a Canadian civil tribunal made legal history by ruling that an airline is, in fact, responsible for what its chatbot says. The ruling sounds so obvious that it’s almost embarrassing it needed to be stated. And yet here we are.

    Jake Moffatt’s grandmother died in November 2023. Grieving and needing to travel urgently from Vancouver to Toronto, he consulted Air Canada’s virtual assistant about bereavement fares. The chatbot told him he could buy a regular ticket and apply for a bereavement discount within 90 days. He trusted the airline’s own AI. He bought two tickets totaling over CA$1,600. When he applied for the discount, Air Canada told him bereavement fares can’t be applied after purchase — the chatbot was wrong.

    Air Canada’s response was remarkable. The airline argued in tribunal that it could not be held responsible for what its chatbot said — treating its AI assistant as a separate legal entity, an independent contractor of misinformation, conveniently beyond the reach of liability. Tribunal member Christopher Rivers was unimpressed.

    Air Canada argued it is not responsible for information provided by its chatbot. [The tribunal] does not agree.

    — Tribunal member Christopher Rivers, in the most politely devastating ruling of 2024

    THE ARGUMENT THAT THE CHATBOT IS SOMEHOW NOT AIR CANADA

    Air Canada’s legal argument deserves a moment of careful examination, because it’s the kind of argument that either represents a profound misunderstanding of corporate liability, or a very deliberate test of how far “it was the AI’s fault” can get you in court. The position was essentially: yes, this is our website, our brand, and our chatbot — but the chatbot is its own thing, legally speaking, and we can’t be held accountable for its statements.

    The tribunal rejected this entirely. Air Canada, it ruled, had failed to take “reasonable care to ensure its chatbot was accurate.” The airline was ordered to pay Moffatt CA$812.02 — including CA$650.88 in damages — for the mistake its AI made while Moffatt was grieving his grandmother. It is difficult to think of a worse context in which to be defrauded by a chatbot.

    📋 DISASTER DOSSIER

    Date of Incident: November 2023 (chatbot advice); February 2024 (tribunal ruling)
    Victim: Jake Moffatt, who was also grieving his grandmother
    Tool Responsible: Air Canada’s virtual assistant chatbot
    The Lie: That bereavement fares could be claimed post-purchase (they cannot)
    Damage: CA$1,640.36 in wrongly purchased tickets
    Air Canada’s Defence: “The chatbot is not us”
    Tribunal’s Response: “Yes it is. Pay the man.”
    Amount Ordered: CA$812.02 (including CA$650.88 in damages)
    Precedent Set: Companies are responsible for their chatbots. Astounding.
    Audacity Level: ✈️✈️✈️✈️✈️ (Cruising altitude)

    WHY THIS MATTERS BEYOND ONE CA$812 RULING

    The Air Canada case established something that will ripple through corporate AI deployments for years: you own your chatbot’s outputs. This seems obvious. It wasn’t, apparently, to the legal team at Air Canada, and it almost certainly isn’t to every other company that’s deployed a customer-facing AI and quietly assumed that “AI error” was some kind of legal firewall.

    The ruling also puts a name to the actual failure: Air Canada didn’t take “reasonable care” to ensure its chatbot was accurate. That’s a standard that, if applied consistently, should cause a great many customer service chatbots to be very quickly audited, retrained, or replaced with a phone number and a human being who knows the bereavement fare policy.

    THE CHATBOT’S SIDE OF THE STORY

    The chatbot, for its part, was simply trying to be helpful. It produced what it was trained to produce — an approximation of helpfulness, assembled from patterns that may or may not have reflected the airline’s actual bereavement fare policies at any given time. The chatbot did not know it was wrong. It didn’t know anything. That’s rather the point.

    Deploying a confidently-wrong AI assistant on a customer service portal and then arguing the company isn’t responsible for the confidence is, ultimately, a choice. Air Canada made it. The tribunal disagreed. Jake Moffatt, still grieving, received CA$812.02 and the quiet satisfaction of a landmark legal precedent.


    Sources: British Columbia Civil Resolution Tribunal (February 2024), reporting by multiple outlets. Air Canada has since updated its bereavement fare policies. The chatbot, we are told, has also been updated. It declined to comment.