The Hallucination Trap

The Cost of Rushed AI Adoption at the Big Four

Jun 16, 2026

Key Takeaways

Big Four firms are pushing aggressive AI adoption while tying productivity metrics to performance, creating conflicting pressures on staff.
Major reports from KPMG, Deloitte, and EY contained fabricated case studies and citations — in some cases, the majority of references were invented or distorted.
AI-generated “vibe citations” (fake, fused, or heavily altered sources) are proving difficult to detect and are polluting the knowledge base.
These incidents show that rushed AI rollout without strong verification processes leads to reputational and knowledge integrity risks.
As the same AI tools move into audit and assurance work, the potential consequences for financial reporting quality and public trust are significantly higher.
Firms need systematic human oversight, verification workflows, and transparency about AI use rather than relying on speed and productivity gains alone.

Employees at the Big Four are being given conflicting messages. The amount of time they save by using AI is now a performance metric, with dashboards tracking usage and heavy messaging about the productivity gains expected by leadership.

Senior partners are even making it clear that there is no future in the firm for anyone who does not wholeheartedly embrace the technology.

But there is a conundrum – the expectation is that employees not only save time and squeeze more work in during the day, but that they should also avoid the biggest problem that anyone using AI needs to beware of – the hallucination trap.

An AI Mishap at KPMG International

Take the most recent example of one of the Big Four landing in an uncomfortable spotlight after publishing an AI-generated report.

On June 12, GPTZero published an article revealing that one of KPMG International’s flagship reports, “Total Experience: Redefining Excellence in the Age of Agentic AI” (released in October 2025) included several case studies about reputable firms such as UBS, Swiss Federal Railways, and Transport for London that were figments of the AI’s imagination.

The report included claims such as:

UBS “integrates AI agents across investment advisory, risk management and compliance monitoring .. These agents operate within a composable platform co-developed with Microsoft, enabling personalised, efficient and compliant financial journeys.”

Swiss Federal Railways uses AI agents to “help users plan, book, and optimise journeys based on preferences, real-time conditions and carbon impact, turning SBB into a holistic mobility orchestrator”.

Transport for London uses AI agents “to predict and manage congestion, personalise commuter updates and co-ordinate multimodal transport”.

Sounds impressive, right?

Yes – but the problem is that the majority of these claims are simply not true.

When the GPTZero article was published, the companies who were name dropped in the report checked it out and soon published a denial,

UBS called the assertions “factually incorrect,” Swiss Federal Railways confirmed it was “not accurate,” and others including the UK’s NHS and Transport for London told the FT that the claims about their AI usage were untrue or misleading.

“factually incorrect” UBS

KPMG’s AI had been asked to write a report about how great AI is, and it decided to follow the maxim “fake it till you make it,” inventing the majority of the story.

“not accurate” Swiss Federal Railways

“misleading” Transport for London

Of the 45 references in the report, only 5 checked out.

The rest were either totally made up, or even worse, half truths, because the AI got a real paper and twisted its contents to prove its thesis – a phenomenon the authors of the report refer to as vibe citations.

“Vibe Citations can include references that are entirely fabricated (fake authors, fake title, and fake container/locators), fusions of two or more real references (authors of paper A paired with the title of paper B), or paraphrased or heavily altered versions of real citations.”
GPTZero

Now it would be easy to point the finger of blame at the report’s authors, BUT the question is: how much time were they given to research and write the report?

Were they told that they should now be able to do it in half the time they usually do, because they have a trusty assistant AI agent to do the heavy lifting for them?

Did they have the time to go through each reference to check that they were not being led astray by AI hallucinations?

“We expect all our people to follow our guidelines on the responsible use of AI, including human oversight to validate content and verify independent sources.” KPMG

I think that what we are seeing at play is the obvious result of firms that have rushed to adopt the new technology, out of an abundance of FOMO, without first putting in place the guardrails and governance required to ensure that AI does not take its staff for a ride.

It’s Not Just KPMG

Deloitte and EY have also had their major embarrassments here.

Deloitte Got Caught Out Twice – in Australia and in Canada

Targeted Compliance Framework Assurance Review

The first incident was the AU$440,000 “Targeted Compliance Framework Assurance Review” for the Department of Employment and Workplace Relations — a hefty report on how the government automates welfare penalties that was finalized in Summer 2025.

The report included recommendations and options to strengthen the Targeted Compliance Framework in the future – thus potentially having serious real-life consequences.

The problem? The report was riddled with fabricated references to non-existent academic papers and even a made-up quote from a federal court judgment.

Sydney University Deputy Director of Health Law Chris Rudge spotted the bogus citations – referencing academic papers supposedly published by Lisa Burton Crawford, a professor at the University of Sydney law school, and Carolyn Adams, an Honorary Senior Lecturer at Macquarie Law School – and alerted the media.

Deloitte is “advising on a very serious matter that applies to hundreds of thousands of people across the Commonwealth, I would expect a high degree of diligence.”
Chris Rudge, speaking to the Australian Financial Review

The government quietly published a revised version after Deloitte admitted they had used generative AI (Azure OpenAI GPT-4o) for core parts of the analysis without initially disclosing it.

Deloitte confirmed some footnotes and references were incorrect, added a disclaimer to the updated report, and agreed to a partial refund of the fee.

“It is concerning to see research attributed to me in this way. I would like to see an explanation from Deloitte as to how the citations were generated.”
Lisa Burton Crawford, speaking to the Australian Financial Review

Targeted Compliance Framework Assurance Review

Health Human Resources Plan

And then there’s Deloitte Canada.

In May 2025, the firm delivered a 526-page Health Human Resources Plan to the Newfoundland and Labrador government — a $1.6 million report meant to guide critical decisions about hospitals, staffing, and healthcare delivery across the province.

The recommendations made related to recruitment strategies, virtual care, monetary recruitment and retention incentives, and impacts of the COVID-19 pandemic on healthcare workers.

Local journalists later discovered that several citations simply didn’t exist. Papers were referenced that were never published, authors were credited on work they hadn’t done, and some sources appeared to be complete fabrications.

“...if public funds are being allocated to private companies, we should expect higher standards. If a human employee made this error, it would likely result in disciplinary action. This is unacceptable and warrants immediate attention.”
Jerry Earle, President of the Newfoundland and Labrador Association of Public and Private Employees

EY Canada’s Phantom Case Studies

A recent investigation published by GPTZero in May addressed hallucinations in EY Canada’s “Points of Attack: Uncovering Cyber Threats and Fraud in Loyalty Systems” cybersecurity report.

They found that 16 of the 27 references in the report were fictitious, with footnotes pointing to dead pages or information that simply wasn’t there. EY pulled the study down quickly once the GPTZero article went live.

Big Four AI Arms Race

Claudine Cassar

May 20

Read full story

Polluting the Knowledge Base

Beyond reputational damage to the firms themselves, these incidents have a broader, harder-to-reverse effect. When the Big Four publish a report, it is then used as a reputable reference by other entities.

According to GPTZero, for example, the hallucinated facts were then referenced by publications such as CXM, CX Dive, and Mi3, and a Czech newspaper.

Suddenly the “alternative facts” become part of the established canon of knowledge – and simply retracting the report does not fix that problem.

The genie is by then out of the bottle.

The Big Four are probably scrambling to check any report they published over the last year or so – but the more important thing they should be doing is create a system to double check every report generated using AI and being honest about using the technology whenever they do.

They can check out the wording at the bottom of this newsletter for inspiration – using AI is not something to be ashamed of, but covering up its use, and not checking the content of a report generated by AI is.

“AI Use Disclosure: This newsletter was researched and drafted with the assistance of AI tools. All analysis, opinions, judgments, and final edits are fully human. Every fact was verified and the content carefully reviewed by the editor.”

What This Means for Audit

EY has integrated advanced AI capabilities across its global assurance platform supporting more than 160,000 audit engagements. KPMG is using AI to scan millions of accounting entries, Deloitte has embedded GenAI and agentic tools in its Omnia audit platform, and PwC is rolling out end-to-end AI-driven audit solutions expected in 2026.

This raises some uncomfortable questions:

What happens when the same approach used in these research reports moves into audit work?

Have auditors been given the time and tools to properly validate what their AI tools are producing?

And if hallucinations start appearing in audited financials, how quickly could small errors turn into restatements, regulatory action, or a broader loss of confidence in financial reporting?

The uncomfortable reality is that the Big Four are pushing AI into some of their most sensitive work while still struggling to get the basics right on relatively simple research reports.

If firms can’t reliably fact-check citations in a policy paper, audit committees have every reason to ask what safeguards are in place when the same tools are shaping financial statements and assurance opinions.

Clients pay premium rates for professional judgment. At some point, they may start questioning whether they’re still getting it — or whether they’re paying for an expensive form of autocomplete.

Final Thoughts

These incidents are a warning sign. The Big Four have moved quickly to adopt AI, often under pressure to demonstrate productivity gains. But speed has come at the expense of basic quality controls.

The firms will likely improve their internal processes after these embarrassments. But the more important question is whether they’ll slow down enough to make those controls meaningful — or whether the pressure to show AI-driven efficiency will continue to outpace proper oversight.

Emerging best practices show what meaningful controls can look like in practice.

One approach is the introduction of mandatory citation verification workflows: every reference suggested or generated by AI must be independently validated against the original source by a qualified human reviewer, with the reviewer’s name and confirmation logged as part of the deliverable’s quality record.

Another is the adoption of standardised AI disclosure standards in all published reports and client deliverables — clearly stating the extent of AI assistance and confirming that key facts and citations have undergone independent human review. These measures directly address the need for systematic oversight and transparency already highlighted in the key takeaways.

For now, clients and regulators should be paying close attention — and specifically asking whether these kinds of verification workflows and transparency controls are firmly embedded in the firms’ processes. Because when AI errors move from research reports into audited financial statements, the consequences will be significantly harder to walk back.

Big Four AI Arms Race

Discussion about this post

Ready for more?