The Hallucination Trap
The Cost of Rushed AI Adoption at the Big Four
Key Takeaways
Big Four firms are pushing aggressive AI adoption while tying productivity metrics to performance, creating conflicting pressures on staff.
Major reports from KPMG, Deloitte, and EY contained fabricated case studies and citations — in some cases, the majority of references were invented or distorted.
AI-generated “vibe citations” (fake, fused, or heavily altered sources) are proving difficult to detect and are polluting the knowledge base.
These incidents show that rushed AI rollout without strong verification processes leads to reputational and knowledge integrity risks.
As the same AI tools move into audit and assurance work, the potential consequences for financial reporting quality and public trust are significantly higher.
Employees at the Big Four are being given conflicting messages. The amount of time they save by using AI is now a performance metric, with dashboards tracking usage and heavy messaging about the productivity gains expected by leadership.
Senior partners are even making it clear that there is no future in the firm for anyone who does not wholeheartedly embrace the technology.
But there is a conundrum – the expectation is that employees not only save time and squeeze more work in during the day, but that they should also avoid the biggest problem that anyone using AI needs to beware of – the hallucination trap.
An AI Mishap at KPMG International
Take the most recent example of one of the Big Four landing in an uncomfortable spotlight after publishing an AI-generated report.
On June 12, GPTZero published an article revealing that one of KPMG International’s flagship reports, “Total Experience: Redefining Excellence in the Age of Agentic AI” (released in October 2025) included several case studies about reputable firms such as UBS, Swiss Federal Railways, and Transport for London that were figments of the AI’s imagination.
The report included claims such as:
UBS “integrates AI agents across investment advisory, risk management and compliance monitoring .. These agents operate within a composable platform co-developed with Microsoft, enabling personalised, efficient and compliant financial journeys.”
Swiss Federal Railways uses AI agents to “help users plan, book, and optimise journeys based on preferences, real-time conditions and carbon impact, turning SBB into a holistic mobility orchestrator”.
Transport for London uses AI agents “to predict and manage congestion, personalise commuter updates and co-ordinate multimodal transport”.
Sounds impressive, right?
Yes – but the problem is that the majority of these claims are simply not true.
When the GPTZero article was published, the companies who were name dropped in the report checked it out and soon published a denial,
UBS called the assertions “factually incorrect,” Swiss Federal Railways confirmed it was “not accurate,” and others including the UK’s NHS and Transport for London told the FT that the claims about their AI usage were untrue or misleading.
“factually incorrect” UBS
KPMG’s AI had been asked to write a report about how great AI is, and it decided to follow the maxim “fake it till you make it,” inventing the majority of the story.
“not accurate” Swiss Federal Railways
“misleading” Transport for London
Of the 45 references in the report, only 5 checked out.
The rest were either totally made up, or even worse, half truths, because the AI got a real paper and twisted its contents to prove its thesis – a phenomenon the authors of the report refer to as vibe citations.
“Vibe Citations can include references that are entirely fabricated (fake authors, fake title, and fake container/locators), fusions of two or more real references (authors of paper A paired with the title of paper B), or paraphrased or heavily altered versions of real citations.”
GPTZero
Now it would be easy to point the finger of blame at the report’s authors, BUT the question is: how much time were they given to research and write the report?
Were they told that they should now be able to do it in half the time they usually do, because they have a trusty assistant AI agent to do the heavy lifting for them?
Did they have the time to go through each reference to check that they were not being led astray by AI hallucinations?
“We expect all our people to follow our guidelines on the responsible use of AI, including human oversight to validate content and verify independent sources.” KPMG
I think that what we are seeing at play is the obvious result of firms that have rushed to adopt the new technology, out of an abundance of FOMO, without first putting in place the guardrails and governance required to ensure that AI does not take its staff for a ride.
It’s Not Just KPMG
Deloitte and EY have also had their major embarrassments here.
Deloitte Got Caught Out Twice – in Australia and in Canada
Targeted Compliance Framework Assurance Review
The first incident was the AU$440,000 “Targeted Compliance Framework Assurance Review” for the Department of Employment and Workplace Relations — a hefty report on how the government automates welfare penalties that was finalized in Summer 2025.
The report included recommendations and options to strengthen the Targeted Compliance Framework in the future – thus potentially having serious real-life consequences.
The problem? The report was riddled with fabricated references to non-existent academic papers and even a made-up quote from a federal court judgment.
Sydney University Deputy Director of Health Law Chris Rudge spotted the bogus citations – referencing academic papers supposedly published by Lisa Burton Crawford, a professor at the University of Sydney law school, and Carolyn Adams, an Honorary Senior Lecturer at Macquarie Law School – and alerted the media.
Deloitte is “advising on a very serious matter that applies to hundreds of thousands of people across the Commonwealth, I would expect a high degree of diligence.”
Chris Rudge, speaking to the Australian Financial Review
The government quietly published a revised version after Deloitte admitted they had used generative AI (Azure OpenAI GPT-4o) for core parts of the analysis without initially disclosing it.
Deloitte confirmed some footnotes and references were incorrect, added a disclaimer to the updated report, and agreed to a partial refund of the fee.
“It is concerning to see research attributed to me in this way. I would like to see an explanation from Deloitte as to how the citations were generated.”
Lisa Burton Crawford, speaking to the Australian Financial Review
Health Human Resources Plan
And then there’s Deloitte Canada.
In May 2025, the firm delivered a 526-page Health Human Resources Plan to the Newfoundland and Labrador government — a $1.6 million report meant to guide critical decisions about hospitals, staffing, and healthcare delivery across the province.
The recommendations made related to recruitment strategies, virtual care, monetary recruitment and retention incentives, and impacts of the COVID-19 pandemic on healthcare workers.
Local journalists later discovered that several citations simply didn’t exist. Papers were referenced that were never published, authors were credited on work they hadn’t done, and some sources appeared to be complete fabrications.
“...if public funds are being allocated to private companies, we should expect higher standards. If a human employee made this error, it would likely result in disciplinary action. This is unacceptable and warrants immediate attention.”
Jerry Earle, President of the Newfoundland and Labrador Association of Public and Private Employees
EY Canada’s Phantom Case Studies
A recent investigation published by GPTZero in May addressed hallucinations in EY Canada’s “Points of Attack: Uncovering Cyber Threats and Fraud in Loyalty Systems” cybersecurity report.
They found that 16 of the 27 references in the report were fictitious, with footnotes pointing to dead pages or information that simply wasn’t there. EY pulled the study down quickly once the GPTZero article went live.
Polluting the Knowledge Base
Beyond reputational damage to the firms themselves, these incidents have a broader, harder-to-reverse effect. When the Big Four publish a report, it is then used as a reputable reference by other entities.
According to GPTZero, for example, the hallucinated facts were then referenced by publications such as CXM, CX Dive, and Mi3, and a Czech newspaper.
Suddenly the “alternative facts” become part of the established canon of knowledge – and simply retracting the report does not fix that problem.
The genie is by then out of the bottle.
The Big Four are probably scrambling to check any report they published over the last year or so – but the more important thing they should be doing is create a system to double check every report generated using AI and being honest about using the technology whenever they do.
They can check out the wording at the bottom of this newsletter for inspiration – using AI is not something to be ashamed of, but covering up its use, and not checking the content of a report generated by AI is.
“AI Use Disclosure: This newsletter was researched and drafted with the assistance of AI tools. All analysis, opinions, judgments, and final edits are fully human. Every fact was verified and the content carefully reviewed by the editor.”
What This Means for Audit
EY has integrated advanced AI capabilities across its global assurance platform supporting more than 160,000 audit engagements. KPMG is using AI to scan millions of accounting entries, Deloitte has embedded GenAI and agentic tools in its Omnia audit platform, and PwC is rolling out end-to-end AI-driven audit solutions expected in 2026.
This raises some uncomfortable questions:
What happens when the same approach used in these research reports moves into audit work?
Have auditors been given the time and tools to properly validate what their AI tools are producing?
And if hallucinations start appearing in audited financials, how quickly could small errors turn into restatements, regulatory action, or a broader loss of confidence in financial reporting?
The uncomfortable reality is that the Big Four are pushing AI into some of their most sensitive work while still struggling to get the basics right on relatively simple research reports.
If firms can’t reliably fact-check citations in a policy paper, audit committees have every reason to ask what safeguards are in place when the same tools are shaping financial statements and assurance opinions.
Clients pay premium rates for professional judgment. At some point, they may start questioning whether they’re still getting it — or whether they’re paying for an expensive form of autocomplete.
Final Thoughts
These incidents are a warning sign. The Big Four have moved quickly to adopt AI, often under pressure to demonstrate productivity gains. But speed has come at the expense of basic quality controls.
The firms will likely improve their internal processes after these embarrassments. But the more important question is whether they’ll slow down enough to make those controls meaningful — or whether the pressure to show AI-driven efficiency will continue to outpace proper oversight.
Emerging best practices show what meaningful controls can look like in practice.
One approach is the introduction of mandatory citation verification workflows: every reference suggested or generated by AI must be independently validated against the original source by a qualified human reviewer, with the reviewer’s name and confirmation logged as part of the deliverable’s quality record.
Another is the adoption of standardised AI disclosure standards in all published reports and client deliverables — clearly stating the extent of AI assistance and confirming that key facts and citations have undergone independent human review. These measures directly address the need for systematic oversight and transparency already highlighted in the key takeaways.
For now, clients and regulators should be paying close attention — and specifically asking whether these kinds of verification workflows and transparency controls are firmly embedded in the firms’ processes. Because when AI errors move from research reports into audited financial statements, the consequences will be significantly harder to walk back.







