2026-06-20 19:43 UTCIn-site rewrite6 min readUpdated: 2026-06-21 23:31 UTC

When AI Files Your Taxes: Who Pays When It Fails

In 2026, AI tax filing tools have surged in popularity, but tests show they frequently miscalculate refunds by thousands of dollars. Users bear full legal responsibility, while AI companies disclaim liability. This article analyzes the risks and regulatory gaps of AI tax preparation.

SourceHacker News AIAuthor: dxs

June 20, 2026

Tax season 2026 arrived with a peculiar new ritual. Across kitchen tables and home offices, millions of filers uploaded W-2s, 1099s, and brokerage statements not to a human accountant, but to an algorithmic system promising speed, savings, and superior accuracy. The pitch was irresistible: why pay thousands for a professional when an AI agent can ingest your financial life, cross-reference the tax code, and spit out an optimised return in minutes?

One early adopter, Mike Todasco, documented the experiment on his Substack in vivid detail. He pointed OpenAI's Codex at a folder of tax documents, fed it a master prompt, and waited. Three hours and roughly twenty dollars later, the system had processed his return, a task that would have cost him around ten thousand dollars with his usual accountant. The post went viral. The implication was unmistakable: the AI tax revolution had arrived, and it was cheap.

But here is the question nobody racing to upload their documents seems to be asking. When the algorithm gets it wrong, and the evidence suggests it will, who exactly picks up the bill?

The Allure of the Algorithmic Accountant

The shift from tax software to tax agents is one of the defining themes of the 2026 filing season. Having AI “do” your taxes now means deploying large language models and agentic AI systems that pull data from financial institutions, read blurry 1099-K photographs using optical character recognition, categorise thousands of Venmo transactions, reconcile brokerage statements, and surface recent changes in tax law. Intuit, the company behind TurboTax, has gone all in on what it calls “done-for-you” experiences. Its AI engine, Intuit Assist, uses both traditional and generative AI to provide personalised recommendations, flag potential errors in real time, and even deploy a specialised agent, the “1099 Cost Agent,” that can ingest supplemental PDF forms and reason through stock sales to identify the correct cost basis.

Intuit announced in early 2026 that it had paired advanced agentic AI with a nationwide network of 13,000 human experts, creating what it describes as the only all-in-one consumer platform for year-round personal finance management. Credit Karma's Tax Assistant, another Intuit product, claims that members with simple tax situations who answer quick questions throughout the year can have up to 80 per cent of their Tax Year 2025 returns ready to go by filing time. TurboTax Live Assisted is marketed as “the only tax filing solution on the market that provides customers an expert final review at no added cost, ensuring 100 percent accuracy and maximum refund guaranteed.” That guarantee, notably, applies to the human-reviewed product, not to the AI outputs alone.

The competition is just as aggressive. H&R Block launched AI Tax Assist, a product designed to streamline preparation for individuals, the self-employed, and small-business owners. Newer entrants like Hive Tax AI can pull in years of past financial data, automatically organise transactions, and help identify missed deductions. TaxGPT markets itself as an AI tax assistant for individuals, promising to simplify the filing process through conversational interfaces. The message from every corner of the industry is the same: the machines are ready.

Yet the machines, it turns out, are not nearly as ready as the marketing suggests.

When the Maths Does Not Add Up

In early 2025, The New York Times conducted a test that should give every aspiring AI tax filer pause. Reporters ran eight fictional tax scenarios, developed in partnership with tax-filing service TaxSlayer, through four leading AI chatbots: Google's Gemini, OpenAI's ChatGPT, Anthropic's Claude, and xAI's Grok. The chatbots were provided with all necessary forms. The result was sobering. On average, the tools miscalculated the refund or amount owed to the IRS by more than two thousand dollars.

The Times attributed the failures to a fundamental design limitation: AI chatbots do not truly understand the complex relationships among the pieces of information they process, and errors accumulate as tasks become more interconnected. Benedict Evans, a prominent technology analyst, told the newspaper that “the problem with taxes is all those very small little details matter, and it's not going to get every single little detail right.” He acknowledged that the models improve dramatically every six months, but added that they still only give “roughly the right answer,” which is not sufficient for taxes.

The nature of these failures matters as much as their frequency. Large language models are probabilistic systems. They generate outputs based on statistical patterns in their training data, not by executing deterministic calculations. This means that the same input can produce different outputs on different runs, a characteristic that is fundamentally incompatible with the precision required in tax preparation. As multiple experts have noted, the results are “unexplainable” in the formal sense: you cannot go back and audit the reasoning chain the way you can with traditional tax software, where every calculation is traceable to a specific rule in the code.

Independent benchmarking has confirmed the scale of the problem. TaxCalcBench, a rigorous evaluation framework created by Column Tax and published on arXiv in July 2025, tested frontier models on their ability to calculate personal income tax returns. The benchmark uses 51 test cases representing a range of personal tax situations, and a return is considered “correct” only if every evaluated field matches the expected value exactly, reflecting the IRS's own standard. The results were stark. Gemini 2.5 Pro, the best-performing standalone model, achieved just 32.4 per cent strict accuracy. Claude Opus 4 managed 27.5 per cent. GPT-5 reached 41.7 per cent. Common failure modes included consistent misuse of tax tables, errors in tax calculation, and incorrect eligibility determinations.

Even Filed, a company using a multi-agent architecture with validation layers, only achieved 72.5 per cent strict accuracy on complete federal returns, though it reached 94 per cent on a line-by-line basis. Patrick McKenzie, the well-known fintech commentator, has cited 2026 to 2028 as the AI industry's consensus window for when large language models might genuinely be able to “do taxes.” Column Tax itself concluded that the task is likely not automated by the end of 2026, and that achieving it will require strong tax domain expertise and proprietary datasets that go well beyond what general-purpose language models currently possess.

NerdWallet published its own analysis in March 2026, testing ChatGPT, Gemini, and Perplexity on seven tax questions. The team combed through more than 50,000 words of chat transcripts and found that while the chatbots performed well on black-and-white questions, they produced inconsistent answers when the same question was asked multiple times and made assumptions about users that could lead to personalised errors. Sam Taube, NerdWallet's lead writer for investing and taxes, noted that “a couple of years ago, even the cutting-edge AI models couldn't reliably do basic arithmetic,” and that while recent updates have improved their maths skills, “the tendency to cite nonexistent, 'hallucinated' cases in response to legal questions still comes up in 2026.” His summary was blunt: “Taxes involve both of those subjects, math and law. It's not a reliable source of truth yet.”

There is an uncomfortable irony here. Intuit's own vice president of product management has publicly acknowledged that generative AI “doesn't do well with math yet,” which is why TurboTax does not use AI for its actual calculations. Making sure tax code outcomes are accurate, the executive said, is “always job number 1A,” adding: “We don't feel that generative AI is at a point yet where it can do that.” The company that sells the most popular tax software in the world is telling you, in effect, that AI cannot do the thing that millions of people are increasingly using AI to do.

The Accountability Void

If the accuracy picture is complicated, the liability picture is worse. When you sign your tax return, you attest under penalty of perjury that the information is accurate to the best of your knowledge. The IRS holds you accountable for your return's accuracy regardless of what tools or methods you used in preparation. There is no special category for AI-assisted errors. No safe harbour protects you from liability based on reliance on algorithmic outputs. If the AI is wrong, the IRS treats that error as your mistake.

This creates a structural asymmetry that ought to trouble anyone who has uploaded a PDF to a chatbot and clicked “file.” The companies building these tools bear minimal liability for the advice they generate. No contract exists between you and the AI in any meaningful sense. No professional liability insurance covers AI errors. No licensing board can sanction an algorithm for providing incorrect advice. The terms of service for virtually every consumer AI product disclaim responsibility for the accuracy of outputs, often in language buried deep in documents that almost nobody reads.

The contrast with traditional tax preparation is instructive. When you hire a human accountant or a CPA, that professional is bound by licensing requirements, ethical codes, and professional liability standards. If they make an error, there are established mechanisms for recourse: malpractice claims, professional disciplinary proceedings, and often errors-and-omissions insurance that can cover the financial damage. None of these mechanisms exist for AI tax tools. The technology occupies a regulatory gap between “software tool,” which carries product liability, and “professional service,” which carries professional liability. It is treated as neither, and thus escapes both frameworks.

Laura Carrubba, an accounting instructor at George Mason University, has warned bluntly that filers should “never, ever upload any kind of sensitive personal information into a public forum like that.” The privacy risks alone are substantial, but the liability exposure is arguably worse. As one tax professional put it to reporters: “The alibi can't be that ChatGPT told me to do it; that's kind of equivalent to the dog ate my homework.”

For tax professionals who use AI tools in their practice, the picture is somewhat different but no less fraught. Practitioners remain professionally liable for supervising AI-generated advice, ensuring its accuracy in the context of intricate tax laws and client-specific circumstances, and validating recommendations before presenting them to clients. AI developers may bear some responsibility for tool reliability, but current service agreements shift most liability to users. As one widely cited legal analysis put it, “the blame game is perhaps the same as it ever was; the responsibility for competent advice lies with the tax professionals who employ these and other tools.”

Canadian tax professionals have already reported a troubling pattern. A survey found that businesses are losing money after relying on AI tools for financial and tax advice, with tax professionals spotting mistakes on a regular basis. The problem, they warn, is not hypothetical. It is materialising now.

A Landmark Ruling and Its Ripple Effects

The legal landscape shifted significantly in February 2026, when Judge Jed Rakoff of the Southern District of New York issued what appears to be the first ruling to squarely address privilege claims involving generative AI. In United States v. Heppner, the defendant, a corporate executive charged with securities fraud, wire fraud, and making false statements to auditors in connection with an alleged scheme to defraud investors of approximately 150 million dollars, had used a consumer version of Ant

[truncated for AI cost control]