Dec 3, 2025
Why Generic AI Fails in Capital Markets
Author | Audience |
|---|---|
Joe Barhouch, AI Engineer | Business |
The numbers tell a clear story: 42% of companies abandon AI initiatives before they ever reach production. And in financial markets, the cost of failed AI is far higher than wasted spend — it’s missed opportunities and increased risk.
While an impressive 78% of organisations use AI in at least one function, a mere 1% believe they're at maturity. In capital markets, these aren't just disappointing statistics. They are signals of a fundamental mismatch between what generic AI offers and what financial enterprises actually need.
The firms that are getting AI right have all learned the same lesson: finance needs AI with domain knowledge embedded at every layer — not generic models bolted onto workflows.
The Specific Challenges of Capital Markets
ChatGPT impresses in demos and pilots show promise, but the gaps become clear when applied to complex financial workflows. The question isn't whether generic AI is powerful; it's whether power without precision matters when the stakes are regulatory compliance and capital allocation.
Capital markets face challenges that make generic AI insufficient in four critical ways:
1. The High Cost of Error
Imagine an analyst asking for the portfolio’s Q3 exposure to the tech sector through a generic AI assistant layered over position data. The system returns $50M, a number that looks reasonable, passes the gut check, and ends up in a risk committee deck. The actual exposure was $200M because the model applied the wrong time period, the wrong aggregation logic, or the wrong entity resolution.
In marketing, a wrong recommendation might mean a weak campaign. In capital markets, the same failure mode can translate into misreported risk, misallocated capital, and potential compliance breaches. Generic AI was never designed to meet capital‑markets standards for accuracy, auditability, and control.
2. The Structured and Unstructured Data Split
Financial decisions require a synthesis of both structured data (from databases, trading systems, and CRMs) and unstructured data (news, research reports, and earnings transcripts). Although 90% of enterprise data is unstructured and grows 3x faster, less than half is used in generative AI today.
Generic AI treats these as separate problems: query databases with one approach, search documents with another. Financial workflows, however, don't work that way. You need portfolio positions from your database and the latest analyst report on those holdings while understanding what might change your calculation methodology. Fragmenting these inputs means fragmenting your answer.
3. The "Looks Right But Is Wrong" Problem
This is the killer problem. A wealth advisor might ask an agent to calculate weekly returns for a client portfolio — something simple yet often slowed down by fragmented data and tools. The agent generates a 40‑line SQL query with multiple joins and correct syntax, it runs without errors, and a neat chart appears. Everything looks correct.
Under the surface, the agent used a naive average of daily returns instead of the correct method for weekly performance. Or it matched “Apple” to the wrong entity because several names in your systems contain “Apple.” It might even pull a research report from 2022 instead of 2024 because the older document scored higher on cosine similarity to the query; high similarity does not always mean the document is the most relevant or up to date.
These errors are very hard to spot by inspection. The output appears valid and the logic seems sound, but the foundation is wrong, and every decision built on it quietly compounds the mistake.
4. The Lack of Auditability
When auditors ask how you calculated exposure, "ChatGPT generated it" isn't an answer. They need to see the full lineage: Who asked what question? What data sources were accessed? Which queries were executed? What reports were retrieved? What logic was applied?
Generic AI provides lacklustre auditability. Even when logs exist, reconstructing the decision path is difficult, as you can't easily verify which SQL ran, which documents were fetched, or why the system chose one calculation method over another. Trusting answers without a complete audit trail isn't just risky; it is not enough in regulated markets.
The Limits of Generic Tools
Over 80% of financial organisations are integrating AI, but "integrating AI" and "having AI that works in production" are two very different things. This is evidenced by 60% of AI leaders citing legacy integration and compliance as primary barriers, and 95% struggling with the hybrid multi-cloud environments that house sensitive data.
ChatGPT, Claude, and Gemini are remarkable tools that democratised AI and showed what's possible. They were, however, built for general use cases, not capital markets. They don't understand entity resolution in financial databases. They can't distinguish between similarly named securities. They don't know your firm's calculation methodologies or data science standards, and they can't provide the audit trail regulators require.
The architecture that works for writing emails or summarising articles simply doesn't extend to querying complex financial databases or synthesising structured and unstructured data with regulatory precision.
The Reality of Production
Most firms are still trying to make off-the-shelf AI behave like enterprise software—adding prompts, building wrappers, layering on guardrails. The hope is that with enough engineering, generic AI will eventually work for financial workflows.
But prompt engineering can’t overcome architectural limitations. You can’t prompt your way to entity resolution. You can’t bolt on guardrails that teach an AI your firm’s calculation logic. And you can’t turn a black-box model into something fully transparent and auditable by putting a wrapper around it.
The firms that are actually in production—not just experimenting, but scaling AI across investment, risk, and research—built differently from day one. They recognised that finance needs AI with domain expertise embedded at every layer, not retrofitted through prompts.
The split is already underway. Organisations that invested in domain-native architecture are achieving real deployment and measurable value. Those relying on generic tools are still explaining why their pilots haven’t scaled.
