Fable 5 Is Back, So We Ran an OWASP Top 10 Security Assessment on PentaTrail's Code

When a strong model for verification lands in your lap, point it at your own code first

As we wrote in Superhuman AI Needs Source Code, Anthropic's top-tier models that read source code to find vulnerabilities (Fable 5 / Mythos) had been suspended under export controls. That Fable 5 is available again.

We ran an OWASP Top 10 self-audit against PentaTrail's codebase. This post is a record of exactly that.

Method

We split the ten OWASP Top 10 lenses across six agents, ran them in parallel, and had them flag findings by Critical / High / Medium / Low. Asking a single agent to "look at everything" is weaker than narrowing the lens and fixing the role—precision goes up and oversights go down.

Static review: not black-box probing from the outside, but reading the latest source code directly (UI, APIs, database definitions, scanners), with the recent fixes already in.
Parallel, background execution: the six ran concurrently; results were collected as each finished and reconciled.
Severity classification: each finding was sorted into Critical / High / Medium / Low.

Each owner got the same rules: every finding must cite a real location (file and line); don't write from imagination; don't fix one instance of a pattern and stop—sweep them all. The six don't see each other's conclusions. Looking at the same code from independent angles—rather than making one inspector smarter—is what holds up against the quiet oversights.

How we split the ten lenses

The OWASP Top 10 distills the weaknesses most common in web apps into ten shapes.

Agent 1: Access Control (A01) + Authentication (A07) — can you see someone else's data, or impersonate them?
Agent 2: Cryptography (A02) + Integrity (A08) — misplaced secrets, tampering, supply-chain trust
Agent 3: Injection (A03) — into SQL, the page, or a command
Agent 4: Insecure Design (A04) + Logging & Monitoring (A09) — logic holes, can't notice / can't trace
Agent 5: Misconfiguration (A05) + Outdated Components (A06) — loose headers/permissions, known flaws in dependencies
Agent 6: SSRF (A10) — tricking the server into reaching inside

Don't stop at "found it"—verify against real data

This is the crux. When an AI says "this is dangerous," that's only a hypothesis. Count self-reported "confirmed" at face value and it usually inflates.

The key is to look at the state as it is right now, not the change history (the diffs): the permission settings, the function bodies, whether a key is truly revoked—settled by fact, not by guess. In fact, several findings that looked dangerous fell away once we queried the live database and found them already properly closed, or already revoked. Skip this step, and holes you thought you'd sealed remain.

The result — by Critical / High / Medium / Low

OWASP Top 10	Medium	Low	What it was (blurred)
A01 Broken Access Control	0	0	Tenant boundaries and permissions. No crossing even under live DB checks — zero findings
A02 Cryptographic Failures	0	2	Tightening how secrets are handled, operationally
A03 Injection	0	1	Wrapping the spots that handle external input one layer thicker, just in case
A04 Insecure Design	0	3	Resistance to unexpected use, and failing to the safe side
A05 Security Misconfiguration	2	4	Nudging header/permission defaults toward the stricter side
A06 Vulnerable/Outdated Components	1	1	Bringing a few dependencies up to newer versions
A07 Auth Failures	0	0	Passkey-centric auth is sound — zero findings
A08 Integrity Failures	1	1	Raising the confidence in the provenance of ingested data and executables
A09 Logging & Monitoring	0	2	Masking records and not leaving extra info in production
A10 SSRF	1	2	Enforcing that the server's outbound path is strictly pinned
Total	5	16

Critical and High are zero in every lens. What remained were 5 Medium and 16 Low—all "one notch thicker" items for the layered defense, none of them exploitable fatal flaws. As for the findings raised, we fixed every one of them the same day.

Wrapping up

Given all the attention it's been getting, the depth and substance of the findings really did bear out the performance. What matters isn't that no holes appear—it's whether you can keep finding them by mechanism—so we're going to build a way to run this on a regular basis.

Fable 5 Is Back, So We Ran an OWASP Top 10 Security Assessment on PentaTrail's Code

Method

How we split the ten lenses

Don't stop at "found it"—verify against real data

The result — by Critical / High / Medium / Low

Wrapping up

Visualize your attack surface with PentaTrail/CTEM

Related Articles

Two-Track Review with Claude and Codex — Surfacing Blind Spots with a Different AI Lineage

The Reality of Long-Running AI Development — The Context Wall and Running a Memory

How We Keep AI-Written Code Secure — Defense in Depth for the Age of Generated Code