Fable 5 Is Back, So We Ran an OWASP Top 10 Security Assessment on PentaTrail's Code
Table of Contents
When a strong model for verification lands in your lap, point it at your own code first
As we wrote in Superhuman AI Needs Source Code, Anthropic's top-tier models that read source code to find vulnerabilities (Fable 5 / Mythos) had been suspended under export controls. That Fable 5 is available again.
We ran an OWASP Top 10 self-audit against PentaTrail's codebase. This post is a record of exactly that.
Method
We split the ten OWASP Top 10 lenses across six agents, ran them in parallel, and had them flag findings by Critical / High / Medium / Low. Asking a single agent to "look at everything" is weaker than narrowing the lens and fixing the role—precision goes up and oversights go down.
- Static review: not black-box probing from the outside, but reading the latest source code directly (UI, APIs, database definitions, scanners), with the recent fixes already in.
- Parallel, background execution: the six ran concurrently; results were collected as each finished and reconciled.
- Severity classification: each finding was sorted into Critical / High / Medium / Low.
Each owner got the same rules: every finding must cite a real location (file and line); don't write from imagination; don't fix one instance of a pattern and stop—sweep them all. The six don't see each other's conclusions. Looking at the same code from independent angles—rather than making one inspector smarter—is what holds up against the quiet oversights.
How we split the ten lenses
The OWASP Top 10 distills the weaknesses most common in web apps into ten shapes.
- Agent 1: Access Control (A01) + Authentication (A07) — can you see someone else's data, or impersonate them?
- Agent 2: Cryptography (A02) + Integrity (A08) — misplaced secrets, tampering, supply-chain trust
- Agent 3: Injection (A03) — into SQL, the page, or a command
- Agent 4: Insecure Design (A04) + Logging & Monitoring (A09) — logic holes, can't notice / can't trace
- Agent 5: Misconfiguration (A05) + Outdated Components (A06) — loose headers/permissions, known flaws in dependencies
- Agent 6: SSRF (A10) — tricking the server into reaching inside
Don't stop at "found it"—verify against real data
This is the crux. When an AI says "this is dangerous," that's only a hypothesis. Count self-reported "confirmed" at face value and it usually inflates.
The key is to look at the state as it is right now, not the change history (the diffs): the permission settings, the function bodies, whether a key is truly revoked—settled by fact, not by guess. In fact, several findings that looked dangerous fell away once we queried the live database and found them already properly closed, or already revoked. Skip this step, and holes you thought you'd sealed remain.
The result — by Critical / High / Medium / Low
| OWASP Top 10 | Critical | High | Medium | Low | What it was (blurred) |
|---|---|---|---|---|---|
| A01 Broken Access Control | 0 | 0 | 0 | 0 | Tenant boundaries and permissions. No crossing even under live DB checks — zero findings |
| A02 Cryptographic Failures | 0 | 0 | 0 | 2 | Tightening how secrets are handled, operationally |
| A03 Injection | 0 | 0 | 0 | 1 | Wrapping the spots that handle external input one layer thicker, just in case |
| A04 Insecure Design | 0 | 0 | 0 | 3 | Resistance to unexpected use, and failing to the safe side |
| A05 Security Misconfiguration | 0 | 0 | 2 | 4 | Nudging header/permission defaults toward the stricter side |
| A06 Vulnerable/Outdated Components | 0 | 0 | 1 | 1 | Bringing a few dependencies up to newer versions |
| A07 Auth Failures | 0 | 0 | 0 | 0 | Passkey-centric auth is sound — zero findings |
| A08 Integrity Failures | 0 | 0 | 1 | 1 | Raising the confidence in the provenance of ingested data and executables |
| A09 Logging & Monitoring | 0 | 0 | 0 | 2 | Masking records and not leaving extra info in production |
| A10 SSRF | 0 | 0 | 1 | 2 | Enforcing that the server's outbound path is strictly pinned |
| Total | 0 | 0 | 5 | 16 |
Critical and High are zero in every lens. What remained were 5 Medium and 16 Low—all "one notch thicker" items for the layered defense, none of them exploitable fatal flaws. As for the findings raised, we fixed every one of them the same day.
Wrapping up
Given all the attention it's been getting, the depth and substance of the findings really did bear out the performance. What matters isn't that no holes appear—it's whether you can keep finding them by mechanism—so we're going to build a way to run this on a regular basis.
Visualize your attack surface with PentaTrail/CTEM
From discovery to vulnerability validation and remediation — all powered by the CTEM framework.
Get Started