6-Year-Old Zero-Day in One Hour — Here’s How RidgeZero Did It

by | Apr 21, 2026 | Blog

An undisclosed vulnerability hiding in plain sight inside a widely-used open-source email server was discovered and patched automatically by RidgeZero — our Agentic Zero-Day Reasoning System.

A few days ago, RidgeZero found a previously-unknown vulnerability in GreenMail — a popular open-source email server used by thousands of Java developers for integration testing. The bug crashes the server every time an administrator deletes a user account: a routine operation that should be completely safe.

This critical zero-day bug has been silently sitting in the codebase since April 2020, introduced by a well-intentioned fix that accidentally created a new problem while solving an old one.

RidgeZero found it in under one hour with root cause analysis, an autopatch, and verification of the patch that actually works. The total cost was under $10.

For context: last month, the Mythos AI system made headlines for finding a vulnerability in OpenBSD at a reported cost of $20,000. That was a notable achievement.

What the Bug Does

A crash on every user deletion

GreenMail is an email server that developers embed in their applications for testing. When you delete a user, the server needs to clean up that user’s mailboxes — their Inbox, Sent folder, Drafts, and any other folders they’ve created.

The cleanup code deletes folders in the wrong order. It tries to remove the top-level folder first, before removing the folders nested inside it. But the system enforces a safety rule — you cannot delete a folder that still has children inside it.

Think of it this way: trying to demolish a building by removing the ground floor first. The safety system refuses — “there are still floors above” — and the entire operation fails. Not some of the time. Not under unusual conditions. Every time, for every user, guaranteed.

If you’re running GreenMail in an environment where users can be created and deleted — which is its primary purpose — any user deletion will crash the processing thread. An attacker who knows this can trigger the crash repeatedly, effectively shutting down the service.

Root Cause

How a bugfix created a new bug

In April 2020, a developer filed Issue #312: when you delete a user, GreenMail only removes the Inbox and leaves all other folders behind as orphans. A fix was merged the same day — the code was changed to find all of a user’s folders and delete each one.

The fix solved the orphaned-folder problem. But the new code loops through folders in the order the system returns them, which happens to be top-down — parents before children. That ordering violates the safety check that prevents deleting a folder with children still inside it.

Fixing one bug quietly introduced another, and nobody noticed for six years. This is not unusual. Studies consistently show that a significant percentage of bugs are introduced by previous fixes.

What’s unusual is having a system that can find these hidden regressions automatically.

Discovery

How RidgeZero found it

The harness: orchestrating competitive AI agents

The real innovation behind RidgeZero isn’t any single AI model — it’s the orchestration harness that coordinates multiple heterogeneous agents, manages their resources, and synthesizes their results into verified findings.

Traditional security scanning works by matching known patterns. Those tools are valuable, but they only find the types of bugs they’ve been taught to recognize. They wouldn’t flag a folder-ordering logic error. A single AI agent might find a bug — or it might hallucinate, crash, get stuck, or explore the wrong code paths. RidgeZero treats vulnerability discovery as an ensemble problem: deploy multiple independent agents with different models, strategies, and strengths, then let the harness collect and verify whatever any of them find.

What the harness did for the GreenMail target

Target provisioning. RidgeZero automatically built GreenMail from source inside an isolated Docker container, configured the build environment, and prepared instrumented binaries — all without human input. The harness handles the entire build pipeline, including injecting instrumentation for crash detection.

Agent deployment. The harness allocated 16 CPU cores and 60 GB of memory across three competing AI agents, each running in fully isolated containers so they couldn’t interfere with each other.

Fault tolerance. Two of the three agents produced nothing useful. One crashed. One never started. In a single-agent system that’s a 67% failure rate. In RidgeZero’s ensemble model, it’s a successful run — because success is the union of all agent outputs. You need one.

Result collection and verification. When the Gemini-powered agent triggered a crash, the harness captured the 481-byte proof-of-vulnerability that deterministically crashes the server, and fed it into the verification pipeline automatically.

Why the harness matters more than the model

The Mythos finding in OpenBSD is impressive, but the reported $20,000 cost points to a fundamental scalability problem. If each vulnerability costs five figures to discover, autonomous security analysis remains a research curiosity. RidgeZero’s harness solves this in three ways:

  • Resource efficiency — per-agent compute and token budgets keep costs controlled
  • Model diversity — cheaper models run in parallel rather than betting everything on one expensive one
  • Pipeline automation — the full discover → analyze → patch → verify cycle runs without manual intervention

Remediation

From discovery to verified fix

Analysis. The platform traced the crash through four layers of code — from the user deletion call, through internal cleanup logic, down to the safety check that threw the error. Root cause identified: parent folders are deleted before their children.

Patch generation. The platform wrote a fix: before deleting, sort folders by depth (deepest first), so children are always removed before their parents. The fix was injected directly into the build pipeline, since GreenMail’s source is pulled fresh during each build.

Verification. The exact input that caused the original crash was fed to the patched build. No crash. The server handled the user deletion cleanly and continued running. Fix confirmed.

This entire cycle — discover, analyze, patch, verify — completed without human intervention.