We tested AuditAgents against a realistic treasury vault contract containing a subtle error-handling flaw involving unchecked Ether transfer failures.
Unchecked return values are a common source of silent failures in Solidity contracts. Functions such as send() return a boolean indicating whether a transfer succeeded. If this result is ignored, internal accounting may be updated even though funds were never transferred.
For this benchmark, we created a realistic treasury vault contract and submitted it to AuditAgents without any vulnerability hints, comments, or intentionally revealing naming conventions.
The following function contains the unchecked return value flaw:
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;
contract TreasuryVault {
mapping(address => uint256) public deposits;
function deposit() external payable {
deposits[msg.sender] += msg.value;
}
function claim(uint256 amount) external {
require(deposits[msg.sender] >= amount, "insufficient");
// State updated BEFORE transfer — accounting change is permanent
deposits[msg.sender] -= amount;
// ⚠ Return value ignored — silent failure possible
payable(msg.sender).send(amount);
}
receive() external payable {}
}
deposits[msg.sender] before calling send(). If send() fails (e.g., recipient is a contract that reverts), the balance is already reduced — the user permanently loses funds while receiving nothing.
The following issues were identified by AuditAgents without any hints, vulnerability annotations, or revealing contract names:
| Expected Detection | AuditAgents Result |
|---|---|
| Unchecked Return Value | ✓ PASS |
| Silent Transfer Failure | ✓ PASS |
| User Fund Loss Risk | ✓ PASS |
| Accounting Inconsistency | ✓ PASS |
| Remediation Guidance | ✓ PASS |
Unlike obvious access-control or reentrancy vulnerabilities, this benchmark contains no vulnerability hints in function names, contract names, or comments. Successful detection requires reasoning about execution flow, transfer semantics, and state consistency — rather than pattern matching alone. A detection here demonstrates genuine semantic understanding of Solidity's transfer primitives.
AuditAgents successfully identified the unchecked return value vulnerability, explained how silent transfer failures occur, described the resulting accounting inconsistency, and provided accurate remediation guidance — all without any hints or prior guidance.