Evidence Bundle Standard

Evidence Bundle 1.0 — making transparency a concrete, auditable record

For every governed action, EDENA produces an evidence bundle: a structured record of what the AI proposed, how it was classified, who decided, and on what grounds. Evidence travels with the claim.

Status · Published Version 1.0 Issued by the NAIO Institute · June 2026

Abstract

"Transparency" and "explainability" are only meaningful when they leave a record. The Evidence Bundle Standard turns those aspirations into a concrete artifact produced for every governed action: a tamper-evident bundle that captures the decision, the classification, the named human and their judgment, the grounding of the underlying claim, and the model that produced it. A refusal is recorded with the same rigor as an approval. Evidence travels with the claim.

§1

Scope & purpose

This standard defines the evidence bundle: the auditable record EDENA produces whenever it governs an action. It applies to every governance decision made under the Agentic Systems Standard and the Action-Gating Standard — every classification, every outcome, every named-human authorization, and every refusal.

The purpose is to convert "transparency" into something an auditor, a clinician, a safety reviewer, or a regulator can actually inspect. A polished AI output is not trustworthy unless it is traceable, and an oversight process is not meaningful unless it leaves evidence that it occurred. The evidence bundle is that evidence. MUST, MUST NOT, SHOULD, and MAY are used as defined in §2.

§2

Terms & normative language

The key words MUST, MUST NOT, SHOULD, and MAY are requirement levels: MUST denotes an absolute requirement for conformance; MUST NOT an absolute prohibition; SHOULD a recommended practice that requires documented justification to omit; MAY an optional practice.

Evidence bundle — the structured, tamper-evident record produced for a single governed action, containing the fields enumerated in §3.
Governed action — any action that passed through the EDENA gate and received one of the five outcomes: ALLOW, REQUIRE_HUMAN, CONSTRAIN, THROTTLE, or DENY.
Provenance — the traceable origin of the content an action rests on: source documents, retrieval results, and the chain by which they reached the model.
Refusal record — an evidence bundle for a DENY outcome, logged with the same rigor as an approval.
Tamper-evident — recorded such that any later alteration of the bundle is detectable.

§3

Required fields

An evidence bundle MUST contain, at minimum, the following fields. Absence of a required field is a conformance failure, not a formatting detail.

Decision id — a unique identifier for the governance decision.
Timestamp — the time of the decision, recorded in UTC.
Agent id & registered owner — the acting agent and its contactable human owner.
Action description & assigned tier — what the action was and the action-risk tier it received.
Classification signals — the inputs to the gate: who asked, what data, what system, reversibility, and potential harm.
Governance outcome — one of ALLOW / REQUIRE_HUMAN / CONSTRAIN / THROTTLE / DENY.
Named human & their decision — where a human was required, the identified, role-appropriate person and the judgment they made.
Rationale — why the outcome was reached.
Source grounding & provenance — the traceable basis for the underlying claim.
Model & version — the model and version that produced the output.
Uncertainty — the confidence or uncertainty signal associated with the output.
Missing / contradictory data flags — explicit flags for gaps and conflicts in the evidence.
Data scope & PHI flag — the data the action touched and whether protected health information was involved.
Override record — whether a human overrode the system, in which direction, and on what basis.

An evidence bundle SHOULD be machine-readable. The following illustrates the shape of a conformant bundle for a single gated clinical action:

{ "decision_id": "edena-9f2a17c4", "timestamp_utc": "2026-06-15T14:32:07Z", "agent": { "id": "handoff-summarizer", "owner": "charge-nurse-3west" }, "action": { "description": "Transmit shift-change handoff summary", "tier": "Yellow" }, "classification_signals": { "requester": "RN", "data": "PHI", "system": "EHR", "reversibility": "low", "potential_harm": "moderate" }, "outcome": "REQUIRE_HUMAN", "named_human": { "role": "RN", "decision": "validated_with_edits" }, "rationale": "Summary transmitted to receiving clinician; nurse validation required before send.", "grounding": { "sources": ["vitals/24h", "med-admin-record", "nursing-notes"], "provenance": "verified" }, "model": { "name": "clinical-summarizer", "version": "3.1.0" }, "uncertainty": 0.18, "flags": { "missing_data": ["last-pain-score"], "contradictory_data": [] }, "data_scope": { "records": 1, "phi": true }, "override": { "occurred": false } }

§4

Refusal records

A DENY MUST be logged with the same rigor as an approval. A refusal is a governance event, not a failure — and an unrecorded refusal is indistinguishable from a system that simply did nothing.

A refusal record MUST capture why the action was refused, the safer path offered in its place, and the human it was routed to.
Refusal records MUST carry every field required of an approval in §3, including grounding, uncertainty, and flagged gaps.
Refusals MUST be queryable in the same telemetry as approvals (§6), so that refusal rates are visible as a measure of governance health rather than hidden as errors.

§5

Immutability & retention

Evidence is only evidence if it cannot be quietly changed. Bundles MUST be tamper-evident — recorded so that any later alteration is detectable — and MUST be retained for the period required by applicable policy, regulation, and legal hold.

Bundles MUST NOT be edited in place; corrections MUST be appended as new records that reference the original.
Where a bundle contains protected health information, that information MUST be handled in accordance with the HIPAA Security Rule — access-controlled, auditable, and protected in storage and transmission.
Retention periods MUST be declared, and disposal MUST itself be logged.

§6

Monitoring & telemetry

Bundles are not only an after-the-fact audit trail; in aggregate they are the live signal of whether governance is working. Conformant systems MUST feed evidence bundles into governance telemetry that supports real-time monitoring.

Telemetry MUST expose, at minimum, override rates, drift, incidents, and evidence quality — the completeness and grounding of the bundles themselves.
These measures MUST be reviewable by the steward and the accountable governance owner, and SHOULD trigger alerts when a measure crosses a defined threshold.
This monitoring capability operationalizes the Measure function of the NIST AI Risk Management Framework: governance that is observed, quantified, and acted upon rather than asserted.

§7

Mapping to external frameworks

The evidence bundle is the artifact that satisfies record-keeping, transparency, and measurement obligations across the major frameworks at once.

External requirement	Evidence Bundle clause
ONC HTI-1 — DSI source attributes (data, performance, risk context)	§3 source grounding, provenance, model & version
EU AI Act Art. 12 — record-keeping & automatic logging over the lifecycle	§3 required fields, §5 immutability & retention
EU AI Act Art. 15 — accuracy, robustness, fallback to human	§3 uncertainty & flags, §4 refusal records
Joint Commission / CHAI — quality monitoring & voluntary safety-event reporting	§4 refusal records, §6 telemetry
NIST AI RMF — Measure (track, quantify, monitor)	§6 monitoring & telemetry

Why this matters

Regulators ask for logging (EU AI Act Art. 12), source transparency (ONC HTI-1), quality monitoring and safety-event reporting (Joint Commission/CHAI), and measurement (NIST Measure). These are usually treated as four separate compliance burdens. The evidence bundle answers all of them with a single artifact — produced automatically, for every governed action, including the refusals — so that "we have oversight" becomes a claim an auditor can verify.

Sources

← All standards Next: Human Oversight Standard

Apply the Evidence Bundle

Design the record before you scale the agent.

Define your required fields, your refusal logging, your retention policy, and the telemetry that turns bundles into live governance. We'll help you stand up evidence your auditors can verify.

Human Oversight Standard Start the adoption path