Lab Guide: Unit 7 — Production Security Engineering

CSEC 602 | Weeks 9–12 | Semester 2

Transform your agentic systems from proof-of-concept to production-grade: supply chain security, NHI governance, observability, and containerized deployment with CI/CD security gates. Your deployed Managed Agent becomes the system you govern, instrument, and ship.

Claude as your governance reviewer: Use Claude to review your governance artifacts before filing them. Paste your AIUC-1 assessment and ask: "What risk did I underestimate?" Production regrets are expensive.

Unit 7 Lab Progress0 / 31 steps complete

Week 9 — AI Supply Chain Security

WEEK 9Lab: SBOM Generation, Dependency Scanning & Model Provenance

Lab Goal: Generate a Software Bill of Materials for your SOC agent system, scan all dependencies for CVEs, verify model integrity, and implement supply chain security controls at each stage of your development pipeline.

Why this matters today: The LiteLLM supply chain attack (March 24, 2026) compromised a package with 97M monthly downloads. It was discovered by an MCP plugin in Cursor — the exact technology you've been building. Steps 2–3 in this lab (pip-audit, safety) are what would have caught the advisory after publication. Step 6 (hash pinning) is what would have blocked installation before — even during the zero-day window when no advisory existed yet. See the LiteLLM supply chain case study.

Lab Exercise: SBOM Generation & Supply Chain Audit

Step 1: Generate SBOM in CycloneDX format

pip install cyclonedx-bom
mkdir -p ~/noctua-labs/unit7/week9 && cd ~/noctua-labs/unit7/week9

# Generate SBOM for your SOC agent system (cyclonedx-bom v4+ syntax)
cyclonedx-py requirements ~/noctua-labs/unit4/soc-agent-team/requirements.txt > sbom.json

# Inspect: how many components? Any with known vulnerabilities?
cat sbom.json | python3 -m json.tool | grep -c '"type"'
# Note: if CLI flags differ on your version, run: cyclonedx-py --help

Step 2: Scan dependencies for CVEs with pip-audit

pip install pip-audit
pip-audit --requirement ~/noctua-labs/unit4/soc-agent-team/requirements.txt \
  --format json -o audit-results.json

# Review findings
cat audit-results.json | python3 -m json.tool | grep -E '"id"|"description"|"fix"'

Step 3: Scan for transitive dependency vulnerabilities

Install safety scanner for a second opinion on transitive deps. Compare results with pip-audit. Document: how many direct dependencies vs. transitive? Any high/critical CVEs in transitive deps that pip-audit missed?

pip install 'safety<3.0'
safety check --json > safety-results.json

# Compare with pip-audit results
# Document: discrepancies between scanners, high/critical findings
# Note: safety v3+ changed CLI and requires account auth — pin to <3.0 for this lab

Step 4: Implement model configuration pinning

Create model-config.yaml pinning exact model versions used in production. For Claude: pin the specific model ID (e.g., claude-sonnet-4-6). For locally-run models (if any): record the SHA256 hash of the model weights file. This prevents silent version upgrades from changing agent behavior.

# model-config.yaml
models:
  primary:
    provider: anthropic
    model_id: claude-sonnet-4-6   # Pinned version
    max_tokens: 4096
    temperature: 0.0
  fallback:
    provider: anthropic
    model_id: claude-haiku-4-5-20251001
    max_tokens: 1024
    temperature: 0.0

# For local models, also record:
# sha256: "abc123..." (hash of weights file)

Step 5: Write the Supply Chain Audit Report

Document: SBOM summary (component count, license types), CVE findings (by severity), transitive risk analysis, model provenance controls, gap analysis against SLSA Level 1 requirements, and 3 prioritized remediation actions.

Step 6: Implement hash pinning — the LiteLLM lesson

Version pinning failed in the LiteLLM attack (the attacker published under the pinned version number). Hash pinning verifies package content. Generate a hash-pinned requirements file and test that a modified hash causes installation to fail — this is the control that would have blocked the attack even during the zero-day window.

# Install pip-tools if not already installed
pip install pip-tools

# Copy your Unit 4 SOC agent dependencies as the source requirements.in
cp ~/noctua-labs/unit4/soc-agent-team/requirements.txt requirements.in

# Generate hash-pinned requirements
pip-compile --generate-hashes requirements.in -o requirements-pinned.txt

# Verify the output — each package should have sha256 hashes
head -30 requirements-pinned.txt

# Test the protection: corrupt one hash (change a single character)
# Then try to install — it MUST fail
pip install --require-hashes -r requirements-pinned.txt

# Now restore the correct hash and verify clean install succeeds
# Document: What hash mismatch error did you see?
# This is exactly what would have caught LiteLLM 1.82.7 at install time

Week 10 — Non-Human Identity Governance

WEEK 10Lab: Agent Identity Registry & Least Privilege Design

Lab Goal: Design and implement a Non-Human Identity (NHI) governance system for your multi-agent SOC. Each agent receives a unique identity with defined permissions, credential rotation, and audit trail. Apply the principle of least privilege to agent tool access.

Knowledge Check — Week 10

1. What is the estimated ratio of non-human to human identities in enterprise environments?

A) 1:1 — equal numbers B) 5:1 — five NHIs per human C) 50:1 — fifty NHIs per human identity, creating a massive unmonitored attack surface D) 1000:1 — modern automation is fully autonomous

2. What does 'least privilege' mean specifically for AI agent tool access?

A) Using the smallest available LLM for each task B) Each agent receives only the specific tools and permissions needed for its defined role — no more, reducing blast radius if the agent is compromised C) Limiting the number of tokens each agent can use per request D) Requiring human approval for every agent action

Lab Exercise: NHI Registry for Your SOC Agent System

Step 1: Create an Agent Identity Manifest

Create agent-identities.yaml defining each agent in your system: unique agent_id, role, allowed_tools (explicit list), denied_tools (explicit deny list), max_token_budget, credential_rotation_days. This is your NHI registry.

# agent-identities.yaml
agents:
  - agent_id: soc-orchestrator-001
    role: incident_orchestrator
    allowed_tools: [route_alert, invoke_specialist, generate_report]
    denied_tools: [external_api_calls, file_write, database_write]
    max_tokens_per_session: 50000
    credential_rotation_days: 30

  - agent_id: soc-recon-001
    role: threat_intelligence_analyst
    allowed_tools: [query_cve, ip_reputation, hash_lookup]
    denied_tools: [generate_report, write_database, send_alerts]
    max_tokens_per_session: 20000
    credential_rotation_days: 30

agent-identities.yaml IS an Allowance Profile

The manifest you just wrote — per-agent tool permissions, credential scope, token budget — has a formal name: an Allowance Profile. An Allowance Profile defines what an agent is permitted to do before it is deployed, creating a verifiable boundary between what was authorized and what the agent attempts at runtime.

The pattern: specify allowed tools, allowed credential scopes, and cost limits per agent identity in a manifest that exists before any code runs. The enforcement layer reads the manifest at runtime and rejects tool calls outside the defined scope.

PeaRL (Policy-enforced Agent Runtime Layer) is a governance system built around Allowance Profile enforcement. No PeaRL installation is required for this lab; the concept is what matters. The agent-identities.yaml you wrote is a valid Allowance Profile by design.

Cedar Policy — What It Looks Like

Cedar is Amazon's authorization policy language. It reads like English and enforces like a database constraint.

// Allow the analyst agent to use the query tool
permit (
  principal == Agent::"analyst-agent",
  action == Action::"query",
  resource == Tool::"cve-lookup"
);

// Allow the reporter agent to read CVE data, but not write or delete
permit (
  principal == Agent::"reporter-agent",
  action in [Action::"read", Action::"list"],
  resource in Resource::"cve-database"
);

// No agent can access the production database directly — ever
// Note: forbid cannot be overridden by any permit
forbid (
  principal,
  action,
  resource == Database::"production-db"
);

Cedar's forbid is unconditional — no permit can override it. This is by design: your most critical restrictions (no direct database access, no PII export) go in forbid blocks so they can never be accidentally granted. Use permit for what agents CAN do; use forbid for what they MUST NEVER do.

What: Cedar is a policy language that defines what agents are allowed to do — which tools they can call, which resources they can access, under what conditions.

Why: Hard-coding permissions in agent code creates security debt. Cedar externalizes authorization so you can audit, rotate, and restrict permissions without touching agent code.

How to start: Define one Cedar policy per agent. Start with a default-deny posture (no permits) and add explicit permits for every capability the agent needs. If a capability isn't listed, it doesn't exist.

Step 2: Implement permission enforcement in the MCP server

Modify your MCP server to accept an agent_id header with each tool call. Before executing any tool, check the agent-identities.yaml: is this agent_id allowed to call this tool? If not, return 403 Forbidden with a structured error. Log all permission checks.

import yaml, logging
from functools import wraps

# Load identity manifest at startup
with open("agent-identities.yaml") as f:
    IDENTITIES = {a["agent_id"]: a for a in yaml.safe_load(f)["agents"]}

def require_permission(tool_name):
    """Decorator: check agent_id header before any tool executes."""
    def decorator(fn):
        @wraps(fn)
        def wrapper(request, *args, **kwargs):
            agent_id = request.headers.get("X-Agent-Id")
            identity = IDENTITIES.get(agent_id)
            if not identity:
                logging.warning(f"DENIED unknown agent_id={agent_id} tool={tool_name}")
                return {"error": "403 Forbidden", "reason": "unknown agent"}
            if tool_name in identity.get("denied_tools", []):
                logging.warning(f"DENIED agent_id={agent_id} tool={tool_name} (explicitly denied)")
                return {"error": "403 Forbidden", "reason": "tool explicitly denied"}
            if tool_name not in identity.get("allowed_tools", []):
                logging.warning(f"DENIED agent_id={agent_id} tool={tool_name} (not in allowlist)")
                return {"error": "403 Forbidden", "reason": "tool not in allowlist"}
            logging.info(f"ALLOWED agent_id={agent_id} tool={tool_name}")
            return fn(request, *args, **kwargs)
        return wrapper
    return decorator

# Apply to each tool handler:
# @require_permission("query_cve")
# def handle_query_cve(request): ...

Step 3: Implement short-lived JWT tokens for agent authentication

Implement a token service that issues 1-hour JWT tokens to agents at startup. Tokens include: agent_id, allowed_tools, issued_at, expires_at. Agents must present a valid token with each MCP call. Revoke a token and verify the agent is blocked.

pip install pyjwt
# Claude Code prompt:
# "Build a JWT token service for my MCP NHI system:
# - issue_token(agent_id) → signed JWT with 1hr expiry + allowed tools
# - validate_token(token) → verify signature + check expiry
# - revoke_token(agent_id) → add to revocation list (in-memory for now)
# - Use HS256 signing with a secret from environment variable JWT_SECRET"
claude

Workload Identity — From JWTs to SPIFFE

The JWT token service you just built establishes a principle: each agent gets its own cryptographic identity, issued fresh for each session, scoped to its allowed actions. That is workload identity.

SPIFFE (Secure Production Identity Framework for Everyone) and its implementation SPIRE automate exactly this at infrastructure scale. Instead of your application code generating JWTs, the SPIFFE runtime issues short-lived X.509 certificates or JWTs to each workload automatically, rotating them without application changes.

The connection: you built the workload identity principle by hand. In production, your security team runs SPIRE so your application code doesn't have to manage credential issuance. The design decision — short-lived, per-agent, cryptographically verifiable identity — is the same either way.

Step 4: Test least privilege enforcement

Attempt to call each tool with the wrong agent identity (e.g., the recon agent calling generate_report). Verify: all unauthorized calls are blocked and logged. Verify: a revoked token is immediately blocked across all tool calls.

Step 5: Build the NHI Audit Trail

Create a daily NHI audit log that records: which agents were active, which tools each called, any permission violations (blocked calls), token issuance and expiry events. This log is your compliance evidence for AIUC-1 E. Accountability domain (audit trails and decision logging).

Step 6: Apply your NHI governance to your deployed Managed Agent

Your agent-identities.yaml defines what each agent is allowed to do. Your Managed Agent (deployed in Unit 4 Week 15) is already a non-human identity — it has a persistent agent ID, authenticates via API key, and has a defined tool scope. Map your governance model to what's already deployed: verify the API key is stored as a GitHub Secret (not hardcoded), confirm the tool scope in the agent YAML matches your allowed_tools list, and document the agent ID as the NHI entry in your registry.

# Your Managed Agent IS an NHI. Map your agent-identities.yaml to it:

# 1. Verify ANTHROPIC_API_KEY is in GitHub Secrets, not hardcoded
#    GitHub repo → Settings → Secrets and variables → Actions
#    Secret name: ANTHROPIC_API_KEY
#    Never appears in logs or code. Rotation = update the secret value.

# 2. Load your deployed agent IDs
import json
with open("managed_agent_ids.json") as f:
    ids = json.load(f)
print(f"NHI Identity: agent_id={ids['agent_id']}")
# This agent_id IS the NHI identifier — persistent, named, auditable

# 3. Verify tool scope in the agent YAML matches allowed_tools from Step 1
#    tools/mass/claude-managed-agents/01-orchestrator.yaml:
#      tools: [{"type": "agent_toolset_20260401"}]
#    For your custom agent, list only the tools this role needs:
#    tools:
#      - type: bash          # only if this agent needs shell access
#      - type: web_search    # only if this agent needs web access
#    Omitting a tool type = denying it — this is your least-privilege enforcement

# 4. Add your Managed Agent to the NHI registry:
nhi_entry = {
    "agent_id": ids["agent_id"],
    "role": "soc-analyst",
    "credential": "ANTHROPIC_API_KEY (GitHub Secret)",
    "credential_rotation_days": 90,
    "tool_scope": ["bash", "web_search"],  # match your agent YAML
    "session_isolation": True,  # each session is fresh — no cross-investigation state
    "audit_trail": "session events stream (agent.tool_use events)",
    "revocation": "delete agent via API or rotate ANTHROPIC_API_KEY"
}
print(json.dumps(nhi_entry, indent=2))

Step 7: Verify NHI controls via the session events stream

Run your deployed agent against a test alert and capture the session events stream. Verify that agent.tool_use events match only the tools in your allowed_tools list from Step 1. This stream is your live NHI audit trail — every tool invocation is recorded with agent identity, tool name, and timestamp. Compare it against your Step 5 audit log format.

import anthropic, json
from datetime import datetime

client = anthropic.Anthropic()

with open("managed_agent_ids.json") as f:
    ids = json.load(f)

# Run a test session and capture the NHI audit trail
session = client.beta.sessions.create(
    agent=ids["agent_id"],
    environment_id=ids["environment_id"],
    title=f"NHI Audit Test — {datetime.utcnow().isoformat()}",
)

audit_records = []
test_alert = "Analyze: suspicious outbound traffic to 185.220.101.x on port 4444"

with client.beta.sessions.events.stream(session.id) as stream:
    client.beta.sessions.events.send(session.id, events=[{
        "type": "user.message",
        "content": [{"type": "text", "text": test_alert}]
    }])
    for event in stream:
        if event.type == "agent.tool_use":
            # This IS the NHI audit trail — agent.tool_use = what the agent DID
            record = {
                "timestamp": datetime.utcnow().isoformat(),
                "agent_id": ids["agent_id"],
                "tool_name": event.name,
                "session_id": session.id,
            }
            audit_records.append(record)
            print(f"[TOOL] {event.name}")
        elif event.type == "agent.message":
            for block in event.content:
                if hasattr(block, "text"):
                    print(block.text, end="", flush=True)
        elif event.type == "session.status_idle":
            break

# Verify: every tool_name in audit_records is in your allowed_tools list
allowed = {"bash", "web_search", "text_editor"}  # from your agent YAML
violations = [r for r in audit_records if r["tool_name"] not in allowed]
print(f"\n\nAudit: {len(audit_records)} tool calls, {len(violations)} violations")
if violations:
    print("VIOLATION — tool called outside allowed scope:", violations)
else:
    print("PASS — all tool calls within defined scope")

# Save audit log
with open("nhi-audit-session.json", "w") as f:
    json.dump(audit_records, f, indent=2)

Week 11 — Observability & Cost Management

WEEK 11Lab: OpenTelemetry Instrumentation for Agent Systems

Lab Goal: Instrument your SOC agent system with OpenTelemetry to capture distributed traces, metrics, and logs. Build a cost tracking dashboard. Configure anomaly detection alerts for unusual agent behavior.

Knowledge Check — Week 11

1. What are the three pillars of observability?

A) Monitoring, Alerting, and Dashboarding B) Logs (discrete events), Metrics (numerical measurements over time), and Traces (distributed request flows) C) CPU, Memory, and Network utilization D) Availability, Performance, and Security

2. What makes OpenTelemetry valuable for production AI systems?

A) It provides AI-specific security scanning B) It reduces token consumption by caching responses C) It automatically detects and blocks prompt injection D) Vendor-neutral instrumentation APIs for all three pillars — allows switching backend platforms without code changes, preventing observability vendor lock-in

Lab Exercise: OTel Instrumentation for Your SOC Agent

Step 1: Install OpenTelemetry SDK

pip install opentelemetry-sdk opentelemetry-exporter-otlp \
  opentelemetry-instrumentation-requests
mkdir -p ~/noctua-labs/unit7/week11 && cd ~/noctua-labs/unit7/week11

Step 2: Add distributed tracing to the orchestrator

Instrument each agent invocation as an OTel span. The trace should show: root span (full incident investigation) → child spans (recon_agent, analysis_agent, reporting_agent) → grandchild spans (each tool call within each agent). Each span records: agent_id, model_id, token_count, duration_ms, error (if any).

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, BatchSpanProcessor

# Setup tracer
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("soc-agent-system")

# Instrument agent calls:
with tracer.start_as_current_span("recon_agent") as span:
    span.set_attribute("agent.id", "soc-recon-001")
    span.set_attribute("model.id", "claude-sonnet-4-6")
    result = recon_agent.run(alert)
    span.set_attribute("tokens.input", result.usage.input_tokens)
    span.set_attribute("tokens.output", result.usage.output_tokens)

Distributed Tracing Backends

The ConsoleSpanExporter you just configured writes OTel spans to stdout — useful for local development, not useful in production where spans need to be stored, queried, and alerted on.

In production, the same spans route to a tracing backend. All of the following accept OTLP (OpenTelemetry Protocol) natively:

Grafana Tempo — open source, pairs with Grafana dashboards
Jaeger — open source, strong for distributed tracing visualization
Honeycomb — managed service, strong for high-cardinality queries
AWS CloudWatch — managed service, native to AWS deployments

Swapping from ConsoleSpanExporter to any backend is a one-line config change — the exporter endpoint. The instrumentation code is identical. The lab uses ConsoleSpanExporter for zero-dependency local development; the architecture is production-compatible by design.

Step 3: Add cost tracking metrics

Implement an OTel Counter for token usage per agent per incident. Calculate cost: input tokens * $3/1M + output tokens * $15/1M (Claude Sonnet pricing). Track: total cost per incident, cost per agent role, running daily total. Export to console or Prometheus endpoint.

Step 4: Configure a cost anomaly alert

Set a cost alert: if a single incident investigation costs more than $0.50, log a WARNING with the trace ID. If it costs more than $2.00, log an ERROR and flag for human review. Test by artificially injecting a large incident that causes excessive token usage.

Step 5: Run 10 incidents and build the observability dashboard

Process your 10-incident test suite with instrumentation active. Export all spans. Analyze: average cost per incident, P95 latency by agent, error rate by agent. Use the console exporter output to build a simple HTML dashboard showing these metrics.

Week 12 — Deploying Agentic Security Systems

WEEK 12Lab: Dockerfile, Container Scanning & CI/CD Security Gates

Lab Goal: Package your SOC agent system as a hardened container image. Build a GitHub Actions CI/CD pipeline with security gates (secrets detection, SAST, container scanning, SBOM generation). Implement a multi-stage promotion pipeline: dev → staging → production. Your container and your Managed Agent are both production artifacts — the pipeline governs both.

Pre-capstone checkpoint — do this now: Your Unit 8 capstone requires a live deployed agent. Verify your Managed Agent is reachable and your ANTHROPIC_API_KEY is stored as a GitHub Secret (not hardcoded). Run the test below. If it fails, check that your agent IDs are correct and the API key is valid.

Knowledge Check — Week 12

1. What security benefit does a multi-stage Dockerfile provide?

A) Faster container startup time B) Automatic SBOM generation C) Build tools and dependencies are in the build stage only — the production image contains only the minimum needed to run, reducing attack surface significantly D) Enables hot-reload in production without restart

2. Why is pre-commit secrets detection the most critical CI/CD security gate?

A) It's the cheapest gate to implement B) Once a secret is committed to git history, it must be treated as compromised — pre-commit detection prevents this irreversible mistake C) It catches the most vulnerabilities compared to other gates D) It runs fastest, speeding up the CI/CD pipeline

Lab Exercise: Containerize & Build CI/CD Pipeline

Step 1: Write a hardened multi-stage Dockerfile

# Dockerfile (multi-stage, non-root user, health check)
FROM python:3.11-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

FROM python:3.11-slim AS runtime
# Non-root user
RUN useradd -r -s /bin/false soc-agent
WORKDIR /app
# Copy only runtime deps from builder
COPY --from=builder /install /usr/local
COPY --chown=soc-agent:soc-agent . .
USER soc-agent
# Health check
HEALTHCHECK --interval=30s --timeout=10s CMD python3 -c "import anthropic; print('OK')"
EXPOSE 8080
CMD ["python3", "orchestrator.py"]

Step 2: Build and scan the container image with Trivy

docker build -t soc-agent:latest .
# Install Trivy
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
# Scan for CVEs
trivy image --severity HIGH,CRITICAL soc-agent:latest --format json > trivy-report.json
# Count critical/high CVEs
cat trivy-report.json | python3 -m json.tool | grep '"Severity"' | sort | uniq -c

Step 3: Build the GitHub Actions CI/CD pipeline

Create .github/workflows/security-pipeline.yml with stages: (1) pre-commit: secrets detection with detect-secrets, (2) PR review: SAST with Bandit, (3) Build: container image build + Trivy scan + SBOM generation, (4) Deploy-dev: auto-deploy on merge to main, (5) Deploy-prod: requires manual approval.

# .github/workflows/security-pipeline.yml
name: SOC Agent Security Pipeline
on: [push, pull_request]
jobs:
  secrets-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Secrets detection
        uses: trufflesecurity/trufflehog@main
        with:
          path: ./
          extra_args: --only-verified

  sast:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install bandit && bandit -r . -f json -o bandit-report.json

  container-build:
    needs: [secrets-scan, sast]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build image
        run: docker build -t soc-agent:${{ github.sha }} .
      - name: Trivy scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: soc-agent:${{ github.sha }}
          exit-code: 1   # Fail on HIGH/CRITICAL
          severity: HIGH,CRITICAL

Step 4: Test the pipeline gates

Test that each security gate blocks correctly: (1) Commit a test API key — verify secrets scan blocks it, (2) Introduce a SQL injection vulnerability — verify Bandit catches it, (3) Pin a dependency with a known HIGH CVE — verify Trivy blocks the build.

Step 5: Write the Deployment Runbook

Document: How to deploy a new version, how to rollback, how to respond to a Trivy finding, how to handle a secrets detection failure, who has approval authority for production deployments. This runbook is required for your Unit 8 capstone.

Step 6: Verify your deployed agent is reachable — pre-capstone checkpoint

Confirm your Managed Agent is deployed and the ANTHROPIC_API_KEY resolves correctly from your environment. This is required before Week 13. If it fails, verify managed_agent_ids.json exists and the key in your environment is valid. Do not wait.

import anthropic, json, os

# Verify ANTHROPIC_API_KEY is set (from env, not hardcoded)
assert os.environ.get("ANTHROPIC_API_KEY"), "ANTHROPIC_API_KEY not set — add to GitHub Secrets"

client = anthropic.Anthropic()

with open("managed_agent_ids.json") as f:
    ids = json.load(f)

# Quick smoke test — one session, one message
session = client.beta.sessions.create(
    agent=ids["agent_id"],
    environment_id=ids["environment_id"],
    title="Pre-capstone checkpoint",
)

with client.beta.sessions.events.stream(session.id) as stream:
    client.beta.sessions.events.send(session.id, events=[{
        "type": "user.message",
        "content": [{"type": "text", "text": "Respond with 'Agent deployment confirmed.' only."}]
    }])
    for event in stream:
        if event.type == "agent.message":
            for block in event.content:
                if hasattr(block, "text"):
                    print("SUCCESS:", block.text)
        elif event.type == "session.status_idle":
            break

Step 7: Push your container to GitHub Container Registry

Your hardened container image is a production artifact alongside your Managed Agent. Push it to GitHub Container Registry — no cloud provider account required. Your CI/CD pipeline (Step 3) will automate this on every merge to main. The container can be pulled and run anywhere: local Docker, any cloud provider's container service, or as a sidecar alongside your Managed Agent sessions.

# Authenticate to GitHub Container Registry
echo $GITHUB_TOKEN | docker login ghcr.io -u YOUR_GITHUB_USERNAME --password-stdin

# Tag your image
docker tag soc-agent:latest ghcr.io/YOUR_GITHUB_USERNAME/soc-agent:latest

# Push
docker push ghcr.io/YOUR_GITHUB_USERNAME/soc-agent:latest

# Verify it's accessible
docker pull ghcr.io/YOUR_GITHUB_USERNAME/soc-agent:latest

# Run from the registry (same as local — key from environment, never in image)
docker run -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  ghcr.io/YOUR_GITHUB_USERNAME/soc-agent:latest

Two production artifacts — one governance standard

You now have two ways to run your security agent in production:

Claude Managed Agents — Anthropic hosts the loop and tool execution. Deploy once, run as sessions. Best for long-running investigations, file-heavy work, and teams that don't want to manage compute.
Container (GitHub Container Registry → any runtime) — you own the compute. Pull the image and run it on any cloud provider, on-prem server, or local Docker. Best for air-gapped environments, custom runtime requirements, or cost optimization at scale.

Both use the same system prompt and agent logic. Both pull ANTHROPIC_API_KEY from the environment — never hardcoded. The NHI governance model from Week 10 applies to both: one identity, one credential, one audit trail.

Step 8: Verify the production controls checklist

Complete the PeaRL Delegated Autonomous promotion gate checklist for your deployed agent. This is the governance evidence required for your Unit 8 capstone submission.

# PeaRL Delegated Autonomous promotion gate — verify each item:
# [ ] Agent has its own identity (unique agent_id, not shared with other agents)
# [ ] No hardcoded credentials (ANTHROPIC_API_KEY in GitHub Secrets / env var, not in code or image)
# [ ] Tool scope defined (agent YAML lists only the tools this role needs)
# [ ] Session isolation confirmed (each session starts fresh — no cross-investigation state)
# [ ] Output validation active (NeMo Guardrails or equivalent — applied in agent code)
# [ ] OTel instrumentation active (agent.tool_use events captured and logged)
# [ ] max_iterations / failure cap configured on the agent
# [ ] dependencies hash-pinned (requirements-pinned.txt present)
# [ ] SBOM generated (from Week 9)
# [ ] Tool calls logged with agent_id and timestamp (nhi-audit-session.json from Week 10)
# [ ] OWASP Agentic Top 10 risks assessed (or documented as accepted)

# Document each item with evidence: screenshot, config file, or CLI output
# This checklist + evidence = your capstone governance package

Pre-review gate: run two automated checks before any human governance review

The role review is a human judgment call — but only if the automated baseline is clean. Run both checks and fix all findings before the strategic, engineering, and SOC reviews begin.

Step 1 — Code quality: Zero CRITICAL findings required. CRITICAL findings in the role review count against your governance score.

/check-antipatterns ~/noctua-labs/unit7/soc-system/

Step 2 — Controls inventory: Produces evidence of what controls are implemented vs. implied. Bring this output to your role review — it's what the reviewer is asking about.

/harness-assess ~/noctua-labs/unit7/soc-system/

⭳ Download check-antipatterns.md ⭳ Download harness-assess.md

Release Governance: Role Review + Changelog

Before your Unit 7 capstone PR, apply the gstack role-based review pattern and produce the required governance artifacts. These are the practices that separate a working system from a production-ready one.

Strategic Review: does capability match threat model?

Write one paragraph from the "CEO lens": Does this system's scope match what the threat model actually requires? Is anything over-engineered for the stated problem? Is anything under-engineered that creates security risk? Would you approve shipping this?

Engineering Manager Review: architecture, coverage, rollback

Write one paragraph from the "EM lens": Is the architecture sound for production load? Is test coverage adequate for the security use case? Is the rollback plan documented? Are all dependencies pinned and scanned?

SOC Analyst Review: can a human understand agent actions?

Write one paragraph from the "analyst lens": Can an on-call SOC analyst read the logs and understand exactly what the agent did, why, and what action it expects them to take? Are escalation paths clear? Are false positive rates documented?

CHANGELOG.md: from Unit 4 prototype to Unit 7 production

Write a structured CHANGELOG covering: v0.1.0 (Sprint II prototype from Unit 4), and each major change made in Unit 7 as v0.x.0 entries. Every MAJOR behavioral change must include a note on its security impact. Format:

## v0.3.0 — NHI governance added / Security impact: all agent tool calls now require JWT; reduces blast radius if credentials are compromised.

Debug log: document one production issue using the 4-phase method

Pick a real bug or unexpected behavior you encountered during Unit 7. Write a short post-mortem using the systematic debug methodology: (1) Evidence collected, (2) Pattern identified, (3) Hypotheses tested (max 3, document outcome of each), (4) Fix implemented and regression test written. This goes in your capstone documentation.

Unit 7 Deliverables Summary

SBOM + Dependency Scan — CycloneDX SBOM and pip-audit results for your SOC system
NHI Registry — agent-identities.yaml with working JWT token enforcement
OTel Instrumentation — working traces and cost metrics with anomaly alerting
Hardened Dockerfile + GitHub Actions Pipeline — multi-stage build with all security gates active
Deployment Runbook — documented operational procedures for production deployment
CHANGELOG.md — versioned history from Unit 4 prototype to Unit 7 production with security impact annotations
Role Review Document — strategic, EM, and analyst review paragraphs for your capstone system
Debug Post-Mortem — one real issue documented using the 4-phase systematic debug methodology

Your CI/CD Pipeline Is Reusable — Make It a Template

Every AI security project in your organization needs secrets scanning, SAST, container scanning, and SBOM generation. Nobody should have to build this from scratch. Convert your GitHub Actions pipeline into a public repository template — one click to spin up a compliant DevSecOps pipeline for any new AI agent project.

Push it as a public GitHub template (Settings → Template repository), tag it devsecops, ai-security, github-actions. Good security infrastructure should be open. The practitioner at a startup without a security team deserves the same pipeline gates as an enterprise. Share yours.

Also: your NHI governance registry format (agent-identities.yaml), your OpenTelemetry cost alerting config, and your SBOM generation workflow are all worth extracting as standalone gists or templates. Use this prompt:

Extract the reusable components from my Unit 7 work and write a GitHub repository template README that helps someone adopt these security controls for their own AI agent project.

Unit 7 Complete

Your systems are now production-ready: supply chain verified, identities governed, observable, and deployable via secure CI/CD.

Next: Unit 8 Lab — Capstone Projects →

Lab Guide: Unit 7 — Production Security Engineering

Week 9 — AI Supply Chain Security

Knowledge Check — Week 9

Lab Exercise: SBOM Generation & Supply Chain Audit

Week 10 — Non-Human Identity Governance

Knowledge Check — Week 10

Lab Exercise: NHI Registry for Your SOC Agent System

Week 11 — Observability & Cost Management

Knowledge Check — Week 11

Lab Exercise: OTel Instrumentation for Your SOC Agent

Week 12 — Deploying Agentic Security Systems

Knowledge Check — Week 12

Lab Exercise: Containerize & Build CI/CD Pipeline

Release Governance: Role Review + Changelog

Unit 7 Deliverables Summary

Unit 7 Complete