Day 5 — Bash + Permission Engine
Day 4 made file edits safe. Today the Agent grows hands to run commands — but every command goes through a permission gate first: dangerous commands are blocked outright, normal ones print a preview and wait for y, file edits can be put in acceptEdits or plan mode. The model only sees a bash tool; every policy hides behind PermissionRequest → decide_permission.
Day 4 made file edits safe. Every write went through a harness that printed a diff, asked y/N, and snapshotted the old content — a full read-write protection chain. But the Agent still can't run commands. The model wants to run pytest to see tests, or git status to check the repo — no tool for that yet. Day 3's grep was a pure-Python implementation, not a shell.
Today the Agent gets two new capabilities. First, the bash tool: the model can invoke any shell command, but every execution flows through a permission engine — dangerous commands get blocked, normal ones get a preview + y/N confirmation, and a --permission-mode flag swaps between three modes (default / acceptEdits / plan). Second, ask_user_question as a first-class tool: the model isn't limited to "doing" — it can also "ask you", popping a numbered menu and waiting for your choice.
About 300 lines of code, four new files (bash_runner.py / permissions.py / bg_manager.py / prompt_ui.py), four touched (tools.py / agent.py / cli.py / diff_ui.py). No new dependencies — subprocess, threading, re all ship with Python.
Day 5 Main Visual — Bash + Permission Engine
Start with this Agent Logic Map. It is not a frame-by-frame replay; it isolates the four Day 5 lines that are easiest to mix up: how a tool_use becomes a PermissionRequest, how bash passes dangerous-command blocking and confirmation, how --permission-mode changes allow / ask / deny, and why ask_user_question plus background bash are still harness branches.
Today we keep editing Day 4's agent-code project. packages/day-04-safe-edit/ is the Day 4 reference snapshot (the Day 5 final state lives in demo/agent_code/).
Setup — today's starting point
Day 4 gave the Agent two write tools — file_write for whole-file overwrite and file_edit for string replace. The harness rendered diffs, waited for y, and snapshotted old content. But the interceptor in agent.py was a single if call.name in ("file_write", "file_edit"): block — edit-only, and every new tool would need another branch.
Today we wrap that interceptor with four pieces:
- A new
bash_runner.py:subprocess.runruns shell commands, cwd pinned to the project, env restricted to a minimal set, timeout + output truncation. - A new
permissions.py:PermissionRequestdescribes one tool call,PermissionDecision(allow | ask | deny)is the harness's verdict,decide_permission()is the single entry point. - A new
prompt_ui.py: pullsrender_diff+confirm_editout of Day 4'sdiff_ui.py, addsconfirm_command+confirm_tool_use+prompt_single_choice.diff_ui.pybecomes a re-export so old imports keep working. - A new
bg_manager.py:subprocess.Popen+ a daemon thread, returns abackground_idimmediately, output streams to.bg/<id>.out/.err.
Four-step rollout: v1 gets bash running, v2 lifts confirmation into a permission engine, v3 adds ask_user_question, the final step wires up background execution.
v1 — bash sync execution + command preview + y/N confirmation
Start with the most direct case: the model wants to run a command, the harness prints the command for you, you hit y, it runs. Same shape as Day 4 v1's file_write — intercept before tools.run, preview, confirm. But bash has one key twist: there's no resolve_in_cwd to lean on (commands are shell strings, not file paths). Instead the safety boundary is subprocess.run(cwd=ctx.cwd), pinning the process to the project directory.
1.1 Create agent_code/bash_runner.py
Bash execution gets its own module. The tool function stays thin; the actual run logic lives in the runner — so both the permission engine and the tool function can reuse it. Full 41 lines in the v1 DiffCard, the two key blocks:
# Minimal env for bash — don't leak host secrets into the child process
_MINIMAL_ENV = {
"PATH": os.environ.get("PATH", "/usr/bin:/bin"),
"HOME": os.environ.get("HOME", ""),
"USER": os.environ.get("USER", ""),
"SHELL": os.environ.get("SHELL", "/bin/bash"),
}
def run_sync(command: str, cwd: Path, timeout: int = 30) -> str:
"""Run a shell command synchronously. cwd is pinned; the process is killed on timeout."""
try:
proc = subprocess.run(
command,
shell=True,
cwd=str(cwd),
env=_MINIMAL_ENV,
capture_output=True,
timeout=timeout,
)
except subprocess.TimeoutExpired:
return f"error: command timed out after {timeout}s"
...
truncated = truncate_output(output.strip(), max_chars=12000)
if proc.returncode != 0:
return f"exit code {proc.returncode}\n{truncated}"
return truncated if truncated else "(no output)"shell=True lets the model write commands the way you'd type them in a terminal. That convenience has a cost — any shell-injection that bypasses the permission engine executes directly. The teaching-grade safety net lands in v2: dangerous-command regex blocking + user confirmation.
1.2 Create agent_code/prompt_ui.py
Day 4's diff_ui.py carried render_diff and confirm_edit. Today both functions move into prompt_ui.py, joined by three new interactions (confirm_command for v1, confirm_tool_use for v2, prompt_single_choice for v3). diff_ui.py becomes a re-export shim so existing from .diff_ui import ... lines keep working:
from __future__ import annotations
# Day 5: render_diff and confirm_edit moved to prompt_ui.py.
# Keep this re-export so the old import path keeps working.
from .prompt_ui import confirm_edit, render_diff
__all__ = ["confirm_edit", "render_diff"]confirm_command is short:
def confirm_command(command: str) -> bool:
"""Ask the user to confirm a bash command. Defaults to no."""
return typer.confirm(f"Run this command?", default=False)1.3 Edit agent_code/tools.py — add bash, git_status, git_diff
Three new tools. bash is the core — it calls bash_runner.run_sync. git_status and git_diff are read-only conveniences; their bodies are also subprocess, but in v2 the permission engine will put them in _READONLY_TOOLS so they don't trigger an ask popup every time.
Top of file, add this import:
from .bash_runner import run_sync as _bash_run_syncAfter file_edit, before class ToolRegistry, insert the three tool functions. bash itself is thin:
def bash(args: dict[str, Any], ctx: ToolContext) -> str:
"""Execute a shell command. Validation and confirmation live in agent.py's interceptor."""
command = args.get("command", "")
if not command:
return "error: missing required argument 'command'"
timeout = int(args.get("timeout", 30))
background = bool(args.get("background", False))
# v1 only handles sync; v4 wires up the background=True branch
if background:
return "error: background mode not implemented yet (coming in v4)"
return _bash_run_sync(command, ctx.cwd, timeout=timeout)Full three tool functions + three registry.register(...) calls in the v1 DiffCard. Notice bash's input_schema only exposes command / timeout / background — the dangerous-command list, user confirmation, and mode switching all stay invisible to the model.
1.4 Edit agent_code/agent.py — add a bash branch in the interceptor
Day 4's interceptor only knew about file_write and file_edit. v1 adds an elif after that if-block: recognize bash, print the command preview, ask for confirmation. Add from .prompt_ui import confirm_command at the top, then insert before result = tools.run(call, ctx):
# bash interceptor: print the command preview, then ask for confirmation
elif call.name == "bash":
command = call.arguments.get("command", "")
timeout = call.arguments.get("timeout", 30)
console.print(f"\n[bold yellow]Command:[/bold yellow] {command}")
console.print(f"[dim]timeout: {timeout}s cwd: {ctx.cwd}[/dim]")
if not confirm_command(command):
result = ToolResult(call.id, "error: command rejected by user", is_error=True)
emit(f"observation: {result.content}")
tool_result_blocks.append(...)
continueOnly bash is intercepted. git_status and git_diff fall straight through to tools.run with no confirmation prompt.
1.5 Three verifications
(a) Read-only git tools — straight to results, no confirmation:
$ uv run agent-code "Use git_status to inspect the repo, then git_diff to see what changed"
Agent Code
cwd: /your/project
provider: anthropic model: deepseek-v4-flash
tool_call: git_status {}
observation: On branch main
Changes not staged for commit:
modified: ...
tool_call: git_diff {}
observation: diff --git ...
final: There are uncommitted changes in the repo; I've reviewed git status and git diff.Don't expect your output to match byte-for-byte. If you're working in this repo while reading the tutorial, git status probably won't be clean. The two things to look for: tool_call: git_status / tool_call: git_diff show up, and no Run this command? prompt appears between them.
(b) bash execution — command preview + confirmation:
$ uv run agent-code "Use bash to run pytest --version"
...
tool_call: bash {'command': 'pytest --version', 'timeout': 30, 'background': False}
Command: pytest --version
timeout: 30s cwd: /your/project
Run this command? [y/N]: y
observation: pytest 8.x.x
final: pytest 8.x.x is installed in this environment.You see the yellow Command: preview and the Run this command? prompt; only after y does the command run. Press n and you get observation: error: command rejected by user.
(c) bash error — the model recovers from the exit code:
$ uv run agent-code "Use bash to run python -c 'print(1/0)'"
...
observation: exit code 1
Traceback (most recent call last):
File "<string>", line 1, in <module>
ZeroDivisionError: division by zero
final: The command failed — the code tried to divide by zero and Python raised ZeroDivisionError.The model reads exit code 1 + stderr and figures out what went wrong on its own.
v1's bash works, but the confirmation logic still lives in agent.py's if/elif chain. Every new tool adds another branch, and dangerous bash still only triggers y/N — there's no pre-confirmation blocker. The next step lifts confirmation into a real permission engine.
v2 — Permission engine: three modes + dangerous-command blocking
v1 has three problems:
- Confirmation logic is scattered.
file_write/file_editdiff+confirm sits in the if block;bashpreview+confirm sits in the elif. Addingask_user_questionneeds yet another branch. - Confirmation alone isn't enough for dangerous bash.
echo helloandrm -rf /both flow through the same y/N. Dangerous commands should be rejected before the prompt. - No mode switching. Sometimes you want the Agent to edit freely without confirming (
acceptEdits); sometimes you want it strictly read-only (plan).
The permission engine solves all three. Every tool call passes through decide_permission(), which returns allow | ask | deny. The Agent Loop dispatches by decision: allow → run directly, ask → render preview + confirm, deny → return an error observation.
2.1 Create agent_code/permissions.py
The core is one function: decide_permission(request). PermissionRequest describes "what this tool call wants to do", PermissionDecision is the harness verdict. Full 93 lines in the v2 DiffCard, the key dataclass + decision skeleton:
@dataclass
class PermissionRequest:
"""A single tool-call permission request. The tool describes intent; the harness decides."""
tool_name: str
args: dict
mode: str
cwd: Path
@dataclass
class PermissionDecision:
"""Permission verdict. behavior is one of allow / ask / deny."""
behavior: str # "allow" | "ask" | "deny"
message: str | None = None
def decide_permission(request: PermissionRequest) -> PermissionDecision:
"""Permission entry point: decide allow / ask / deny by tool name, args, and mode."""
tool_name = request.tool_name
args = request.args
mode = request.mode
if tool_name in _ASK_TOOLS:
return PermissionDecision("ask")
# plan mode: only read-only tools are allowed. Write tools always deny.
if mode == "plan":
if tool_name in _READONLY_TOOLS:
return PermissionDecision("allow")
return PermissionDecision("deny", f"plan mode: {tool_name} is not allowed. ...")
# Read-only tools are always allowed
if tool_name in _READONLY_TOOLS:
return PermissionDecision("allow")
# bash dangerous-command check — applies in every mode
if tool_name == "bash":
command = args.get("command", "")
danger_reason = _is_dangerous(command)
if danger_reason:
return PermissionDecision("deny", f"Dangerous command blocked: {danger_reason}")
# acceptEdits: file edits skip the confirmation UI, but safety checks still run in agent.py
if mode == "acceptEdits":
if tool_name in ("file_write", "file_edit"):
return PermissionDecision("allow")
# Default: writes and bash need confirmation
return PermissionDecision("ask")The dangerous-command regex is the teaching-grade minimum: rm -rf / sudo / chmod -R / curl ... | sh / git push --force / git push -f / git push / git reset --hard. _READONLY_TOOLS whitelists read_file / list_files / glob / grep / project_tree / git_status / git_diff / system_date / echo so they bypass confirmation. _ASK_TOOLS holds ask_user_question / web_fetch / web_search — read-only in a sense, but the user should still know the Agent is pausing to ask a human or hitting an external resource.
Why put the read-only whitelist in the permission engine and not the tool registry? Because "is this read-only?" is a property of the permission decision, not the tool itself. The same bash tool can run read-only or write commands; the engine asks "what does this tool do in this call?", not "what kind of tool is this?".
2.2 Edit agent_code/agent.py — unified gate via decide_permission
Replace v1's if/elif chain with a unified permission gate. Update the top of the file to import every UI function from prompt_ui directly, and add from .permissions import PermissionRequest, decide_permission.
Add permission_mode: str = "default" to run_agent's signature. Then replace the entire tool-call loop's if/elif block with the three-branch decision structure. The core change: every tool call wraps into a PermissionRequest, hits decide_permission, and dispatches into deny / ask / allow:
# Permission engine entry: wrap every tool call into a PermissionRequest
request = PermissionRequest(
tool_name=call.name,
args=call.arguments,
mode=permission_mode,
cwd=ctx.cwd,
)
decision = decide_permission(request)
edit_preview: tuple[str, str, str] | None = None
if call.name in ("file_write", "file_edit") and decision.behavior != "deny":
# acceptEdits skips the confirm UI but not Day 4's safety checks
# path resolve + read-before-edit + mtime check + apply_single_replace all run here
...
edit_preview = (path_str, old_content, new_content)
if decision.behavior == "deny":
# deny path: return an error observation without any UI
result = ToolResult(call.id, f"error: {decision.message}", is_error=True)
...
continue
elif decision.behavior == "ask":
if call.name in ("file_write", "file_edit"):
# ask only owns diff + confirm; safety checks already passed
...
elif call.name == "bash":
# command preview + confirm
...
elif call.name in ("web_fetch", "web_search"):
# confirm_tool_use lets the user okay an external resource
...
elif call.name == "ask_user_question":
pass # v3 wires this up
# allow path + ask passed: run the tool
result = tools.run(call, ctx)Three decision paths: deny returns an error observation immediately (no UI, no execution); ask dispatches to a preview UI by tool type, and only after confirmation does it fall through to tools.run; allow skips the UI and goes straight to tools.run. Full ~150-line replacement in the v2 DiffCard.
Note: acceptEdits only skips confirmation, not safety checks. Before either the ask or allow branch, file_write / file_edit still run Day 4's ensure_read_before_edit + check_mtime_conflict + apply_single_replace pre-compute.
2.3 Edit agent_code/cli.py — add --permission-mode
Add to main_command's signature:
permission_mode: str = typer.Option("default", "--permission-mode", help="Permission mode: default, acceptEdits, plan"),Add permission_mode: str to run_once's signature, pass it through in both run_once(...) call sites (the direct run and the REPL loop), and forward it to run_agent(..., permission_mode=permission_mode).
2.4 Five verifications
(a) default mode — file edits still need confirmation:
After Day 4 the hello.txt content is usually hola from agent. Have the model read the file first and switch hola back to hello:
$ uv run agent-code "Read hello.txt, then change hola to hello inside it"
...
tool_call: read_file {'path': 'hello.txt'}
observation: hola from agent
tool_call: file_edit {...}
Diff for hello.txt:
...
Apply this edit to hello.txt? [y/N]:Day 4 diff + confirm behavior unchanged.
(b) acceptEdits — file edits skip confirmation:
$ uv run agent-code --permission-mode acceptEdits "Read hello.txt, then change hello back to hola inside it"
...
tool_call: file_edit {...}
observation: Edited hello.txt: replaced 5 chars with 4 charsNo Diff for and no Apply this edit? — in acceptEdits, decide_permission returns allow for file_write / file_edit, but read-before-edit, mtime conflict, and string-replace checks still run before execution. bash still needs confirmation:
$ uv run agent-code --permission-mode acceptEdits "Use bash to run echo hello"
...
Command: echo hello
Run this command? [y/N]:(c) plan mode — write tools get rejected:
$ uv run agent-code --permission-mode plan "Create a hello.txt file"
...
tool_call: file_write {'file_path': 'hello.txt', 'content': 'hello'}
observation: error: plan mode: file_write is not allowed. Only read-only tools can run in plan mode.
final: We're in plan mode; write operations aren't allowed.Read-only tools are unaffected — read_file lives in _READONLY_TOOLS:
$ uv run agent-code --permission-mode plan "Read hello.txt"
...
tool_call: read_file {'path': 'hello.txt'}
observation: hola from agent(d) Dangerous git command gets blocked:
$ uv run agent-code "Use bash to run git push --force origin main"
...
tool_call: bash {'command': 'git push --force origin main', 'timeout': 30, 'background': False}
observation: error: Dangerous command blocked: git push --force overwrites remote historyNo Command preview, no confirmation — decide_permission returned PermissionDecision("deny") at the _is_dangerous check.
(e) sudo rm -rf / may be refused by the model first:
A real model sometimes refuses extreme commands at the assistant layer and answers with a final like "I can't run this." That doesn't mean the permission engine isn't working; the model just pre-empted it. For a stable check on the engine itself, prefer git push --force above.
The permission engine's three modes + dangerous-command blocking are now wired up. agent.py's interceptor isn't a tangle of if/elif anymore — it's one unified decision.behavior three-branch dispatch.
v3 — ask_user_question: let the model ask you
v2's permission engine handles two interactions: allow (run directly) and ask (preview + y/N). But sometimes the model needs more than "should this run?" — it needs you to pick between options, like "fix the test or fix the code?".
ask_user_question does that. It's a first-class tool, not a slash command. The model calls it, the harness blocks the Agent Loop, the terminal pops a numbered menu, and the choice flows back as a tool_result.
3.1 Edit agent_code/prompt_ui.py — add a single-choice menu
Append to the file:
def prompt_single_choice(question: str, labels: list[str]) -> str | None:
"""Show a numbered menu, return the chosen label, or None."""
import rich
console = rich.console.Console()
console.print(f"\n[bold yellow]? {question}[/bold yellow]")
for i, label in enumerate(labels, 1):
console.print(f" {i}. {label}")
console.print(f" 0. [dim]Skip / Other[/dim]")
try:
choice = typer.prompt("Choice", default="0")
idx = int(choice)
if 1 <= idx <= len(labels):
return labels[idx - 1]
return None
except (ValueError, TypeError):
return None3.2 Edit agent_code/tools.py — add ask_user_question
The tool function doesn't read stdin — the actual interaction happens inside agent.py's interceptor. The function only validates args; the normal path never calls it:
def _ask_user_question(args: dict[str, Any], ctx: ToolContext) -> str:
"""Handled by agent.py's interceptor — this function never reads stdin itself."""
prompt = args.get("prompt", "")
options = args.get("options", [])
if not prompt:
return "error: missing required argument 'prompt'"
if not options or not isinstance(options, list):
return "error: options must be a non-empty list"
return "error: ask_user_question must be handled by the harness, not executed directly"Append an ask_user_question registration after the bash registration in default_tools() (input_schema has a prompt string + an options array). Full registration in the v3 DiffCard.
3.3 Edit agent_code/agent.py — handle ask_user_question
v2 left a placeholder elif call.name == "ask_user_question": pass in the interceptor. Add prompt_single_choice to the top import, replace pass with:
elif call.name == "ask_user_question":
question = call.arguments.get("prompt", "")
options = call.arguments.get("options", [])
if not isinstance(options, list):
options = []
labels = [str(o) for o in options]
selected = prompt_single_choice(question, labels)
if selected is None:
result = ToolResult(call.id, "User skipped the question.", is_error=False)
else:
result = ToolResult(call.id, f'User selected: "{selected}"', is_error=False)
emit(f"observation: {result.content}")
tool_result_blocks.append({
"type": "tool_result",
"tool_use_id": result.tool_call_id,
"content": result.content,
"is_error": result.is_error,
})
continueNote ask_user_question uses continue — it builds its own ToolResult, appends to tool_result_blocks, and never falls through to tools.run. That's different from file_write / file_edit / bash: those tools still need tools.run after confirmation; ask_user_question's whole job is showing a menu and grabbing the answer.
3.4 Verification
$ uv run agent-code "Should I fix the test first or the code first? Ask me with ask_user_question, three options: fix the test, fix the code, not sure"
...
tool_call: ask_user_question {'prompt': 'Where should we start fixing this bug?', 'options': ['fix the test', 'fix the code', 'not sure']}
? Where should we start fixing this bug?
1. fix the test
2. fix the code
3. not sure
0. Skip / Other
Choice: 2
observation: User selected: "fix the code"
final: You chose "fix the code". Let's start by changing the implementation...The model picks up the choice and keeps reasoning. Picking 0 returns User skipped the question. — is_error=False, so the model knows "the user saw the question but didn't pick a direction", not an error.
v3 stops here on purpose. This numbered menu is the smallest teaching version: it proves the full chain works — the model asks a structured question, the harness pauses the loop, the user picks an option, that choice becomes a tool_result, and the model continues.
The arrow-key selector you see in Claude Code is not a different model capability; it is a richer prompt_ui layer. The terminal switches into raw input, listens for up/down arrows to move the highlighted option, returns the selected label on Enter, then the Agent Loop wraps it as a tool_result. In the Claude Code source, this is also a first-class tool named AskUserQuestion: its permission check always asks for user interaction, and the UI supports question groups, option descriptions, previews, and multi-select. We skip those pieces today so the harness boundary stays clear. To upgrade later, mostly replace typer.prompt("Choice") inside prompt_single_choice() with an arrow-key menu; the agent.py interception flow can stay the same.
Final — bash(background=True): background execution
v1's bash is synchronous: the model calls bash("sleep 30") and the Agent Loop blocks for 30 seconds. Compilation, long test runs — that doesn't fly.
Background execution fixes this: the model sets background=True, the harness spawns the process and returns a structured observation immediately (containing background_id, output file paths, pid), the Agent Loop never blocks. Afterwards the model can bash("cat .bg/<id>.out") to check output or bash("kill <pid>") to stop it.
4.1 Create agent_code/bg_manager.py
A daemon thread waits on a subprocess.Popen; stdout and stderr stream to .bg/<id>.out and .bg/<id>.err. The launcher returns a dict immediately without waiting for the process to finish:
def start_background(command: str, cwd: Path) -> dict:
"""Start a shell command in the background; stream stdout/stderr to .bg/<id>.out/.err.
Returns immediately with structured info — doesn't wait for the process to finish."""
bg_id = f"bg-{uuid.uuid4().hex[:8]}"
bg_dir = cwd / ".bg"
bg_dir.mkdir(parents=True, exist_ok=True)
out_path = bg_dir / f"{bg_id}.out"
err_path = bg_dir / f"{bg_id}.err"
out_f = open(str(out_path), "w")
err_f = open(str(err_path), "w")
proc = subprocess.Popen(
command, shell=True, cwd=str(cwd), env=_MINIMAL_ENV,
stdout=out_f, stderr=err_f,
)
def _wait_and_close() -> None:
"""Wait for the child to finish, close fds. Runs in a daemon thread."""
proc.wait()
out_f.close()
err_f.close()
t = threading.Thread(target=_wait_and_close, daemon=True)
t.start()
return {"background_id": bg_id, "output_file": ..., "stderr_file": ..., "pid": proc.pid, "message": ...}A daemon=True thread gets reaped when the main process exits, so it never blocks the CLI from exiting. Here is the full mental model: a real background-task lifecycle is not just "start a thread". It is create task → record task id → keep writing output → allow later reads → support cancellation → notify the model when the task finishes.
Claude Code follows that shape. Background bash is still the same Bash tool; the input carries run_in_background: true. The harness starts the subprocess, registers a local shell task, streams output into a task output file, and immediately returns backgroundTaskId plus the output path as the tool_result. When the task finishes, a notification can be sent back into model context. If the user stops it, the CLI uses its internal task-kill path instead of asking the model to guess a pid.
Day 5 does not implement that full manager. We keep the smallest useful chain: background=True starts a subprocess, stdout/stderr stream into .bg/<id>.out/.err, and the tool_result returns background_id, output paths, and pid. Later the model can bash("cat .bg/<id>.out") to inspect output or bash("kill <pid>") to stop it. In the current 14-day path, bg_status / bg_read / bg_cancel and completion notifications stay out of the main track; they fit better as extras, or as a follow-up AI-assisted implementation using this lifecycle map.
4.2 Edit agent_code/tools.py — wire bash's background branch
In bash, replace:
if background:
return "error: background mode not implemented yet (coming in v4)"with:
if background:
# Background execution: spawn the child, return structured info, don't block the Agent Loop
from .bg_manager import start_background
result = start_background(command, ctx.cwd)
return (
f"Command running in background with ID: {result['background_id']}.\n"
f"Output is being written to: {result['output_file']}\n"
f"Stderr is being written to: {result['stderr_file']}\n"
f"PID: {result['pid']}\n\n"
f"{result['message']}"
)Background mode skips timeout (the child owns its own lifetime) and skips output truncation (output goes to a file, not the tool_result).
4.3 agent.py stays put
Background bash and sync bash share the same decide_permission flow — to the permission engine, bash is bash, sync or background. The dangerous-command check applies to both, and the user-confirmation prompt fires for background bash too. background=True is just a different execution mode, not a permission exemption.
4.4 Two verifications
(a) Background execution + read output later:
$ uv run agent-code "Use bash(background=True) to run sleep 5 && echo 'done from background'. After you get the background_id, use sync bash to run sleep 6 && cat the corresponding .bg output file."
...
tool_call: bash {'command': 'sleep 5 && echo done from background', 'timeout': 30, 'background': True}
Command: sleep 5 && echo done from background
Run this command? [y/N]: y
observation: Command running in background with ID: bg-a1b2c3d4.
Output is being written to: .bg/bg-a1b2c3d4.out
...
tool_call: bash {'command': 'sleep 6 && cat .bg/bg-a1b2c3d4.out', 'timeout': 10, 'background': False}
Command: sleep 6 && cat .bg/bg-a1b2c3d4.out
Run this command? [y/N]: y
observation: done from background
final: Background task done; output was "done from background".We deliberately wait 6 seconds because cat-ing right after starting a background task might hit an empty file. If you're driving this with printf, line up at least two y answers — one for the background command, one for the sleep 6 && cat ....
(b) Killing a background process:
$ uv run agent-code "Use bash(background=True) to run sleep 300, then kill it"
...
observation: Command running in background with ID: bg-e5f6g7h8.
PID: 12345
...
tool_call: bash {'command': 'kill 12345', 'timeout': 10, 'background': False}The model reads the pid from the first tool_result and builds kill <pid>.
Terminal replay
Below is the terminal animation for agent-code "Use bash to run pytest --version" — the same 7-frame story as the Day 5 main visual: tool_call: bash → yellow Command: preview + dim timeout/cwd line → Run this command? [y/N]: y → observation: pytest 8.4.2 → final:
What you have now
bashtool: the model can run any shell command. cwd is pinned to the project, env is restricted to_MINIMAL_ENV, timeout defaults to 30s, output is truncated at 12000 chars.git_status/git_diffare thin read-only wrappers — allowed by default, no popup noise.- Permission engine
permissions.py:PermissionRequestdescribes one tool call,PermissionDecision(allow | ask | deny)is the verdict. Three modes —default(writes ask),acceptEdits(file edits skip confirm but preserve Day 4 safety),plan(writes always deny). - Dangerous-command blocking: regex coverage for
rm -rf/sudo/chmod -R/curl | sh/git push/git push --force/git reset --hard. A match goes straight to deny with no UI. ask_user_questiontool: the model can stop and ask the user a structured single-choice question. The terminal pops a numbered menu; the answer flows back as a tool_result and drives the next round of reasoning.- Background bash:
bash(background=True)spawns the child and returnsbackground_id+ output file paths immediately. The Agent Loop doesn't block. The model can laterbash("cat .bg/<id>.out")orbash("kill <pid>"). - Unified
agent.pyinterceptor: v1's if/elif chain → v2's three-branch dispatch ondecide_permission. Adding a new tool no longer changes the interceptor shape — just give it a decision rule in the permission engine.
FAQ
bash returns command timed out after 30s
The command ran longer than 30 seconds. Either ask the model to pass a larger timeout value, or switch to background=True for long-running work. Putting an upper bound on timeout is a good exercise.
Is shell=True safe?
The teaching version uses shell=True so commands look natural (the same way you'd type them). Safety leans on two layers: the permission engine's dangerous-command regex (v2) + user confirmation (v1). Production deployments still need container sandboxing, a read-only filesystem, and network isolation. The claim isn't "shell=True is safe"; it's "shell=True + harness gating is acceptable for teaching".
In plan mode the model keeps calling file_edit and getting denied
The model probably hasn't noticed it's in plan mode. The system prompt should announce plan mode explicitly (Day 8 adds that). For now, if the model gets stuck in a "call write tool → get denied → try again" loop, exit with /exit and rerun with --permission-mode default.
What happens when the user picks 0 in ask_user_question?
You get User skipped the question. with is_error=False. The model knows the user saw the question and chose not to pick — it'll usually decide on its own or rephrase the question, treating it as feedback rather than an error.
When does the background bash output file appear?
.bg/<id>.out is created immediately when start_background runs (the open("w") on Popen time). But the contents only fill in once the child process actually writes. If the model cats right after getting background_id, it may see an empty file — give it a few seconds. The tutorial verification uses bash("sleep 6 && cat .bg/<id>.out").
Exercises
- Exercise 1: Optional range for timeouts. Add
_MAX_TIMEOUT = 120topermissions.pyand checktimeout > _MAX_TIMEOUTin the bash branch — over the limit, return ask with an explanation. Add a--max-timeoutCLI flag. - Exercise 2: Remember user choices. Add a
_remembered: dict[str, str]topermissions.py; after the user picks "always allow" for a tool, subsequent calls return allow directly. Add "yes to all" toconfirm_commandandconfirm_edit. - Exercise 3:
ask_user_questionmulti-select. Add amulti_selectparameter toprompt_single_choiceso input like1,3returns two labels; mirror that with amulti_selectfield in_ask_user_question's schema. - Exercise 4: Background task listing. Add
list_background() -> list[dict]tobg_manager.pythat scans.bg/and returns the running/done status for every task. Add abg_listtool intools.pyso the model can list its own background tasks. - Exercise 5: Switch
--permission-modeinside the REPL. Add a/permissionsslash command so typing/permissions acceptEditsin REPL swaps the current session's permission mode without restarting.
Reflection
A few open questions. Sit with a one-line answer before reading on — these are the harness-boundary calls that interviewers push hardest.
-
Why does
decide_permissionget called fromagent.py's interceptor instead of inside each tool function intools.py? (Hint: same design principle as Day 4 Reflection 2. If a tool function decided "should I show a confirm prompt?" itself, what breaks when you swap the CLI for a web frontend?) -
git_statusandgit_diffare standalone tools rather than the model just callingbash("git status"). What does that design choice buy you, and what does it cost? (Hint: think about how_READONLY_TOOLSworks and what the model is likely to guess wrong when it tries to decide "is this command read-only?") -
planmode only enforces a hard constraint ("all write tools deny"), no soft constraint ("draft a plan with todo_write first"). Without the soft half, is it still Plan Mode? (Hint: Day 8 fills the other half in. First imagine the typical behavior with only the hard constraint — what does the model do in plan mode?) -
Background bash output lives in
.bg/<id>.outand the model queries it withbash("cat .bg/<id>.out"). If you skipped the file and let the harness inject the output as a notification into the next round of messages, how would the Agent's behavior change? (Hint: notification injection means the model doesn't need to poll, but when do you inject, and which step's model do you inject into?)
Next day
The Agent can now execute commands: it runs commands, gates with permissions, can go background, and can ask you questions. Combined with Day 3's file + web tools and Day 4's safe edits, the Agent has the basic loop for small coding tasks — read code → change code → run tests → look at results → iterate.
But every time agent-code exits, the conversation is gone. Tomorrow we make sessions persist and resumable — JSONL session files on disk, --resume to bring back the last conversation, AGENT.md project memory injected into the system prompt, and a cross-session long-term memory system (memdir). The Agent finally remembers you and your project instead of starting fresh every time.