Stage 01Day 6Day 6 of 14

Day 6 — Session + Memory

Day 5 gave the Agent shell commands and permissions. Today adds three layers of memory: Session JSONL, AGENT.md project rules, and memdir long-term memory, so the Agent can continue after exit.

Day 5 gave the Agent shell execution and a permission engine. But once agent-code exits, the conversation is gone: the next run has no memory of what you said, what files were touched, or what decisions were made.

Today we add three memory layers to the harness: session history is written to JSONL, the cwd's AGENT.md enters the system prompt, and the model can use memory_write / memory_recall to store durable facts under .agent/memory/. By the end, --resume <id> / -c can continue an old session, and a fresh session can still recover user and project facts from the MEMORY.md index.

About 400 new lines, 4 existing files changed, 7 new files. No new dependency: uuid, json, and datetime are all standard library modules.


Day 6 Logic Map — Three Memory Boundaries

Start with this Agent Logic Map. It does not replay every execution frame; it separates the three lines that are easy to blur together today: how Session.history restores JSONL into the messages list, how build_system_prompt(cwd) layers the core prompt, AGENT.md, and MEMORY.md, and how memory_write / memory_recall read and write memdir across sessions.

session JSONL continues conversations; system prompt carries rules and index hooks; memdir stores durable facts.
Loading Agent Logic Map…

Today we keep editing the Day 5 agent-code project. The Day 5/Day 6 packages/day-* snapshots have not been added yet, so this page's diffs use the code blocks in docs/day-06-session-memory.md as the source of truth.

Setup — Today's Starting Point

The end of Day 5 already has bash, the permission engine, ask_user_question, and background bash. Today we add four things around that Agent Loop:

  • New file session.py: create, load, and append JSONL sessions.
  • New file project_memory.py: read AGENT.md from cwd and wrap it in <project-rules>.
  • New file compact_basic.py: run deterministic compact after the messages list grows past 40 entries.
  • New package memdir/: long-term memory directories, index, writes, and recall.

We also edit model.py, agent.py, cli.py, permissions.py, and tools.py. The main path has four passes: v1 session, v2 AGENT.md, v3 compact, v4 memdir.


v1 — Session JSONL + --resume / -c

Start with the painful part: conversation history disappears after exit. v1 creates a Session class for the session id, JSONL path, history loading, and append-only writes. The Agent Loop does not need to know how files are arranged; it only uses session.history and session.append_messages().

1.1 Create agent_code/session.py

Session has three factory methods and two read/write entry points:

class Session:
    """一次会话。管理 session id、JSONL 落盘、读取历史消息。"""

    @classmethod
    def create(cls, cwd: Path) -> "Session":
        sid = uuid.uuid4().hex[:12]
        file_path = _sessions_dir(cwd) / f"{sid}.jsonl"
        file_path.touch()
        return cls(cwd=cwd, session_id=sid, file_path=file_path, resumed=False)

    @classmethod
    def load_latest(cls, cwd: Path) -> "Session | None":
        sessions_dir = _sessions_dir(cwd)
        jsonl_files = list(sessions_dir.glob("*.jsonl"))
        if not jsonl_files:
            return None
        latest = max(jsonl_files, key=lambda p: p.stat().st_mtime)
        return cls(cwd=cwd, session_id=latest.stem, file_path=latest, resumed=True)

    def append_messages(self, msgs: list[dict[str, Any]]) -> None:
        now = datetime.now(timezone.utc).isoformat()
        with open(self.file_path, "a", encoding="utf-8") as f:
            for msg in msgs:
                record = {"role": msg["role"], "content": msg["content"], "timestamp": now}
                f.write(json.dumps(record, ensure_ascii=False, separators=(",", ":")) + "\n")

JSONL is append-only: one message per row. If the process crashes, at worst you lose the current turn; you never rewrite the whole file.

1.2 Wire agent.py and cli.py

run_agent() gains a session parameter. If history exists, restore it first; otherwise cold start from the current prompt. Then append the new user prompt, assistant final, and each tool-result turn to JSONL:

if session and session.history:
    messages = list(session.history)
    messages.append({"role": "user", "content": prompt})
else:
    messages = [{"role": "user", "content": prompt}]

if session:
    session.append_messages([messages[-1]])

The CLI gets two entry points: --resume <id> restores a specific session, and --continue / -c restores the latest session for the current cwd. One-shot commands and REPL mode use the same Session.

1.3 Run It

$ uv run agent-code "用 echo 工具说 hello"
Agent Code
cwd: /your/project
provider: anthropic  model: deepseek-v4-flash
session: a1b2c3d4e5f6

tool_call: echo {'text': 'hello'}
observation: hello
final: 已经用 echo 工具说出了 "hello"。

Continue the latest session:

$ uv run agent-code -c "我们上次说了什么"
Agent Code
cwd: /your/project
provider: anthropic  model: deepseek-v4-flash
session: a1b2c3d4e5f6 (resumed)

final: 我们上次用 echo 工具说了 "hello"。

The session_id is random, so yours will differ. The deterministic check is that -c prints the same id and adds the (resumed) suffix.

loading…

v2 — AGENT.md Project Rules Injection

Session remembers one conversation, but project rules still need to be repeated. v2 reads AGENT.md from cwd at CLI cold start, wraps it in <project-rules>, and passes it through the Anthropic Messages API system field.

2.1 Create project_memory.py

This module does one thing: read AGENT.md from cwd. A missing file is not an error; it returns None:

def load_agent_md(cwd: Path) -> str | None:
    """读取 cwd 下的 AGENT.md,包装成 <project-rules> 块。
    文件不存在返回 None——不是错误,只是没配置。"""
    agent_md = cwd / "AGENT.md"
    if not agent_md.exists():
        return None
    content = agent_md.read_text(encoding="utf-8", errors="replace").strip()
    if not content:
        return None
    if len(content.encode("utf-8")) > _MAX_AGENT_MD_BYTES:
        truncated = content.encode("utf-8")[:_MAX_AGENT_MD_BYTES].decode("utf-8", errors="replace")
        content = truncated + "\n\n[... AGENT.md truncated at 50 KB ...]"
    return f"<project-rules>\n{content}\n</project-rules>"

It only reads one AGENT.md at cwd. For beginners, cat AGENT.md shows exactly what the model sees.

2.2 Expose system in model.py

In the Anthropic Messages API, the system prompt and the messages list are sibling inputs. The provider interface should expose that boundary:

class ModelProvider(Protocol):
    def complete(
        self,
        messages: list[dict[str, Any]],
        tools: list[Any] | None = None,
        system: str | None = None,
    ) -> ModelResponse:
        ...

AnthropicProvider.complete() adds kwargs["system"] only when system exists. MockProvider gets the same signature but ignores it.

2.3 build_system_prompt(cwd)

agent.py exports one assembly function. The CLI calls it once at cold start, and REPL mode reuses the same system prompt for all turns:

_SYSTEM_CORE = (
    "You are an AI coding agent running inside a CLI harness. "
    "You have access to tools for reading/writing files, running shell commands, "
    "searching the web, and asking the user questions. "
    "Use tools when needed; respond directly when you can."
)


def build_system_prompt(cwd: Path) -> str:
    parts: list[str] = [_SYSTEM_CORE]
    agent_md = load_agent_md(cwd)
    if agent_md:
        parts.append(agent_md)
    return "\n\n".join(parts)

2.4 Run It

$ cat > AGENT.md << 'EOF'
# 项目规则(demo)
- 所有代码必须用 Python 3.10+,写完整 type hints。
- 测试框架用 pytest,测试文件放在 tests/ 目录下。
- 不要用 print 打日志,用 logging 模块。
EOF

$ uv run python -c "
from pathlib import Path
from agent_code.agent import build_system_prompt
print(build_system_prompt(Path.cwd()))
"

Expected excerpt:

You are an AI coding agent running inside a CLI harness. ...

<project-rules>
# 项目规则(demo)
- 所有代码必须用 Python 3.10+,写完整 type hints。
- 测试框架用 pytest,测试文件放在 tests/ 目录下。
- 不要用 print 打日志,用 logging 模块。
</project-rules>

This check does not depend on model behavior. If build_system_prompt prints the right block, half the system prompt injection path is proven; then uv run agent-code "这个项目有什么规则" can show the model restating the rules.

loading…

A fuller coding Agent usually does not read only one file in the current directory. It combines several rule sources: user-level rules, such as ~/.claude/CLAUDE.md for long-term preferences; project-level rules found by walking upward from cwd through CLAUDE.md / .claude/CLAUDE.md; private project overrides, such as CLAUDE.local.md; and sometimes directory-style rules such as .claude/rules/*.md. That shape fits real projects, but if the first teaching version also handles recursion, precedence, and multi-file merging, it becomes hard for a learner to tell which rule text the model actually saw.

So v2 narrows the scope on purpose: read only AGENT.md in cwd, skip user-level rules, and do not walk parent directories. The goal is not to recreate the full rules system yet. The goal is to prove the key harness boundary: the CLI reads rules once at cold start, joins them into system_prompt, and the provider passes that text to the model. Because this version reads one file, load_agent_md() does not need to solve rule precedence or conflict resolution yet.


v3 — Minimal Compact

Session JSONL keeps growing, and the in-memory messages list can grow with it. v3 adds a deterministic compact pass: keep the first 2 pinned messages, keep the latest 8 working messages, and compress the middle into one <compacted-history> user message.

Important: compact only touches the messages list, not the system prompt. AGENT.md and MEMORY.md are passed through provider.complete(system=...), so they are never compacted.

3.1 Create compact_basic.py

def compact(messages: list[dict[str, Any]], keep: int = 8) -> list[dict[str, Any]]:
    """确定性压缩消息历史。不调 LLM。"""
    pin_count = 2
    if len(messages) <= keep + pin_count:
        return messages

    pinned = messages[:pin_count]
    working = messages[-keep:]
    middle = messages[pin_count:-keep]
    compressed = _build_compressed_block(middle)
    return pinned + [compressed] + working

_build_compressed_block() only extracts statistics: message count, tool-call count, tools used, files read, files edited, and commands run. It does not pretend to understand the semantic task. LLM summarization is saved for Day 11.

3.2 Trigger From the Agent Loop

Check message count before every provider.complete() call:

if len(messages) > 40:
    messages = compact(messages, keep=8)
    console.print(f"[dim]compacted: {len(messages)} messages remaining[/dim]")

40 is not an estimate of the model's context limit. Modern context windows can be much larger than 8K, and message count is a weak proxy anyway: one large tool_result can cost more than 20 short chat messages, while 100 tiny messages may still be cheap. Day 6 hardcodes 40 only so the compact path is easy to trigger locally and you can see the shape: older messages collapse into a summary while recent working context stays intact. Real auto-compact should look at token usage, tool_result size, the model's context window, and a reserved buffer; Day 11 replaces this fixed number with a token-budget-based threshold.

3.3 Run It

$ uv run python -c "
from agent_code.compact_basic import compact

msgs = []
for i in range(15):
    msgs.append({'role': 'user', 'content': f'task step {i}'})
    msgs.append({'role': 'assistant', 'content': [
        {'type': 'text', 'text': f'doing step {i}'},
        {'type': 'tool_use', 'id': f'c{i}', 'name': 'read_file', 'input': {'path': f'file{i}.py'}},
    ]})
    msgs.append({'role': 'user', 'content': [
        {'type': 'tool_result', 'tool_use_id': f'c{i}', 'content': f'content of file{i}'}
    ]})

print(f'before: {len(msgs)} messages')
result = compact(msgs, keep=8)
print(f'after: {len(result)} messages')
"

You should see before: 45 messages and after: 11 messages. 11 means 2 pinned + 1 summary + 8 working.

loading…

v4 — Memdir Long-Term Memory

v1-v3 still mostly live within one session. v4 adds cross-session memory: the model can write “the user is a data scientist working on observability” to .agent/memory/, and a fresh session can see that fact through the MEMORY.md index without -c.

4.1 The memdir/ Package

memdir has four files: paths.py owns directories and truncation constants, types.py owns MemoryEntry and slugs, store.py owns load_index / write_memory / recall_memory, and __init__.py re-exports the public API.

MEMORY_TYPES = ("user", "feedback", "project", "reference")


@dataclass
class MemoryEntry:
    mem_type: str
    title: str
    slug: str
    body: str
    file_path: str

write_memory() does two things in one tool call: write the topic file, then append a MEMORY.md index row. That prevents the model from doing the first step and forgetting the second.

4.2 Permissions And Tool Registration

memory_recall is readonly, so it goes into _READONLY_TOOLS. memory_write is a low-risk write, so it goes into _LOW_RISK_WRITES: default / acceptEdits allow it automatically, while plan mode still denies it.

_LOW_RISK_WRITES = frozenset({"memory_write"})

if tool_name in _LOW_RISK_WRITES:
    return PermissionDecision("allow")

The tool functions are thin wrappers: validate arguments, call write_memory() / recall_memory(), and convert the result into an observation.

4.3 Inject MEMORY.md Into the System Prompt

v4 replaces v2's build_system_prompt() with a three-layer assembly:

def build_system_prompt(cwd: Path) -> str:
    """组装 system prompt:核心指南 + AGENT.md + MEMORY.md 索引。"""
    from .memdir.store import load_index as load_memory_index

    parts: list[str] = [_SYSTEM_CORE]
    agent_md = load_agent_md(cwd)
    if agent_md:
        parts.append(agent_md)

    memory_index = load_memory_index(cwd)
    if memory_index:
        parts.append(f"<project-memory>\n{memory_index}\n</project-memory>")

    return "\n\n".join(parts)

4.4 Run It

Start with a deterministic function-level check in a temp directory:

$ uv run python -c "
import tempfile
from pathlib import Path
from agent_code.memdir.store import write_memory, recall_memory
from agent_code.agent import build_system_prompt

with tempfile.TemporaryDirectory() as tmp:
    cwd = Path(tmp)
    entry = write_memory(cwd, 'user', '用户角色', '用户是数据科学家,主要研究方向是观测性。')
    print('written:', entry.file_path)
    hits = recall_memory(cwd, '数据科学家')
    print('recalled:', len(hits), 'entries')
    print('first hit type:', hits[0].mem_type, 'title:', hits[0].title)
    sp = build_system_prompt(cwd)
    print('project-memory in system prompt:', '<project-memory>' in sp)
"

Expected output:

written: .agent/memory/user/mem-247c9e11.md
recalled: 1 entries
first hit type: user title: 用户角色
project-memory in system prompt: True

Then let the real model write one memory:

$ uv run agent-code "我是数据科学家,主要研究观测性。记住这一点。"
tool_call: memory_write {'type': 'user', 'title': '用户角色', 'body': '用户是数据科学家,主要研究方向是观测性。'}
observation: Memory saved: [user] 用户角色 -> .agent/memory/user/mem-247c9e11.md
final: 已记住。下次对话时我会记得你是数据科学家,研究方向是观测性。

In a fresh session, ask it to use memory_recall:

$ uv run agent-code "用 memory_recall 工具查一下你对我有什么记忆"
tool_call: memory_recall {'query': '用户 数据科学家 观测性', 'top_k': 5}
observation: ## [user] 用户角色
  file: .agent/memory/user/mem-247c9e11.md
  用户是数据科学家,主要研究方向是观测性。

final: 根据我的记忆,你是一名数据科学家,主要研究观测性方向。
loading…

Terminal Replay

Here is the terminal animation for creating a session, then restoring that same conversation with -c:

Loading trace…

What We Have Now

  • Session JSONL: every conversation is written to .agent/sessions/<sanitized_cwd>/<id>.jsonl; -c restores the latest session and --resume <id> restores a specific one.
  • AGENT.md project rules: the cwd's AGENT.md is wrapped in <project-rules> and injected through system=.
  • Deterministic compact: after 40 messages, older history becomes <compacted-history> while pinned and recent working context stay intact.
  • Memdir long-term memory: memory_write writes the topic file and updates MEMORY.md; memory_recall uses keyword grep over topic files.
  • System prompt assembly: core prompt → AGENT.md → MEMORY.md index, with every layer optional.

FAQ

-c says "no historical session found"

There is no .agent/sessions/ directory under your cwd yet. Run one command without flags to create the first session, then use -c.

$ uv run agent-code "hello"
$ uv run agent-code -c "continue"

The model “forgot” something after compact

That is expected. The current compressed block only stores statistics, not semantic conclusions. If an important conclusion leaves the working window, the model may forget it. Day 11 adds LLM semantic summarization; for now, use memory_write for durable facts.

How does memory_recall match Chinese text?

query is split by spaces into keywords. Every keyword found in title + body adds score. Not every keyword must match. Chinese without spaces is treated as one keyword, so "数据科学家" counts as one term.

Why does memory_write create mem-247c9e11.md instead of pinyin?

make_slug keeps only ASCII letters and digits. A fully Chinese title becomes an empty slug, so it falls back to hashlib.sha1(title) and generates mem-<first 8 chars>. This keeps the implementation standard-library-only and avoids filesystem Unicode normalization differences.

What happens when MEMORY.md exceeds 200 lines or 25KB?

load_index truncates on read: over 200 lines keeps the header plus newest 200 lines; over 25KB truncates on a byte boundary. Writes do not truncate yet. One exercise below asks you to add write-time truncation.


Exercises

  1. List sessions with /sessions: add a slash command in cli.py that scans .agent/sessions/ JSONL files and prints session_id, message count, and last modified time.

  2. Export a session to markdown: add export_markdown(output_path) to session.py and render JSONL history into human-friendly markdown.

  3. Chinese tokenization for recall: add an optional tokenization strategy to recall_memory. Without jieba, try unigram + bigram tokens.

  4. Truncate MEMORY.md on write: after write_memory appends to the index, remove oldest entries if the index exceeds 200 lines.

  5. Extract memdir entries from session history: write a script that scans session JSONL, extracts user facts, and stores them through memory_write.


Thinking Questions

  1. Why split the three memory layers (session JSONL / AGENT.md / memdir) instead of putting everything in one file? (Hint: the writer, lifecycle, and reliability requirements are different for each layer.)

  2. When is deterministic compact enough, and when do you need an LLM semantic summary? (Hint: statistics can say “which files were edited,” but can they say “where the bug investigation stopped”?)

  3. What are the tradeoffs between injecting the MEMORY.md index into the system prompt and making the model call memory_recall every time? (Hint: bigger system prompts cost tokens; more tool calls cost latency.)

  4. memory_write writes both a topic file and an index row. What happens if the process crashes between those two writes? (Hint: recall_memory scans topic files, while load_index only reads MEMORY.md.)


Next Day

Today the Agent gained three layers of memory: session history, AGENT.md project rules, and memdir cross-session long-term memory. The single-Agent CLI now has the core loop: read code, edit files, run commands, enforce permissions, and remember context.

Next we enter the customization layer: Slash Commands + Hooks + Cron. You'll make / commands control Agent behavior, register hooks before and after tool calls, and schedule tasks with cron expressions.