Your Coding Agent Will Never Start From Scratch Again: Session Storage in AgentCore Runtime

Table of Contents

  1. The Problem with Ephemeral Agents
  2. How Session Storage Works
    1. Session Isolation
    2. Storage Lifecycle
  3. Implementation: Coding Agent with Session Storage on AgentCore Runtime
    1. Prerequisites
    2. IAM Role for the Agent Runtime
    3. Project Structure
    4. The Container
    5. The Agent
    6. Configuring the Agent Runtime with Session Storage
    7. The Stop/Resume Cycle in Action
  4. What the Filesystem Supports (and What It Doesn’t)
  5. When to Use Session Storage (and When Not To)
  6. What I Learned from Testing It
  7. The Complete Picture: The Three State Layers of an Agent
  8. Official Resources 📚

Picture this: your coding agent spent the last 40 minutes scaffolding a Node.js project. It installed dependencies, wrote the models, configured the ORM, left unit tests half-finished. You have to close the session. The next day you pick it back up — and the agent starts from scratch. No files. No node_modules. No trace of what it built.

That’s not a bug in your agent. It’s the by-design behavior of any agent runtime without persistence. Every session boots from a clean filesystem.

And there’s an important distinction worth making before diving into the code:

Episodic memory (which we covered in the previous article) stores what the agent learned: patterns, reflections, past experiences. Session Storage stores what the agent built: files, dependencies, artifacts, operational project state.

These are two complementary forms of persistence, not interchangeable ones. A serious production agent needs both.

Today we focus on the second.

The Problem with Ephemeral Agents

The AgentCore runtime, like any serverless compute system, is ephemeral by design. When a session ends or is stopped, the associated compute is destroyed. The next time you invoke the same session, AWS provisions a fresh, clean environment.

For simple conversational agents, this isn’t a problem. For coding agents, long-running data analysis agents, or any agent that works with the filesystem, it’s a serious blocker:

  • The agent installs packages → session stops → must reinstall everything
  • The agent generates intermediate artifacts → session restarts → files lost
  • The agent checkpoints a long process → restart → no checkpoints

The traditional workarounds are painful: manually syncing to S3, using EFS with VPC configuration, or writing your own checkpoint logic. They all work, but add operational complexity your team has to maintain.

AgentCore Runtime Session Storage is AWS’s managed answer to this problem.

How Session Storage Works

Session Storage is a managed capability of the AgentCore Runtime. Your agent reads and writes to a regular local directory — say /mnt/workspace — and the runtime transparently replicates that state to durable storage.

The lifecycle is:

  1. First invocation of a session — New compute is provisioned. The directory at the mount path appears empty.
  2. The agent writes files — Normal filesystem operations (mkdir, write, npm install, git init). Data is asynchronously replicated to durable storage.
  3. The session stops — Compute is destroyed. Any pending data is flushed during graceful shutdown.
  4. Next invocation with the same sessionId — New compute, but the filesystem is restored exactly where it left off.

What struck me most when testing it: there’s no special API for this. Your agent just uses the filesystem as usual. The runtime handles everything else.

⚠️ Important: When you explicitly call StopRuntimeSession, wait for the operation to complete before resuming the session. This guarantees all data is flushed to durable storage before the next start.

Session Isolation

Each session has its own isolated storage. One session cannot read or write to another session’s storage — whether from the same agent or a different one. This matters for multi-tenant scenarios or when multiple users have parallel sessions with the same agent.

Storage Lifecycle

Data persists as long as the session is active. Two conditions reset the filesystem to a clean state:

  • The session is not invoked for 14 consecutive days.
  • The agent runtime version is updated. If you deploy a new version, existing sessions will start with an empty filesystem.

That second point is a real gotcha for production: if you have long-running sessions in flight and you deploy, they lose their filesystem state. Design your agent to handle this case.

Implementation: Coding Agent with Session Storage on AgentCore Runtime

Let’s build a coding agent that demonstrates persistence in action: creates a project, stops, resumes, and continues where it left off — both in files and in conversation.

Prerequisites

Before starting, verify you have:

  • AWS CLI configured with permissions on bedrock-agentcore-control and ecr
  • Docker with Buildx — run docker buildx version to confirm
  • ECR repository created in your account for the agent image
  • Region: Session Storage is available in multiple regions (us-west-2, us-east-1, eu-central-1, ap-northeast-1, and others) — check the updated list in the official docs before deploying
pip install strands-agents strands-agents-tools bedrock-agentcore boto3

IAM Role for the Agent Runtime

The runtime needs a role that AgentCore can assume. The trust policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "bedrock-agentcore.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

And the permissions policy with the minimum required permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage",
        "ecr:GetAuthorizationToken"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:us-west-2:YOUR_ACCOUNT:*"
    }
  ]
}

Project Structure

Three files in the same directory:

coding-agent/
├── Dockerfile
├── coding_agent.py
└── requirements.txt

The Container

AgentCore Runtime runs ARM64 containers exclusively. If you develop on an x86/amd64 machine, you need cross-compilation with Docker Buildx:

# Create a builder for ARM64
docker buildx create --use

# Build + push directly to ECR
docker buildx build \
  --platform linux/arm64 \
  -t YOUR_ACCOUNT.dkr.ecr.us-west-2.amazonaws.com/coding-agent:latest \
  --push .

⚠️ Gotcha: If you use regular docker build without buildx, the resulting image will be amd64 even if you’re on an ARM machine. AgentCore will reject it with Architecture incompatible. In my experience, when cross-compilation from x86 didn’t produce a valid ARM image, using the explicit docker-container driver (--driver docker-container) fixed it — but the official documentation only requires docker buildx without specifying a driver. If you run into architecture issues, that’s the first thing to try.

The Dockerfile needs Python for the agent and Node.js because the agent creates Node projects:

FROM python:3.12-slim

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    git curl && rm -rf /var/lib/apt/lists/*

RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
    && apt-get install -y nodejs \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY coding_agent.py .

RUN mkdir -p /mnt/workspace

EXPOSE 8080

CMD ["python", "coding_agent.py"]

And the requirements.txt:

strands-agents
strands-agents-tools
bedrock-agentcore
boto3

The Agent

from strands import Agent
from strands.session import FileSessionManager
from strands.models import BedrockModel
from strands_tools import file_read, file_write, shell
from bedrock_agentcore.runtime import BedrockAgentCoreApp

# Enable tools without interactive confirmation
os.environ["BYPASS_TOOL_CONSENT"] = "true"

app = BedrockAgentCoreApp()

# The workspace persists between sessions thanks to Session Storage
WORKSPACE = "/mnt/workspace"

model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-20250514-v1:0"
)

tools = [file_read, file_write, shell]

@app.entrypoint
def handle_request(payload):
    session_id = payload.get("session_id", "default")

    # Conversation history also persists in the workspace
    # — same directory, no additional cost
    session_manager = FileSessionManager(
        session_id=session_id,
        storage_dir=f"{WORKSPACE}/.sessions"
    )

    agent = Agent(
        model=model,
        tools=tools,
        session_manager=session_manager,
        system_prompt=(
            "You are a coding assistant. "
            "Project files are in /mnt/workspace. "
            "When resuming a session, check what's in the workspace first "
            "before assuming you need to start from scratch."
        )
    )

    response = agent(payload.get("prompt"))
    return {
        "response": response.message["content"][0]["text"]
    }

if __name__ == "__main__":
    app.run()

Notice the design point in the system_prompt: we tell the agent to check the workspace before acting. Without this, the agent might not “notice” that existing files are there and propose starting over. Filesystem persistence is transparent to the runtime, but the agent needs to know it should look for prior work.

FileSessionManager from Strands saves the conversation history in /mnt/workspace/.sessions/ — the same directory that persists. This means the agent also remembers what it promised to do in the previous session, not just the files it created.

Configuring the Agent Runtime with Session Storage

When creating the agent runtime, add filesystemConfigurations with a sessionStorage:

# deploy.py
import boto3
import argparse

REGION = "us-west-2"
ACCOUNT_ID = "YOUR_ACCOUNT"
RUNTIME_NAME = "coding_agent"
ROLE_ARN = f"arn:aws:iam::{ACCOUNT_ID}:role/AgentExecutionRole"
CONTAINER_URI = f"{ACCOUNT_ID}.dkr.ecr.{REGION}.amazonaws.com/coding-agent:latest"

client = boto3.client("bedrock-agentcore-control", region_name=REGION)


def create_runtime():
    response = client.create_agent_runtime(
        agentRuntimeName=RUNTIME_NAME,
        roleArn=ROLE_ARN,
        agentRuntimeArtifact={
            "containerConfiguration": {
                "containerUri": CONTAINER_URI
            }
        },
        networkConfiguration={
            "networkMode": "PUBLIC"    # Required if your agent needs internet access (Bedrock, npm, pip)
        },
        filesystemConfigurations=[
            {
                "sessionStorage": {
                    "mountPath": "/mnt/workspace"
                }
            }
        ]
    )
    arn = response["agentRuntimeArn"]
    # AWS appends a random suffix to the name: coding_agent-XXXXXXXXXX
    # Get the full ARN with:
    #   aws bedrock-agentcore-control list-agent-runtimes
    print(f"✅ Agent Runtime created: {arn}")
    return arn


def update_runtime(runtime_id: str):
    """Add session storage to an existing runtime."""
    client.update_agent_runtime(
        agentRuntimeId=runtime_id,
        filesystemConfigurations=[
            {
                "sessionStorage": {
                    "mountPath": "/mnt/workspace"
                }
            }
        ]
    )
    print(f"✅ Session Storage added to runtime {runtime_id}")


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--create", action="store_true")
    parser.add_argument("--update", type=str, metavar="RUNTIME_ID")
    args = parser.parse_args()

    if args.create:
        create_runtime()
    elif args.update:
        update_runtime(args.update)
    else:
        print("Usage: python deploy.py --create | --update RUNTIME_ID")

Two details worth knowing:

  • networkConfiguration with networkMode: "PUBLIC" is needed if your agent requires internet access — to call Bedrock, download npm or pip packages, etc. It’s not a required API parameter if your agent runs in a VPC without internet egress.
  • AWS appends a random suffix to the name you provided — the actual runtime ARN has the format coding_agent-XXXXXXXXXX. Check it with aws bedrock-agentcore-control list-agent-runtimes after deployment.

If you already have an existing runtime, update_agent_runtime accepts the same filesystemConfigurations parameter to add it without recreating the runtime.

The Stop/Resume Cycle in Action

# client.py
import boto3
from botocore.config import Config
import json
import os
import time

REGION = "us-west-2"
# AWS automatically appends a suffix to the name given in create_agent_runtime.
# Get the exact ARN with: aws bedrock-agentcore-control list-agent-runtimes
AGENT_ARN = os.environ.get(
    "AGENT_ARN",
    "arn:aws:bedrock-agentcore:us-west-2:YOUR_ACCOUNT:runtime/coding_agent-XXXXXXXXXX"
)

# Same sessionId across all invocations = same persistent filesystem.
# Minimum 33 characters — AgentCore validates this on the client side.
SESSION_ID = "proyecto-api-rest-001-session-demo-01"

# read_timeout=300 is necessary: npm install and other long operations
# easily exceed boto3's default 60-second timeout.
client = boto3.client(
    "bedrock-agentcore",
    region_name=REGION,
    config=Config(read_timeout=300)
)


def invoke(prompt: str, conv_id: str = "conv-001") -> str:
    response = client.invoke_agent_runtime(
        agentRuntimeArn=AGENT_ARN,
        runtimeSessionId=SESSION_ID,
        payload=json.dumps({
            "prompt": prompt,
            "session_id": conv_id
        }).encode()
    )
    result = json.loads(b"".join(response["response"]))
    return result["response"]


def stop_session():
    print(f"⏹  Stopping session {SESSION_ID}...")
    client.stop_runtime_session(
        agentRuntimeArn=AGENT_ARN,
        runtimeSessionId=SESSION_ID
    )
    # The official docs explicitly recommend waiting for StopRuntimeSession
    # to complete before resuming the session — this ensures the flush to
    # durable storage finishes. In production, implement a session state
    # poll instead of a fixed sleep.
    print("⏳ Waiting for flush to durable storage...")
    time.sleep(15)
    print("✅ Session stopped. Filesystem persisted.")


# --- First invocation ---
print(invoke(
    "Create a Node.js project in /mnt/workspace/api. "
    "Initialize with npm (name: 'rest-api', version '1.0.0'), "
    "install express and dotenv, and create index.js with a "
    "GET /health endpoint that returns {status: 'ok', timestamp: Date.now()}."
))

# --- Stop the session ---
stop_session()

# --- Second invocation with the same sessionId ---
# The agent resumes with filesystem and conversation intact
print(invoke(
    "Add a POST /echo endpoint that returns the received body "
    "as JSON. First check what exists in the workspace."
))

In my tests, the second invocation resumed exactly where it left off: node_modules intact, package.json with dependencies already defined, and the conversation history that let the agent understand what it had built before.

What the Filesystem Supports (and What It Doesn’t)

Session Storage implements a standard Linux filesystem at the mount path. Common operations that work without modification:

ls, cat, mkdir, touch, mv, cp, rm
git init / git add / git commit
npm install / pip install / cargo build
chmod, chown, stat, readdir

Standard POSIX operations work. There are four documented exceptions worth knowing before designing your agent:

Hard links — Not supported. Use symlinks if you need them. Most development tools don’t use them directly.

Device files, FIFOs, UNIX socketsmknod is not supported. Affects very specific use cases (Unix socket servers, etc.).

Extended attributes (xattr) — Tools that depend on xattr metadata won’t work.

fallocate — Sparse file preallocation is not supported. Tools that use it explicitly will fail; tools that simply write files won’t be affected.

File locking between sessions — Advisory locks work within an active session but don’t persist across stop/resume. git is not affected because it doesn’t rely on persistent locks.

One behavioral note: permissions (chmod) are stored correctly and stat reports them accurately, but enforcement doesn’t apply within the session because the agent runs as the sole user in the microVM. This doesn’t affect the behavior of standard tools, but it’s worth considering if your agent creates files with specific permissions expecting them to be enforced.

When to Use Session Storage (and When Not To)

The question I heard most when I shared this with the team: “Does this replace EFS?”

Not exactly. Here’s the honest comparison:

Criterion Session Storage Own EFS Manual S3 No Persistence
Setup 1 parameter at deploy VPC + mount target + sg Sync code None
Isolation Per-session, automatic Manual (your logic) Manual (your logic) N/A
Duration limit 14 days without invocation While EFS exists While bucket exists 0 (ephemeral)
Deploy effect Resets filesystem No effect Depends on your logic N/A
Cross-session sharing No (isolated per session) Yes, possible Yes, possible N/A
Cost Preview — pricing TBD EFS + data transfer S3 per operation None

Use Session Storage when:

  • Your agent works on code projects that span multiple sessions
  • You need operational persistence without configuration overhead
  • Each session is independent and doesn’t need to share storage with others
  • You want filesystem state to survive restarts without writing checkpoint code

Consider alternatives when:

  • Multiple sessions of the same agent need access to a shared filesystem (EFS)
  • Your use case requires more than 14 days of inactivity without reset (EFS or S3)
  • You deploy your agent runtime frequently and filesystem reset is disruptive
  • You have specific compliance requirements around data storage location

What I Learned from Testing It

Some real-world observations that aren’t in the official documentation:

The system_prompt matters as much as the configuration. Session Storage is transparent to the runtime, but the LLM needs context to “notice” that prior work exists. Without telling it to check the workspace before acting, the agent may propose starting over even though the files are right there.

Strands’ FileSessionManager is the natural complement. Saving conversation history in the same /mnt/workspace is elegant: one persistence mechanism for both operational state and conversational context.

Explicit wait after stop is not optional. The official docs are explicit: “always wait for [StopRuntimeSession] to complete before resuming the session.” In my tests, resuming without waiting produced 500 errors from the runtime. A minimum time.sleep(15) worked consistently, but in production implement a session state poll instead of a fixed sleep.

boto3’s read_timeout will bite you. The default is 60 seconds. A coding agent running npm install or pip install easily exceeds that limit, and you get a ReadTimeoutError that looks like a runtime error but is actually a client-side issue. Set Config(read_timeout=300) in the bedrock-agentcore client.

ARM64 is the only supported format. A regular docker build on an x86 machine produces an amd64 image that AgentCore rejects with Architecture incompatible. Use docker buildx --platform linux/arm64. If you run into architecture issues with cross-compilation from x86, adding the explicit --driver docker-container flag when creating the builder was what fixed it in my case.

runtimeSessionId requires a minimum of 33 characters. The official code example documents this with an inline comment: # Must be 33+ chars. A short ID will fail when invoking the agent.

AWS appends a random suffix to the runtime name. The actual ARN has the format coding_agent-XXXXXXXXXX. Check it with aws bedrock-agentcore-control list-agent-runtimes after deployment.

The deploy effect on active filesystems. Updating the agent runtime version resets the filesystem of all active sessions. If you have long-running sessions in flight and you deploy, they lose their state. Factor this into your release strategy.

The Complete Picture: The Three State Layers of an Agent

With this article, the series has covered the three state layers that a production agent on AgentCore can manage:

  • AgentCore Policy — What the agent can do. Deterministic guardrails.
  • AgentCore Episodic Memory — What the agent learned. Experiences and patterns.
  • AgentCore Session Storage — What the agent built. Operational filesystem state.

None replaces the other. A serious production coding agent can benefit from all three simultaneously: Policy to limit which commands it can run, Episodic Memory to learn from code patterns or past mistakes, and Session Storage to maintain the workspace between sessions.

The combination makes “agent that works on real projects” a viable use case, not just a re:Invent demo.


Are you building coding agents or long-running analysis agents on AWS? What’s been your biggest challenge with state persistence? I’d like to know what you’re running into — comments are open.

Until next time! 🚀


Found this useful? Share it with your team. They probably also have an agent that “forgets” everything every time it restarts.


Official Resources 📚

Exploring GenAI architectures on AWS?

I can help you design solutions with Bedrock Agents, Guardrails, and production-ready AI pipelines.

Schedule a call →
Written by

Gerardo Arroyo Arce

Solutions Architect, AWS Golden Jacket with a passion for sharing knowledge. As an active AWS Community Builders member, former AWS Ambassador, and AWS User Group Leader, I dedicate myself to building bridges between technology and people. A Java developer at heart and independent consultant, I take cloud architecture beyond theory through international conferences and real-world solutions. My insatiable curiosity for learning and sharing keeps me in constant evolution alongside the tech community.

Start the conversation