Durable Execution Middleware for AI Agents

The Stateful Runtime
for Autonomous AI.

Network drops and server crashes abort long-running agent tasks—wasting tokens and corrupting production databases. Cellaflow bridges the gap. It delivers low-overhead execution journaling and fault-tolerant process recovery to make probabilistic AI architectures completely deterministic.

Exactly 0
Redundant LLM Requests
< 250ms
State Replay Recovery
< 5ms
Serialization Overhead
< 20MB
RocksDB Run Footprint
from cellaflow.sdk import workflow, step, NonPersistableZone
from cellaflow.sdk.tools import clone_repo, send_report

@workflow(name="code_audit_agent", version="1.0.0")
async def run_audit_agent(ctx, repo_url: str):
    # Step 1: Durable Execution of Repository Clone
    files = await ctx.step("clone_repo", clone_repo, repo_url)

    # Step 2: Non-Persistable Zone (Suspends Checkpoint Commits)
    async with NonPersistableZone(ctx):
        analysis = ""
        async for chunk in ctx.llm.stream("gpt-4", prompt=f"Audit: {files}"):
            analysis += chunk

    # Step 3: Notification with Idempotency Key (Exactly-Once)
    report_status = await ctx.step(
        "send_report",
        send_report,
        to="devs@org.com",
        body=analysis,
        idempotency_key=f"report-{ctx.session_id}"
    )
    return report_status
gRPC engine.cellaflow.local:50051 OKTLS Enabled
The Engineering Reality

Why Autonomous Agents Break in Production

Autonomous agents execute long-running, non-deterministic workflows. Running them on stateless application servers or serverless containers introduces critical operational and financial liabilities.

Stateless Server Fail Path

The Stateless Agent Nightmare

Pod Recycles and Lost Memory

Kubernetes or AWS ECS terminates your node 8 minutes into a 10-minute workflow. Because all variables, agent stack context, and memory are in transient RAM, the entire run is aborted.

Runaway API Token Expenses

To complete the failed job, your agent must start over from Step 1. You re-run costly LLM reasoning prompts and stream completions again, doubling or tripling your model API bill.

Duplicate Tool Side-Effects

Retrying the agent forces tools to run a second time. This results in duplicate database mutations, repeated Stripe credit card charges, or multiple redundant Slack/email dispatches.

Cellaflow Stateful Execution

Durable Replay Recovery

Automated Re-Binding

Upon boot, the container SDK re-initializes and re-binds to the active Session ID. Cellaflow queries the Engine Core via gRPC, recognizing the session state instantly.

Cognitive Graph Journals

Cellaflow runs **Replay Recovery** by reading completed steps from the immutable **Cognitive Graph** log. It skips execution, returns cached outputs, and resumes at the first uncompleted step.

At-Most-Once Tool Guards

Deterministic idempotency keys guarantee that tool side-effects occur at most once. Non-Persistable Zones (NPZ) suspend checkpointing during stream blocks to avoid database corruption on crash.

Live Web Sandbox

Interactive Replay Recovery Simulator

Observe how Cellaflow journals steps to its local RocksDB store, handles a sudden mid-turn server recycle, and restores execution context in less than 250ms with zero extra LLM tokens.

Agent Execution Graph
Token Bill: $0.00
1

Clone Repository

Durable Tool Call ($0.01 API fee)

2

Scan Workspace Files

Durable Tool Call ($0.01 API fee)

3

Analyze Bugs with LLM

LLM Completion Prompt ($0.25 API fee)

4

Generate Code Fix Patch

Durable Tool Call ($0.05 API fee)

5

Send Email Notification

Side-effecting Tool (Idempotent)

Cognitive Graph Ledger Logs (RocksDB)
Click the simulate button to boot the agent and inspect RocksDB logs...
Operational ROI Assessment

Token Waste & Savings Calculator

See the immediate business case. Estimate how much LLM API budget you are throwing away on redundant steps due to infrastructure recycles and how Cellaflow's middleware stops the bleed.

1,000 runs
8 steps
$0.05
4%
Active Compressed Cognitive State (CCS)

Enable background log compaction (5x to 40x compression ratios) to optimize context inputs and save an extra ~45% in prompt token costs.

Calculated Annual Savings

Wasted Token Cost (Without Cellaflow)
$2,920 / year

Directly thrown away on re-running completed steps.

CCS Prompt Caching Savings
$65,700 / year

Saved via asynchronous observation compaction.

Total Saved Annual Budget
$68,620

Annual savings from Durable Replay + CCS Memory optimization.

Total Developer Time Recovered
2,433 hours / year
The Middleware Foundation

The Four Core Infrastructure Pillars

Cellaflow does away with bloated state machine libraries, replacing them with a low-level, high-throughput Rust engine core built for state safety.

Durable Execution

Automatically serializes stack frame variables, variable bindings, and conversation histories to disk after every successful state transition.

Replayable Cognitive Graph

Maintains an immutable, versioned, append-only ledger that records all non-deterministic events (LLM outputs, tool calls) for instant time-travel debugging.

Non-Persistable Zones (NPZ)

Suspends checkpoints during streaming LLM completions or unresolved tool executions to prevent serializing corrupted state, rolling back on crash.

Embedded RocksDB

Uses an isolated key-value database embedded directly inside the single-process core daemon. Eliminates relational network overhead.

Tonic gRPC Compilation

Decoupled Protos & Automated Code Gen

Cellaflow maintains a dedicated compilation crate `cellaflow-proto` that compiles Protocol Buffer definitions (`proto/cellaflow/v1/`) on the `cargo build` phase using `tonic-build` in its build script. SDK clients and backend systems share exact runtime contracts without code replication.

TLS Transport Layer Encryption

Tonic gRPC server rejects unencrypted HTTP/2 immediately, securing all pipeline traffic.

Auth Interceptor Validation

Bearer Token metadata is validated at the gRPC interceptor layer before passing to the engine.

Workflow Version Pinning

Locks active executions to the specific version they started on, preventing schema drift.

proto/cellaflow/v1/service.proto
syntax = "proto3";

package cellaflow.v1;

service EngineService {
  // Initiates a stateful session with registry version pinning
  rpc StartSession(StartSessionRequest) returns (StartSessionResponse);

  // Commits a step output to the RocksDB Cognitive Graph
  rpc CommitStep(CommitStepRequest) returns (CommitStepResponse);

  // Recovers active session state log on failover
  rpc ReplayRecovery(ReplayRequest) returns (stream ReplayEvent);
}
Alpha Developer Program

Join the Private Preview

The Cellaflow stateful runtime engine core is under active development. We are partnering with engineering teams building complex multi-agent frameworks, cognitive pipelines, and containerized agent platforms.

Request Alpha Access

Book a direct technical architecture review with our founding engineers to design your stateful runtime parameters and request early builds.

Schedule Architecture Call

Private Engine Core

The stateful engine core and compiler boundary are currently in a private repository. Code and SDK definitions will be released under an open-source license once we reach beta.

Repository Private