ML work is stateful. The model you trained Tuesday, the eval split you swore you'd never change, the hyperparameter you tried and rejected — Statefulai remembers them all and surfaces them the next time you ask an agent to "run another sweep" or "fix the eval loop."
Every notebook run, every hyperparameter, every metric — captured as episodic memory and indexed for retrieval. "What did I try last sprint?" becomes a question, not an archaeology project.
Your team's lr schedules, eval splits, seed conventions, and validation gates become procedural rules — the agent applies them by default, without being asked.
Cell history, kernel state changes, and inline outputs all become part of the searchable memory graph. The agent can quote your own EDA back at you six weeks later.
Versioned dataset references, schema fingerprints, train/val/test split provenance.
Run IDs, weights paths, config snapshots, MLflow / W&B / Comet linkage.
Tried, rejected, currently winning. Sweep results queryable by metric or motivation.
Per-slice metrics, regression notes, "this broke on long sequences" annotations.
Feature schemas, transform invariants, training-serving skew checks as procedural rules.
Cell-level memory: what you tried, what you saw, what you concluded — preserved across kernel restarts.
Our agent-run measurements are public. Headline numbers below are coding-task demos today; the ML-task suite (training-loop fix, eval regression, feature-skew triage) lands next.
Live feed: app.statefulai.tech/metrics · current snapshot 2026-05-26 · n = 10 runs · output tokens −37.4% · time-to-first-edit −52.7% · completion rate 40 → 100%.
We're onboarding ML teams now. Tell us your stack (PyTorch / JAX / TF, Jupyter / VSCode, MLflow / W&B / Comet) and we'll wire up the integration that matters first.