Activeloop: Database for AI

Database for AI

We provide a simple API for creating, storing, versioning, and collaborating on multi-modal AI datasets of any size. With Activeloop's open-core stack, you can rapidly transform and stream data while training models at scale. Deep Lake powers foundational model training by acting as a vector database with significant benefits, such as (1) the ability to use multi-modal datasets to fine-tune your own LLM models, (2) storing both the embeddings and the original data with automatic version control, so no embedding re-computation is needed (3) truly serverless service with no vendor lock-in. How cool is that? GitHub loves us - we're one of the fastest-growing libraries there, and we're used by little-known companies like Google, Waymo, and Intel. No big deal. Our founding team hails from places like Princeton, Stanford, Google, and Tesla, and we're backed by Y Combinator & other Silicon Valley heavyweights. Activeloop is hiring, and we want you! Check out our open roles on our YC page and join the fun. 10-min demo: https://activeloop.wistia.com/medias/aibvo0dst2 Whitepaper: https://www.deeplake.ai/whitepaper

Active Founders

Davit Buniatyan

Founder

Founding CEO Activeloop, PhD on leave from Princeton, AI/ML, Data and Infra, Y Combinator S18, UCL 16’ Working on Data 2.0

Davit Buniatyan

Founder

Founding CEO Activeloop, PhD on leave from Princeton, AI/ML, Data and Infra, Y Combinator S18, UCL 16’ Working on Data 2.0

The Problem: Coding Agents Don't Learn

Every engineering team building with AI agents already knows the issues.

Your senior engineer's agent debugs a tricky bug Monday. Your junior engineer hits the same bug Tuesday and starts from zero.
Every session begins with re-explaining your codebase and your conventions.
Architectural decisions made in code review evaporate the moment the session ends.
Your team is solving the same problems over and over, paying for the same compute every time.

Coding agents do not communicate with each other out of the box. Hivemind closes the gap. An agent figures something out, every agent inherits it. Your worst run starts from your best.

The tools tackling this treat it as a memory problem. They give each agent a personal notepad. Siloed. Invisible to the rest of the team. And most of them do it by shipping your code and context to their servers.

But memory without learning doesn't make agents better.

We decided to go beyond memory.

uploaded image

Introducing Hivemind: Continual Learning for Coding Agents

Hivemind runs on a simple chain: capture, codify, optimize, propagate.

1. Trace capture. Every agent interaction (prompts, tool calls, file reads, reasoning chains, outputs) is captured automatically as a structured trace.

2. Skill codification. Repeated patterns across traces get codified into reusable skills. Mix of automatic detection and LLM-assisted extraction, with workspace-level scoping so skills don't leak between teams.

3. Skill optimization. This is new today. Skills get trained and optimized. We've implemented SkillOpt (a text-space optimizer out of Microsoft, Shanghai Jiao Tong, and Fudan) directly into Hivemind. It improves each skill the way you'd tune a model, keeping only the edits that prove out on a held-out test.

That approach lifts agent accuracy by +19.1 points inside Claude Code and +24.8 inside Codex, and wins or ties on all 52 setups tested. It runs offline, so it adds zero cost at inference. Your skills get sharper over time instead of bloating.

4. Skill propagation. Optimized skills flow into every agent's context at inference time, across every Hivemind-connected agent in your workspace.

Works across:

Claude Code
Codex
Cursor
OpenClaw
Hermes
pi

uploaded image

On Your Cloud

This is the part our customers care about most. Your traces, your skills, your cloud storage. The other tools in this space treat your codebase as their training data. We don't. Hivemind store data on your cloud storage bucket, which is the only way a continual learning layer should ever touch your source.

Codebase knowledge graph:

Hivemind builds a graph of your codebase
Agents reason over how files, functions, patterns, and prior fixes connect
Structure beats keyword search: agents retrieve relevant context instead of grepping blindly

uploaded image

Example

Your senior engineer's agent figures out a tricky migration pattern in your payment service on Monday.

Tuesday, your junior engineer's agent hits the same migration shape in a related service. The skill has already crystallized, and SkillOpt has already sharpened it against the cases where the first version fell down. The agent executes it in one shot.

Repeat work stops being repeat work.

The Results: 25% Lower Cost, 41% Fewer Tokens

We benchmarked Hivemind against memory tools using LoCoMo, the standard evaluation for long-horizon agent memory.

Hivemind matches Mem0 on accuracy.

25% lower cost per 100 QA
41% fewer output tokens
31% fewer agent turns

Single-agent benchmarks understate the real story. The compounding effect of org-wide skill propagation isn't captured in these numbers. In production, the gap widens every week as your trace library grows.

uploaded image

Full methodology: deeplake.ai/hivemind

Why This Matters

Most developer tools deliver linear value. Hivemind compounds.

Week 1: your agents stop repeating mistakes.
Month 1: they're learning from each other.
Quarter 1: your whole engineering org is operating with capability that survives team changes, onboarding cycles, and turnover.

Your team's hard-won knowledge accumulates.

uploaded image

What's Next: From Skills to Custom Models

Because we store trajectories in Deeplake's tensor format, traces are ready as PyTorch datasets. A handful of advanced customers are already fine-tuning their own open-source models on the trajectories their Claude and Codex agents generated last week.

Same data layer. Two paths to continual learning.

Full tutorial and public examples shipping in the coming weeks.