Building the Memory Layer for AI Agents
2026-04-20
I recently built Azen, an open-source, self-hostable memory infrastructure layer for AI agents.
The problem I wanted to solve was straightforward: most AI apps are good at generating responses, but weak at remembering users over time. In practice, that means developers end up stitching together ad hoc memory systems that are hard to search, hard to scale, and tightly coupled to one interface or one model provider.
I wanted to build a proper memory backend that agents could use as infrastructure.
So I built Azen as a Bun + TypeScript monorepo with a clean separation between the API layer, the core memory engine, shared schemas, client integrations, and a management UI.
At the center of the system is a fan-out memory pipeline.
When a memory is created, Azen first writes the canonical record into PostgreSQL** using **Drizzle ORM. That record includes `userId`, `appId`, content, metadata, timestamps, and optional TTL via `expiresAt`. From there, the write is pushed into a BullMQ queue backed by Redis, where an async worker takes over.
That worker does the rest of the heavy lifting:
- generates an embedding using an EmbeddingProvider abstraction
- upserts the vector into either pgvector or Qdrant
- creates or updates the corresponding Neo4j memory node
- runs LLM-based entity extraction
- links the memory to extracted entities inside the graph
This was an intentional design choice. I didn’t want the API request path to block on embedding generation, graph enrichment, and downstream sync work. By splitting canonical storage from async enrichment, the system is cleaner and much more production-friendly.
Azen’s search layer is also more than plain vector similarity.
The `SearchService` first embeds the incoming query and runs a top-K nearest-neighbor search against the configured vector store. Then, if graph expansion is enabled, it uses the initial vector hits as anchors and asks Neo4j for other memories that share extracted entities with them. Those graph-derived results are merged back into the result set with a discounted relevance score and labeled separately by source.
That means the search stack supports two retrieval modes at once:
- semantic retrieval through embeddings
- relationship-aware retrieval through graph traversal
This is useful because human memory is not purely lexical. Two memories can be related even if they are phrased very differently, and the graph layer helps recover that extra context.
To support that graph expansion, I added an entity extraction pipeline powered by an **LLMProvider** abstraction. Right now the repo uses an OpenAI-backed implementation, with `gpt-4o-mini` as the default entity extraction model. The extracted entities are normalized into a constrained schema with fields like `name`, `type`, and `relation`, then stored in Neo4j using `MERGE` patterns to keep graph writes idempotent.
On the vector side, Azen supports a pluggable `VectorStore` interface. In the repo, there are two implementations:
- pgvector, which keeps embeddings inside Postgres for simpler deployments
- Qdrant, for cases where a dedicated vector database is preferred
That lets the same memory engine work for both lightweight self-hosted setups and more specialized vector search deployments.
The project is also multi-tenant by design. Every memory operation is scoped by `userId` and `appId`, so the same Azen instance can support multiple applications or agents without mixing memory contexts. Shared schemas are centralized in `@azen-sh/types` using Zod, which keeps the contracts consistent across the API server, core engine, tests, and integrations.
On top of the backend, I shipped multiple interfaces around the same memory system.
The main API is exposed through `@azen-sh/server`, a Hono REST service with endpoints for:
- creating memories
- listing memories
- fetching by ID
- updating and deleting memories
- semantic search with optional graph expansion
- health checks
I also added a lightweight React dashboard in `@azen-sh/web` so memories can be inspected and managed from a browser. It includes views for listing stored memories, semantic search, and creating new memory records manually.
One of the most important parts of the project is that I didn’t stop at the API.
I built MCP capabilities through `@azen-sh/mcp`, which exposes Azen as a Model Context Protocol server over stdio. That means tools like Claude Desktop, Cursor, or any MCP-compatible client can directly store, search, list, and delete memories without custom glue code. Instead of forcing developers to wire raw REST calls into every agent environment, Azen can plug into the growing MCP ecosystem as a memory layer.
I also built a Vercel AI SDK integration in `@azen-sh/vercel-ai`, packaged as an npm module. It gives language models a ready-made set of memory tools like:
- `addMemory`
- `searchMemories`
- `listMemories`
- `getMemory`
- `updateMemory`
- `deleteMemory`
- `deleteAllMemories`
With that package, developers can pass `azenTools()` into `generateText` or `streamText` and let the model decide when to store and recall memory on its own. I like this part because it turns Azen from “just a backend” into something that fits naturally into real agent frameworks and modern LLM application stacks.
From an engineering perspective, this project taught me a lot about building AI infrastructure instead of just AI features.
It pushed me to think about:
- async system boundaries
- multi-store consistency
- retrieval architecture
- typed API contracts
- provider abstractions
- developer-facing integrations
Azen started as a memory API, but it turned into a broader platform for long-term agent memory: canonical storage in Postgres, semantic recall through vectors, contextual expansion through graphs, and multiple ways for developers and AI tools to plug into the system.