Google open-sourced an agent that keeps persistent memory without vector databases or embeddings—just an LLM that reads, thinks, and writes. Here's what it is, when it fits, and when I'd still reach for a vector DB.
March 7, 2026 (5d ago)
11 min read
I've been wiring embeddings and vector DBs into every AI feature. Then I saw this: no vectors. No embeddings. Just memory.
In March 2026, Google PM Shubham Saboo open-sourced the "Always On Memory Agent" on Google Cloud's GitHub—free to use (MIT license), and built so you can run it in production. It solves a problem a lot of us have: an AI that can take in new information over time, remember it, and use it later—without using a vector database at all.
I went through the repo and the buzz around it. Here's what you need to know in plain terms.
Quick glossary (so we're on the same page):
The Always On Memory Agent skips the vector DB and embeddings. The AI itself reads what comes in, thinks about what to keep, and writes structured memory into a normal database (SQLite). So: technical, but the idea is simple.
I was building a small internal copilot—something that could remember what we discussed in previous sessions and bring that context back. My default was the usual recipe: turn conversations into embeddings, put them in a vector DB, and at query time run a similarity search. It worked. But I was also maintaining an embedding pipeline, chunking logic, index updates, and sync between our app and the vector store. For a small team and a few thousand messages, it felt like too much.
Then I read the Always On Memory Agent repo. No vector database. No embeddings. The AI reads, thinks, and writes structured memory into SQLite. I cloned it, ran it, and in about an hour I had something that remembered—without a single embedding call.
That's when it clicked: for a certain kind of agent, we've been overbuilding.
The Always On Memory Agent runs all the time. It takes in files or data from an API—text, images, audio, video, PDFs—and saves structured memories into SQLite (a single file on disk, no separate server). On a schedule (every 30 minutes by default), it runs a memory consolidation step: the AI re-reads what it has stored, merges duplicates, drops noise, and keeps the store tidy.
Think of it like this: instead of you building a system that "fingerprints" everything and then searches by similarity, the AI is in charge. It decides what to store, how to organize it, and what to pull up when you ask.
The repo's tagline says it all: "No vector database. No embeddings. Just an LLM that reads, thinks, and writes structured memory."
It's built with Google's Agent Development Kit (ADK) and Gemini 3.1 Flash-Lite—a low-cost, fast model Google released in early March 2026. That choice is on purpose: an agent that runs 24/7 and keeps consolidating memory needs cheap, predictable calls. Flash-Lite is built for high-volume tasks (translation, moderation, etc.) and is priced so that "always on" doesn't blow the budget.
So in practice: simpler stack. No embedding pipeline. No vector store. No index to keep in sync. Just the model and SQLite.
Under the hood, the repo uses a small set of jobs:
The trade: you give up semantic search over raw embeddings. You gain simplicity and one place where the AI maintains its own memory.
Here’s the difference in how things flow.
Traditional way (what I used to do):
You ingest documents or messages. You chunk them (split into pieces, tune size and overlap). You call an embedding API for every chunk. You store those vectors in something like Pinecone or pgvector. When the user asks a question, you embed the question, run a similarity search, get the top results, and feed them into the LLM. You also have to keep the index in sync when new data arrives.
It works. But it’s a lot of pieces: embedding cost grows with data, and you own all the retrieval design.
Always On way:
You ingest documents or messages. The AI reads them and writes structured memories to SQLite. Every 30 minutes, the AI consolidates: merge, dedupe, summarize. When the user asks something, the agent reads from its own memory and answers.
One process. One database. No embedding pipeline. The model is the retriever.
I'm not saying one is "better." They're different tools. For bounded memory and simpler use cases, Always On can replace the whole vector stack. For large-scale semantic search or strict compliance, you'll still want the traditional approach—or a mix of both.
I've shipped RAG and agent memory with embeddings and vector DBs. It works, but the plumbing is real. The Always On approach moves that complexity into the model. The AI decides what to store, how to structure it, and what to surface when you query.
Where I'd consider it:
So yes—for a certain class of agents, this can replace the vector DB. Not only in theory; in practice, as a reference implementation you can clone and adapt.
The line that stuck with me: removing the vector DB doesn't remove retrieval design; it just moves where the complexity lives.
When memory gets very large, you still have to chunk, organize, and retrieve. In the Always On design, that job sits in the model and the consolidation loop. That works well at small-to-medium scale. At large scale—millions of facts, long history, many users—you may still want explicit indexing, stricter retrieval controls, and lifecycle tooling. Vector DBs (or hybrid setups) give you that.
I'd also keep vector search when:
So: replace vector DBs when the agent's memory is bounded and the goal is simplicity. Keep them (or add them) when scale or governance demands it.
A lot of the discussion around the launch wasn't about speed—it was about drift and control. An agent that "dreams" and merges memories in the background without clear rules can become a compliance nightmare. The real cost of always-on agents isn't just tokens; it's drift and feedback loops.
The real question for enterprises: not whether the agent can remember, but whether it can remember in ways that stay bounded, inspectable, and safe enough to trust in production. The open-source agent is a strong starting point; it's not yet a full governance story. Treat it as a template and add policy, retention, and audit on your side if you need them.
The biggest mistake would be treating Always On as the default for every AI memory use case. It's not. It's great for bounded, always-on agents. It's the wrong tool when you need semantic search over a huge corpus or strict retrieval guarantees. Pick the right tool for the problem.
Consolidation is where the model "thinks" and merges memory. If you don't tune or at least understand that step, you might get surprising merges, lost details, or drift. Read the prompts. Watch what gets written. Adjust if needed.
If you're in a regulated industry or care about audit, design policy early. What can be written? What gets retained? Who can delete? The repo gives you the mechanics; you add the boundaries.
After building with both patterns, here's how I choose:
Choose the Always On pattern (or this repo) if:
Stick with (or add) a vector DB if:
Consider a hybrid if:
gemini/agents/always-on-memory-agent. There's a local HTTP API and a Streamlit dashboard. Get it running locally first.Google's Always On Memory Agent doesn't replace vector DBs everywhere. It gives you a simpler option when your agent's memory is bounded and you're okay with the model owning read/think/write. For that class of use cases, I've stopped assuming every agent needs embeddings and a vector store. I try Always On first; I reach for vectors when the problem demands it.
At Elephaant, we care about shipping AI features without overbuilding. The Always On Memory Agent is a sign of where agent infra is going: sometimes the right answer isn't more databases—it's a model that can read, think, and write memory on its own. Know when that fits, and when it doesn't. Then build.
Clone the repo. Run it. See how it feels. I think you'll be surprised—I was.
Will you try it?
The right stack in 2026 isn't just fast—it's AI-native. From frontend to backend, here's the tech stack I use and recommend to build products that leverage AI from day one.
Anthropic's $15–25/PR Code Review is live. Should you use it? How do you control cost? Is your code used for training? We answer the questions every developer actually has—including the ones that aren't in the marketing.
2026 is the year AI becomes truly accessible to every developer. Learn why building with AI isn't just trendy - it's essential. Discover practical projects, tools, and strategies from my experience building Elephaant.