ELEPHAANT

Table of Contents

  • The Moment I Stopped Assuming Every Agent Needs Vectors
  • What It Actually Is (In Plain Terms)
  • Under the Hood: Ingestion, Consolidation, Query
  • Traditional Stack vs Always On: Before and After
  • Why This Could Replace Vector DBs (For Some Use Cases)
  • What I Like About This Approach
  • What to Watch Out For
  • Where I'd Still Use a Vector DB
  • The Governance Question Everyone's Asking
  • Common Mistakes (I'm Trying to Avoid)
  • Mistake #1: Using It for Everything
  • Mistake #2: Ignoring Consolidation Behavior
  • Mistake #3: Skipping Governance Until Later
  • The Decision Framework
  • Getting Started: My Actual Advice
  • The Bottom Line
#ai#development#tools

Google's Always-On Memory Agent: Why It Might Replace Your Vector DB in 2026

Google open-sourced an agent that keeps persistent memory without vector databases or embeddings—just an LLM that reads, thinks, and writes. Here's what it is, when it fits, and when I'd still reach for a vector DB.

March 7, 2026 (5d ago)

11 min read

I've been wiring embeddings and vector DBs into every AI feature. Then I saw this: no vectors. No embeddings. Just memory.

In March 2026, Google PM Shubham Saboo open-sourced the "Always On Memory Agent" on Google Cloud's GitHub—free to use (MIT license), and built so you can run it in production. It solves a problem a lot of us have: an AI that can take in new information over time, remember it, and use it later—without using a vector database at all.

I went through the repo and the buzz around it. Here's what you need to know in plain terms.

Quick glossary (so we're on the same page):

  • Vector DB = A database that stores text as "fingerprints" (numbers) so you can search by meaning instead of exact words. Great for "find things like this."
  • Embedding = That fingerprint: an API turns your text into a list of numbers. Similar text → similar numbers.
  • RAG = "Retrieval-augmented generation": you store docs, search them when the user asks something, and feed the results into the LLM so it can answer from your data.

The Always On Memory Agent skips the vector DB and embeddings. The AI itself reads what comes in, thinks about what to keep, and writes structured memory into a normal database (SQLite). So: technical, but the idea is simple.

The Moment I Stopped Assuming Every Agent Needs Vectors

I was building a small internal copilot—something that could remember what we discussed in previous sessions and bring that context back. My default was the usual recipe: turn conversations into embeddings, put them in a vector DB, and at query time run a similarity search. It worked. But I was also maintaining an embedding pipeline, chunking logic, index updates, and sync between our app and the vector store. For a small team and a few thousand messages, it felt like too much.

Then I read the Always On Memory Agent repo. No vector database. No embeddings. The AI reads, thinks, and writes structured memory into SQLite. I cloned it, ran it, and in about an hour I had something that remembered—without a single embedding call.

That's when it clicked: for a certain kind of agent, we've been overbuilding.

What It Actually Is (In Plain Terms)

The Always On Memory Agent runs all the time. It takes in files or data from an API—text, images, audio, video, PDFs—and saves structured memories into SQLite (a single file on disk, no separate server). On a schedule (every 30 minutes by default), it runs a memory consolidation step: the AI re-reads what it has stored, merges duplicates, drops noise, and keeps the store tidy.

Think of it like this: instead of you building a system that "fingerprints" everything and then searches by similarity, the AI is in charge. It decides what to store, how to organize it, and what to pull up when you ask.

Always On: how it works
Ingest (files / API)
→
Store in SQLite
→
Consolidate (every 30 min)
→
Query

The repo's tagline says it all: "No vector database. No embeddings. Just an LLM that reads, thinks, and writes structured memory."

It's built with Google's Agent Development Kit (ADK) and Gemini 3.1 Flash-Lite—a low-cost, fast model Google released in early March 2026. That choice is on purpose: an agent that runs 24/7 and keeps consolidating memory needs cheap, predictable calls. Flash-Lite is built for high-volume tasks (translation, moderation, etc.) and is priced so that "always on" doesn't blow the budget.

So in practice: simpler stack. No embedding pipeline. No vector store. No index to keep in sync. Just the model and SQLite.

Under the Hood: Ingestion, Consolidation, Query

Under the hood, the repo uses a small set of jobs:

  • Ingestion: New files or API input arrive; the system processes them and writes structured memories.
  • Consolidation: Every 30 minutes (you can change this), the AI re-reads memory, merges duplicates, drops junk, and keeps the store manageable. This is the "read, think, write" loop.
  • Query: When you ask something, the agent looks at its own memory store—no vector search, no similarity score. The model decides what's relevant.

The trade: you give up semantic search over raw embeddings. You gain simplicity and one place where the AI maintains its own memory.

Traditional Stack vs Always On: Before and After

Here’s the difference in how things flow.

Traditional way (what I used to do):

You ingest documents or messages. You chunk them (split into pieces, tune size and overlap). You call an embedding API for every chunk. You store those vectors in something like Pinecone or pgvector. When the user asks a question, you embed the question, run a similarity search, get the top results, and feed them into the LLM. You also have to keep the index in sync when new data arrives.

It works. But it’s a lot of pieces: embedding cost grows with data, and you own all the retrieval design.

Always On way:

You ingest documents or messages. The AI reads them and writes structured memories to SQLite. Every 30 minutes, the AI consolidates: merge, dedupe, summarize. When the user asks something, the agent reads from its own memory and answers.

One process. One database. No embedding pipeline. The model is the retriever.

Traditional (vector DB + embeddings)
  • Chunk documents
  • Call embedding API
  • Store vectors in DB
  • On query: embed query, search vectors
  • Feed results to LLM
  • Keep index in sync
Always On
  • Ingest documents
  • LLM writes to SQLite
  • Every 30 min: LLM consolidates
  • On query: LLM reads its memory
  • One process, one DB

I'm not saying one is "better." They're different tools. For bounded memory and simpler use cases, Always On can replace the whole vector stack. For large-scale semantic search or strict compliance, you'll still want the traditional approach—or a mix of both.

Why This Could Replace Vector DBs (For Some Use Cases)

I've shipped RAG and agent memory with embeddings and vector DBs. It works, but the plumbing is real. The Always On approach moves that complexity into the model. The AI decides what to store, how to structure it, and what to surface when you query.

Where I'd consider it:

  • Bounded-memory agents: Support bots, research assistants, internal copilots, workflow automation. Memory stays in a manageable size; consolidation keeps it from growing out of control.
  • Prototypes and side projects: Less infra. One process, one SQLite file, one model. Ship faster and see if the product works before committing to a full retrieval stack.
  • Cost sensitivity: Embedding APIs and vector DBs add up. If your memory footprint is modest, letting the model do read/think/write can be cheaper and simpler.

So yes—for a certain class of agents, this can replace the vector DB. Not only in theory; in practice, as a reference implementation you can clone and adapt.

What I Like About This Approach

  • No embedding pipeline: No chunking strategy, no embedding API, no index to maintain. One less system to debug.
  • Single source of truth: Memory lives in SQLite. You can open it, back it up, version it. No separate vector store to keep in sync.
  • Cheap to run: Flash-Lite is priced for volume. A 24/7 consolidation loop is realistic without burning budget.
  • Fast to try: Clone the repo, point it at your data, run. You get a working memory agent in minutes.

What to Watch Out For

  • Scale: The repo doesn't claim to solve million-fact memory. At some point, retrieval design matters—whether that's vectors, keyword indexes, or something else. For small-to-medium memory, you're in good shape.
  • Governance: Who can write? What gets merged? How long is it kept? The reference implementation doesn't ship with enterprise policy controls. You add those if you need them.
  • Drift and loops: People have pointed out that the real cost of always-on agents can be drift (memory slowly changing in unwanted ways) and feedback loops. Worth keeping in mind when you design consolidation and retention.

Where I'd Still Use a Vector DB

The line that stuck with me: removing the vector DB doesn't remove retrieval design; it just moves where the complexity lives.

When memory gets very large, you still have to chunk, organize, and retrieve. In the Always On design, that job sits in the model and the consolidation loop. That works well at small-to-medium scale. At large scale—millions of facts, long history, many users—you may still want explicit indexing, stricter retrieval controls, and lifecycle tooling. Vector DBs (or hybrid setups) give you that.

I'd also keep vector search when:

  • Semantic search is the product: The user types a query and you need fast, relevance-ranked results over a big corpus. That's what vector DBs are built for.
  • Compliance and audit matter: Persistent agent memory raises hard questions. Who can write? What gets merged? How long is it kept? How do you audit what the agent "learned"? The repo doesn't (yet) spell out enterprise-grade policy or retention. If you're in a regulated space, you may want stricter control—and that often means a design you own, not a black-box consolidation step.

So: replace vector DBs when the agent's memory is bounded and the goal is simplicity. Keep them (or add them) when scale or governance demands it.

The Governance Question Everyone's Asking

A lot of the discussion around the launch wasn't about speed—it was about drift and control. An agent that "dreams" and merges memories in the background without clear rules can become a compliance nightmare. The real cost of always-on agents isn't just tokens; it's drift and feedback loops.

The real question for enterprises: not whether the agent can remember, but whether it can remember in ways that stay bounded, inspectable, and safe enough to trust in production. The open-source agent is a strong starting point; it's not yet a full governance story. Treat it as a template and add policy, retention, and audit on your side if you need them.

Common Mistakes (I'm Trying to Avoid)

Mistake #1: Using It for Everything

The biggest mistake would be treating Always On as the default for every AI memory use case. It's not. It's great for bounded, always-on agents. It's the wrong tool when you need semantic search over a huge corpus or strict retrieval guarantees. Pick the right tool for the problem.

Mistake #2: Ignoring Consolidation Behavior

Consolidation is where the model "thinks" and merges memory. If you don't tune or at least understand that step, you might get surprising merges, lost details, or drift. Read the prompts. Watch what gets written. Adjust if needed.

Mistake #3: Skipping Governance Until Later

If you're in a regulated industry or care about audit, design policy early. What can be written? What gets retained? Who can delete? The repo gives you the mechanics; you add the boundaries.

The Decision Framework

After building with both patterns, here's how I choose:

Choose the Always On pattern (or this repo) if:

  • You're building a support bot, research assistant, internal copilot, or workflow agent.
  • Memory is bounded (thousands, not millions, of facts).
  • You want to ship fast with minimal infra.
  • Cost and simplicity matter more than maximum retrieval precision at scale.

Stick with (or add) a vector DB if:

  • Semantic search over a large corpus is the product.
  • You need strict control over indexing, retention, and audit.
  • Memory scale is large or multi-tenant and you need explicit retrieval design.
  • You're already invested in a RAG stack that works and don't need to simplify.

Consider a hybrid if:

  • You want Always On for agent state and a vector DB for document/search. They can coexist.

Getting Started: My Actual Advice

  1. Clone and run: The repo is on Google Cloud Platform's generative-ai GitHub under gemini/agents/always-on-memory-agent. There's a local HTTP API and a Streamlit dashboard. Get it running locally first.
  2. Feed it something real: Point it at a small set of docs or a sample conversation log. Watch how it ingests and consolidates. Open the SQLite file and see what's in there.
  3. Compare to your current stack: If you already have a vector DB for a small agent, try the same use case with Always On. See which one is simpler to operate and reason about.
  4. Add boundaries if you need them: Retention windows, write permissions, audit logs—design these in if your product or compliance requires it.

The Bottom Line

Google's Always On Memory Agent doesn't replace vector DBs everywhere. It gives you a simpler option when your agent's memory is bounded and you're okay with the model owning read/think/write. For that class of use cases, I've stopped assuming every agent needs embeddings and a vector store. I try Always On first; I reach for vectors when the problem demands it.

At Elephaant, we care about shipping AI features without overbuilding. The Always On Memory Agent is a sign of where agent infra is going: sometimes the right answer isn't more databases—it's a model that can read, think, and write memory on its own. Know when that fits, and when it doesn't. Then build.

Clone the repo. Run it. See how it feels. I think you'll be surprised—I was.

Will you try it?

Related Articles

  • Best Tech Stack to Build Anything in 2026 Using AI

    The right stack in 2026 isn't just fast—it's AI-native. From frontend to backend, here's the tech stack I use and recommend to build products that leverage AI from day one.

  • Anthropic Code Review for Claude Code: The Developer Questions Nobody's Answering

    Anthropic's $15–25/PR Code Review is live. Should you use it? How do you control cost? Is your code used for training? We answer the questions every developer actually has—including the ones that aren't in the marketing.

  • Why You Should Build Something with AI in 2026: A Developer's Guide to the Future

    2026 is the year AI becomes truly accessible to every developer. Learn why building with AI isn't just trendy - it's essential. Discover practical projects, tools, and strategies from my experience building Elephaant.

ELEPHAANT