Every morning I open my inbox, and sitting at the top is an email that didn't exist 24 hours ago. It tells me what happened overnight in AI, flags the one TechCrunch story worth reading, and reminds me there's a free art walk in the Mission this weekend. It reads like a note from a well-informed friend — not an algorithm.
I built this myself. Three times, actually. Each iteration taught me something different — not just about code, but about a word I kept seeing everywhere and didn't fully understand: agent.
This post is about that word. I'm going to walk you through three versions of the same project — a personal daily news digest — and use the progression to explore what "agency" actually means in the context of AI systems. Not the marketing definition. The practical one.
A companion post, Part 2: The Technical Playbook, covers the implementation details, code, bugs, and cost analysis for anyone who wants to build something similar.
The Itch I Was Trying to Scratch
I read a lot. Every morning I'd have tabs open across Simon Willison's blog, the TLDR newsletter in my inbox, TechCrunch, Product Hunt, and Lenny's Newsletter. I also wanted to know about cheap events happening in SF over the weekend.
The problem was that it was taking me 20–30 minutes every morning just to triage these sources before I even started reading the interesting stuff. I wanted a single email in my inbox at 7am that had already done the triage for me — curated, not comprehensive.
Commercial tools like Feedly or Morning Brew exist, but they don't know me. They can't combine my exact set of sources, apply my personal interest filters (AI agents > VC funding > SF events), or write in the voice of a knowledgeable friend who tells you only what actually matters.
So I built it. And in doing so, I accidentally designed an experiment in what "agency" really means.
v1.0 — "It Works, Ship It"
The first version was the simplest thing that could possibly work: a single Python script running on a GitHub Actions cron job. Every morning at 7am PST, it would fetch RSS feeds and Gmail via IMAP, call a free LLM through OpenRouter to summarize each source, assemble an HTML email, and send it via Resend.
It worked. Emails arrived reliably. Cost was zero. And honestly, for a weekend project, that felt like a win.
But the output was... flat. If TechCrunch had ten boring articles, all ten got summarized. If Simon Willison and TLDR both covered the same Anthropic announcement, they appeared as separate items with no awareness of each other. The digest felt like a firehose, not a morning briefing.
Here's the question that nagged me: was this an "agent"? It ran automatically. It made decisions (sort of). It acted on my behalf every morning without me touching it.
The answer, I'd later realize, is no — and understanding why took me through two more versions.
The Word Everyone Uses and Nobody Agrees On
"Agent" is one of those terms that has been stretched to mean almost anything. Your email autoresponder is an "agent." Your Roomba is an "agent." The thing that books your flights autonomously and negotiates hotel rates — also an "agent." These are clearly not the same thing.
When I started this project, I had a misconception that I think a lot of people share: I equated automatic with autonomous. If something acts on your behalf without you pressing a button, it's an agent, right?
Not really. My thermostat acts on my behalf automatically, but nobody calls it an agent (at least not in the sense that's making waves in AI right now). A cron job that fires every morning is automatic. A macro that reformats your spreadsheet is automatic. But automatic is not the same as autonomous, and deterministic is not the same as agentic.
This is the insight I missed early on. It's not about the ingredients. You can have an LLM in your pipeline, combine it with RAG and prompt engineering and web search, and still not have an agentic system. What makes it agentic isn't what tools you use — it's the role the LLM plays in your workflow. Is it a tool being called, or is it the one making the calls?
The Spectrum of Agency
I found a framework that helped me think about this more clearly. Sam Bhagwat describes levels of agency on a spectrum, and mapping my three versions onto it was revealing:
At the low end, agents make binary choices in a decision tree. At a medium level, agents have memory, call tools, and retry failed tasks. At a high level, agents do planning — they divide tasks into subtasks and manage their own task queue.
My v1.0 was barely on this spectrum at all. It was purely automatic: a deterministic pipeline where every step was pre-programmed. The LLM was a summarization tool, nothing more.
v1.5 — Adding a Curation Brain
The first meaningful improvement was adding a curation layer. After fetching all sources, the script would make one additional LLM call with all the raw content and ask it to score each item on a 0-to-1 relevance scale based on a user profile I'd defined in a YAML config file. Items below a 0.6 threshold got cut. High-scoring items got longer summaries. The LLM also wrote a 2–3 sentence editorial intro framing the day's top theme.
The results were noticeably better. The digest felt more like reading a friend's curated picks than a raw dump of everything published that day. Funcheap SF events about art walks in Oakland got filtered out (low relevance for someone interested in AI). The editorial intro surfaced cross-source themes like "OpenClaw is dominating the conversation today."
The pipeline went from fetch → summarize → send to fetch → curate → summarize (tiered) → send. A new user_profile.yaml file defined my interests, and the LLM used it to make relevance decisions. But every decision point was still pre-programmed in Python. The LLM was doing more sophisticated work, but it was still being told exactly when and how to do it.
Was v1.5 an agent? It was closer. The LLM was now making editorial decisions — which items to include, how much detail to give each one, what theme to highlight. But those decisions were happening at fixed, pre-determined points in a rigid pipeline. If TechCrunch's RSS URL changed, the whole section silently broke. There was no ability to adapt, recover, or try something different.
On the spectrum of agency, v1.5 sits at "tool-augmented." The LLM makes some choices, but only at pre-defined checkpoints. It's smarter, but it's still on rails.
v2.0 — Letting the Agent Decide
Version 2 was a fundamentally different architecture. Instead of a hand-coded pipeline telling the LLM what to do at each step, I gave the agent a system prompt, a set of CLI tools, and a goal — then let it figure out the workflow on its own.
This version uses the Claude Agent SDK with Claude Sonnet as the orchestrator. The agent has access to five CLI tools — RSS fetcher, IMAP fetcher, Exa web search, email sender, and a logging tool — and a system prompt that describes the task, editorial standards, and how to handle failures.
The difference in output was striking. Where v1.5 would produce a digest organized by source (Simon Willison section, TLDR section, TechCrunch section), v2 groups items by theme. If three different sources all covered the same story, the agent synthesizes them into a single item with context from each. It writes an editorial intro that doesn't just summarize — it offers an opinion on what's worth your time.
But the most interesting difference isn't in the output — it's in the behavior. When a source fails, the agent can search the web for equivalent coverage using Exa's neural search. When it encounters a newsletter that's paywalled or poorly structured for scraping (I'm looking at you, Lenny's Newsletter), it notes the limitation and moves on instead of producing garbage. When multiple sources cover the same story, it genuinely deduplicates and synthesizes instead of repeating itself.
And the most telling detail: when things went wrong on one morning's run (IMAP authentication broke, RSS parsing hit an error, web search returned empty), the agent adapted its strategy, produced a digest from what it could access, and included a yellow notice explaining the technical issues. It made a judgment call about what was still worth sending — and it was the right call.
So What Does Agency Actually Buy You?
After running all three versions, here's what I think the real value of the "agentic layer" is. It's not about making things automatic — v1 was already automatic. It's about three specific capabilities:
1. Adaptive error handling
In a deterministic pipeline, if step 3 fails, the pipeline fails (or silently skips). In an agentic system, the agent can notice the failure, try an alternative approach, and decide what's still worth producing. This is genuinely useful for anything that depends on flaky external sources — which is basically every real-world data pipeline.
2. Cross-source synthesis
My v1 and v1.5 digests were organized by source because the pipeline processed each source independently. The agent in v2 sees all the content at once and groups it by theme. This sounds small, but it's the difference between reading seven separate summaries and reading a coherent briefing. This is editorial judgment, and the LLM is genuinely good at it when you give it the right context and instructions.
3. Editorial voice and judgment
The v2 system prompt tells the agent to write like "a knowledgeable friend who reads everything and tells you only what actually matters." It's a simple instruction, but it produces output that feels qualitatively different from a summarization pipeline. The agent's intro to one morning's digest read: "OpenClaw is dominating the conversation today, with practical implementation guides competing against skeptical takes on whether the hype matches reality. The more interesting thread runs through cognitive debt." That's not summarization — that's curation with a point of view.
What Stronger Agency Would Look Like
My v2 sits at the "adaptive" to "planning" level on the agency spectrum. The agent calls tools, handles errors, and makes editorial decisions. But it doesn't learn from my behavior over time, and it doesn't set its own goals.
A v3 — if I build it — would move further right on the spectrum. It would track which digest items I actually click on and adjust future relevance scoring. It would notice that I stopped reading TechCrunch funding stories and dial them down. It might proactively suggest new sources based on topics I've been engaging with. It would have memory across runs, not just within a single execution.
At the far right of the spectrum — true autonomous agency — the system would set its own information-gathering goals, seek out sources I've never heard of, and potentially even draft analysis or talking points based on emerging trends it noticed before I did. We're not there yet, and for a personal morning digest, we probably don't need to be. But mapping your project onto this spectrum is a useful exercise for knowing what level of complexity is actually warranted.
What I Actually Learned
I started this project wanting a better morning email. I ended up with a working mental model for something I see debated constantly in AI circles: what counts as an agent and what doesn't.
The framework I'd offer to anyone building with LLMs is simple: before reaching for an agentic architecture, ask yourself what role the LLM needs to play. If it's filling a fixed slot in your pipeline — summarize this, classify that, extract these fields — you don't need an agent. A well-designed deterministic pipeline with an LLM step will be cheaper, faster, and more predictable.
But if your workflow needs to adapt to unpredictable inputs, make judgment calls about quality, synthesize across multiple sources, or gracefully handle failure — that's where agency starts earning its keep. Not because it's fancier, but because the alternative is writing increasingly brittle if/else trees that try to anticipate every edge case.
The honest answer is that most workflows probably don't need agents today. But the ones that do benefit enormously — and learning to recognize the difference is a skill worth developing.
"Agent" is not a binary. It's a spectrum. And the right question isn't "should I use an agent?" — it's "what level of agency does my problem actually require?" Start with the simplest thing that works, then add agency where it demonstrably improves outcomes. I know this because I built the simple thing first, and the progression taught me exactly where agency mattered and where it was overkill.