Applied AI Thinking for Operators · Part 2 of 2

The Technical Playbook: Building a Personal AI Digest from Scratch

Claude Agent SDK, Exa Search, free vs. paid models, every bug I hit, and the architectural decisions that made it work. Full code included.

In Part 1, I explored what "agency" actually means by building three versions of a personal daily digest — each with a different level of autonomy. This post is the technical companion: how I built it, what tools I used, every bug I hit, and the trade-offs I navigated along the way.

If you want to build something similar — or if you're evaluating whether agentic architectures are worth the complexity for your own projects — this is the honest playbook.

The Stack: What I Used and Why

Here's how the tech stack evolved across three versions. The progression tells a story about the trade-offs between cost, quality, and complexity:

Component v1.0 v1.5 v2.0
Orchestration Hand-coded pipeline Hand-coded + curation layer Claude Agent SDK
LLM OpenRouter free tier Claude Haiku (direct) Claude Sonnet (agent) + Haiku (fallback)
RSS feedparser feedparser tools/fetch_rss.py
Email fetch imaplib imaplib tools/fetch_imap.py
Web search None DuckDuckGo scraping (broken in CI) Exa Search API (neural)
Email send Resend Resend Resend
Scheduling GitHub Actions cron GitHub Actions cron GitHub Actions cron
Fallback None None fallback.py (auto-triggered)
Cost/day $0 ~$0.018 ~$0.03–0.05

A few decisions deserve explanation.

Free vs. Paid Models: The $0.018/Day That Changed Everything

v1 used OpenRouter's free tier to access open-source models like Llama 3.3 70B, DeepSeek R1, and Gemma 3. Free is appealing. But in practice, the free tier was a source of constant friction: rate limits kicked in unpredictably mid-pipeline, models went offline for maintenance, and quality was inconsistent between model families.

Free models comparison showing 429 rate limit errors across all sections
A bad day on the free tier: every single section hit OpenRouter's 429 rate limit. The entire digest was empty. This happened unpredictably and was the final straw that pushed me to paid APIs.

Switching to Anthropic's API directly with Claude Haiku cost about $0.018/day — but the difference was night and day. All the links actually worked, the summaries were coherent and properly structured, and inference was fast and consistent. On $25 in Anthropic credits, estimated runway is roughly 3.8 years. For a personal project, that's effectively free.

Advice I wish I'd taken earlier (and which echoes what Sam Bhagwat recommends): start with a hosted provider like Anthropic, OpenAI, or Google Gemini. Even if you think you'll need open-source eventually, prototype with cloud APIs first, or you'll be debugging infrastructure issues instead of iterating on your actual product. Once you get something working, you can optimize cost later.
~$6.50/year
Total cost of Claude Haiku for daily digest runs. On $25 credit, that's ~3.8 years of runway.

The Search API Landscape: Why Exa Won

Web search was one of the trickiest components. I needed it for two things: filling gaps when RSS sources failed, and finding coverage of stories that weren't in my usual sources.

My first attempt used DuckDuckGo HTML scraping — which worked perfectly on my laptop and completely broke in GitHub Actions. The reason: GitHub Actions IP ranges are well-known bot traffic sources, and DuckDuckGo (reasonably) blocks them with 403 errors. Lesson learned: never rely on HTML scraping for anything that runs in CI.

I looked at Brave Search API next — but discovered they'd recently discontinued their free tier. Current pricing starts at $3 per 1,000 queries with no free option.

That left Exa, which turned out to be the best option by a wide margin. Exa offers 1,000 free neural search queries per month (more than enough for 3 searches/day), returns content summaries alongside results (not just titles and URLs), and uses semantic search rather than keyword matching. For finding conceptually relevant AI news, semantic search is noticeably better.

Here's the core search implementation (full source):

def search_exa(query: str, limit: int = 5) -> list[dict]:
    resp = requests.post(
        "https://api.exa.ai/search",
        headers={"x-api-key": api_key, "Content-Type": "application/json"},
        json={
            "query": query,
            "numResults": limit,
            "useAutoprompt": True,   # Exa rewrites your query for neural search
            "type": "neural",        # semantic search, not keyword
            "contents": {"summary": {"query": query}},  # get snippets back
        },
        timeout=15,
    )

The v2 Agent Architecture: System Prompt as the Brain

The most interesting architectural decision in v2 is that the agent's intelligence lives in a text file — config/system_prompt.txt — not in Python code. The system prompt tells the agent what tools it has and how to call them, the editorial standards to maintain, and how to handle failures. The Python code (agent.py) is just plumbing that launches the agent and catches crashes.

v1.0 architecture diagram — simple linear pipeline
v1.0: A simple linear pipeline. The LLM is called at one fixed point to summarize. Every step is hardcoded in Python.
v1.5 architecture diagram — pipeline with curation layer
v1.5: Same pipeline structure, but with a curation layer and user profile config. The LLM does more work, but still at pre-determined points.
v2.0 architecture diagram — agent orchestrator with tool access
v2.0: Fundamentally different. The agent decides which tools to call, in what order. The system prompt and user profile are its "brain." A fallback pipeline catches agent failures.

This separation means you can improve the digest quality by editing a text file, not rewriting code. That's a meaningful shift in how you maintain an LLM application. When I wanted the agent to stop including generic Product Hunt listings, I added one line to the system prompt. When I wanted it to be more opinionated in its editorial intro, I tweaked the voice description. No code changes, no redeployment (the system prompt is read at runtime).

The other key config is config/user_profile.yaml:

interests:
  high_priority:
    - AI agents and agentic systems
    - LLM application architecture and best practices
    - AI research breakthroughs and new model releases
  medium_priority:
    - venture capital and startup funding rounds
    - product management practices
  low_priority:
    - general tech news (only if significant)
    - cheap/free SF activities

content_rules:
  max_digest_items: 20
  min_relevance_score: 0.6
  prefer_themes_over_sources: true

The Fallback Pattern: Why Every Agent Needs a Dumb Backup

This might be the most important architectural decision in the entire project, and it's the one I'd recommend to anyone building agentic systems: build the deterministic fallback first.

fallback.py is a complete, standalone pipeline that produces an acceptable digest without any agent involvement. It uses Claude Haiku directly (no Agent SDK), follows the same source list, and outputs a properly formatted HTML email. If the agent crashes, hangs, or produces garbage, the fallback runs automatically.

The GitHub Actions workflow implements this cleanly:

# .github/workflows/digest.yml
- name: Run agent
  id: agent
  run: python agent.py
  continue-on-error: true   # don't fail the workflow if agent fails

- name: Run fallback if agent failed
  if: steps.agent.outcome == 'failure'
  run: python fallback.py

Building the fallback first had an unexpected benefit: it gave me confidence to experiment aggressively with the agent layer. I could try wild system prompt changes, swap models, add new tools — knowing that if anything broke, tomorrow's email would still arrive.

Every Bug That Bit Me (The Honest Account)

This section is the most valuable for anyone trying to replicate this. These are real problems I encountered, and some of them were surprisingly subtle.

Bug 1: Python's email module and lazy loading

Symptom: AttributeError: module 'email' has no attribute 'message' when parsing Gmail messages.

Python's email package uses lazy loading. When you write import email, it makes the package available, but not all its submodules. If you use a type annotation like email.message.Message, Python tries to resolve the submodule at function definition time — and fails because it hasn't been explicitly imported.

# This breaks:
import email
def _extract_body(msg: email.message.Message) -> str:  # AttributeError!
    ...

# This works:
import email
import email.message  # explicit submodule import
def _extract_body(msg: email.message.Message) -> str:
    ...

Bug 2: TechCrunch's broken RSS feed

Symptom: Failed to parse feed: not well-formed (invalid token) for the TechCrunch venture feed.

The topic-specific TechCrunch feed at /tag/venture/feed/ returns malformed XML. It's not a feedparser bug — the feed itself is broken. The fix was simple: use the main feed (techcrunch.com/feed/) instead and let the curation layer filter for relevant stories. Not all RSS feeds are valid XML, even from major publishers. Always test your feed URLs with curl before committing them.

Bug 3: DuckDuckGo blocks GitHub Actions IPs

Symptom: Web search returned empty results in production but worked locally.

DuckDuckGo detects and blocks traffic from GitHub Actions IP ranges. This failed silently — no error, just empty results. Replaced with Exa Search API which uses proper API authentication and works everywhere. The broader lesson: never rely on HTML scraping for anything that runs in CI/CD.

Bug 4: Agent SDK hangs on machines without AVX CPU support

Symptom: Running agent.py locally hung indefinitely on the BashTool pre-flight check.

The Claude Agent SDK's BashTool runs a security check using a bundled binary that requires AVX CPU instructions. On machines without AVX, it hangs instead of failing with a clear error. Workaround: use fallback.py --dry-run for local previews. In GitHub Actions (Linux x64 with full AVX support), the agent runs fine.

Watch out: Unintended API spending

When I told the agent to start using my Anthropic API credits, it switched to a paid Claude model within OpenRouter rather than using the Anthropic API directly. This incurred charges on my OpenRouter account — which I hadn't attached a payment method to, resulting in a negative balance. The amount was negligible this time, but it could have been significant with higher usage. The lesson: be explicit about which API endpoint and billing account to use, and set spending limits as guardrails. Agents will take the most direct path to fulfilling your request, which may not be the path you intended.

Content Sources: What Worked and What Didn't

Simon Willison's blog was the easiest source — clean RSS feed, consistent format, high signal-to-noise ratio. RSS is genuinely the best protocol for automated content consumption and I wish more publishers maintained theirs.

TLDR newsletter worked well through IMAP (pulling from Gmail), though parsing email HTML into structured content required some effort.

TechCrunch was middling — the main RSS feed works, but topic-specific feeds are unreliable.

Lenny's Newsletter was the most frustrating. The content is paywalled and nested under several layers of Substack's link structure. The agent's assessment in one run was accurate: "I can see this is Lenny's Newsletter's introduction/masthead, but it doesn't contain the actual content." Newsletters that aren't structured for programmatic access are inherently difficult for automated digestion.

Product Hunt and Funcheap SF returned usable data but required aggressive curation — most items weren't relevant, which is exactly the kind of filtering the curation layer and agent were designed to handle.

GitHub Actions: The Best Free Server You're Not Using

The entire project runs on GitHub Actions — no EC2, no Lambda, no Render, no Vercel. For personal automations that run once or twice a day, GitHub Actions is genuinely underrated as free infrastructure. Free tier limits: 2,000 minutes/month for private repos, unlimited for public repos. My daily run takes 2–4 minutes, so even on a private repo I'd use roughly 120 minutes/month — well under the limit.

Cost Analysis

Version Daily cost Annual cost What you get
v1.0 $0 $0 Flat summaries, no curation, rate limit failures
v1.5 ~$0.018 ~$6.50 Curated, themed, tiered relevance
v2.0 ~$0.03–0.05 ~$11–18 Agentic curation, cross-source synthesis, adaptive error handling

The jump from v1 to v1.5 — going from free to $6.50/year — delivered the single biggest quality improvement. The jump from v1.5 to v2 added meaningful capabilities but the marginal quality improvement per dollar was smaller. My take: the $0.018/day for Haiku is the highest-ROI investment in this project.

What I'd Do Differently

Test RSS URLs before committing them. I spent real debugging time on the TechCrunch URL that would have taken 30 seconds to verify with curl.

Use proper APIs from the start, not scraping. DuckDuckGo scraping worked locally and broke in production. Exa cost me nothing but saved a full debugging cycle.

Build the fallback pipeline first. It's what gives you confidence to iterate aggressively on the agent layer, knowing you'll always have a working backup.

Be explicit about billing endpoints. When you tell an agent to "use Anthropic's API," make sure it can't interpret that as "use Anthropic's models through a third-party router."

Start with hosted models, then optimize. Prototype with the most capable model you can afford, get the system working, then tune for cost.

Try It Yourself

Both versions are open source:

v1/v1.5: github.com/sumoseah/daily-digest — The main branch is v1, the development branch is v1.5.

v2 (agentic): github.com/sumoseah/daily-digest-v2 — The full agentic version with Claude Agent SDK, Exa search, and fallback pipeline.

To get started, you'll need: a GitHub account (free), an Anthropic API key ($5 minimum credit), a Resend account (free tier), and optionally an Exa API key (free tier, 1,000 queries/month). Fork the repo, add your API keys to GitHub Secrets, edit user_profile.yaml with your interests and sources, and push. Your first digest will arrive the next morning.