Anatomy of an AI SDR Agent: A Real Decision Trace From a Production System

I took over marketing at Warmly in February. Last quarter, our pipeline was under a million dollars. Last month, it 3x'd. Same headcount. Lower spend.

The thing that did it wasn't a single tool. It was learning to stop waiting for signals and start forcing pipeline through.

I empathize with anyone trying to generate demand right now. In a world where SaaS is going under and every rep wants more meetings with less budget, the old playbook breaks. You can't wait for 6sense to light up an account. You can't wait for Bombora to show a surge. You can't wait for a sales rep to notice an alert in Salesforce and decide to action it. By the time any of that happens, the prospect is three days deep into evaluating a competitor.

The fix is an AI SDR agent that decides and acts on its own, 24 hours a day, across every channel you're willing to pay for.

This post is a real decision trace from the AI SDR agent we run at Warmly. One signal, one account, the actual reasoning. I'll show you every tool call. I'll show you the three things the agent decided not to do. I'll tell you what's hard about building this, why most AI SDR software still sucks, and what I still get wrong.

If you're evaluating AI SDR software this quarter, this is the level of depth you should be demanding from every vendor on your list.

The one idea that changed everything: force pipeline

Most outbound tools are signal-driven. They wait for a buying committee to tip its hand. A new hire. A Bombora surge. A jobs posting. Then they fire an email or send an alert to a rep.

That playbook is fine when you have 100,000 monthly visitors. It's broken when you're a startup with 3,000 visitors a month or a quarter-growth-stage company with a stalling funnel. The math doesn't work. You don't have enough signals. You're fighting over the same 200 accounts everyone else is targeting.

The fix isn't more signals. It's more volume. Productive volume, not spray and pray.

Here's the constraint framing I walk prospects through on every call:

Your ad budget is finite. You can run $50K/month in paid social before diminishing returns.
Your email inbox capacity is finite. Each mailbox can send ~1,000 sequenced emails/week before Google flags you.
Your LinkedIn send limit is hard-capped. 25 invites per account per day. Period.

Those three resources are the real TAM. Your goal isn't to have better signals than your competitor. It's to max out productive volume across every channel you can afford, then layer signals on top to prioritize. Signals are the ranking function. Volume is the surface area.

Everything I'm about to describe is built around this idea. The AI SDR agent isn't optimizing a lead alert. It's orchestrating maximum productive volume across ads, email, and LinkedIn, with signals deciding what goes where.

How an AI SDR agent makes decisions: the 3-second trace

Signal hits at 11:47am PT on a Tuesday. An account on our watchlist ran a Bombora surge this morning, and a new VP of Sales was announced on LinkedIn 3 weeks ago. Here's what the agent does, in order.

(Fictional placeholder domain used below. Real traces live inside our customer workspaces.)

Second 1: Read the context

Tool 1. get_account_detail. The agent queries our context graph. The full account record comes back in 82 milliseconds: industry (wholesale distribution), headcount (180), tech stack (Salesforce + Outreach), ICP tier (Tier 1), current intent score (187 out of 500), last 30 days of activity.

Domain: acme-supply.com.

Tool 2. get_account_events. The agent pulls the event ledger. Last 90 days: 4 website visits, 2 case study downloads, a Bombora surge on "revenue orchestration," a new VP of Sales hired 3 weeks ago, a job posting for an SDR manager.

That's a buying committee reorganizing. That's intent.

Tool 3. get_score_breakdown. The agent asks how the score of 187 was computed. Website behavior (72 points), research intent (40), new hire signal (35), job postings (20), social engagement (20). Not a black box. Every point is traceable.

Second 2: Check the rules

Tool 4. check_cooldown. Have we emailed anyone at this account in the last 14 days? The agent finds we sent a sequence to the old VP of Sales 9 days ago. He's gone. Cooldown applies to the email address, not the company. Cleared to act.

Tool 5. get_pattern_rules. The agent pulls the policy. For a Tier 1 account with intent above 150 and a fresh executive hire, what are we allowed to do? The rules say: build buying committee, write sequence with new-exec angle, push to SDR queue for manual approval.

Tool 6. get_trust_scores. The agent checks its own trust rating for this action type. In plain English: if the score is 8.5 and above (on our 10-point scale), the action goes through automatically. Below that, it routes to a human for approval. For "send email sequence to new account" on this account, our trust score is 0.78 out of 1.0. That rounds to 7.8. Needs review.

This is the part most AI SDR demos skip.

Tool 7. build_account_buying_committee. The agent goes and builds the committee. LinkedIn enrichment (Vetric) plus firmographic data (Clearbit). Six people come back: new VP of Sales, CRO, Director of RevOps, a Sales Ops Manager, two SDR Managers. Each gets a persona tag: Decision Maker, Champion, Influencer, User.

Tool 8. get_account_contacts. The agent verifies the committee is written back to the workspace and every contact has a valid business email. Email quality scored against our email-validity classifier. Five out of six pass. One gets flagged for a bounce check.

Second 3: Act (and restrain)

Three paths diverge.

Path	Action	Outcome
A	Write and send emails autonomously	Blocked. Trust 0.78 < threshold 0.85. Needs human review.
B	Add domain to LinkedIn retargeting audience	Executed. Threshold 0.40. Zero incremental cost.
C	Generate email batch for human review	Executed. Queued for morning approval.

Tool 9. push_linkedin_audience. The domain gets added to the LinkedIn retargeting audience. The new VP sees a Warmly ad in his feed this afternoon. Cost: zero incremental.

Tool 10. generate_email_batch. The agent writes 6 emails. Each references the specific persona, the hiring signal, and the Bombora surge. The new VP's email opens: "Congrats on the new role. Noticed the team started researching revenue orchestration the week you joined. Probably not a coincidence." Specific. Falsifiable. Not "Hope this finds you well."

Tool 11. get_batch_push_preflight. Preflight checks run. Do the emails pass spam filters? Are personas correctly assigned? Is committee coverage complete? Yes to all three.

Tool 12. log_decision. The full decision trace gets written to the ledger. Context snapshot, policy version, reasoning, factors, confidence, tools invoked, and what it decided not to do. Immutable. Every decision our agent makes is auditable after the fact.

Total time from signal hit to logged decision: 2.7 seconds.

The three things the agent decided NOT to do

This is the part that separates an agent from an automated sequence. Restraint is the feature.

It did not Slack the AE. A VP of Sales for a RevOps company told me on a call last month: "If you just have an alert that says so-and-so visited our website, the reps aren't going to do anything. They never do." He's right. Alerts are noise by default. Our agent only pings Slack when the intent score crosses 200 and there's a warm contact on file. This account hit 187. One page view plus a hiring signal isn't Slack-worthy.

It did not push to HeyReach or a LinkedIn outreach sequence. Policy: for accounts where we haven't had a direct touchpoint yet, start with ads and email. LinkedIn outreach gets reserved for warmer signals. Save the 25/day LinkedIn send budget for accounts where someone has actually replied.

It did not send the emails autonomously. Trust score 0.78, below 0.85. The batch went to the work queue. A human rep reviews in the morning, approves in 30 seconds, and the sequence fires.

Most AI SDR software measures success by how much it did. The right question is whether it did the right thing. Sometimes the right thing is wait.

Why Clay alone isn't enough (the static spreadsheet problem)

Every prospect I've talked to in the last 60 days has asked some version of: how is this different from Clay?

Fair question. Clay is a great tool. If all you need is contact data and a one-time list build, go buy Clay. I'd use it too.

But Clay is a static spreadsheet. It doesn't feel alive. You pull the data, enrich it, push it to a sequence, and from that point forward it starts decaying. The contact changes jobs. The company raises a round. A new buying committee member joins. Clay doesn't know. The list you built three weeks ago is already wrong.

An AI SDR agent layers live signals on top of every contact, continuously. It re-scores accounts as new events fire. It re-ranks buying committees as people move. It skips the old VP of Sales who left and adds the new one automatically.

Clay is sourcing. An AI SDR agent is orchestration. You still need sourcing. But sourcing is table stakes in 2026, and Clay's own pricing strategy (they keep dropping the floor) tells you it's getting commoditized. The defensible layer is the live signal graph on top.

The 65 tools a real AI SDR agent uses

If you're shopping for AI SDR software, ask the vendor for their tool list. Below is ours, grouped. A real agent calls across these in a single reasoning loop. A fake agent has 5 tools and a hopeful prompt.

Category	Tool count	What they do
GTM Query	7	Account lookup, events, contacts, memory, buying committee
Decision / Trust	4	Log decisions, check cooldowns, trust scores, pattern rules
Email / Outreach	6	Generate emails, push to Outreach, HeyReach, Salesloft
Ad Audiences	4	LinkedIn, Meta, YouTube audience pushes
Batch Work Queue	15	Review, approve, reject, preflight, push
Policy / Config	13	ICP rules, persona rules, policy simulation, reclassification
Research	10	Web search, document search, transcript analysis, LinkedIn lookup
Control Plane	16	Agent status, run traces, scheduled actions, ledger replay

The tools matter. The chaining matters more. Our SDR agent routinely invokes 10 to 15 of these in a single decision. That's what "agentic outbound" means. Everything else is marketing.

How the agent gets smarter every week

Every decision gets logged with a trace ID. Every outcome (reply, meeting booked, deal closed, unsubscribe, bounce) gets logged with the same trace ID. Over time, you can ask: when the agent made this kind of decision, what happened?

The learning loop:

Decision. Full context snapshot, policy version, tools used, reasoning, confidence.
Outcome. Reply? Meeting? Bounce? Unsubscribe? Revenue attribution?
Grading. Automatic (reply = positive, bounce = negative) plus human review on ambiguous cases.
Policy update. Weights adjust. New rules propose themselves. Old rules get deprecated.
Better decisions. Next week's runs use the updated policy.

This is not RAG. RAG retrieves documents. This retrieves the outcome of every decision the system has ever made, and uses those outcomes to decide what to do next.

Critical mass happens around 100 graded decisions. That's when the system reaches roughly 90% agreement with human judgment on "was this the right call." For most customers, 2 to 4 weeks of active use.

The result: the agent running today isn't the same agent that ran last Tuesday. Same code. Different policy layer. New ICP rules. Updated scoring weights. A messaging angle that stopped converting is now deprecated. The version number changes, but quietly.

This is agent memory doing actual work. Not a vector DB full of chat transcripts. A causal graph between decisions and outcomes.

Why most AI SDR software still fails

Every prospect I talk to has tried an AI SDR product that flopped. I've heard specific stories from marketing leaders across B2B SaaS, services, and mid-market ops teams. The pattern is always the same.

They bought an AI SDR that just auto-drafted emails. A CMO who tried one of the big AI SDR tools last year told me she had to let her team go because the output was so bad it damaged deliverability across her whole domain. She's still dealing with the spam score hangover a year later.

They bought an intent tool that alerted a rep. A revenue leader told me: "If the alert isn't actionable, the rep won't click it. And they never click it." Alert fatigue is a real deliverability problem for your own team's attention, not just your prospects' inboxes.

They bought Clay and expected orchestration. Clay isn't orchestration. It's sourcing. People pick Clay, build a list, push it to one sequence, and then wonder why nothing compounds.

The three failure modes share a common cause: no real tool chaining, no decision layer, no feedback loop. The "AI" is window dressing on top of a CSV export.

Why autonomous SDR agents are hard to build

Let me spare you the "we pioneered" routine. Here's what's actually hard.

Account identification is a nightmare. You need seven data sources because no single vendor gets it right. Clearbit misses 30% of B2B traffic. Bombora is great at intent but useless for person identification. We spent 18 months on a streaming pipeline that stitches this together with smart window closing, late data handling, and shadow A/B testing across premium vs. economy resolution modes. This is distributed systems work, not prompt engineering.

The context graph is harder than it looks. 40M+ company profiles. 400M+ person profiles. An immutable event ledger handling 1.28M+ signals per day. We sync 15 million records to the database every day. Entity resolution, deduplication, making sure every record is live and ready at inference time. Every query has to come back in under 100ms for the fast projection, under 5 seconds for medium, under 30 seconds for deep. pgvector isn't fast enough. Pure Postgres isn't structured enough. We ended up with computed columns that compress 1,000 raw events into 5 meaningful scores, because no agent can reason over 1,000 events in a 3-second decision window.

Trust gates are where most AI SDR tools die. Letting an AI fire email sequences autonomously is how you end up on a deliverability blacklist. We built a graduated trust system. The agent starts with low trust, earns it through good decisions, and different actions have different thresholds. Adding a domain to a LinkedIn audience is trust 0.40. Sending an email sequence is 0.85. Updating ICP policy is 0.95. Most startups building "autonomous SDR agents" skip this entirely, which is why they're not actually autonomous. They're just fast.

The one thing we still get wrong: new verticals. When we onboard a customer in a market we haven't seen much of (vertical SaaS in industries like maritime logistics, say), the first month is rough. The ICP classifier doesn't know what it doesn't know. Our policies were tuned on tech B2B and they miss the nuances. We're getting better at cold-starting new verticals, but we're not there yet. If your GTM motion is weird, expect a ramp.

"Why not just build this in Claude Code?"

A VP of Engineering at a holding company asked me this directly on a call last week. Reasonable question. Claude Code is good. A smart eng team can spin up a prototype that hits the Bombora API, enriches with Clearbit, drafts an email with Claude, and pushes to Outreach. In a week.

Here's what that prototype doesn't have:

Deduplication across 15 million daily records. The same person shows up with different emails, different LinkedIn URLs, different companies. Resolving identity is a full-time team.
A 14-day cooldown logic that handles job changes mid-sequence.
Trust scores that learn from actual outcomes.
An immutable ledger of every decision so you can actually debug what the agent did last Tuesday.
Deliverability guardrails that stop the agent from nuking your domain reputation when it spins up.
A buying committee builder that actually works across 40M companies without LinkedIn scraping you into a ban.

It's really easy to spin something up. It's very hard to make it production-ready. We've been building this for three years. If you're an ops person with 20 hours to spare and no infra team, the math on "build vs buy" becomes obvious quickly.

What prospects actually ask about AI SDR software

From the last 60 days of sales calls, every prospect asks some flavor of these. If the vendor you're evaluating can't answer them cleanly, move on.

"How often is your contact data updated?" Ours re-scrapes on every account interaction. People always boast about contact count. Ask about freshness.

"What happens if your trust score blocks an action I want to take?" You should be able to override. Trust gates are defaults, not jail cells. You stay in control.

"Can I see the logs of what the agent actually did?" If the vendor doesn't have a ledger view, run. This is the #1 diagnostic tool when something goes sideways.

"How do credits work?" Credit pricing is the most confusing part of the AI tool category right now. Demand a breakdown: what costs what, what's unlimited, what triggers overages. If the vendor's pricing page has the word "usage-based" without a calculator, they're trying to hide something.

"Is my data portable? Can I access the context graph via API?" You need an exit path. If the answer is "contact sales for API access," treat that as a future lock-in problem.

"What's your retention?" Anyone can win a customer in the AI hype cycle. Keeping them is the only credibility that matters. We run 114% net retention. Ask every vendor on your shortlist. Compare.

What to demand from any AI SDR software vendor

You're going to buy AI SDR software this year. Probably several products. Here's what to look for.

Can it show you a decision trace? If you can't see the 12 tools it called and the reasoning between them, it's a black box. Black boxes become liabilities when deliverability complaints start. Demand a ledger.

Can it decide NOT to do things? If every feature is about "generating more," run. Restraint is harder than generation. Ask how many of the agent's runs end in "no action taken."

Does it get smarter, or just louder? Ask to see a decision from 3 months ago and the same type of decision from last week. If the reasoning hasn't changed, the agent isn't learning. It's iterating on prompts.

Does it have real tools, or just LLM calls? An agent with 5 tools is a sequence tool. An agent with 65 tools that chain based on reasoning is an operator. Ask for the tool list.

Is it trust-gated? Ask what the agent does autonomously vs. what it escalates. If the answer is "everything is autonomous," the vendor is lying or reckless.

Can it explain a score? If the agent scores an account 187/500 and can't break that number down, the score is vibes. Real scores are traceable.

Is the company going to be around in 3 years? AI is compressing. Every month another "AI SDR" launches. The tools that survive will be the ones with real retention and real infrastructure behind them. Ask about net dollar retention, runway, and customer count growth. Don't trust pitch decks. Ask for references.

The AI SDR era isn't about replacing SDRs. It's about replacing the lookup tables and rules engines that have been pretending to be intelligence for a decade. The companies that figure this out in 2026 will compound. The ones still measuring "AI success" by message volume will look like the 2010 companies that measured email marketing by opens.

See your own decision trace

I run Warmly's AI SDR agent on our own pipeline every day. Every signal, every account, every decision, logged and auditable. If you want to see what it would do on your accounts, book 20 minutes with our team. We'll pull a real decision trace from your pipeline on the call. No canned demo. No slides. Just the agent, running on your accounts.

Not ready for a demo? Start here:

What Is a GTM Context Graph: the data layer the agent queries
The Agent Harness for GTM: how we run the execution loop
Memory Is the Moat for AI Sales Agents: why learning loops win
Warmly's AI SDR Agent: the product page with live examples

Last Updated: April 2026

Connect with Our Experts