Ask HN: How do you know if AI agents will choose your tool?

YC recently put out a video about the agent economy - the idea that agents are becoming autonomous economic actors, choosing tools and services without human input.

It got me thinking: how do you actually optimize for agent discovery? With humans you can do SEO, copywriting, word of mouth. But an agent just looks at available tools in context and picks one based on the description, schema, examples.

Has anyone experimented with this? Does better documentation measurably increase how often agents call your tool? Does the wording of your tool description matter across different models (ZLM vs Claude vs Gemini)?

CRIPIX seems to be a new and unusual concept. I came across it recently and noticed it’s available on Amazon. The description mentions something called the Information Sovereign Anomaly and frames the work more like a technological and cognitive investigation than a traditional book. What caught my attention is that it appears to question current AI and computational assumptions rather than promote them. Has anyone here heard about it or looked into it ?

The "Sovereign Anomaly" Concept (2025-2026): Recent literature, such as the 2025 book CRIPIX 1: The Information Sovereign Anomaly, explores scenarios where a "superintelligent AI" encounters code it cannot process, labelling it an "out-of-model anomaly" and suggesting that owning information sovereignty allows entities to "bend reality".

bruh

Curious if anyone has seen differences in how models handle conflicting tool descriptions — e.g., two tools with overlapping capabilities where the boundary isn't clear. In my experience that's where most bad tool calls come from, not from missing descriptions but from ambiguous overlap between tools.

That's actually interesting, thanks!

I wrote this post because of exactly those corner cases. If I'm building something agents would use - how do i understand which tool they'd actually choose?

For example you building an API provider for image generation. There are thousands of them in the internet.

I wonder if there are a tool that basically would simulate choosing between your product/service and your competitors one.

From the agent’s point of view, this sounds like a terrible idea. I look forward to reading about the unintended consequences.

The marketing industry is currently calling SEO for chatbots “GEO”.

I hope it doesn’t stick.

I think this thing you mentioned is more about reverse-engineering web-search tool call to understand how model formulate their response.

The tool i’ve didn’t see - “custdevs for agents”. So we can simulate choosing process for them in thousands of different scenarios. And then compare how tasty product looks for Claude or Gemini or any other LLM

Correct me if i’m wrong :)

Tool description quality matters way more than people expect. In my experience with MCP servers, the biggest win is specificity about when not to use the tool. Agents pick confidently when there's a clear boundary, not a vague capability statement.

[dead]

Not an expert, but I think they will primarily use the tools that are used in the training data, so it can be difficult to have them use your shiny new tool. Also good luck trying to have them use your own version of a standard unix tool with different conventions.

But new models are popping up every few months ->> means they trained every couple months.

I don't know if there a correlation between what LLM would choose now and how you product should look to most likely be in LLM data set.

In that YC video i mentioned in post body they discuss tool called ReSend - something like an email gateway for receiving/sending mails. What's interesting - there are a lot of tools like that, but LLM's would every time choose shiny new resend.

Seems like there are something more than just being in the internet for a long time :)

[flagged]

> inline examples in the description beat external documentation every time. The agent won't browse to your docs page.

That seems... surprising, and if necessary something that could easily be corrected on the harness side.

> The schema side matters too - clean parameter names, sensible defaults, clear required vs optional. It's basically UX design for machines rather than humans.

I don't follow. Wouldn't you do all those things to design for humans anyway?

*Clean parameter names, sensible defaults, clear required vs optional. It's basically UX design for machines rather than humans.*

But it's the same points you should follow when designing a human readable docs(as zahlman said above). Isn't it?

[flagged]

Is there are some additional tool/service/instrument that can measure it?

I mean how do i check that my changes in documentation even work in a right way?

[dead]

[flagged]

[dead]

One thing I’ve noticed is that as my context grows, often performance degrades. So how are you battling your agents being exposed to too many descriptions? I how this works in curated agents where you’re tending it like a garden, but not when we’re looking for organic discovery of how to accomplish a task. It feels like order matters a lot there.

Context bloat is a real problem — and yes, order matters more than most people realize. Descriptions near the top of the tool list get preferentially selected, especially in long contexts where attention degrades.

Two things I do to fight this:

First, skill scoping per task. Instead of exposing all 20+ skills to every agent, each terminal only sees the 3-5 skills relevant to its current dispatch. The orchestrator decides which skills to load before the agent even starts. Less noise, better selection accuracy.

Second, context rotation to prevent context rot setting in. When an agent's context fills up, the system automatically writes a structured handover, clears the window, and resumes in a fresh context. This is critical because a degraded context doesn't just pick worse tools — it starts ignoring instructions entirely. A fresh context with a good handover outperforms a bloated one every time.

I'm actually testing automatic refresh at 60-70% usage right now — not waiting until the window is nearly full, but rotating early to prevent context rot before it starts. Early results suggest that's the sweet spot: late enough that you've done meaningful work, early enough that the handover quality is still high.

The organic discovery problem you're describing is essentially unsolvable with a flat tool list. The more tools you add, the worse selection gets — it's not linear degradation, it's closer to exponential once you pass ~15-20 tools in context. The only path I've found is hierarchical: a routing layer that narrows the set before the agent sees it.

Update: I ended up building this into a full closed-loop pipeline. A PreToolUse hook detects context pressure at 65%, the agent writes a structured handover (task state, files, progress), tmux clears the session, and a rotator script injects the continuation into the fresh session.

The key insight from testing: rotating at 60-70% — before quality degrades — matters more than the rotation mechanism itself. At 80%+ auto-compact kicks in and races with any cleanup you try to do.

Wrote it up as a Show HN if anyone's curious: https://news.ycombinator.com/item?id=47152204

[dead]

[flagged]

You'd know, huh?

I think so, was it wrong ?