Comparing Ways to Give an API to an LLM

If you want a model to call your API, you have four common options. None is best. Each has a shape it fits.

Spec in the prompt

Paste the OpenAPI document into the system prompt. The model sees every endpoint at once.

Best for: small APIs, prototypes, examples in a tutorial.

Trade-offs: the spec eats tokens on every call. Most models can handle a few thousand lines, but cost adds up. The model may also pick the wrong endpoint when many endpoints share a verb or noun.

Retrieval over docs

Index your docs as chunks. At call time, retrieve the chunks most relevant to the user's question and put them in context.

Best for: large docs, frequent updates, multiple products in one doc set.

Trade-offs: retrieval failures are silent. The model gets the wrong page, answers confidently, and the user has no way to know. Quality depends on chunking, embeddings, and the retrieval prompt.

Function calling and tool schemas

Define each endpoint as a function with a typed schema. The model picks a function and fills in arguments. The runtime makes the actual call.

Best for: high-precision actions, sensitive data, production agents.

Trade-offs: the schema format is constrained (usually JSON Schema). Long lists of tools can confuse the model the same way long specs do. You also have to write or generate one schema per endpoint.

MCP (Model Context Protocol)

Run an MCP server in front of your API. The server exposes endpoints as tools, plus optional resources and prompts. Any MCP-capable client can use them.

Best for: when you want one integration that works across many clients.

Trade-offs: still early. Tooling is maturing. You need to host the server somewhere your users can reach it.

Token cost, roughly

A back-of-envelope comparison for an API with 30 endpoints:

Spec in prompt: the OpenAPI document might be 8,000 to 15,000 tokens. Every call pays this in the system prompt.
Retrieval: typically 500 to 2,000 tokens of retrieved context per call, plus the embedding lookup cost (cheap, but not free).
Function calling: schemas for 30 tools might total 3,000 to 6,000 tokens. Models can prune unused tools from the context, but not always.
MCP: comparable to function calling on the wire, since MCP serializes tools as schemas underneath.

For low-volume usage, the differences do not matter much. At a hundred thousand calls a month, the gap between "always send the full spec" and "retrieve as needed" can be the difference between a hobby project and a real bill.

Picking one

A reasonable default path: spec-in-prompt for prototypes, function calling for production, MCP if you want to reach users across many tools. Retrieval is the right answer when your docs are too big to fit, not as a default.

These are not exclusive. A production agent might use function calling for the actions it can take, and retrieval over docs for everything else. A common hybrid:

Function calling for the ten or so endpoints that perform actions.
Retrieval over the rest of the docs for the model to answer questions it cannot answer with a tool call.

That setup keeps the tool list small (better signal on which tool to pick) and lets the model answer questions like "what does this error code mean" without inventing an endpoint to call.

A note on freshness

All four approaches have a freshness story:

Spec in prompt is as fresh as the spec you copy into the prompt.
Retrieval is as fresh as your last index build.
Function calling is as fresh as your tool definitions, which are usually generated.
MCP is as fresh as your server, which can return live data.

If your API changes often, lean toward the options that are easiest to regenerate: function calling and MCP both work well when wired to a CI step that updates schemas on every release.

Comparing Ways to Give an API to an LLM

Spec in the prompt

Retrieval over docs

Function calling and tool schemas

MCP (Model Context Protocol)

Token cost, roughly

Picking one

A note on freshness

What to read next

Using an AI Agent to Rewrite Documentation for Clarity

Designing the Review Workflow for AI-Generated Documentation

How an AI Writer Detects Documentation Drift from a Pull Request