Token-aware message truncation: fit a chat history into your model's context budget.
Token-aware message truncation: fit a chat history into your model's context budget.
agentfit · v0.1.0
by MukundaKatta
agentfit-mcp
MCP server for @mukundakatta/agentfit. Lets Claude Desktop, Cursor, Cline, Windsurf, Zed, or any other MCP client estimate token counts and fit a chat history into a model's context budget on demand.
npx -y @mukundakatta/agentfit-mcp
Three tools:
count_tokens— estimate tokens in a string or chat-message array, with per-model estimator families (openai, anthropic, google, llama, default).fit_messages— drop messages from a chat history until under amaxTokensbudget. Supports drop-oldest, drop-middle, and priority strategies; honorspreserveSystem,preserveFirstN,preserveLastN.list_estimators— list the built-in estimator families.
Add to your client
Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"agentfit": {
"command": "npx",
"args": ["-y", "@mukundakatta/agentfit-mcp"]
}
}
}
Cursor
~/.cursor/mcp.json:
{
"mcpServers": {
"agentfit": {
"command": "npx",
"args": ["-y", "@mukundakatta/agentfit-mcp"]
}
}
}
Cline / Windsurf / Zed
Same shape as above. The server speaks plain MCP over stdio, so any client that supports stdio MCP servers will work.
Tool examples
count_tokens:
{ "input": "hello world", "model": "claude-sonnet-4-6" }
Returns:
{ "tokens": 4, "model": "claude-sonnet-4-6" }
fit_messages:
{
"messages": [
{ "role": "system", "content": "You are precise." },
{ "role": "user", "content": "long context..." },
{ "role": "assistant", "content": "..." },
{ "role": "user", "content": "final question" }
],
"maxTokens": 8000,
"model": "claude-sonnet-4-6",
"preserveSystem": true,
"preserveLastN": 2,
"strategy": "drop-oldest"
}
Returns:
{
"messages": [...],
"dropped": [...],
"tokens": { "before": 12000, "after": 7800, "budget": 8000 },
"fit": true
}
fit_messages always returns a structured result and never throws across the wire: if the budget is unreachable even after dropping all non-protected messages, you get fit: false with the partial result so the caller can decide what to do.
Why a separate MCP server
@mukundakatta/agentfit is a zero-dependency JavaScript library. This package wraps it as an MCP server so it's accessible from inside any MCP-aware AI assistant: ask Claude "how many tokens is this transcript?" or "trim this chat to 8k tokens preserving the system prompt and last 2 turns" and the assistant calls these tools directly.
Sibling MCP servers
Part of the agent-stack series, all @mukundakatta/*-mcp:
@mukundakatta/agentfit-mcp— Fit it. (this)@mukundakatta/agentguard-mcp— Sandbox it.@mukundakatta/agentsnap-mcp— Test it.@mukundakatta/agentvet-mcp— Vet it.@mukundakatta/agentcast-mcp— Validate it.
License
MIT