A web content fetcher MCP server that converts HTML to clean, AI and human readable markdown.
A web content fetcher MCP server that converts HTML to clean, AI and human readable markdown.
Fetch URL MCP Server · v1.10.0
j0hanz
Fetch URL MCP Server
A web content fetcher MCP server that converts HTML to clean, AI and human readable markdown.
Overview
The Fetch URL MCP Server provides a standardized interface for fetching public web content and transforming it into Markdown enriched with structured metadata. It validates URLs, applies noise removal heuristics, and caches results for reuse. The server supports both inline and task-based execution modes, making it suitable for a wide range of client applications and LLM interactions.
Key Features
fetch-urlvalidates public HTTP(S) URLs, fetches the page, and returns cleaned Markdown plus structured metadata.- The tool advertises optional task support and emits progress updates while fetching and transforming larger pages.
- GitHub, GitLab, Bitbucket, and Gist page URLs are rewritten to raw-content endpoints when possible before fetch.
internal://instructionsandinternal://cache/{namespace}/{hash}expose built-in guidance and cached Markdown as MCP resources.- HTTP mode adds host/origin validation, auth, rate limiting, health checks, OAuth protected-resource metadata, and cached-download URLs.
Requirements
- Node.js >=24 (from
package.json) - Docker is optional if you want to run the published container image.
Quick Start
Use this standard MCP client configuration:
{
"mcpServers": {
"fetch-url-mcp": {
"command": "npx",
"args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
}
}
}
Client Configuration
Install in VS Code
Add to .vscode/mcp.json:
{
"servers": {
"fetch-url-mcp": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
}
}
}
Or install via CLI:
code --add-mcp '{"name":"fetch-url-mcp","command":"npx","args":["-y","@j0hanz/fetch-url-mcp@latest"]}'
For more info, see VS Code MCP docs.
Install in VS Code Insiders
Add to .vscode/mcp.json:
{
"servers": {
"fetch-url-mcp": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
}
}
}
Or install via CLI:
code-insiders --add-mcp '{"name":"fetch-url-mcp","command":"npx","args":["-y","@j0hanz/fetch-url-mcp@latest"]}'
For more info, see VS Code Insiders MCP docs.
Install in Cursor
Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"fetch-url-mcp": {
"command": "npx",
"args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
}
}
}
For more info, see Cursor MCP docs.
Install in Visual Studio
For solution-scoped setup, add this to .mcp.json at the solution root:
{
"servers": {
"fetch-url-mcp": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
}
}
}
For more info, see Visual Studio MCP docs.
Install in Goose
Add to ~/.config/goose/config.yaml on macOS/Linux or %APPDATA%\Block\goose\config\config.yaml on Windows:
extensions:
fetch-url-mcp:
name: fetch-url-mcp
cmd: npx
args: ['-y', '@j0hanz/fetch-url-mcp@latest']
enabled: true
type: stdio
timeout: 300
For more info, see Goose extension docs.
Install in LM Studio
Add to ~/.lmstudio/mcp.json on macOS/Linux or %USERPROFILE%/.lmstudio/mcp.json on Windows:
{
"mcpServers": {
"fetch-url-mcp": {
"command": "npx",
"args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
}
}
}
For more info, see LM Studio MCP docs.
Install in Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"fetch-url-mcp": {
"command": "npx",
"args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
}
}
}
For more info, see Claude Desktop MCP docs.
Install in Claude Code
Use the CLI:
claude mcp add fetch-url-mcp -- npx -y @j0hanz/fetch-url-mcp@latest
For project-scoped config, Claude Code writes .mcp.json with:
{
"mcpServers": {
"fetch-url-mcp": {
"command": "npx",
"args": ["-y", "@j0hanz/fetch-url-mcp@latest"],
"env": {}
}
}
}
For more info, see Claude Code MCP docs.
Install in Windsurf
Add to ~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"fetch-url-mcp": {
"command": "npx",
"args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
}
}
}
For more info, see Windsurf MCP docs.
Install in Amp
Add to ~/.config/amp/settings.json on macOS/Linux, %USERPROFILE%\.config\amp\settings.json on Windows, or .amp/settings.json for workspace-scoped config:
{
"amp.mcpServers": {
"fetch-url-mcp": {
"command": "npx",
"args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
}
}
}
Or install via CLI:
amp mcp add fetch-url-mcp -- npx -y @j0hanz/fetch-url-mcp@latest
For more info, see Amp docs.
Install in Cline
Open the MCP Servers panel, choose Configure MCP Servers, and add this to cline_mcp_settings.json:
{
"mcpServers": {
"fetch-url-mcp": {
"command": "npx",
"args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
}
}
}
For more info, see Cline MCP docs.
Install in Codex CLI
Use the CLI:
codex mcp add fetch-url-mcp -- npx -y @j0hanz/fetch-url-mcp@latest
Or add this to ~/.codex/config.toml or project-scoped .codex/config.toml:
[mcp_servers.fetch-url-mcp]
command = "npx"
args = ["-y", "@j0hanz/fetch-url-mcp@latest"]
For more info, see Codex MCP docs.
Install in GitHub Copilot
Add to .vscode/mcp.json:
{
"servers": {
"fetch-url-mcp": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
}
}
}
For more info, see GitHub Copilot MCP docs.
Install in Warp
Open Personal > MCP Servers in Warp, choose + Add, and either add a CLI server with:
command:npxargs:["-y", "@j0hanz/fetch-url-mcp@latest"]
Or paste this JSON snippet when using Warp's multi-server import flow:
{
"mcpServers": {
"fetch-url-mcp": {
"command": "npx",
"args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
}
}
}
For more info, see Warp MCP docs.
Install in Kiro
Use Kiro's MCP Servers panel or the Add to Kiro install flow. Kiro stores workspace-scoped MCP config in .kiro/settings/mcp.json and user-scoped config in ~/.kiro/settings/mcp.json.
For this server, use:
command:npxargs:["-y", "@j0hanz/fetch-url-mcp@latest"]
For more info, see Kiro MCP docs.
Install in Gemini CLI
Add to ~/.gemini/settings.json:
{
"mcpServers": {
"fetch-url-mcp": {
"command": "npx",
"args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
}
}
}
For more info, see Gemini CLI MCP docs.
Install in Zed
Add to ~/.config/zed/settings.json:
{
"context_servers": {
"fetch-url-mcp": {
"command": "npx",
"args": ["-y", "@j0hanz/fetch-url-mcp@latest"],
"env": {}
}
}
}
For more info, see Zed MCP docs.
Install in Augment
Use the Augment Settings panel and either add the server manually or choose Import from JSON:
{
"mcpServers": {
"fetch-url-mcp": {
"command": "npx",
"args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
}
}
}
For more info, see Augment MCP docs.
Install in Roo Code
Use Roo Code's MCP Servers UI or marketplace flow.
For this server, use:
command:npxargs:["-y", "@j0hanz/fetch-url-mcp@latest"]
For more info, see Roo Code docs.
Install in Kilo Code
Use Kilo Code's MCP Servers UI or marketplace flow.
For this server, use:
command:npxargs:["-y", "@j0hanz/fetch-url-mcp@latest"]
For more info, see Kilo Code docs.
Use Cases
- Fetch documentation pages, blog posts, or reference material into Markdown before sending them to an LLM.
- Retrieve repository-hosted content from GitHub, GitLab, Bitbucket, or Gists and let the server rewrite page URLs to raw endpoints when possible.
- Reuse cached Markdown through
internal://cache/{namespace}/{hash}or bypass the cache withforceRefreshfor time-sensitive pages. - Use task mode for large pages or slower sites when the inline response would otherwise be truncated or delayed.
Architecture
[MCP Client]
├─ stdio -> `src/index.ts` -> `startStdioServer()` -> `createMcpServer()`
└─ HTTP (`--http`) -> `src/index.ts` -> `startHttpServer()` -> HTTP dispatcher
├─ `GET /health`
├─ `GET /.well-known/oauth-protected-resource`
├─ `GET /.well-known/oauth-protected-resource/mcp`
├─ `GET /mcp/downloads/{namespace}/{hash}`
└─ `POST|GET|DELETE /mcp`
`createMcpServer()`
├─ registers tool: `fetch-url`
├─ registers prompt: `get-help`
├─ registers resources:
│ - `internal://instructions`
│ - `internal://cache/{namespace}/{hash}`
├─ enables capabilities: completions, logging, resources, prompts, tasks
└─ installs task handlers, log-level handling, and shutdown cleanup
`fetch-url` execution
├─ validate input with `fetchUrlInputSchema`
├─ normalize URL and block local/private targets unless allowed
├─ rewrite supported code-host URLs to raw endpoints when possible
├─ fetch and cache content via the shared pipeline
├─ transform HTML into Markdown in the transform worker path
└─ validate `structuredContent` with `fetchUrlOutputSchema`
Request Lifecycle
[Client] -- initialize {protocolVersion, capabilities} --> [Server]
[Server] -- {protocolVersion, capabilities, serverInfo} --> [Client]
[Client] -- notifications/initialized --> [Server]
[Client] -- tools/call {name, arguments} --> [Server]
[Server] -- {content: [{type, text}], structuredContent?, isError?} --> [Client]
MCP Surface
Tools
fetch-url
Fetch public webpages and convert HTML into AI-readable Markdown. The tool is read-only, does not execute page JavaScript, can bypass the cache with forceRefresh, and supports optional task mode for larger or slower fetches.
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string |
yes | Target URL. Max 2048 chars. |
skipNoiseRemoval |
boolean |
no | Preserve navigation/footers (disable noise filtering). |
forceRefresh |
boolean |
no | Bypass cache and fetch fresh content. |
maxInlineChars |
integer |
no | Inline markdown limit (0-10485760, 0=unlimited). Lower of this or the global limit applies. |
The response is returned as MCP text content and, when validation succeeds, as structuredContent containing url, resolvedUrl, finalUrl, title, metadata, markdown, fromCache, fetchedAt, contentSize, and truncated.
1. [Client] -- tools/call {name: "fetch-url", arguments} --> [Server]
2. [Server] -- dispatch("fetch-url") --> [src/tools/fetch-url.ts]
3. [Handler] -- validate(fetchUrlInputSchema) --> normalize / fetch / transform
4. [Handler] -- validate(fetchUrlOutputSchema) --> assemble content + structuredContent
5. [Server] -- result or tool error --> [Client]
Resources
| Resource | URI | MIME Type | Description |
|---|---|---|---|
fetch-url-mcp-instructions |
internal://instructions |
text/markdown |
Guidance for using the Fetch URL MCP server. |
fetch-url-mcp-cache-entry |
internal://cache/{namespace}/{hash} |
text/markdown |
Read cached markdown generated by previous fetch-url calls. |
Prompts
| Prompt | Arguments | Description |
|---|---|---|
get-help |
none | Return Fetch URL server instructions: workflows, cache usage, task mode, and error handling. |
MCP Capabilities
| Capability | Status | Notes |
|---|---|---|
| completions | confirmed | Advertised in createServerCapabilities() and used by the cache resource template for namespace and hash completion. |
| logging | confirmed | Advertised in createServerCapabilities() and handled through SetLevelRequestSchema. |
| resources subscribe/listChanged | confirmed | Advertised in createServerCapabilities() and implemented for cache resource subscriptions and list changes. |
| prompts | confirmed | get-help is registered during server startup. |
| tasks | confirmed | Advertised in createServerCapabilities() and backed by registered task handlers plus optional tool task support. |
| progress notifications | confirmed | Tool execution reports notifications/progress updates during fetch and transform stages. |
Tool Annotations
| Annotation | Value |
|---|---|
readOnlyHint |
true |
destructiveHint |
false |
idempotentHint |
true |
openWorldHint |
true |
Structured Output
fetch-urlpublishes an explicitoutputSchemaand returnsstructuredContentwhen the assembled response passes validation.
Configuration
| Variable | Default | Applies To | Notes |
|---|---|---|---|
HOST |
127.0.0.1 |
HTTP mode | Bind address. Non-loopback bindings also require ALLOW_REMOTE=true. |
PORT |
3000 |
HTTP mode | Listening port for --http. |
ALLOW_REMOTE |
false |
HTTP mode | Must be enabled to bind to a non-loopback interface. |
ACCESS_TOKENS |
unset | HTTP mode | Comma- or space-separated static bearer tokens. |
API_KEY |
unset | HTTP mode | Alternate static token source for header auth. |
OAUTH_ISSUER_URL |
unset | HTTP mode | Enables OAuth mode when combined with the other OAuth URLs. |
OAUTH_AUTHORIZATION_URL |
unset | HTTP mode | Optional explicit authorization endpoint. |
OAUTH_TOKEN_URL |
unset | HTTP mode | Optional explicit token endpoint. |
OAUTH_REVOCATION_URL |
unset | HTTP mode | Optional OAuth revocation endpoint. |
OAUTH_REGISTRATION_URL |
unset | HTTP mode | Optional OAuth dynamic client registration endpoint. |
OAUTH_INTROSPECTION_URL |
unset | HTTP mode | Required for OAuth token introspection. |
OAUTH_REQUIRED_SCOPES |
empty | HTTP mode | Required scopes enforced after auth. |
OAUTH_CLIENT_ID |
unset | HTTP mode | Optional introspection client ID. |
OAUTH_CLIENT_SECRET |
unset | HTTP mode | Optional introspection client secret. |
SERVER_TLS_KEY_FILE |
unset | HTTP mode | Enable HTTPS when set together with SERVER_TLS_CERT_FILE. |
SERVER_TLS_CERT_FILE |
unset | HTTP mode | TLS certificate path. |
SERVER_TLS_CA_FILE |
unset | HTTP mode | Optional custom CA bundle. |
SERVER_MAX_CONNECTIONS |
0 |
HTTP mode | Optional connection cap. |
SERVER_HEADERS_TIMEOUT_MS |
unset | HTTP mode | Optional Node server tuning. |
SERVER_REQUEST_TIMEOUT_MS |
unset | HTTP mode | Optional Node server tuning. |
SERVER_KEEP_ALIVE_TIMEOUT_MS |
unset | HTTP mode | Optional keep-alive tuning. |
SERVER_KEEP_ALIVE_TIMEOUT_BUFFER_MS |
unset | HTTP mode | Optional keep-alive tuning buffer. |
SERVER_MAX_HEADERS_COUNT |
unset | HTTP mode | Optional header count limit. |
SERVER_BLOCK_PRIVATE_CONNECTIONS |
false |
HTTP mode | Enables inbound private-network protections. |
MCP_STRICT_PROTOCOL_VERSION_HEADER |
true |
HTTP mode | Requires MCP-Protocol-Version on session init. |
ALLOWED_HOSTS |
empty | HTTP mode | Additional allowed Host and Origin values. |
ALLOW_LOCAL_FETCH |
false |
Fetching | Allows loopback and private-network fetch targets. |
FETCH_TIMEOUT_MS |
15000 |
Fetching | Network fetch timeout in milliseconds. |
USER_AGENT |
fetch-url-mcp/<version> |
Fetching | Override the outbound user agent string. |
MAX_INLINE_CONTENT_CHARS |
0 |
Tool output | 0 means no explicit inline truncation limit. |
CACHE_ENABLED |
true |
Caching | Enables in-memory fetch result caching. |
TASKS_MAX_TOTAL |
5000 |
Tasks | Total task capacity. |
TASKS_MAX_PER_OWNER |
1000 |
Tasks | Per-owner task cap, clamped to the total cap. |
TASKS_STATUS_NOTIFICATIONS |
false |
Tasks | Enables status notifications for tasks. |
TASKS_REQUIRE_INTERCEPTION |
true |
Tasks | Requires task interception for task-capable tool execution. |
TRANSFORM_CANCEL_ACK_TIMEOUT_MS |
200 |
Transform workers | Cancellation acknowledgement timeout. |
TRANSFORM_WORKER_MODE |
threads |
Transform workers | Worker execution mode. |
TRANSFORM_WORKER_MAX_OLD_GENERATION_MB |
unset | Transform workers | Optional worker memory limit. |
TRANSFORM_WORKER_MAX_YOUNG_GENERATION_MB |
unset | Transform workers | Optional worker memory limit. |
TRANSFORM_WORKER_CODE_RANGE_MB |
unset | Transform workers | Optional worker memory limit. |
TRANSFORM_WORKER_STACK_MB |
unset | Transform workers | Optional worker stack size. |
FETCH_URL_MCP_EXTRA_NOISE_TOKENS |
empty | Content cleanup | Extra noise-removal tokens. |
FETCH_URL_MCP_EXTRA_NOISE_SELECTORS |
empty | Content cleanup | Extra DOM selectors for noise removal. |
FETCH_URL_MCP_LOCALE |
system default | Content cleanup | Locale override for extraction heuristics. |
MARKDOWN_HEADING_KEYWORDS |
built-in list | Markdown cleanup | Override heading keywords used by cleanup. |
LOG_LEVEL |
info |
Logging | debug, info, warn, or error. |
LOG_FORMAT |
text |
Logging | Set to json for structured logs. |
HTTP Endpoints
| Method | Path | Auth | Purpose |
|---|---|---|---|
GET |
/health |
no, unless ?verbose=1 on a remote server |
Basic health response, with optional diagnostics. |
GET |
/.well-known/oauth-protected-resource |
no | OAuth protected-resource metadata. |
GET |
/.well-known/oauth-protected-resource/mcp |
no | OAuth protected-resource metadata for the MCP endpoint. |
POST |
/mcp |
yes | Session initialization and JSON-RPC requests. |
GET |
/mcp |
yes | Session-bound server-to-client stream handling. |
DELETE |
/mcp |
yes | Session shutdown. |
GET |
/mcp/downloads/{namespace}/{hash} |
yes | Download route used by HTTP-mode cached fetch results. |
Security
| Control | Status | Notes |
|---|---|---|
| Host and origin validation | implemented | HTTP requests are rejected unless Host and Origin match the allowlist built from loopback, the configured host, and ALLOWED_HOSTS. |
| Authentication | implemented | HTTP mode supports static bearer tokens locally or OAuth token introspection; remote bindings require OAuth. |
| Protocol version checks | implemented | HTTP sessions validate MCP-Protocol-Version and pin it to the negotiated session version. |
| Rate limiting | implemented | Requests pass through the HTTP rate limiter before route dispatch. |
| Outbound SSRF protections | implemented | Local/private IPs, metadata endpoints, and .local/.internal hosts are blocked unless ALLOW_LOCAL_FETCH=true. |
| TLS | optional | HTTPS is enabled when both TLS key and certificate files are configured. |
| Stdio logging safety | implemented | Server logs are written to stderr, not stdout, so stdio MCP traffic stays clean. |
Development
| Script | Command |
|---|---|
clean |
node scripts/tasks.mjs clean |
build |
node scripts/tasks.mjs build |
copy:assets |
node scripts/tasks.mjs copy:assets |
prepare |
npm run build |
dev |
tsc --watch --preserveWatchOutput |
dev:run |
node --env-file=.env --watch dist/index.js |
start |
node dist/index.js |
format |
prettier --write . |
type-check |
node scripts/tasks.mjs type-check |
type-check:src |
node node_modules/typescript/bin/tsc -p tsconfig.json --noEmit |
type-check:tests |
node node_modules/typescript/bin/tsc -p tsconfig.test.json --noEmit |
type-check:diagnostics |
tsc --noEmit --extendedDiagnostics |
type-check:trace |
node -e "require('fs').rmSync('.ts-trace',{recursive:true,force:true})" && tsc --noEmit --generateTrace .ts-trace |
lint |
eslint . |
lint:tests |
eslint src/__tests__ |
lint:fix |
eslint . --fix |
test |
node scripts/tasks.mjs test |
test:fast |
node --test --import tsx/esm src/__tests__/**/*.test.ts node-tests/**/*.test.ts |
test:coverage |
node scripts/tasks.mjs test --coverage |
knip |
knip |
knip:fix |
knip --fix |
inspector |
npm run build && npx -y @modelcontextprotocol/inspector node dist/index.js --stdio |
prepublishOnly |
npm run lint && npm run type-check && npm run build |
Build and Release
- The repository includes release automation under
.github/workflows/. Dockerfileanddocker-compose.ymlare available for container-based packaging and local runs.npm run prepublishOnlyruns the release gate: lint, type-check, and build.
Troubleshooting
- For stdio mode, avoid writing logs to stdout; keep logs on stderr.
- For HTTP mode, verify MCP protocol headers and endpoint routing.
- Update client snippets when client MCP configuration formats change.
Credits
| Dependency | Registry |
|---|---|
| @modelcontextprotocol/sdk | npm |
| @mozilla/readability | npm |
| linkedom | npm |
| node-html-markdown | npm |
| undici | npm |
| zod | npm |
Contributing and License
- License: MIT
- Contributions are welcome via pull requests.