AI News HubLIVE
站内改写5 min read

Show HN: Nenya – A lightweight, highly secure AI API Gateway/Proxy written in Go

Nenya is a lightweight, zero-dependency AI API gateway written in Go. It sits between AI coding clients and upstream LLM providers, adding secret redaction, context management, agent routing, and MCP tool integration with transparent SSE streaming. Security-hardened features include non-root execution, mlock for secrets, seccomp, and no-new-privileges.

SourceHacker News AIAuthor: garou

Uh oh!

There was an error while loading. Please reload this page.

Notifications You must be signed in to change notification settings

Fork 0

Star 14

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

396 Commits

396 Commits

.github

.github

.opencode/plans

.opencode/plans

cmd

cmd

config

config

deploy

deploy

docs

docs

examples

examples

internal

internal

packaging/scripts

packaging/scripts

.containerignore

.containerignore

.gitignore

.gitignore

.golangci.yml

.golangci.yml

.goreleaser.yml

.goreleaser.yml

AGENTS.md

AGENTS.md

CHANGELOG.md

CHANGELOG.md

LICENSE

LICENSE

README.md

README.md

go.mod

go.mod

go.sum

go.sum

install.sh

install.sh

mise.toml

mise.toml

Repository files navigation

A lightweight, zero-dependency AI API Gateway written in Go. Nenya sits between your AI coding clients and upstream LLM providers, adding secret redaction, context management, agent routing, and MCP tool integration — all with transparent SSE streaming. Security-hardened: non-root execution, mlock for secrets, seccomp + no-new-privileges.

Compatible with any provider that implements the OpenAI Or Anthropic Chat Completions API. For 23 providers we ship built-in adapters with specialized handling.

How Nenya handles the requests

+----------------------------------------------+ | Client (Cursor / OpenCode / Aider / etc.) | | OpenAI-compatible request | | POST /v1/chat/completions + Bearer token | | or | | Anthropic Messages API request | | POST /v1/messages + x-api-key | +----------------------------------------------+ | v +----------------------------------------------+ | Nenya Gateway | | - auth check + RBAC enforcement | | - parse JSON + extract model | | - resolve agent/provider | | - optional cache (HIT => replay SSE) | | - optional MCP context/tool injection | +----------------------------------------------+ | v +----------------------------------------------+ | Interceptor Chain (pluggable, best-effort) | | - RedactInterceptor (regex patterns) | | - EntropyInterceptor (high-entropy strings) | | - TFIDFInterceptor (relevance scoring) | | - BouncerInterceptor (engine summarization) | +----------------------------------------------+ | v +----------------------------------------------+ | Token Budget Trimming (if payload > hard | | limit) drops oldest non-system messages and | | applies token-aware middle-out truncation | +----------------------------------------------+ | v +----------------------------------------------+ | Routing | | A) Standard forwarding | | - fallback chain + circuit breaker + RL | | B) MCP multi-turn tool loop (if enabled) | | - buffer SSE, execute MCP tools, re-send | | C) Context-limit retry | | - on upstream 413/context_exceeded, | | summarize payload, retry with fallback | +----------------------------------------------+ | v +----------------------------------------------+ | Upstream LLM Providers | | Anthropic | Gemini | DeepSeek | Mistral | ...| +----------------------------------------------+ | | SSE stream v +----------------------------------------------+ | Nenya SSE Pipeline | | - adapter response transforms | | - (optional) OpenAI→Anthropic conversion | | - usage accounting + stream filter | | - flush + (optional) cache capture | | - (optional) MCP auto-save | +----------------------------------------------+ | v +----------------------------------------------+ | Client receives transparent SSE output | +----------------------------------------------+

Flow notes:

/v1/* endpoints require client bearer auth; /healthz, /statsz, /metrics do not.

Pipeline failures degrade gracefully and forward the request instead of returning a 500.

MCP-enabled agents can run local/remote tools without exposing MCP complexity to the client.

Features

Routing & Agents

Config-driven provider registry — add providers via JSON, zero code changes

23 built-in providers with specialized adapters for wire format differences

Dynamic model discovery — fetches live model catalogs from providers at startup and on reload

Model registry — reference models by string shorthand with automatic provider/context resolution

Multi-provider model resolution — when a model exists in multiple providers, all are added to the agent's fallback chain

Three-tier model resolution — config overrides > discovered models > static registry

Per-model wire format — models from multi-format gateways (OpenCode Zen) auto-convert between OpenAI, Anthropic, and Gemini wire formats based on the model's format attribute

Agent fallback chains — round-robin or sequential with circuit breaker and automatic failover

Latency-aware routing — auto-reorder targets by historical median response time with ±5% jitter to prevent thundering herd

Per-agent system prompts — inline or file-based

Security & Privacy

Tier-0 regex secret filter — always-on redaction of AWS keys, GitHub tokens, passwords, etc.

3-Tier content pipeline — pluggable interceptor chain: regex redaction, entropy filtering, TF-IDF relevance scoring, engine summarization

Context window compaction — sliding window summarization with configurable engine

Stale tool call pruning — compact old assistant+tool response pairs to save tokens

Thought pruning — strip reasoning blocks from assistant message history

Input validation — strict body limits, JSON sanitization, header filtering

Graceful degradation — never blocks requests due to engine or pipeline failures

Role-Based Access Control (RBAC) — per-API key roles (admin, user, read-only) with agent and endpoint restrictions

Secure memory — mlock-protected token storage, read-only sealing, core dump prevention

Hardening (Deployment Security)

Secure memory (default): All tokens stored in mlock-protected RAM, sealed read-only after init, core dumps disabled

Non-root execution — runs as UID 65532 with dropped capabilities

Memory protection — LimitMEMLOCK=infinity and LimitCORE=0 in systemd

Read-only filesystem — immutable root + private /tmp

Seccomp + no-new-privileges — restricted syscalls, prevents privilege escalation

Zero-trust secrets — loaded via systemd credentials or container mounts, never to disk

Socket activation — seamless restarts with zero dropped connections

Reliability

Zero external dependencies — Go standard library only

Hot reload — systemctl reload nenya for zero-downtime config changes

Circuit breaker — per agent+provider+model with automatic failover, exponential backoff, and semantic error classification

Rate limiting — per upstream host (RPM/TPM) with per-provider overrides

Response cache — in-memory LRU with SHA-256 fingerprinting and optional semantic similarity search

Graceful shutdown — 5s grace period for in-flight requests, MCP client cleanup

Context-limit auto-retry — upstream context-length errors trigger summarization and retry

Local engine lifecycle — pre-load and manage local Ollama models with LRU eviction

Structured errors — all error responses include error_kind field for programmatic diagnostics

MCP Tool Integration

Tool discovery — connect to MCP servers for automatic tool injection

Multi-turn execution — intercept tool calls, execute against MCP servers, forward results

Auto-search — pre-fetch relevant context from MCP servers before forwarding

Auto-save — persist assistant responses to MCP memory servers

Quick Start

Run with Podman

Create minimal config and secrets:

mkdir -p config secrets cat > config/config.json secrets/provider_keys.json secrets/client.json _linux_amd64.deb from the release page and run sudo dpkg -i

Fedora/RHEL (.rpm) Download nenya-.x86_64.rpm from the release page and run sudo rpm -i

Arch Linux (.pkg.tar.zst) Download nenya--x86_64.pkg.tar.zst from the release page and run sudo pacman -U

Arch Linux (AUR) yay -S nenya-bin (or your preferred AUR helper)

Nix/NixOS Add gumieri/nur-packages to your NUR registry and use nenya

All packages install the binary to /usr/bin/nenya and include systemd service and socket units. After install, enable and start:

sudo systemctl enable --now nenya.socket sudo systemctl enable --now nenya.service

Runtime Configuration

Nenya supports standard environment variables for deployment portability:

Variable Default Description

PORT 8080 Listening port (overrides server.listen_addr)

HOST — Optional bind address (e.g. 127.0.0.1). Only used when combined with PORT

NENYA_CONFIG_DIR /etc/nenya/ Configuration directory path

NENYA_CONFIG_FILE — Single config file path (takes precedence over NENYA_CONFIG_DIR)

NENYA_SECRETS_DIR — Secrets directory (overrides CREDENTIALS_DIRECTORY)

Example usage:

PORT=9090 HOST=127.0.0.1 ./nenya --config /path/to/config.json

Or in Docker:

docker run -e PORT=9090 -p 9090:9090 ghcr.io/gumieri/nenya:latest

Or Choose Your Deployment

Deploy Bare Metal (systemd) — Direct binary install, socket activation, hot reload

Deploy Container (Podman/Docker Compose) — compose.yml, image verification, security hardening

Deploy Kubernetes (Helm) — Helm chart, ConfigMap/Secret, ingress setup

API Endpoints

All /v1/* endpoints require Authorization: Bearer or Bearer . API keys support RBAC enforcement — agent scoping, endpoint allowlists, role-based permissions (admin bypasses all checks).

Endpoint Auth Description

POST /v1/chat/completions Bearer + RBAC OpenAI-compatible chat with SSE streaming, agent fallback, MCP multi-turn

POST /v1/messages Bearer + RBAC Anthropic Messages API with bidirectional format conversion

GET /v1/models Bearer + RBAC Live model catalog from discovered providers + static registry (context window, max tokens)

POST /v1/embeddings Bearer + RBAC Passthrough proxy

POST /v1/responses Bearer + RBAC Passthrough proxy

POST /v1/images/generations Bearer + RBAC Image generation (OpenAI-compatible)

POST /v1/audio/transcriptions Bearer + RBAC Audio transcription (Whisper-compatible, multipart support)

POST /v1/audio/speech Bearer + RBAC Text-to-speech synthesis (OpenAI-compatible)

POST /v1/moderations Bearer + RBAC Content moderation (OpenAI-compatible)

POST /v1/rerank Bearer + RBAC Re-ranking API (Cohere/Jina/Voyage-compatible)

POST /v1/a2a Bearer + RBAC Agent-to-Agent protocol (Google A2A)

GET /v1/files Bearer + RBAC File listing, upload, retrieval, deletion

POST /v1/batches Bearer + RBAC Batch API operations

POST /proxy/{provider}/* Bearer + RBAC Arbitrary provider endpoint passthrough (all HTTP methods, SSE streaming)

GET /healthz None Engine health probe

GET /statsz None Token usage, circuit breaker state, MCP server status

GET /metrics None Prometheus-compatible metrics

GET /debug/pprof/* Bearer Go profiling endpoints (disabled by default, see debug.pprof_enabled)

See docs/PASSTHROUGH_PROXY.md for detailed passthrough proxy usage.

Documentation

Document Description

Providers All 23 providers, capabilities matrix, special behaviors, adding custom providers

Configuration Full config reference, directory mode, all sections and fields

Deploy Bare Metal Systemd unit, config.d layout, secrets, hot reload

Deploy Container Podman/Docker Compose, image verification, security notes

Deploy Kubernetes Helm chart usage, ConfigMap/Secret, ingress setup

Passthrough Proxy Raw provider endpoint proxying, SSE streaming, auth injection

Architecture Package DAG, request lifecycle, circuit breaker, SSE pipeline

MCP Integration MCP server integration, tool discovery, multi-turn execution

Adapters Adapter system internals, auth styles, capability flags

Secrets Format Systemd credentials, env var fallback, container/K8s deployment

Security Vulnerability reporting policy

License

Apache 2.0. See LICENSE.

About

A lightweight, highly secure AI API Gateway/Proxy written in Go. Acts as transparent middleware between local AI coding clients (OpenCode/Pi/Cur

[truncated for AI cost control]