2026-06-07 20:24 UTCIn-site rewrite4 min readUpdated: 2026-06-30 13:03 UTC

Show HN: Nightwatch, The open-source, read-only AI SRE

Nightwatch (ninoxAI) is an open-source, read-only AI SRE that ingests alerts from monitoring tools, clusters them into incidents, investigates root cause using an AI agent, and proposes fixes for human approval. It is local-first and monitoring-agnostic.

SourceHacker News AIAuthor: egorferber

Notifications You must be signed in to change notification settings

Fork 0

Star 1

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

87 Commits

.github

deploy/k8s

docs

lab

src/ninoxai

tests

.dockerignore

.env.example

.gitignore

CLAUDE.md

CONTRIBUTING.md

Dockerfile

Dockerfile.embeddings

LICENSE

NOTICE

README.md

alembic.ini

docker-compose.yml

pyproject.toml

Repository files navigation

The open-source, read-only AI SRE.

ninoxAI turns alert storms into incidents, investigates root cause over your live systems, and proposes human-approved fixes — without ever touching production.

Quickstart · AI SRE · Demo lab · Docs · Discord

Your monitoring tells you something broke. It pages you at 3am with fifty alerts for one outage and leaves the hard part to you:

What broke, why did it break, and what should we do next?

ninoxAI is a thin, local-first, monitoring-agnostic AI SRE layer that answers that question. It sits above Checkmk, Prometheus, Icinga2, Zabbix, webhooks, Docker, Kubernetes, AWS, Grafana, GitHub, Git and plain VMs, and:

🌊 Turns alert floods into incidents — one incident per outage, "confirmed by N tools", instead of one page per symptom.

🔇 Finds the noisy checks — flapping, over-sensitive, never-actioned — with evidence.

🤖 Investigates root cause — a tool-calling AI agent reads your live systems and forms a root-cause hypothesis.

🧰 Proposes classified fixes — copy-pasteable, ranked by risk and blast radius, for a human to gate.

🔒 Read-only by design

ninoxAI observes, reasons, and recommends — it never executes anything. No commands run, no alerts acked, no thresholds changed, no write-back to production. Every fix is a copyable artifact a human approves. Gated, governed remediation is on the roadmap; unconditional auto-execute is not.

⚡ Quickstart

Try it in 60 seconds — no LLM, no API keys, fully offline:

cp .env.example .env # set NINOXAI_SECRET_KEY (one-liner is in the file) docker compose up --build # → http://127.0.0.1:8765

No live monitoring? Watch it triage synthetic alert noise:

docker compose exec ninoxai ninoxai generate-mocks docker compose exec ninoxai ninoxai import data/mock_alerts.json docker compose exec ninoxai ninoxai reprocess

→ /recommendations now shows reasoned threshold + flapping fixes

Local Python install (for development)

python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\Activate.ps1 pip install -e ".[dev,embeddings]"

python -m ninoxai generate-mocks python -m ninoxai import data/mock_alerts.json python -m ninoxai reprocess python -m ninoxai serve # → http://127.0.0.1:8765

Then light up the AI SRE: point ninoxAI at a tool-calling LLM (Anthropic / OpenAI / Mistral / a local Ollama) and connect your systems — either directly or via a ninox runner (below). The full end-to-end scenario — real monitoring tools and live investigator capabilities (Docker/Kubernetes/host/AWS/Grafana/GitHub) against a genuinely failing workload — lives in lab/.

🔭 How it works

ingest → normalize → cluster → score noise → recommend → dashboard ↓ agentic, read-only root-cause investigator

Stage What happens

ingest Read-only adapters pull non-OK alerts from each source + JSON/CSV import.

normalize Maps every source onto one schema + message fingerprint.

cluster Groups by host / service / severity / time-window. Semantic embeddings optional.

noise Frequency, ack-rate, ticket-rate, short-recovery, flapping → one 0–1 score.

recommend Rule-based tuning recommendations with rationale + evidence.

investigate A tool-calling LLM gathers live evidence → root-cause hypothesis + classified fixes.

Cross-tool correlation: the same fault fires in every tool. The Incidents view groups clusters that share (host, severity, time-window) into one incident — "confirmed by N tools" — read-only, no merge.

🤖 The AI SRE investigator

ninoxAI's standout capability. A tool-calling LLM drives a typed allowlist of read-only capabilities (a ReAct loop on native function-calling — reason → act → observe), builds a root-cause hypothesis from live evidence, and proposes classified fixes a human approves.

Capability Reads (all read-only)

🐳 Docker containers, logs, stats, inspect

☸️ Kubernetes pods, logs, events, deployments (in-cluster RBAC)

☁️ AWS CloudTrail change events, EC2, security groups, quotas (IAM read-role)

📈 Grafana PromQL + LogQL over the datasource proxy

🐙 GitHub CI runs, releases, PRs — change-event RCA

🌿 Git mirrored repos: commits, diffs, code & history search

🖥️ Host CPU / mem / disk / processes / sockets / log tail (plain VMs)

Every action is classified read_only · reversible · irreversible + a scope (blast radius). Unknown coerces to irreversible — never silently auto.

Pre-grounded: the agent starts with a compact brief of your environment, so it diagnoses instead of rediscovering.

Hardened: untrusted logs/diffs are injection-shielded, secrets are one-way scrubbed, and a grounding gate caps confidence when claims aren't backed by evidence.

Run it live-streaming in the agent console (/agent) or from the CLI. → Investigator internals

🦉 Distributed ninoxes — the agent's eyes, anywhere

The agent can investigate systems it can't reach directly. A ninox is a thin, outbound-only runner that lives inside one environment (cluster, VPC, on-prem segment), holds that environment's credentials locally, and dials home to the brain — no inbound firewall hole. It advertises a read-only capability surface the brain calls as if local.

┌────────────────────┐ ┌────────────────────┐ │ ninoxAI brain │ ◀── outbound only ─── │ ninox runner │ │ dashboard · API │ (the ninox dials │ inside k8s/Docker/ │ │ incidents · RCA │ home; no inbound │ AWS/on-prem/VM │ │ AI SRE investigator│ firewall hole) │ credentials stay │ └────────────────────┘ ◀── read-only evidence │ local │ └────────────────────┘

Capabilities self-select by environment — one binary, the right tools for the box it lands on. Connected ninoxes show up in the Parliament of Owls (/parliament). → Deployment & on-prem

🔌 Connectors

All adapters are read-only — no ack, no downtime, no write-back. Configured in the UI (/connections), credentials Fernet-encrypted.

Checkmk Prometheus Alertmanager Icinga2 Zabbix Generic Webhook PRTG

✅ ✅ ✅ ✅ ✅ ⛔ stub

Want to teach the AI SRE to read your stack (Jira, Sentry, Postgres…)? Point it at any MCP server, write a Python capability plugin, or expose tools via the runner protocol — every external tool runs through the same safety shell (namespaced, injection-scanned, classification-coerced). → Extending capabilities

🧠 LLM providers

Default is template — fully offline: no LLM, no network, no API keys, no tracking. It works out of the box for summaries/recommendations but deliberately can't drive the agent (that needs tool-calling). Pick a remote per role — a cheap model for high-volume summaries, a strong one for the rare investigation:

Provider Notes

template offline — no LLM, no network. Default.

mistral cost-efficient, EU-hosted

anthropic strong tool-calling — default for the investigator

openai OpenAI, Azure, and local LLMs (vLLM / Ollama / LM Studio) via base URL

Redaction + secret-scrubbing run before every remote call — hostnames, IPs, UUIDs, emails, paths become deterministic placeholders, restored only in proposed commands; credentials are one-way scrubbed and never returned. → Technical architecture

🛠️ Development

Full CLI reference, test setup, and lint rules live in docs/development.md.

🤝 Contributing

Every contributor is an Owl. 🦉 Pull requests, connector adapters, capability providers, and bug reports are all welcome — see CONTRIBUTING.md.

Community: Join the parliament on Discord.

📜 License

ninoxAI is fully open source under the Apache License 2.0 — free to use, self-host, fork, and build on, in open or closed projects alike.

The owl observes; the human decides. 🦉

About

Open-source, local-first, read-only AI SRE: clusters alert storms, investigates root cause over your live systems, proposes human-gated fixes.

Topics

kubernetes

devops

self-hosted

sre

observability

incident-management

aiops

Resources

Readme

License

Apache-2.0 license

Contributing

Uh oh!

There was an error while loading. Please reload this page.

Activity

Custom properties

Stars

1 star

Watchers

0 watching

Forks

0 forks

Report repository

Releases

No releases published

Packages 0

Uh oh!

There was an error while loading. Please reload this page.

Uh oh!

There was an error while loading. Please reload this page.

Contributors

Uh oh!

There was an error while loading. Please reload this page.

Languages

Python 84.8%

HTML 8.9%

CSS 3.8%

Shell 1.1%

PowerShell 1.0%

Dockerfile 0.3%

Mako 0.1%