Show HN: Nightwatch, The open-source, read-only AI SRE
Nightwatch (ninoxAI) is an open-source, read-only AI SRE that ingests alerts from monitoring tools, clusters them into incidents, investigates root cause using an AI agent, and proposes fixes for human approval. It is local-first and monitoring-agnostic.
Notifications You must be signed in to change notification settings
Fork 0
Star 1
BranchesTags
Open more actions menu
Folders and files
NameName
Last commit message
Last commit date
Latest commit
History
87 Commits
87 Commits
.github
.github
deploy/k8s
deploy/k8s
docs
docs
lab
lab
src/ninoxai
src/ninoxai
tests
tests
.dockerignore
.dockerignore
.env.example
.env.example
.gitignore
.gitignore
CLAUDE.md
CLAUDE.md
CONTRIBUTING.md
CONTRIBUTING.md
Dockerfile
Dockerfile
Dockerfile.embeddings
Dockerfile.embeddings
LICENSE
LICENSE
NOTICE
NOTICE
README.md
README.md
alembic.ini
alembic.ini
docker-compose.yml
docker-compose.yml
pyproject.toml
pyproject.toml
Repository files navigation
The open-source, read-only AI SRE.
ninoxAI turns alert storms into incidents, investigates root cause over your live systems, and proposes human-approved fixes — without ever touching production.
Quickstart · AI SRE · Demo lab · Docs · Discord
Your monitoring tells you something broke. It pages you at 3am with fifty alerts for one outage and leaves the hard part to you:
What broke, why did it break, and what should we do next?
ninoxAI is a thin, local-first, monitoring-agnostic AI SRE layer that answers that question. It sits above Checkmk, Prometheus, Icinga2, Zabbix, webhooks, Docker, Kubernetes, AWS, Grafana, GitHub, Git and plain VMs, and:
🌊 Turns alert floods into incidents — one incident per outage, "confirmed by N tools", instead of one page per symptom.
🔇 Finds the noisy checks — flapping, over-sensitive, never-actioned — with evidence.
🤖 Investigates root cause — a tool-calling AI agent reads your live systems and forms a root-cause hypothesis.
🧰 Proposes classified fixes — copy-pasteable, ranked by risk and blast radius, for a human to gate.
🔒 Read-only by design
ninoxAI observes, reasons, and recommends — it never executes anything. No commands run, no alerts acked, no thresholds changed, no write-back to production. Every fix is a copyable artifact a human approves. Gated, governed remediation is on the roadmap; unconditional auto-execute is not.
⚡ Quickstart
Try it in 60 seconds — no LLM, no API keys, fully offline:
cp .env.example .env # set NINOXAI_SECRET_KEY (one-liner is in the file) docker compose up --build # → http://127.0.0.1:8765
No live monitoring? Watch it triage synthetic alert noise:
docker compose exec ninoxai ninoxai generate-mocks docker compose exec ninoxai ninoxai import data/mock_alerts.json docker compose exec ninoxai ninoxai reprocess
→ /recommendations now shows reasoned threshold + flapping fixes
Local Python install (for development)
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\Activate.ps1 pip install -e ".[dev,embeddings]"
python -m ninoxai generate-mocks python -m ninoxai import data/mock_alerts.json python -m ninoxai reprocess python -m ninoxai serve # → http://127.0.0.1:8765
Then light up the AI SRE: point ninoxAI at a tool-calling LLM (Anthropic / OpenAI / Mistral / a local Ollama) and connect your systems — either directly or via a ninox runner (below). The full end-to-end scenario — real monitoring tools and live investigator capabilities (Docker/Kubernetes/host/AWS/Grafana/GitHub) against a genuinely failing workload — lives in lab/.
🔭 How it works
ingest → normalize → cluster → score noise → recommend → dashboard ↓ agentic, read-only root-cause investigator
Stage What happens
ingest Read-only adapters pull non-OK alerts from each source + JSON/CSV import.
normalize Maps every source onto one schema + message fingerprint.
cluster Groups by host / service / severity / time-window. Semantic embeddings optional.
noise Frequency, ack-rate, ticket-rate, short-recovery, flapping → one 0–1 score.
recommend Rule-based tuning recommendations with rationale + evidence.
investigate A tool-calling LLM gathers live evidence → root-cause hypothesis + classified fixes.
Cross-tool correlation: the same fault fires in every tool. The Incidents view groups clusters that share (host, severity, time-window) into one incident — "confirmed by N tools" — read-only, no merge.
🤖 The AI SRE investigator
ninoxAI's standout capability. A tool-calling LLM drives a typed allowlist of read-only capabilities (a ReAct loop on native function-calling — reason → act → observe), builds a root-cause hypothesis from live evidence, and proposes classified fixes a human approves.
Capability Reads (all read-only)
🐳 Docker containers, logs, stats, inspect
☸️ Kubernetes pods, logs, events, deployments (in-cluster RBAC)
☁️ AWS CloudTrail change events, EC2, security groups, quotas (IAM read-role)
📈 Grafana PromQL + LogQL over the datasource proxy
🐙 GitHub CI runs, releases, PRs — change-event RCA
🌿 Git mirrored repos: commits, diffs, code & history search
🖥️ Host CPU / mem / disk / processes / sockets / log tail (plain VMs)
Every action is classified read_only · reversible · irreversible + a scope (blast radius). Unknown coerces to irreversible — never silently auto.
Pre-grounded: the agent starts with a compact brief of your environment, so it diagnoses instead of rediscovering.
Hardened: untrusted logs/diffs are injection-shielded, secrets are one-way scrubbed, and a grounding gate caps confidence when claims aren't backed by evidence.
Run it live-streaming in the agent console (/agent) or from the CLI. → Investigator internals
🦉 Distributed ninoxes — the agent's eyes, anywhere
The agent can investigate systems it can't reach directly. A ninox is a thin, outbound-only runner that lives inside one environment (cluster, VPC, on-prem segment), holds that environment's credentials locally, and dials home to the brain — no inbound firewall hole. It advertises a read-only capability surface the brain calls as if local.
┌────────────────────┐ ┌────────────────────┐ │ ninoxAI brain │ ◀── outbound only ─── │ ninox runner │ │ dashboard · API │ (the ninox dials │ inside k8s/Docker/ │ │ incidents · RCA │ home; no inbound │ AWS/on-prem/VM │ │ AI SRE investigator│ firewall hole) │ credentials stay │ └────────────────────┘ ◀── read-only evidence │ local │ └────────────────────┘
Capabilities self-select by environment — one binary, the right tools for the box it lands on. Connected ninoxes show up in the Parliament of Owls (/parliament). → Deployment & on-prem
🔌 Connectors
All adapters are read-only — no ack, no downtime, no write-back. Configured in the UI (/connections), credentials Fernet-encrypted.
Checkmk Prometheus Alertmanager Icinga2 Zabbix Generic Webhook PRTG
✅ ✅ ✅ ✅ ✅ ⛔ stub
Want to teach the AI SRE to read your stack (Jira, Sentry, Postgres…)? Point it at any MCP server, write a Python capability plugin, or expose tools via the runner protocol — every external tool runs through the same safety shell (namespaced, injection-scanned, classification-coerced). → Extending capabilities
🧠 LLM providers
Default is template — fully offline: no LLM, no network, no API keys, no tracking. It works out of the box for summaries/recommendations but deliberately can't drive the agent (that needs tool-calling). Pick a remote per role — a cheap model for high-volume summaries, a strong one for the rare investigation:
Provider Notes
template offline — no LLM, no network. Default.
mistral cost-efficient, EU-hosted
anthropic strong tool-calling — default for the investigator
openai OpenAI, Azure, and local LLMs (vLLM / Ollama / LM Studio) via base URL
Redaction + secret-scrubbing run before every remote call — hostnames, IPs, UUIDs, emails, paths become deterministic placeholders, restored only in proposed commands; credentials are one-way scrubbed and never returned. → Technical architecture
🛠️ Development
Full CLI reference, test setup, and lint rules live in docs/development.md.
🤝 Contributing
Every contributor is an Owl. 🦉 Pull requests, connector adapters, capability providers, and bug reports are all welcome — see CONTRIBUTING.md.
Community: Join the parliament on Discord.
📜 License
ninoxAI is fully open source under the Apache License 2.0 — free to use, self-host, fork, and build on, in open or closed projects alike.
The owl observes; the human decides. 🦉
About
Open-source, local-first, read-only AI SRE: clusters alert storms, investigates root cause over your live systems, proposes human-gated fixes.
Topics
kubernetes
devops
self-hosted
sre
observability
incident-management
aiops
Resources
Readme
License
Apache-2.0 license
Contributing
Contributing
Uh oh!
There was an error while loading. Please reload this page.
Activity
Custom properties
Stars
1 star
Watchers
0 watching
Forks
0 forks
Report repository
Releases
No releases published
Packages 0
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Contributors
Uh oh!
There was an error while loading. Please reload this page.
Languages
Python 84.8%
HTML 8.9%
CSS 3.8%
Shell 1.1%
PowerShell 1.0%
Dockerfile 0.3%
Mako 0.1%