Hey HN, I’m Kenneth. I spent several years as a Senior SRE at Cloudflare.
One thing that became painfully clear over time is that most outages, security issues, and compliance fire drills don’t come from a lack of tools. They come from missing context. People don’t know what’s running, how things connect, or what changed recently, especially once systems sprawl across clouds, repos, and teams.
That’s why I’m building OpsCompanion.
The goal is simple: keep a live, shared picture of what’s actually running and how it fits together.
OpsCompanion helps engineers:
See a live, visual map of services, infrastructure, and dependencies
Answer “what changed?” without digging through five tools, Slack threads, or outdated docs
Preserve operational context so the next person on call isn’t starting from zero
This isn’t about adding more logs or alerts, or slapping AI on top of existing dashboards. It’s about capturing the mental model experienced operators carry in their heads and keeping it shared and up to date.
It’s still early, and there are rough edges. I’ve opened it up to a small group of engineers who work close to production so I can get honest feedback. If it’s useful, great. If not, I genuinely want to understand why and what would make it better.
You can try it here:
https://opscompanion.ai/?utm_source=hn&utm_medium=show_hn&ut...
I’ll be around in the comments. Happy to answer technical questions, hear skepticism, get a bit roasted, or talk about what actually breaks in real systems.