good ol' nagios (or one of its forks)
Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam posting.
-
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
-
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
-
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
-
No trolling.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
grafana is pretty annoying to learn and setup but it does everything you seem to want.
i have a mixed set of containers (a few, not too many) and bare-metal services
Containers run on bare metal. Or are you running them in a vm?
I run containers on bare metal indeed.
I have services running in containers on bare metal and services running without containers, on bare metal.
I think Prometheus is a good industry standard. It can do everything you listed except for restarting stuff. It's got a decent built-in monitoring capability and you can extend it trivially to monitor anything. For example I wrote a 5-liner to monitor ZFS health and another for LVM. I even monitor my routers with it. OpenWrt has an installable node exporter for Prometheus.
Service restarting is a remote execution capability and generally falls outside of the monitoring domain. You'd be better off implementing that with another process/service manager. If you're running systemd, that's one of its primary purposes. You can use it to start/stop/restart containers just like normal processes.
Can you share a guide / tutorial on how to accomplish what OP wants (or just get started with Prometheus)? I was in the same boat as OP and settled for netdata, and eventually gave up on monitoring altogether because it was either overwhelming me with data, too cumbersome to set up or had features behind paid plans.
Anytime you're asking this, go for the projects Quick Start / Getting Started doc. In this case here. If you're on a Debian based system Prometheus is already packaged in the repository so you don't have to download the latest. You likely won't win anything but the pain for having to set up the bare binary as a service with systemd. I followed that doc to setup mine but installed it from apt.
On a second thought, if you're getting it from the repo and it already has a systemd unit defined, it might be more difficult to follow the Getting Started doc. You know what, follow it as-is. Once you have something running and monitoring ad-hoc, it'll be easy to install from apt and put your config in it.
You can do most of not all of this with CheckMk but it's probably overkill.
I use Homarr and Enjoy it a lot. Nice Interface, can Monitor not only Services but also the Server itself and is quite Customizable.
I tried to spin up a Homarr docker container the other day after seeing it on YouTube, but because it's located in ghcr it just wouldn't install.
I even added ghcr to my resources in docker using my password and an API key, but still no dice.
I'm missing something obvious, but I'm not sure what, any pointers?
Edit: I've just tried again and this time it hasn't failed with an error message, just hanging in Portainer stacks deployment instead
Edit 2: I left it hanging and checked while I was out and about (love Tailscale)and it's working now!
If you go with dashboard approach, I would suggest Homepage
I like Homarr because it is Drag and Drop to edit the Page. I think with Homepage i would have to edit the config file.
Have you tried Cockpit? It has pretty nice Podman integration.
Give https://github.com/louislam/uptime-kuma a try. I'm planning to do the same for similar use case. Sensu (sensu.io) is a more sophisticated option but it requires more infrastructure and there is a bit of a learning curve with it.
While I really like uptime kuma, it seems a bit too restricted for OPs use case. For example, to monitor disk or CPU usage, you would need to write your own scripts. It would be doable, but not very nice.
At least how I understood the.question, OP would probably look for something like icinga.
Yeah better fit but a bit of trouble to setup.. What's your opinion on Icinga? Never used it myself.
We had it at work, but I never did anything else than receiving and resolving alerts. But it looked good for me and I liked the system.