this post was submitted on 12 Jul 2024

25 points (83.8% liked)

Selfhosted

40173 readers

652 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago

MODERATORS

[email protected]

Restart an OOM killed docker automatically (lemmy.world)

submitted 4 months ago by [email protected] to c/[email protected]

32 comments fedilink hide all child comments

I got an home server that is running docker for all my self hosted apps. But sometimes I accidentally trigger Earlyoom by remotely starting expensive docker builds, which kill docker.

I don't have access to my server outside of my home network, so I can't manually restart docker in those situations.

What would be the best way to restart it automatically? I don't mind doing a full system restart if needed

top 32 comments

sorted by: hot top controversial new old

[–] [email protected] 5 points 4 months ago* (last edited 4 months ago)

Do you have your services set up with restart=unless-stopped? I wonder if that would auto restart them after OOM.

[–] [email protected] 1 points 4 months ago (1 children)

You should be able to make docker exempt from early oom. Check it's github for instructions.

[–] [email protected] 1 points 4 months ago

But can it prevent killing only docker, and not the build/big containers processes?

[–] [email protected] 17 points 4 months ago (1 children)

Use -m and limit the build job's memory so it doesn't kill the docker daemon.

[+] [email protected] -19 points 4 months ago (4 children)

Fair enough. But I don't want a bandaid fix solution. Even more that I do all my docker through portainer and the option isn't there.

It could also be useful if a container got a memory leak and is unbounded

[–] [email protected] 4 points 4 months ago

??? Your original proposed solution is literally a bandaid fix.

[–] [email protected] 10 points 4 months ago

This is not a bandaid, this is the solution. What you try is, at least for this scenario, the band aid.

[–] [email protected] 16 points 4 months ago (1 children)

The other person may have responded with a fair amount of hostility, but they're absolutely correct. I run Kubernetes clusters hosting millions of containers across hundreds of thousands of VMs at my job, and OOMKills are just a fact of life. Apps will leak memory, and you're powerless to fix it unless you're willing to debug the app and fix the leak. It's better for the container to run out of memory and trigger a cgroup-scoped OOM kill. A system-wide OOM kill will murder the things you love, shit in your hat, and lick your face like David Tennant licked Krysten Ritter.

[–] [email protected] 1 points 4 months ago (2 children)

Oh that's not a problem to let a container get killed. It's perfectly fine. What I want is just not crippling my whole server because one container did a funny.

If it keeps docker and the portainer VM I'll be 100% ok, because I can just restart it. I don't want to have remote access to my server outside of my home for security reasons, so this is just the bare minimum

[–] [email protected] 5 points 4 months ago

Those remote access fears can be solved with a wireguard VPN

[–] [email protected] 0 points 4 months ago

I don't want to have remote access to my server outside of my home for security reasons, so this is just the bare minimum

What are your security concerns?

[–] [email protected] 35 points 4 months ago (2 children)

This isn't a band-aid, it's the literal fix.

Structuring the available CPU and Memory reservations for containers is LITERALLY the entire reason containers exist. Just because you're only familiar with the "dumb" way of using them doesn't mean you should be dismissive when someone offers you advice when you come here asking for it.

You're also seemingly just a dick for being lazy, because I looked, and wuddyaknow. So now you're just rude, dickish, and lazy.

Take the advice from the original responder, and then go and learn how to use the things you're asking for help with, along with some manners.

[–] [email protected] -5 points 4 months ago* (last edited 4 months ago) (1 children)

Alright, sorry for calling it a "bandaid fix". It wasn't just the right term for what I wanted to say. I was more referring on how it would only fix issues in cases of builds, and not on actual runtime, which can also be an issue if I am not careful. So yeah, it's the fix for the issue in the post, but this solution made me realise that this isn't the only thing I want.

But the second part is... Just chill. It's a home server. Not a high availability cluster. I can afford stupid things. Heck, I'm only asking this question because I got stupid and haven't limited the job count of a cargo build, downing my server. I don't care that my build crash. I just want to not have to manually restart it, because when I'm not here I can't do it.

As for the link that you sent, it's container limitations, not image building limitations. And I already have setup some on my most hungry container, stats shown that it blew past it, so idk what's going on there.

Edit: NVM. This is a bandaid fix. What if you forgot to put the flag? Like it's been 5 month since last time and forgot to do the same fix? Or you accidentally removed it while editing the command? I'm actually looking for a solution that fixed my problem fully, not a partial solution

[–] [email protected] 2 points 4 months ago

Then you didn't explain the issue very well, because what you're asking for was given to you exactly. Builds also have flags, and you should know that if you're complaining about advice given to you. I'm not saying that to admonish you, just giving you the info.

The next step down is that you're using Portainer, and having user-error issues somehow. So another solution is renaming these actions something with a very obvious prefix like "BUILD ACTION", but also setting memory limits.

The very last step is making sure your swap is in order. Allocate 2x your system memory to swap, and this will help alleviate OOM issues to a point, but especially during builds.

If you come back and say this is a band-aid solution, get a better machine and stop asking questions to solve the impossible in here. This is your fault this is an issue to begin with, you don't know how to run your machines (regardless of it just being a home server or whatever ), and you're just being rude.

[–] [email protected] -4 points 4 months ago (1 children)

bro chill

[–] [email protected] 18 points 4 months ago (2 children)

You can't expect people who are knowledgeable about this stuff to just forever accept that someone asks for advice, gets told the solution, and then ignores/belittles the person with knowledge.

This is our daily life experience. We get hired to be experts, and get told by non-experts that our solutions are not tenable every single day. Only for that solution to eventually be accepted when the user in question figures out their idea was not useful and the expert was correct.

We have to put up with it at work, we are not obliged to accept it here.

[–] [email protected] -3 points 4 months ago

There's a difference between helping people with misunderstanding a tool and belittling them for being wrong. It's just a matter of wording that separate an helpful answer from a toxic one

I could tell you "You should actually use Y instead of X. They are numerous benefits like A, B and C. The doc actually have a great example you may have missed or not understood it was for this purpose. It will help you a lot more than what you are thinking of doing." And this would be fine.

But "Just use Y. X is bad because Y is made for that. You not willing to use Y shouldn't make you do X. There's even a the first Google link on how to do it" isn't fine.

And I have not belittled them at all. I have said that it wasn't what I was looking for. A lot of times people post questions they think should solve their issue, but only to realise that they didn't fully understand the full picture and theirs problem is on a larger scale.

[–] [email protected] -2 points 4 months ago* (last edited 4 months ago) (3 children)

we are not obliged to accept it here.

He wasn't obligated to respond at all. He choose to be unchill. He wasn't even the person they replied to, and neither are you the person I replied to. Seems to me like you guys just wanna complain!

[–] [email protected] -1 points 4 months ago

I was obliged to respond to let him know that he was actually provided the correct answer, and he didn't need to respond to the person who provided the correct answer like that. I don't feel it's right to sit idly by and let people who are only trying to help for free be getting snark like that. Obliged, much.

[–] [email protected] 4 points 4 months ago (1 children)

You sound like you work in product

[–] [email protected] -2 points 4 months ago

You sound like you have zero costumer contact, thank god

[–] [email protected] 6 points 4 months ago

In which way am I complaining? I am explaining why calling a valid solution a bandaid might be construed as belittling their very real knowledge of this process. And how that is a regular pattern in a lot technical fields.

And don't give me this shit about 'I'm not the person you were talking to' This is an open forum not a direct/private message.

[–] [email protected] 13 points 4 months ago

Systemd has config options for automatic restart of crashed services. https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#Restart=

[+] [email protected] -7 points 4 months ago (2 children)

I don't know the best way but I would use cron and start docker every minute (if it's not running).

[–] [email protected] 8 points 4 months ago (2 children)

I don't know the best way

Apparently...

Don't do this. Either don't go OOM to begin with (somebody else told you how to limit container memory usage} and/or configure systemd to restart docker if it quits. I'm surprised systemd isn't already.

[–] [email protected] 1 points 4 months ago (1 children)

Seems like the best solution. I'll look into it

[–] [email protected] 0 points 4 months ago (1 children)

Seems like the best solution.

Over using a system tool designed to monitor and restart services that stop?

[–] [email protected] 1 points 4 months ago (1 children)

? I'm agreeing with you?

[–] [email protected] 1 points 4 months ago

Sorry - was ambiguous and thought you were saying the "cron" thing sounded best.

[–] [email protected] 1 points 4 months ago* (last edited 4 months ago) (1 children)

It's usually good to state why something is good or bad :)

[–] [email protected] 1 points 4 months ago* (last edited 4 months ago)

It's fairly obvious I feel.

You're saying rather than use a system tool that does the exact thing that you want you should bodge together a cron job that accomplishes your goal but doesn't actually do what you want.

Like say you want to stop the docker service for some reason? systemctl stop docker will do that. Then your cron job will restart it. That's not the desired outcome. You want the service running IF the service SHOULD be running. Which is a different thing than "always running". And its' exactly what you get for free with systemd without any silly custom BS.

[–] [email protected] -5 points 4 months ago

I'll try that. I know that systemctl has a start-or-reload command, but is there any "start-or-ignore" commands? Or start flags?