this post was submitted on 01 Jul 2024

0 points (NaN% liked)

Lemmy.ca's Main Community

2809 readers

1 users here now

Welcome to lemmy.ca's c/main!

Since everyone on lemmy.ca gets subscribed here, this is the place to chat about the goings on at lemmy.ca, support-type items, suggestions, etc.

Announcements can be found at https://lemmy.ca/c/meta

For support related to this instance, use https://lemmy.ca/c/lemmy_ca_support

founded 3 years ago

MODERATORS

[email protected]

About the outage (July 1st 2024) (lemmy.ca)

submitted 4 months ago* (last edited 4 months ago) by [email protected] to c/[email protected]

15 comments fedilink hide all child comments

Happy Canada Day everyone.

Related to the outage that happened last night, we rebooted the Lemmy services but we're still trying to figure out the root cause, which seems to point to an out of memory issue in the logs. However it's not what we see in our monitoring console.

In the meantime, we will monitor the service more closely until we are confident the issue is resolved, and we will improve our tools to detect such a problem faster.

Apologies for the extended downtime.

top 15 comments

sorted by: hot top controversial new old

[–] [email protected] 0 points 4 months ago (1 children)

Hello,

Good luck with the troubleshooting!

As I suggested elsewhere, could you maybe setup a "status" community on another instance (e.g. sh.itjust.works, it's Canadian as well), so that people can go there to see updates about the potential outages?

[–] [email protected] 0 points 4 months ago (1 children)

We have https://status.lemmy.ca already in place, we'll try to keep it updated as much as possible.

[–] [email protected] 0 points 4 months ago* (last edited 4 months ago) (1 children)

Indeed, but I was more talking about a Lemmy community where people would be able to discuss and give each other information.

I actually stumbled upon someone asking a question on Reddit, it could have been interesting to have a place to redirect this person to: https://old.reddit.com/r/Lemmy/comments/1dtj4hc/can_anyone_help/

The alternative is a Matrix room, but it might take some time to set up compared to just a community on another instance

[–] [email protected] 0 points 4 months ago (1 children)

I'm looking at making a custom CloudFlare error page that embeds the status page. At least we'll be able to put some communication there when something happens without people having to guess where to go.

[–] [email protected] 0 points 4 months ago

Sounds great, thanks!

[–] [email protected] 0 points 4 months ago

Happy Canada day, legends.

[–] [email protected] 0 points 4 months ago

thank you

[–] [email protected] 0 points 4 months ago

Thanks for everything you guys do. And on Canada Day no less!

[–] [email protected] 0 points 4 months ago

Happy Canada Day! Lol

[–] [email protected] 0 points 4 months ago (2 children)

Good morning and happy Canada Day. Thanks for working tirelessly to get things running.

Seems like the API, apps and other frontends were working but the main web frontend wasn't? I wonder if it is anything similar to what happened to the unfortunate feddit.de

[–] [email protected] 0 points 4 months ago* (last edited 4 months ago) (2 children)

I believe feddit.de is moving to feddit.org actually, the original instance might have gone down fully now

https://feddit.org/c/main

I just learned about it the other day: https://lemmy.ca/comment/10102577

[–] [email protected] 0 points 4 months ago

https://feddit.org/post/49429

I found a post with context on oldfeddit's main, it's someone writing in German saying the site [note: specifically the frontend] is down and the admin is MIA. Feddit.org is ran by a Viennese non-profit.

[–] [email protected] 0 points 4 months ago

Hi there! Looks like you linked to a Lemmy community using a URL instead of its name, which doesn't work well for people on different instances. Try fixing it like this: [email protected]

[–] [email protected] 0 points 4 months ago* (last edited 4 months ago)

Something got into a weird state and restarting either the backend or frontend didn't help. Taking the entire stack down and then bringing it back up, resolved it.

It's weird since it crashed at 1am and at 3am we gradually restart all backend and frontends, so that automatic restart should have fixed it too. All the containers reported healthy, but nginx wasn't reporting any available frontends.

I suspect some sort of weird lemmy bug, but we'll just have to improve monitoring for now and try to debug this more if it happens again.

[–] [email protected] 0 points 4 months ago

Appreciate the quick response even on Canada Day! Have a good one!