this post was submitted on 19 Jul 2024
1 points (100.0% liked)

TechTakes

1276 readers
27 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago
MODERATORS
 

The machines, now inaccessible, are arguably more secure than before.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 0 points 2 months ago* (last edited 2 months ago) (29 children)

Zach Vorhies (who made leaking Google stuff to Project Veritas his entire identity) has the worst possible take: https://twitter.com/Perpetualmaniac/status/1814405221738786984 (lemme gather my thoughts and explain why in the next comment)

[–] [email protected] 0 points 2 months ago (28 children)

Fair warning that I'll be ranty because I hate losers talking about DEI hires.

So why is memory address 0x9c trying to be read from? Well because... programmer error.

So what happened is that the programmer forgot to check that the object it's working with isn't valid, it tried to access one of the objects member variables...

This is a huge assumption. The last rumor I've read from actual cybersecurity people is that Crowdstrike's update files were corrupt. If this is true it's likely still from programmer error at some level, but maybe not as simple as "whoopsie I forgot an if (data == nullptr) teehee".

He, like the rest of us that don't work at Crowdstrike, has no idea what happened. I have seen computers do the weirdest gosh darn things. I know better than to assume anything at this point. I wouldn't even rule out weird stuff like the data getting corrupted between release qualification and release yet.

It turns out that C++, the language crowdstrike is using, likes to use address 0x0 as a special value to mean "there's nothing here", don't try to access it or you'll die.

This thread is full of these sorts of small technical inaccuracies and oversimplifications so I won't point out all of them, but nothing in the C++ standard requires null pointers to refer to memory address 0x0. Nor does it require that dereferencing a null pointer terminates the program.

Windows died not because C++ asked it nicely to, but because a driver tried to access an address which wasn't paged in.

Crowdstrike should have set up automated testing using address sanitizer and thread sanitizer that runs on every code update.

The funny thing about accessing into non-paged memory in kernel spaced:

  1. It will crash regardless of if it's running under Asan or not, sanitizers are literally irrelevant based on what we know so far
  2. The Asan version he linked to is for user-space. In the windows kernel you'd need KASAN instead.

C++ is hard. Maybe they have a DEI engineer that did this

Dude would probably call me a "DEI hire"; but I bet I could beat him in a C++ deathmatch so neener neener.

[–] [email protected] 0 points 2 months ago (2 children)

@sailor_sega_saturn And given enough time and enough scale even the most improbably weird things will eventually happen. Update file corrupted by a storage controller that flips a couple of bits at random after every 720 hours of uptime but only if it’s 23.682 seconds after the hour? Weirder shit has happened.

[–] [email protected] 0 points 2 months ago* (last edited 2 months ago)

some twenty four years ago i managed, amongst others, a company's samba and print server (that was at the time when all the company's servers were beige boxes with less memory and disk than the laptop i'm using to type this – and still they served a few hundred employees).

the machine developed a strange custom of hard-resetting itself, which we initially tracked to specific files being sent for printing; the behaviour was fully reproducible.

as it happened, it was a hardware fault somewhere between the mainboard and the integrated SCSI card; installing a separate SCSI card and reconnecting the disks and backup tape device fixed the problem. (i did not have the budget for a new serwer, no.)

establishing the actual cause took me fucking weeks.

[–] [email protected] 0 points 2 months ago

I once helped one of my company's customers troubleshoot an issue that had seen the same ridiculous edge case error happen three times over the course of a few years. At one point the actual sustaining developer we worked with was able to narrow down a specific bit that was getting flipped somehow, and pitched that cosmic radiation was a plausible solution given how rarely this kind of thing impacted other customers.

It was at this point that we remembered that the customer was either a university with a nuclear physics lab or a hospital with a nuclear medicine program (can't remember now, ironically enough) that the server rack lived adjacent to.

load more comments (25 replies)
load more comments (25 replies)