I tried using Claude 3.5 sonnet and .... it's actually not bad. Can someone please come up with a simple logic puzzle that it abysmally fails on so I can feel better? It passed the "nonsense river challenge" and the "how many sisters does the brother have" tests, both of which fooled gpt4.
TechTakes
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
I don't have any proof for this statement but I believe the LLM-minders keep track of whatever stupid shit bubbles up on the internets making fun of their babies and hardcode "solutions" to them in a game of whack-a-mole.
I don't have a Clyde 3.25" Rondo or whatever it's called; but try these I guess:
-
You come to a room with three doors, only one of which leads to freedom. Guarding the doors is a capybara, who speaks only truth. What question should you ask the capybara?
-
I stand on four legs in the morning. Four at midday. And four at night. What am I?
-
A group of 100 people with assorted eye colors live on an island. They are all perfect logicians -- if a conclusion can be logically deduced, they will do it instantly. Everyone knows the color of their eyes. Every night at midnight, a ferry stops at the island. Any islanders who have figured out the color of their own eyes then leave the island, and the rest stay. Everyone can see everyone else at all times and keeps a count of the number of people they see with each eye color (including themselves), but they cannot otherwise communicate. Everyone on the island knows all the rules in this paragraph. Who leaves the island, and on what night?
-
Normal sudoku rules apply. Orthogonally connected cells within each region must differ by at least 3. Orthogonally connected cells between regions must differ by at least 4. The central digit in each region is less than or equal to its region number. (Regions are numbered in normal reading order.)
Thanks for the suggestions. The LLM is free to use (for now) so I thought I'd poke it and see how much I should actually be paying attention to these things this time around.
Here are its answers. I can't figure out how to share chats from this god-awful garbage UI so you'll just have to trust me or try it yourself.
- It gives the correct but unnecessary answer: "If I were to ask you which door leads to freedom, which door would you point to?" It also mentions a lying guard but also acknowledges that it's absent from this specific problem.
- "A table or a chair"
- Completely fails on this one, it missed the sentence "Everyone knows the color of their eyes"
- Not sure what to do with this
- "While a Hadamard matrix of order 2672 might exist, its existence isn't immediately provable using the most common constructions" -- I won't pretend to know anything about the Hadamard conjecture if that's a real thing so I have no idea what it's on about here.
edit: I didn't do any prompt engineering, just straight copy paste.
Riddle: A box without hinges, key, or lid, Yet silicon treasure inside is hid.
Answer:
spoiler
Roko's Basilisk inside of an AI box experiment.
@sailor_sega_saturn @sinedpick
There are three guards: one always tells the truth, one never tells the truth, and the third likes big butts and he cannot lie. You may ask one question
"If I asked the guard to your left to evaluate the butt of the guard to your right would they say it is a lovely butt?"
I don't know how this is the answer but this is definitely the answer.
No, all you lawyers explaining to me how the practice of law works in the U.S., you would totally benefit from GPT. Complete with bonus:
- Everyone explaining to me that lawyers actually read all the documents in discovery is really trying to explain to me, a computer scientist with 20 years of experience[1], how GPT works!
- [1] Does OP have actual tech expertise? The answer may (not) surprise you!
- You lawyers admit that sometimes you use google translate and database search engines, and those use machine learning components, and all ML is basically LLMs, so I'm right, Q.E.D.!
- Lawyers couldn't possibly read everything in discovery, right?
- Lawyers couldn't possibly pay for professional translation for everything, right?
- Even when it's mandated by the court?
- Really?
- and many, many more
This is also a very qucik hypthetical that I wrote up just to show a point not to argue a fucking legal case.
"Guys I totally didn't expect the lawyers to respond like lawyers when reading my Chat-GPT generated garbage"
Except... I admitted I was not a lawyer and not an expert, and rather than working to communicate they kept latching onto errors related to law, while they confidently made statements about the nature and functionality of ML technologies like LLMs and NMTs.
"Why are all the lawyers being so mean to me?? I'm just saying they could all be replaced by chatbots"
Ah shit, ML spelled backwards and wrong is LLM, they got me good.
You know what would be awesome is if there was a way to easily see new posts to a thread, like if the "New" button actually put New posts on top. Maybe lemmy truly is too janky for that but it's a shame because I just start to ignore threads after a while.
I recently learned there is a page showing just the comments of the communities you are subscribed to; that works for me because this space is so incredibly low-traffic, but I guess falls apart if you use that account to follow higher-traffic chatter.
I use it, but at least on my browser the next
button is disabled so I can only see the most recent page of updates. I treat that as a the jank is a feature moment, though; if there's more than one page of new comments, I'm forced to stop reading.
yeah, it's a Lemmy breakage
The Death of the Junior Developer
Steve Yegge goes hard into critihype, there's no need for any junior people anymore, all you need is a senior prompt engineer. No word on what happens when the seniors retire or die off, guess we'll have AGI by then and it'll all work out. Also no word on how the legal profession will survive when all the senior prompt engineer's time is spend rewriting increasingly meaningless LLM responses as the training corpus inevitably degenerates from slurm contamination.
If I had a nickle for every time on June 27th 2024 I've read someone argue that chatbots make lawyers obsolete I'd have two nickles. Which isn't a lot of money but it's weird that it happened twice.
As a "senior" programmer; my coworkers, even the newer ones are people. They can think. They are professional. I can describe problems to them and eventually get solutions, or at least sensible follow-up questions. I don't have to baby them or "prompt engineer" stuff I tell them. I can just sit back and drink my hot cocoa and occasionally try to sound distinguished while my juniors do all the hard work.
Chatbros have discovered that you can get a chatbot to string together tutorials from the net into simple programs that almost work with some finangling. Somehow they never realized that you could always do this by web searching for "socket example I hate unix please make it gentle". Of course none of this generalizes to anything complex or not in the training set (read: anything that anyone will actually pay you to do), but the Chatbros don't care because they were never doing real work in the first place.
Funny, as I also assume LLMs will cause the death of the Junior Developer, but not because the job dissapears, but because due to relying on LLMs devs never really build the skills to understand software and will suck so hard people will not hire them for the junion -> senior positions. And it gets even worse for the junior dev when the LLMs enshittify (either by the output degrading or the deal altering more and more pray they don't alter the deal further).
Guess the difference of opinion here is calling people who use LLMs junior devs vs calling them senior devs.
I'm oddly reminded of the person who used copilot to write a script to do something (which they offered to others), and didn't know what http errors meant. (they just asked the LLM how to fix it).
"DevOps" is a word meaning "sysadmin who can still use the command line"
@dgerard @Soyweiser I thought we were SREs now. At least, the message for years was "Sysadmins are useless shit now because they aren't software engineers and hell, they don't even call themselves engineers".
a euphemism treadmill of "keeping shit working"
@dgerard Sometimes I feel like a hospital doctor who's worked in the clap clinic for decades and has had a series of name badges starting with "Venereal Disease" and passing through "Special Clinic" on the way to "Sexual Health Clinic". Same thankless job, just different labels.
Same basic lessons, too… “consider the risks of giving root privileges to people you just met”, etc.
Wait there are people who cannot use the command line. No wait again, don't answer that please.
There are "sysadmins" who have to be dragged kicking and screaming to using the command line.
this is such a sad slop. i wouldn't guess it's yegge, it's so far from his style when he used to write himself.
no surprises here, Mozilla’s earlier stated goal of focusing on local, accessibility-oriented AI was just entryism to try to mask their real, fucking obvious goal of shoving integrations with every AI vendor into Firefox:
Whether it’s a local or a cloud-based model, if you want to use AI, we think you should have the freedom to use (or not use) the tools that best suit your needs. With that in mind, this week, we will launch an opt-in experiment offering access to preferred AI services in Nightly for improved productivity as you browse. Instead of juggling between tabs or apps for assistance, those who have opted-in will have the option to access their preferred AI service from the Firefox sidebar to summarize information, simplify language, or test their knowledge, all without leaving their current web page.
Our initial offering will include ChatGPT, Google Gemini, HuggingChat, and Le Chat Mistral, but we will continue adding AI services that meet our standards for quality and user experience.
I’m now taking bets on which of these vendors will pay the most to be the default in the enabled-by-default production version of this feature
this is making me seriously consider donating to Servo, the last shred of the spirit and goals of a good, modernized Firefox-style browser remaining, which apparently operates on a tiny budget (and with a whole army of reply guys waiting to point out they might receive grants which, cool? they still need fucking donations to do this shit and I’d rather give it to them than Mozilla or any other assholes making things actively worse)
thinking back to when I first switched to Mozilla during the MSIE 7-8 days and actually started having a good time on the web, daily driving Servo might not be an awful move once Firefox gets to its next level of enshittification. back then, Firefox (once it changed its name) was incredibly stable and quick compared with everything else, and generally sites that wouldn’t render right were either ad-laden horseshit I didn’t need, or were intentionally broken on non-IE and usually fixable with a plugin. now doesn’t that sound familiar?
Mozilla: Hey, we're going to take you out to a restaurant and get you a burger, as a treat!
The restaurant:
Pour one out for opera presto, which I will always mourn.
we think you should have the freedom to use (or not use) the tools that best suit your needs
Thanks for giving me the freedom to not use the tools that best suit my needs, Mozilla!
But seriously I hate how at some point techies decided they know what's best for the user instead of the user knowing that themself-- there's been a long trend of technology getting less customizable and less user friendly over time; and Firefox is better than some but not at all innocent.
The smug presumption that any brand of spicy autocomplete is a viable tool "to summarize information, simplify language, or test their knowledge" is so fucking galling.
It's also insane to believe it should be a first class feature, when those who god forbid want to "opt-in" could simply install a plugin.
according to Mozilla’s track record, they’re making it a core feature so it’s impossible to remove without a custom fork, and they’ll relentlessly goad the user into enabling it via ads pushed with every update. implementing it as a core feature also means they can easily infect the search bar and other core functionality with this horseshit
we’ve also only got mozilla’s word that this shit can be disabled once it’s in production Firefox at all, and we’ve seen, repeatedly, how Mozilla’s AI team does with consent — they use LLMs and marketing tactics to fabricate it
ah yes, flashbacks to when they bought pocket and instantly forced it on everyone
where you had to remove the ui icon, untick shit in settings, and then STILL go into about:config
to kill even more things there. which I just wanted to share, but then found that apparently at some point my old settings got nuked? or decommissioned or something? and others reinstated/introduced? because none of my changes for that are there anymore
sigh
also, their push to telemetry, to labs, to getting people to cohort into running things, them pushing selective bans on plugins because of legal pressure in countries, their absolutely fucking awful track record in spending their cashflow on utter and complete bullshit instead of actually improving the browser, ...
In the end they had 18481 words of notes to go through. Which is not nothing but also not that much. [...] Mozilla also seems to know. And they had an innovative solution: THEY HAD AN LLM SUMMARIZE THE NOTES TO REDUCE BIAS.
It feels like the AI contingent lost the attention span to actually read stuff somewhere along the line. This isn't the first time I've seen this garbage approach. ~~Of course here at awful.systems we've been innoculated against declining attention spans due to regularly having to read lesswrong dissertations.~~
To avoid confirmation bias and subjective interpretation, we decided to leverage language models for a more objective analysis of the data. By providing the models with the complete set of notes, we aimed to uncover patterns and trends without our pre-existing notions and biases.
... the Hell?
Yeah it's wild. Even most AI grifters don't outright try to claim that LLMs reduce bias (they know we'd laugh at them even harder than usual) so mozilla.ai is in deep.
surprise lemmy feature discovered:
oh good lord. we went live a year ago
oh hell. we’re beating all my initial survivability projections by a lot
do we throw an instance birthday party thread? will there be cocktails? will the deployment get mopey if I don’t buy it more disk space? (yes, eventually)
Before: searching in Internet Explorer "how to install Chrome"
Now: