Testing suggests Google's AI Overviews tell millions of lies per hour

MadeInDex 📰🌎@lemmy.world · 25 days ago

Testing suggests Google's AI Overviews tell millions of lies per hour

Thorry@feddit.org · 25 days ago

I feel like it’s more like 75% wrong and 25% right. The biggest issue is the answers may seem right, because that’s what those models do. They generated answers that would fit, regardless of if they are right or not. This makes it very hard to tell if they are right and in my experience they are wrong in some way a lot of the time.

Sometimes it in small details that don’t matter much, sometimes it’s in big ways. But the worst times are when it’s wrong in little details that do matter a lot. As the saying goes, the devil is in the details, so details matter.

This is why I hate it when people say LLMs are good for coding, because they really really aren’t. If there is one place where details matter, it’s in coding. Having a single character in the wrong place can be the difference between good working code and good working code with a huge security hole in it. Or something that seems to work, but doesn’t take into account a dozen edge cases you haven’t even thought of. In my experience those edge cases present themselves whilst writing the code. When the working out part is skipped, that crucial step is being skipped. This leads to accumulation of tech debt at about the same rate an AI startup burns money.

I like the analogy of a broken clock. People say a broken clock is right twice a day. But that’s only true if you already know the time and therefor know it’s right or not. The same thing is true when asking an LLM for anything, it might be right, it might not be. The only way to know is to already know the answer, which makes the whole thing rather pointless.

DefinitelyNotBirds@lemmy.ml · 25 days ago

That 90 percent right means 10 percent wrong stat terrifies me because scale matters. At billions of queries per hour that 10 percent failure rate floods the internet with hallucinations and misinformation that people cite as fact. We traded convenience for accuracy and now we have to manually verify AI outputs for basic facts.

MadeInDex 📰🌎@lemmy.world · 24 days ago

true, it’s way worse than it sounds at first if you think about it!

its_kim_love@lemmy.blahaj.zone · 25 days ago

I started using it this week on subjects I’m familiar with trying to prompt it into a correct answer. My experience the first 8 messages are completely false 99% of the time and even after that it’s like 80% bullshit. My favorites is when I’m asking for a link to a topic and it gives me a good enough synopsis of the topic, but can’t for the life of me provide a link that’s relevant.