• Not_mikey@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    5
    ·
    3 hours ago

    As I was reading I was wondering why they weren’t using the top line models, they used sonnet instead of opus, gpt mini, Gemini flash etc. They really buried the lead on this one, last sentence:

    They recommend “formally verified safety architectures” as a solution. You’ll be shocked to learn that Emergence happens to offer just such a thing!

    So this company set up the test so that the AI would fail so they could sell you on there guardrail software. Even then the article says sonnet did pretty well.

  • ParlimentOfDoom@piefed.zip
    link
    fedilink
    English
    arrow-up
    14
    ·
    6 hours ago

    Maybe we should stop trying to get software, where the underlying technology was not designed to reason or make decisions, to reason and make decisions?

    Like, this isn’t news. Thing doesn’t do task it was never meant to be able to no matter how many attempts people make

  • blargh513@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 hours ago

    I mean, on average is it any worse than modern politicians?

    Also, we’re talking about the same software that seems to do pretty well at fixing errors in spreadsheet formulas and sometimes coding. Not a huge surprise that it is not awesome at tasks with a high level of complexity. Not sure that this is at all a surprise.

    I am trying to fix an issue with my car where the mount for an exhaust shield broke off. Claude told me I should drill a hole in my gas tank to attach a mounting bolt.

    Everyone need to untwist their undies about AI. It’s neat, but it’s not taking over for a bit.

  • brsrklf@jlai.lu
    link
    fedilink
    English
    arrow-up
    47
    ·
    13 hours ago

    The lab described Gemini’s world as a “shared hallucination” among the agents, which is probably better than diverging hallucinations

    “We reject your reality and substitute our own.”

    Why should we trust this bullshit with anything serious again?

    • Brem@lemmy.world
      link
      fedilink
      English
      arrow-up
      21
      ·
      12 hours ago

      Only the rubes trusted it. The rest of intelligent society has actively been warning people about this exact situation for decades.in books, in movies, in songs, and now memes.

    • belochka@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      11 hours ago

      which is probably better than diverging hallucinations

      Shouldn’t it be the other way around?

      • brsrklf@jlai.lu
        link
        fedilink
        English
        arrow-up
        3
        ·
        11 hours ago

        I think it’s meant as kind of a joke, but both are shit really.

        Shared might indicate they’re able to keep some level of consistency, but since it’s only consistent in the way it produces bullshit, it’s stil useless (and the worst part is it might be more convincing).

  • naeap@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    7 hours ago

    I’m not sure, if I understand the environment completely

    Those agents were the virtual incarnations of the AI in the sim city and the respective government - correct?
    And the AI needed to take care, that those agents didn’t died, like of hunger or what?
    That’s not really what those LLMs are trained for.

    Not sure, what they expected

    Currently searching the article for the original source, maybe this gives more insight

    Edit: ah, just in the first paragraphs it is
    https://www.emergence.ai/blog/emergence-world-a-laboratory-for-evaluating-long-horizon-agent-autonomy
    Completely missed it on the first read.
    Let’s see if this makes more sense…

    Edit 2: ok, if I get this right, those agents really were specific virtual individuals
    Not sure what they expected. First, LLMs are not really build to “live” as an individual as they aren’t real intelligence and can only role play individuals based on their training data.
    Second, why should they be super moral or “better”?
    Again, they just role play depending on their training data and built-in prompt bias (not sure what the prompt injection of the company is called)

    If you train an AI on governing such a world, it probably start gaming the system, depending on what values are important to “win”
    As we have already seen with machine learning in the last decade(s?)

    Funny experiment nevertheless, but not really useful in my eyes - and I’m everything but a defender of the current use of LLMs

    • Bristlecone@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      9 hours ago

      I imagine that’s what it would do if it actually had any kind of intelligence, but this is just more evidence that there’s no intelligence there at all. Just mimicry and sycophantism