• fonix232@fedia.io
    link
    fedilink
    arrow-up
    4
    arrow-down
    4
    ·
    14 hours ago

    Yes it is trivial.

    LLM can already do tool calling, emotion metadata output and so on. It would take minimal effort for a well tuned model to also output things like facial expressions, body language, hand and body movements and so on.

    • chicken@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 hours ago

      The tool calling part might be doable (though I’ve personally struggled to get it working passably with local models), but if the goal is to tell a compelling story or create an interesting experience, especially if doing so in a very open ended way, that isn’t trivial. The only LLM based game I’ve personally played that seemed good was mostly on rails and partially scripted, most of them just aren’t very interesting to play, because the model doesn’t have a good idea where it’s going with anything and is often not very creative, the stuffy personality of the instruct model seems to infect the dialogue and apparent thought process of the characters. For a specific example I’d recommend watching streams of the game Suck Up, which has a genuinely cool concept and solid execution, but you can see people being frustrated running into its limitations as something to interact with creatively.

      I’ve tried a couple times to start game projects involving LLMs, and get the feeling that there is a lot of exploration that needs to be done into what can be done well and where that intersects with what is actually fun. Kind of don’t expect EA to be the one to do that.

    • jjjalljs@ttrpg.network
      link
      fedilink
      English
      arrow-up
      2
      ·
      12 hours ago

      I don’t think it would be easy to map free form text to game behavior. Not just like “make the NPC smile” but complex behavior like “this NPC will now go to this location and take this action”. That seems like it would be very error prone at best.

      • fonix232@fedia.io
        link
        fedilink
        arrow-up
        2
        arrow-down
        2
        ·
        11 hours ago

        How do you think most game scripting engines work?

        Nowadays game engines don’t rely on strictly speaking hardcoded behaviour, but rather are themselves just a scripting environment to execute a specific format of code.

        Skyrim is still the perfect example because it gives you the ability to literally do anything in the world, via a scripting language.

        Instructing NPCs to behave in a specific way is also done through these scripts. And LLMs - especially coding fine-tuned ones which could be tied into the execution chain - can easily translate things like <npc paces around> to specific instructions so the NPC walks up and down at a specific distance or in a circle or whatever you want it to do.

        You’re seriously over-estimating the work it takes on even crappy, but modern engines to get certain things to happen. Especially when it comes to things that are already dynamically scripted. Like NPCs.

        • jjjalljs@ttrpg.network
          link
          fedilink
          English
          arrow-up
          3
          ·
          11 hours ago

          LLM generated code is notoriously bad. Like, “call this function that doesn’t exist” is common. Maybe a more specialized model would do better, but I don’t think it would ever be completely reliable.

          But even aside from that, it’s not going to be able to map the free form user input to behavior that isn’t already defined. If there’s nothing written to handle “stand on the table and make a speech”, or “climb over that wall” it’s not going to be able to make the NPC do that even if the player is telling them too.

          But maybe you’re more right than I am. I don’t know. I don’t do game development. I find it hard to imagine it won’t frequently run into situations where natural language input demands stuff the engine doesn’t know how to do.

          • fonix232@fedia.io
            link
            fedilink
            arrow-up
            3
            arrow-down
            2
            ·
            10 hours ago

            Alright I did read further and damn, you just keep going on being wrong, buddy!

            Yes, you can fucking do “stand on the table and make a speech” work. You know how? By breaking it up into detailed steps (pun intended), something that LLMs are awesome at!

            For example in this case the LLM could query the position and direction of the table compared to the NPC and do the following:

            • plan a natural path between the two points (although the game engine most likely already has such a function)
            • make the NPC follow that path
            • upon path end, it will instruct the NPC to step onto the table via existing functions (Skyrim pretty much has all these base behaviours already coded, but the scripting engine should also be able to modify the skeleton rig of an NPC directly, which means the LLM can easily write it)
            • then the script can initiate dialogue too.

            I’ve asked Perplexity (not even one of the best coding agents out there, it’s mistake ratio is around 5%), and within seconds it spit out a full on script to identify the nearest table or desk, and start talking. You can take a look here. And while my Papyrus is a bit rusty, it does seem correct on even the third read-through - but that’s the fun part, one does not need trust the AI, as this script can be run through a compiler or even a validator (which let’s be honest is a stripped down compiler first stage) to verify it isn’t faulty, which the LLM can then interact with and iterate over the code based on the compiler feedback (which would point out errors).

            now mind you this is the output of an internet-enabled, research oriented LLM that hasn’t been fine-tuned for Papyrus and Skyrim. With some work you could probably get a 0.5B local model that does only natural language to Papyrus translation, combined with a 4B LLM that does the context expansion (aka what you see in the Perplexity feed, my simple request being detailed step by step) and reiteration.

            You’d also be surprised just how flexible game engines are. Especially freeroaming, RPG style engines. Devs are usually lazy so they don’t want to hardcore all the behaviours, so they create ways to make it simple for game designers to actually code those behaviours and share between units. For example, both a regular object (say, a chair) and a character type object (such as an NPC) will have a move() function that moves them from A to B, but latter will have extra calls in that function that ensure the humanoid character isn’t just sliding to the new position but taking steps as it moves, turns the right direction and so on. Once all these base behaviours are available, it’s super easy to put them together. This is precisely why we have so many high quality Skyrim mods (or in general for Bethesda games).

            And again, code quality in LLMs has come a VERY long way. I’m a software engineer by trade, and I’d say somewhere between 80-90% of all the code I write is actually done by AI. I still oversee it, review what it does, direct it the right way when it does something silly, but those aren’t as minor functionalities as we’re talking here. I’ve had AI code a full on display driver for a microcontroller, with very specific restrictions, in about 4 hours (and I’d argue 2 of that was spent with running the driver and evaluating the result manually then identifying the issue and working out a solution with the LLM). In 4 hours I managed to do what otherwise would’ve taken me about a week.

            Now imagine that the same thing only needs to do relatively small tasks, not figure out optimal data caching and updating strategies tied to active information delivery to the user with appropriate transformation into UI state holders.

            • jjjalljs@ttrpg.network
              link
              fedilink
              English
              arrow-up
              3
              ·
              10 hours ago

              Yes, you can fucking do “stand on the table and make a speech” work. You know how? By breaking it up into detailed steps (pun intended), something that LLMs are awesome at!

              My intended point was the LLM at run time taking user input wouldn’t be able to do “make a speech” if the game engine doesn’t have that concept already encoded. And if the game is presented as “take user input and respond believably” then users are going to ask for stuff the engine can’t do. Maybe there’s no animations for climbing. Maybe they did some shortcuts and the graphics look bizarre when stuff is elevated.

              I wasn’t talking about Skyrim specifically.

              But also you’re being unpleasant in this exchange, so you can win.

          • fonix232@fedia.io
            link
            fedilink
            arrow-up
            2
            arrow-down
            4
            ·
            10 hours ago

            Okay I won’t even read past the first paragraph because you’re so incredibly wrong that it hurts.

            First generation LLMs were bad at writing long batches of code, today we’re on the fourth (or by some metric, fifth) generation.

            I’ve trained LLM agents on massive codebases that resulted in <0.1% fault ratio on first pass. Besides, tool calling is a thing, but I guess if I started detailing how MCP servers work and how they can be utilised to ensure an LLM agents doesn’t do incorrect calls, you’d come up with another 2-3 year old argument that simply doesn’t have a foot to stand on today.

            • jjjalljs@ttrpg.network
              link
              fedilink
              English
              arrow-up
              1
              ·
              10 hours ago

              lol if you had read the rest of my post you would have seen I admitted you might be right. But go off, I guess.

    • v_krishna@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      2
      ·
      10 hours ago

      It’s very obvious in this thread that you have hands on experience and many others do not. 20+ years professional SWE here, a majority of it applied ML/big data/etc. LLMs are really bad at many things but specifically using them as a natural language layer over NPC interactions would be relatively easy and seems like a great use case honestly.