• Scrubbles@poptalk.scrubbles.tech
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 month ago

    It’s not 100%, and you’re more or less just asking the LLM to behave, and filtering the response through another non-perfect model after that which is trying to decide if it’s malicious or not. It’s not standard coding in that it’s a boolean returned - it’s a probability that what the user asked is appropriate according to another model. If the probability is over a threshold then it rejects.