• i_love_FFT@jlai.lu
    link
    fedilink
    English
    arrow-up
    3
    ·
    12 hours ago

    The main breakthrough of LLM happened when they figured out how to tokenize words… The subsequent transformer architecture was already being tested on various data types and struggled compared to similarly advanced CNN.

    When they figured out word encoding, it created a buzz because transformers could work well with words. They never quite worked as well on images. For that, stable diffusion (a variation on CNN) has always been better.

    It’s only because of the buzz on LLMs that they tried applying them to other data types, mostly because that’s how they could get funding. By throwing in disproportionate amount of resources, it works… But it would have been so much more efficient to use different architectures.