• Dr. Saxon Crawfish@lemmy.today
    link
    fedilink
    arrow-up
    1
    ·
    6 hours ago

    ██████ ███ █████ █████ ██ ██████ ██ ████ ███ ████ ████ ██ ████████ ██████ ███ █ ███ █████ ████ Bill Clinton ██████ ███ █████ █████ ██ ██████ ██ ████ ███ ████ ████ ██ ████████ ██████ ███ █ ███ █████ ████

  • exaybachae@startrek.website
    link
    fedilink
    arrow-up
    20
    ·
    3 days ago

    Those files are kinda a nightmare to navigate in their bare state. And the datasets are huge. I doubt anyone training AI would allow them to go through knowingly, less it was specifically a police invesigation and case law focused AI that was designed to process and categorize that kind of data.

    Most AI are designed for functional discussion and factual data processing. It’s not a great idea to just feed in random trash.

    • r00ty@kbin.life
      link
      fedilink
      arrow-up
      26
      ·
      3 days ago

      I had to use cloudflare to stop AI crawlers from using like 60% of my 16 core server that runs this instance. They were spending that much time pulling fediverse content, multiple bots without and wait time between requests. You really think they’d reject epstein files but seek out our combined output?

    • degenerate_neutron_matter@fedia.io
      link
      fedilink
      arrow-up
      5
      ·
      2 days ago

      They scrape data indiscriminately; I’m sure any Epstein files publicly accessible on the internet have been added to their databases. Perhaps they’d be filtered out before being used to train models but I’m skeptical they take that level of care with the data.

  • theywilleatthestars@lemmy.world
    link
    fedilink
    arrow-up
    4
    ·
    2 days ago

    And also everyone’s vague notes about, like, the Sword of Truth mass isekai hatefic they wanted to write back in 2024 but then gave up on because they mentioned the Battle of Cable Street and then had to stare at a wall for a bit and walk away in shame