• 2 Posts
  • 62 Comments
Joined 1 year ago
cake
Cake day: June 24th, 2023

help-circle













  • I have decreased my meat consumption to about a third than it used to be in recent years. I’m not qualified to do an in-depth study about all the ramifications of the CO2 emissions, but agriculture being just about 11.2% of all emissions sounds like eating less cow won’t cut it to “save ourselves”

    I have a hunch that shit will hit the fan and there will be a massive reduction in CO2 emissions because of a supply chain failure. Third world countries produce the vast majority of “low manufacturing complexity” products, which will be made even more unsustainable if those regions become a scorched earth. That, coupled with a lesser incentive to travel due to an adverse climatic situation, and a trend in population decrease due to an overall quality of life degradation, will really be the reason why we will reduce emissions, simply because things stop working and become unsustainable

    Either way, I don’t think it’s possible to really predict the future and even less so in such a complex society where technology might be a game changer all of the sudden, so my opinion is not really that valid. Even educated estimates using proper statistics/data cannot guess the implications of new wars, AI, new scientific breakthroughs etc



  • People get very confused about this. Pre-training “ChatGPT” (or any transformer model) with “internet shitposting text” doesn’t cause them to reply with garbage comments, bad alignment does. Google seems to have implemented no frameworks to prevent hallucinations whatsoever and the RLHF/DPO applied seems to be lacking. But this is not “problem with training on the entire web”. You can pre-train a model exclusively on a 4-chan database that with the right finetuning you would see a perfectly healthy and harmless model. Actually, it’s not bad to have “shitposting” or “toxic” text in the pre-training because that gives the model an ability to identify it and understand it

    If so, the “problem with training on the entire web” is that we would be drinking from a poisoned well, AI-generated text has a very different statistical distribution from the one users have, which would degrade the quality of subsequent models. Proof of this can be seen with the RedPajama dataset, which improves the scores on trained models simply because it has less duplicated information and is a more dense dataset: https://www.cerebras.net/blog/slimpajama-a-627b-token-cleaned-and-deduplicated-version-of-redpajama