The problem with you techbros is you can’t imagine anything at smaller scales. But what you just said… Jabber is here for 25 years. That means it is good enough for tons of people. Not everybody needs a shiney new toy and if free software doesn’t scale, then who cares. It can and will still work for those of us willing to share the burden and for those that can’t, each one of me can accommodate at least a few such users and those that just won’t… Fuck em. We don’t have to capture every use case to be of value. I use jabber. I have plans to self host it. It works and has done so for 25 years. Furthermore AIM captured everything a chat needs to do, why do we keep reinventing this wheel when there are much more interesting problems that need to be solved.
If you look in your access logs, or /var/log/nginx/access.log and look for user agents in the log file that indicate things like chatgptbot, etc. Then add
if ($http_user_agent ~* "useragent1|useragent2|... useragents") { return 403; }
to the server block of your websites config file in /etc/nginx/sites-enabled/. You can also add a robots.txt that forbids scraping. Chatgpt generally checks and respects that… for now. This paired with some of the stuff above should work.