Hack has at least two definitions in a computing context.
- A nifty trick or shortcut that is useful. “Check out this hack to increase your productivity.”
- Accessing something you shouldn’t. “They hacked into the database.”
A lot of times they sort of get used in conjunction to describe interesting ways to gain access to secure systems, but using it to describe accessing insecure things you shouldn’t is still a valid usage of the phrase.
That said I definitely wanna see the company face charges for this, this is insane.
No, this was a data leak. The word “hack” has legal implications and shifts the blame away from the company and onto the individual who discovered the leak.
Yeah, if I leave my house door wide open for a few weeks and I get robbed, it’s still burglary.
Terrible analogy. A webserver is not at all like a door. It doesn’t block or allow traffic to and from your file system.
A web server is more like a receptionist. It handles requests. “Can I have your basic catalog?” “Certainly, here you go.”
“Can I get this item from your basic catalog?” “Certainly.”
“I don’t see it in your catalog, but my buddy said he got this other item from you. Can I have this other item too?” “Absolutely.”
“Can I borrow your stapler?” Sure. “How about a pad of paper?” “Of Course”. “Can I just have the contents of your supply closet?” “Here you go.” “How about your accounting files, can I get those?” “No problem!” “How about your entire customer list?” “Consider it done!”
When you hire a receptionist and specifically tell them to give customers anything they request, that’s entirely on you. You have to at least make a token effort to restrict access to only authorized users before you can even claim that a particular user was unauthorized.
This wasn’t burglary. This was putting up signs that say “come in” and labeling everything in your house with “free” stickers.
Believe it or not a lot of hacking is more like this than you think.
Social engineering is probably 95% of modern attack vectors. And that’s not even unexpected, some highly regarded computer scientists and security researchers concluded this more than a decade ago.
When the technical side reaches a certain level of security, the humans become the weakest link.
*if
We reached that part a long time ago.
Clearly the authors of this app did not. Hence “if.”
Humans were still very much the weak link here. The tools to do this even mildly securely are available, well documented, and honestly, cheap af
Peak Vibe Coding results.
while True:
Jesus Christ
You know that’s not the Tea code, but the downloader, right?
Other reports state the Tea backend was Vibe Coded: https://www.ainvest.com/news/tea-app-data-breach-exposes-72-000-users-ai-generated-code-security-lapse-2507/
There’s nothing wrong with manually breaking a loop.
There’s nothing wrong with eating a banana with a knife and fork, either.
Except living with the shame.
Well these people probably don’t wash their hands so knife fork is the most sanitary way.
I absolutely despise Firebase Firestore (the database technology that was “hacked”). It’s like a clarion call for amateur developers, especially low rate/skill contractors who clearly picked it not as part of a considered tech stack, but merely as the simplest and most lax hammer out there. Clearly even DynamoDB with an API gateway is too scary for some professionals. It almost always interfaces directly with clients/the internet without sufficient security rules preventing access to private information (or entire database deletion), and no real forethought as to ongoing maintenance and technical debt.
A Firestore database facing the client directly on any serious project is a code smell in my opinion.
It’s like people learn how to make a phone app in React Native or whatever, but then come to the shocking and unpleasant realisation that a data-driven service isn’t just a shiny user interface - it needs a backend too.
But they don’t know anything about backend, and don’t want to, because as far as they are concerned all those pesky considerations like data architecture, availability, security, integrity etc are all just unwanted roadblocks on the path to launching their shiny app.
And so, when a service seemingly provides a way to build an app without needing to care about any of those things, of course they take it.
And I get it, I really do. The backend usually is the genuine hard part in any project, because it’s the part with all the risk. The part with all the problems. The place where everything can come crashing down or leak all your data if you make bad decisions. That’s the bothersome nature of data-driven services.
But that’s exactly why the backend is important, and especially the part you can’t build anything decent without thinking about.
sounds like firebase itself is a hack.
I’m honestly embarrassed by my fellow devs more often than not these days.
What the fuck happened to craftsmanship? Or taking pride in your work?
oh right, techbro startup culture garbage ended it.
I think it’s less about the tech picked and more about developers with no sense of security and a poor understanding of networking. I’ve seen far too many web applications where the developer needed some sort of database behind it (MySQL, PostGres, MSSQL) and so they stood up either a container or entire VM with a public IP and whatever the networking layer set to allow any IP to hit the database port. The excuse is almost always something like, “we needed the web front end to be able to reach the database, so we gave the database server/container a public IP and allowed access”. Which is wonderful, right up until half of the IP addresses in Russia start trying to brute force the database.
I agree that this is ultimately a problem with developers lacking security knowledge and general understanding, but my issue with Firestore specifically is that it is a powerful tool that, while it can be adopted as part of a carefully considered tech stack, lends itself most naturally towards being a blunt force instrument used by these kinds of developers.
My main criticism of Firestore is that it offers a powerful feature set that is both extremely attractive to amateur or constrained developers while simultaneously doing a poor job of guiding said amateurs towards creating a secure and well designed backend. In particular, the seemingly expected use case of the technology as something directly interfaced with by apps and other clients, as evidenced by the substantial support and feature set for this use case, is the main issue. This no-code no-management client driven interaction model makes it especially attractive to these developers.
This lack of indirection through an API Gateway or service, however, imposes additional design considerations largely delegated to the security rules which can easily be missed by a beginner. For example:
- Many examples of amateurs take an open-by-default approach, only applying access and write restrictions where necessary and miss data that should be restricted
- Some amateurs deploy databases with no access or write restrictions at all
- There is no way to only allow a “view” of a document to a request, instead a separate document and security rules containing the private fields needs to be created. This can be fairly simple to design around but seems to be a bit of a “gotcha”, plus if you have similar but non identical sets of data that needs to be accessible by different groups it must be duplicated and manually synchronized.
- Since there is no way to version data models, incompatible changes require complicated workarounds or an increasingly complicated deserialization process on the client side (especially as existing clients continue to write outdated models).
- Schema validation of data written by clients to the database is handled by security rules, which is seemingly unintuitive or missed by many developers because I’ve seen plenty of projects miss it
- If clients are writing data directly, it can become fairly complex to handle and subsequently maintain their contributions, especially if the aforementioned private data documents are required or the data model changes.
All of these pitfalls can be worked around (although I would still argue for some layer of indirection at least for writes), but at this point I’ve been contracted to 2 or 3 projects worked on by “professionals” (derogatory) that failed to account for any of these issues and I absolutely sick to death of it. I think a measure of a tools quality is whether it guides a developer towards good practices by design and I have found Firestore to completely fail in that regard. I think it can be used well, and it is perfectly appropriate for small inconsequential (as in data leaks would be inconsequential) single developer projects, but it almost never is.
This is a very good writeup.
Do you think supabase or other similar solutions also have these pitfalls?
AI just enables the shit programmers to create a greater volume of shit
I’ll tape this to my office door.
My favorite one I’ve seen so far was “AI can take a junior programmer and make them a 10x junior programmer.”
You could say they “spilled the tea”.
This reminds me of how I showed a friend and her company how to get databases from BLS and it’s basically all just text files with urls. “What API did you call? How did you scrape the data?”
Nah man, it’s just… there. As government data should be. They called it a hack.
When getting data legitimately is beyond them…
ah yes, the forbidden curl hack
I remember when a senior developer where i worked was tired of connecting to the servers to check its configuration, so they added a public facing rest endpoint that just dumped the entire active config, including credentials and secrets
That was a smaller slip-up than exposing a database like that (he just forgot that the config contained secrets) but still funny that it happened
That’s not a “senior developer.” That’s a developer that has just been around for too long.
Secrets shouldn’t be in configurations, and developers shouldn’t be mucking around in production, nor with production data.
Yeah the whole config thing in that project was an eldritch horror of a legacy, too ingrained in both the services and tooling to be modified without massive rewrites
What is the Tea hack?
An app called Tea™ was marketed as a safespace for women and used government issued IDs as a way to verify users.
4Chan users leaked all of the IDs onto the larger internet.
So it essentially became a honey trap, either through malice or sheer incompetence.
Well, I get what you mean, but a “honey trap” idiom in English, also called a “honeypot scheme”, usually refers to utilizing romantic connections to influence people to make decisions or release confidential information.
Honeypot is a common term in computing/cybersecurity, setting up fake important servers so bad actors invade and the security team can analyze what got in and how to deal with it.
Well it doesnt surprise me that the IT team doesn’t know about a sexual terminology, tbh.
They’re all over master-slave, tho 😏
I always get irrationally angry when i see python code using os.path instead of pathlib. What is this, the nineties?
What big advantages does pathlib provide? os.path works just fine
- Everything is in one library which offers consistency for all operations.
- You can use forward slashes on Windows paths, which makes for much better readability.
- You can access all the parts of a pathlib object with attributes like .stem, .suffix or .parent.
- You can easily find the differences between paths with .relative_to()
- You can easily build up complex paths with the / operator (no string additions).
Just off the top of my head.
I suppose os.path is simpler? It’s a string and operation.
Python is all about ‘attention efficiency,’ which there’s something to be said for. People taking the path of least resistance (instead of eating time learning the more complex/OOP pathlib) to bang out their script where they just need to move a file or something makes sense. I’m with you here, but it makes sense.
…Also, os.path has much better Google SEO, heh.
Make a PR
Not a big fan of the wording here. Plenty of skilled programmers make dumb mistakes. There should always be systems in place to ensure these dumb mistakes don’t make it to production. Especially when related to sensitive information. Where was the threat model and the system in place to enforce it? The idea that these problems are caused by “shit programmers” misses the real issue: there was either no system or an insufficient system to test features and define security requirements.
I can tell you exactly what happened. “Hey Claude, I need to configure and setup a DB with Firebase to store images from our application.” and then promptly hit shift+tab and then went to go browse Reddit.
nothing was tested. nothing was verified. They let the AI do its thing they checked in on it after an hour or so. once it was done it was add all, commit -m “done”, push origin master. AI doesn’t implement security stuff. there was zero security here.
Shift + tab?
basically “autopilot” for claude code.
Ah, thanks.