The ubuntu:24.04 Docker image is only 77.30 MiB.
alpine:3.19.0 is 7.38 MiB.
Of course those sizes are without a kernel. Typical everything-included distro kernels are generally a few hundred MiB as they include drivers for everything that might be needed, but a custom build for known hardware can reduce that to just a few MiB.
A smarter system won’t just take the mean of the votes from different instances but rather discard outliers as invalid input (flagging repeat offenders to be ignored in the future) and use the median or mode of the remainder. The results should also be quantitized to avoid leaking details about sources or internal algorithms; only the larger trends need to be reported.
Of course you could always just keep the collected data private and only provide it to customers willing to pay $$$ for access, which handily limits instance operators’ ability to reverse-engineer the source of the data. And nothing prevents you from using separate instances for public and private data sets.