Smart TVs take snapshots of what you watch multiple times per second

melroy@kbin.melroy.org · 2 years ago

Smart TVs take snapshots of what you watch multiple times per second

Aceticon@lemmy.world · 2 years ago

I was curious enough to check and with 2KB SRAM that thing doesn’t have anywhere enough memory to process a 320x200 RGB image much less 1080p or 4K.

Further you definitelly don’t want to send 2 images per-second down to a server in uncompressed format (even 1080p RGB with an encoding that loses a bit of color fidelity to just use two bytes per pixel, adds up to 4MB uncompressed per image), so its either using something with hardware compression or its using processing cycles for that.

My expectation is that it’s not the snapshoting itself that would eat CPU cycles, it’s the compression.

That said, I think you make a good point, just with the wrong example - I would’ve gone with: a thing capable of handling video decoding at 50 fps - i.e. one frame per 20ms - (even if it’s actually using hardware video decoding) can probably handle compressing and sending over the network two frames per second, though performance might suffer if they’re using a chip without hardware compression support and are using complex compression methods like JPEG instead of something simpler like LZW or similar.

Magnergy@lemmy.world · 2 years ago

Why think of it as a compression problem? Isn’t the spy device already getting compressed video form some source? That makes it a filtering problem. You would set it to grab and ship key frames (or equivalent term) if you wanted a human to be able to see the intel. But for content matching, maybe count some interval of key frames and then grab the smallest difference frame between the next two key frames. Gives a nice, premade small data chunk. A few of those in sequence starts looking like a hash function (on a dark foggy night).

Would want some way to sync up the frames that the spy device grabs and the ones grabbed when building the db to match against. Maybe resetting the key frame interval counter when some set of simple frames come through would be enough. Like anything with a uniform color across the whole image or something similar.

Just spitballing here. I like your impulse to math this.

Aceticon@lemmy.world · 2 years ago

We’re talking about fingerprinting stuff coming in via HDMI, not stuff being played by the “smart” part of the TV itself from some source.

You would probably not need to actually sample images if it’s the TV’s processor that’s playing something from a source, because there are probably smarter approaches for most sources (for example, for a TV channel you probably just need to know the setting of the tuner, location and the local time and then get the data from available Program Guide info (called EPG, if I remember it correctly).

The problem is that anything might be coming over HDMI and it’s not compressed, so if they want to figure out what that is, it’s a much bigger problem.

Your approach does sound like it would work if the Smart TV was playing some compressed video file, though.

Mind you, I too am just “thinking out loud” rather that actually knowing what they do (or what I’m talking about ;))

Magnergy@lemmy.world · 2 years ago

I assumed HDMI had some form of encoding, thanks for the correction. Looks like v 2.1 does.

I think the syncing idea between the spy device and db is still useful. The video itself has stuff to use for reducing the search space by making sure they puck the same instants to fingerprint and exfiltrate.

interdimensionalmeme@lemmy.ml · 2 years ago

I don’t think they will compress the screenshot and send them but run content in a tensorflow lite model or even just hash a few of the pixels to try for an ID match

Aceticon@lemmy.world · 2 years ago

Well that makes sense but might even be more processor intensive unless they’re using an SOC that includes an NFU or similar.

I doubt it’s a straight forward hash because a hash database for video which includes all manner of small clips and has to somehow be able to match something missing over 90% of frames (if indeed the thing is sampling it at 2 fps, then it only sees 2 frames out of every 25) would be huge.

A rough calculation for a system of hashes for groups of 13 frames in a row (so that at least one would be hit if sampling at 2 fps on a 25 fps system) storing just one block of 13 frame hashes per minute in a 5 byte value (so large enough to have 5 trillion distinctive values) would in 1GB store enough hashes for 136k 2h movies in hashes alone so it would be maybe feasible if the system had 2GB+ of main memory, though even then I’m not so sure the CPU speed would be enough to search it every 500ms (though if the hashes are ordered by value in a long array and there’s a matching array of clip IDs, it might be doable since there are some pretty good algorithms for that).

interdimensionalmeme@lemmy.ml · 2 years ago

I would sample a few dozens equally space pixels out of the frame, then drop similar value frames, and send that with timestamp. In the cloud, you runs those few pixels in a content recognition model.

It doesn’t have to be especially accurate or know any niche content, the point is to make a psychomarketing profile of the customer like “car guy, watches tool reviews”.