Local AI question
Anon789523r:
doesn’t this make the relative safety of the AI models themselves moot? As in, I get that the AI models are just numbers and so on but if the files they are bundled with could still be executing telemetry on behalf of their providers, does that make the distinction between models and wider files unimportant in terms of their effects?
That’s not how it works. The model itself doesn’t send any telemetry. What might send telemetry is the code that sends data to the model or code that sends data from the model back to the app.
Model providers for open-weights models (like Moonshot, Deepseek, or Facebook) can’t collect data here, because they don’t provide code for these steps.
Cloud model providers (OpenAI, Anthropic, OpenRouter, nano-gpt, …), on the other hand, very much provide the code for these steps so they may or may not collect telemetry here.
In the case of Draw Things (or any other local app), Draw Things controls the code for these steps, so they decide what data is collected and sent to whom.
In general, the entity running the model chooses the inference framework and how data is sent to/from the model. This is typically also the entity choosing where, how, and what is collected in terms of telemetry, logs, training data, etc.
if all of this is the case, why do people so often characterise local AI as a qualitatively better approach from a privacy point of view?
Because anything that runs on your computer and wants to collect telemetry needs to send that data through your network. This gives you (or someone technical enough) the option to inspect if data is sent somewhere. If a provider claims they don’t collect telemetry and do everything locally and you still see them sending data over the network something smells funny.
With cloud-hosted AI this method of verification is not possible because you send your data to them to have it processed. You can’t see what’s happening on their servers so they may or may not collect telemetry.
or does this risk persist regardless the file type?
The (security) risk of files from the internet containing 3d party malware or spyware exists regardless of file type.
The risk of providers adding telemetry/tracking into model code distributed on Huggingface is file-type specific. That said, Huggingface is very strict with not having this in the models they have on the platform, which is why you find initiatives like the one you linked (new/custom file formats) to make this requirement easier to enforce.
Discussion in the ATmosphere