This is so naive. The ToS permits paraphrasing of user conversations, by not excluding it, and then training on THAT. You’d never be able to definitively connected paraphrased data to yours, especially if they only train on paraphrased data that covers frequent, as opposed to rare, topics.
“Hey DoctorPangloss, how can we train on user data without training on user data?”
“You can use an LLM to paraphrase the incoming requests and save that. Never save the verbatim request. If they ask for all the request data we have, we tell them the truth, we don’t have it. If they ask for paraphrased data, we’d have no way of correlating it to their requests.”
“And what would you say, is this a 3 or a 5 or…”
Everything obvious happens. Look closely at the PII management agreements. Btw OpenAI won’t even sign them because they’re not sure if paraphrasing “counts.” Google will.
"We will train new models using data from Free, Pro, and Max accounts when this setting is on (including when you use Claude Code from these accounts)."