I do put a lot of value on privacy, but I actually don't really find that problematic. I don't have the expectation to be able to just erase everything I wrote online, I assume it is public and persistent. So for things where I want to preserve private aspects I have to take that into account from the start.
There are certainly individual cases where some post might turn out to be problematic later, but that's more of an exception and can be handled by manual intervention.
I do find it annoying in some places how users can just erase their posts without a reason. Because they don't just erase their own content, they take the responses to it with them or at least make them less understandable. And those people responding did put some effort into it.
Actually, this is the motivation __to__ nuke. It's what I mean about the environment changing. HN is one of the big places that get scraped and everyone is scraping all data that they can these days. It's very reasonable to believe you can be de-anonymized much more easily and that it's only going to get easier. It's not unreasonable to think that someone can mimic my speech and this creates a new vulnerability. I am thinking much more about self-censorship now and being more measured now than I have in the past.
I do agree with you that self-censorship is a growing problem. But I'm not sure how to solve this when we're entering a world where it is the language you use that becomes a fingerprint. I appreciate the records because there is a lot of valuable information here but at the same time these records make us vulnerable in a way we haven't been before. And that it is in a way that not just worrying about nation states being able to do this but that we're posting these records to huggingface. The processing is just getting easier and that's about all that's needed now.
It is a hard problem to solve. Switching usernames doesn't fix the issue but could be more noise. Even disappearing messages is only noise, but stronger than the former. Probably pretty helpful because I don't think people are scraping every day but that could change.
Idk, if you have thoughts I'd love to head. I've been thinking of writing them down and posting to HN but unsure and it feels like one of those things that sounds conspiratorial until years later people will say "of course this was coming" lol
I do think people are scraping the site daily, if not hourly, and rate limiting is usually trivial to overcome. But I suspect the implementation is not standard.
Scraping doesn’t even have to be hourly, scraping a thread or account once then `diff`ing the two between weeks or months or years would easily reveal any interventions.
Which would actually expose users who you should retroactively investigate. Canaries.
I presume you are aware of “fuzzed” users. With unassociated comments, submissions, and possibly even changing names. I only recently discovered this myself after a little digging, but you may be able to do this and continue posting.
I’ve been hesitant to mention this anywhere, as discussing how you would red team a site you like publicly might not be the best for the site.
There are other clever possibilities, such as using some combination of artificial & organic content to wash users identities. Like banned or shadowbanned accounts mixed with real users. An acid bath if you will. If you dissolved 100 people into each other, who can really say who is who?
Yes, I agree that disappearing messages is not a strong action.
I don't think changing usernames is a strong action either. This is because the fingerprint is your language, not the name you're associated with. See the Enron dataset and the project in Ng's intro AI course, but we'd need to scale that quite a bit (which is very hard).
I'm not aware of the fuzzed users. But I think you should mention it in the open if they are problematic.
I have thought of AI to rewrite my language but that's difficult and only fuzzing too.
> this can drive self-censorship. Which is a growing problem at large.
I suppose it depends on one's perspective. We used to call self-censorship "decorum."
The nature of human beings didn't change when we went online, and sometimes it behooves a person to just read the room before speaking. Because of the karma system, the upvote downvote system, and the user curated flagging system, I would classify Hacker News as a "decorum-friendly" message board where a certain amount of pre-filtering is not only necessary but highly encouraged.