Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, if location data is considered part of this "metadata", then I don't see how anyone could argue against the dangers of this.

My physical location in the real world I consider way more private in matters of wide scale tracking than what I write or say.

For instance, I hardly ever let my browser determine my location and send it to some site, it's none of their business where I am, and if I want the local weather they can get the name of the city I'm at.

But I was hoping this article would be about another, way more dangerous, because way more information-rich type of "metadata": Social graphs and contact lists. The problem with this is, humans underestimate the depth of this kind of data because we're not really well-equipped to reason about them.

If you have a table that consists of (time, location) records, it's pretty easy to envision what sort of information could be extracted from this data. Add a few more fields, and it becomes harder, maybe you need some creativity and statistics, but it's all basic detective work.

A free form directed graph (such as a social graph or collection of contact lists) doesn't look like a table at all (well, you can represent it as a table, but that won't make you much wiser). It's in fact a very high-dimensional object.

The older generation out here, may remember when they first encountered the WWW, when you could only navigate it by clicking links. I got this sense of vastness, perhaps even helplessness. They don't call it hypertext for nothing. The sense of vastness comes because clicking and navigating those links gives an idea of moving through a space. Except this space is in some sense "larger" than our usual 3D space. Every door (link) can open into every room, regardless of whether it would be possible in a physical space.

This is why those "graph of (part of) the Internet" pictures you sometimes see are generally always a tangled clutter of strings, usually vaguely ball-shaped. This is because there is no sensible representation of this type of inter-connected data. You can't make a hierarchy or a map, at least, not in the general case (and the thing you want to reason about is the general case, most of those graphs are exponential small-world graphs, highly inter-connected).

Same thing for social / contact list graphs. Except they usually don't have web-rings or directories (you can sometimes make them like FB does, but they aren't generally available, again the general case).

So okay we're not really good at keeping large graph networks of "friends of friends of friends" and other relationships in our heads and reason about them. We're really not. What you think you can reason about those graphs is just scratching the surface.

Computers, however, and Big Data Machine Learning algorithms in particular, have no problems at all with this type of data. An algorithm never lived in a 3D space, it doesn't care if a dataset makes no sense as a physical configuration of nodes, in order to navigate it and extract information from it.

Another important distinction is, people tend to think of these social graphs as labeled nodes with edges between them. Which is correct, in a sense. But it gives the impression that the labels are more important than they actually are. This may sound weird, in the building/room analogy, if you have millions of rooms, and every room is directly connected to 50-200 other rooms, somehow the shape of the paths between the nodes and way they are connected becomes a vastly more information-rich data source than the actual values of the labels of the nodes themselves.

They don't need your name or your photo, the local shape of your social graph is a highly unique fingerprint of whoever you are.

And you can delete Facebook, but on the next social network you sign up for (or any of the other social graphs you're generating, email/IM contact lists, etc), this fingerprint will echo, and in many cases be similar enough to clearly indicate this is the exact same person. No names necessary. (this may be a bit harder if you have a strictly separate business persona and social persona, but there are still some unexpected artifacts to pick up for a ML algo even in these cases) If you're not on a network at all, your presence can be extrapolated from the "hole" in the graph you left (all your friends are there, with their particular local graph shapes, but one node is missing), that is even if you have nothing to hide, you will be leaking info about those who do.



Thanks for this. It is a highly informative comment, especially regarding big data algos.

Extending the 'hole' analogy, do you think the watchers / algorithms could complete reasonable extrapolation on you if your group of closest acquaintances all decided to disappear from the network?

Perhaps even this more extreme measure would be fruitless as each of your friends has a fingerprint that they 'remove' from their respective unique graphs. Your group's disappearance would be a larger void, but each member's tendrils would carve out unique telltale gaps.


> Well, if location data is considered part of this "metadata", then I don't see how anyone could argue against the dangers of this

I remember a "scandal" that occurred in my country's Parliament in the early 2000s (2002 or 2003), when one of the local mobile carriers decided to display the GSM cell towers' names on the mobile phones' small screens (close to the "battery still left" icon). Some of the MPs thought that as being way too obtrusive, but nobody cared because they're seen as being corrupt by definition, the mobile company ended up by not displaying the info anymore (but still collecting it, of course) and everything was fine.

There was of course that other thing that happened to the same company (one of the 3 largest global brands in the industry) a couple of years later, with one of the mobile company's office people (a lady) being jealous on her boyfriend and asking some guys "in the IT department" if there wasn't a way for them to check said boyfriend's messages and calls, all this "as a small favor from colleague to colleague", which of course there was a way to do that. I can't remember if the boyfriend was cheating or not.


1. To communicate, Paula Broadwell and David Petraeus shared an anonymous email account

2. Instead of sending emails, both would login to the account, edit and save drafts

3. Broadwell logged in from various hotels' public Wi-Fi, leaving a trail of metadata that included times and locations

4. The FBI crossed-referenced hotel guests with login times and locations leading to the identification of Broadwell

http://www.guardian.co.uk/technology/interactive/2013/jun/12...


Didn't the 9/11 hijackers use this same technique (sharing an email account and communicating via drafts)? It sounds very familiar.


If you heard about it, you can bet he was. Nevertheless, power and abuses go hand-in-hand. I don't know what it is about human nature that causes us to give those in power the benefit of the doubt. Hell, in America at least, people knew 250 years ago that power begot abuse, and wrote "release valves" into the constitution to prevent that abuse from becoming overwhelming. I wonder why they didn't think people would become overly apathetic in the meantime.


> "I wonder why they didn't think people would become overly apathetic in the meantime."

The Founding Fathers were worried about this; they just didn't know of any systematic way to prevent it. I'm not sure there is one.


Did they ever write any essays or letters on why they didn't make voting compulsory? Was it a feeling that such compulsion impinged on freedoms, or that it wouldn't help fix the problem of apathy? Or did they just think it would be absurd if people voluntarily turned down their chance to pick their representation in government?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: