All kinds of big data contests use large datasets in which the data has been ripped off of any data which might lead back to the people who filled it in.
However, when the data is about location based services, you are not that safe.
In March 2012 the bicycle renting company Hubway set out a data visualization data challenge, to show how their half a million bike rides were spread out. The data included the gender of the renter, the zip code and the year of birth from the annual renters.
Combining these three pieces of data with the power of the internet lead to the identification of many of the renters of the bikes by Harvard professor Latanaya Sweeney. As part of the Data Privacy Lab, she looked into the possibilities of these datasets, and set up the website AboutMyRide.com to show the results of this particular challenge.
When looking at the website one can see that it’s quite shocking to see that so much can be deducted from this anonymized data set. This also brings many other questions to the table, most of all related to the privacy of people, and the borders of it, a.k.a. where do we say no. Are we open with everything or do we need to halt and rethink this development with the sharing of personal data.
I personally believe that for this data set it was not necessary to share all these data, especially since it turns out to be so easy to find the real person behind the data. It would be better to use categories of age (e.g. 15-25, 26-35, etc.) and only the digits of the zipcode, without the letters (so 1000 instead of 1000AB in the Dutch system) .
For every data contest this is something to consider. What do we share and can we share everything we want to? Only when we keep thinking we can have this discussion and maybe we can come to a decision where enough data will be shared, while at the same time the privacy will be kept at an acceptable level?
This will of course depend on the challenge, but put it on your decisions list when setting up a big data challenge!
Header image: g4ll4is on Flickr under Creative Commons License