Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's a difference between being indecisive about what data or questions you care about now and being unsure about which data/questions you will care about in the future. If your data needs might change in the future, then there is an argument to be made in favor of saving data that has no current apparent value, and this must be weighed along with everything else when deciding what data to keep. Sometimes data can suggest new questions, and sometimes it is worth collecting data purely in the hope that it will generate new questions.

As an example from my research area, the human genome was not sequenced to answer any one specific biological question; it was sequenced because without it, we would not even be able to ask the kind of questions we wanted to, much less answer them.

Of course, that's a research context. In a business context, especially in a well-established industry, the types of data that you need are likely to be well-understood and exploratory analysis is probably a lot less important.



Genomics is an area where data is thrown away all the time. The images that come from Illumina sequencing machines usually gets processed onece then disarded.


We throw away the images because we're quite sure at this point that we're extracting all the useful information (the DNA sequences) that we can from them. This was not true in the early days of Illumina sequencing, when it was not uncommon the save the images and run an alternative base caller on them to try and get improved sequences when the standard base caller failed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: