Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The fact of the matter is that I've never been unhappy about overcollecting data. Worst case, step 1 of my pipeline is 10x or 50x slower than it needs to be due to filtering out a bunch of junk. The added latency to my workflow might be a few minutes.

Every time I've undercollected I've been unhappy, and this was hardly a rare occurrence. I need to build the collector, deploy it, and wait for data to flow in. Added latency = 1 week, minimum.

You can always throw useless stale data away. You can never retroactively collect data you needed.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: