Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The first one doesn't even need much historical data. Unless you have some very unoptimized periodic jobs, the last few days or something is plenty.

The second can be done simply on something like Dynamo, CosmosDB, or your cloud-hosted NoSQL of choice. Heck, it can even be done on Aurora or vanilla Postgres + partitioning if it's <64TB.

The third can be done with any off the shelf cloud data warehouse software, at many petabyte scale. And even then, I'm sorry, but I just don't believe you that the product clicks over some large timeframe are historically relevant if your software and UI changes often.

All of these things mentioned have had extremely simple, boring solutions at petabyte scale for >10 years, and in some cases more than that. If you add a batch workflow manager and a streaming solution like Spark, that's like 3-4 technologies total to cover all these cases (and many more!)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: