I agree, as a long time Business Intelligence developer I‘m still confused and astounded with all the tooling and bits and pieces seemingly necessary to create analytics/dashboards with open source tools.
For years I used a proprietary solution like Qlik Sense for the whole journey from data extraction to a finished dashboard (mostly on-prem). Going from raw data to a finished dashboard is a matter of days (not weeks/month) with one single tool (and maybe some scripts for supporting tasks). There is some „scripting“ involved for loading and transforming data, but if you already understand data models (and maybe have some sql experience) it is very easy. The Dashboard creation itself does not need any coding at all.just drag and drop and some formulas like sum(amount).
But this a standalone tool and it is hard to integrate it into your own piece of software. From my experience, software developers have a much more complicated view on data handling. Often this is just the complexity of their use cases, sometimes it is just a lack of knowledge of data preparation for analytics use cases.
Another part which complicates stuff greatly is the focus on use-cases involving cloud storage and doing all the transformations on distributed systems.
And it is often not clear what amount of data we are talking about and if it is realtime (streaming) data or not. There is a big difference in the possible approaches, if you have 6h hours to prepare data or if it has to be refreshed every second (or when new data arrives etc).
Long story short: Yes it is complicated to grasp. There is also a big difference if you use the data for normal analytics use cases in a company (mostly read only data models) or if you use the data in a (big tech) product.
I would suggest to start simple by looking into a „query engine“ to extract some data from somewhere and then doing some transformations with pandas/polars/cubejs for basic understanding. You will need some schedulers and orchestration on the way forward. But this will be dependent on the real use cases and environment you are in.
I would argue that stuff like Iceberg is really aimed at Data Platform Engineers, not BI analysts. Companies I've worked with in the past have 10-15 people on a Platform team that work directly with stuff like this, to offer analysts and data scientists a view into the company's data.
For years I used a proprietary solution like Qlik Sense for the whole journey from data extraction to a finished dashboard (mostly on-prem). Going from raw data to a finished dashboard is a matter of days (not weeks/month) with one single tool (and maybe some scripts for supporting tasks). There is some „scripting“ involved for loading and transforming data, but if you already understand data models (and maybe have some sql experience) it is very easy. The Dashboard creation itself does not need any coding at all.just drag and drop and some formulas like sum(amount).
But this a standalone tool and it is hard to integrate it into your own piece of software. From my experience, software developers have a much more complicated view on data handling. Often this is just the complexity of their use cases, sometimes it is just a lack of knowledge of data preparation for analytics use cases.
Another part which complicates stuff greatly is the focus on use-cases involving cloud storage and doing all the transformations on distributed systems.
And it is often not clear what amount of data we are talking about and if it is realtime (streaming) data or not. There is a big difference in the possible approaches, if you have 6h hours to prepare data or if it has to be refreshed every second (or when new data arrives etc).
Long story short: Yes it is complicated to grasp. There is also a big difference if you use the data for normal analytics use cases in a company (mostly read only data models) or if you use the data in a (big tech) product.
I would suggest to start simple by looking into a „query engine“ to extract some data from somewhere and then doing some transformations with pandas/polars/cubejs for basic understanding. You will need some schedulers and orchestration on the way forward. But this will be dependent on the real use cases and environment you are in.