ianeliot's comments

ianeliot · on March 13, 2021

I can speak to this a bit. My mom is a lawyer and we've talked before about the need for better templating systems. The problem is that the market for solo practitioners and small firms is just not that large or lucrative, and AIUI the market for large law firms is already pretty mature, or at least there's a high barrier to entry.

But there's definitely a need among solo practitioners and small firms, and that need goes beyond templating to include better ways of organizing documents, automating workflows, ensuring security, and so forth. There are companies like Practice Panther which are trying to do this, but the problem I've seen with them is they encourage vendor lock-in and make it hard to do anything not officially supported by the platform.

howtowin · on March 13, 2021

I don’t know if you have started working on a potential solution (I am assuming not based off your verbiage), but I’d love to hear more.

I find tremendous purpose and joy in enabling more productive and efficient work by developing digital tools & applications, especially for intelligent individuals such as your mother and other lawyers.

If interested in sharing more, whether you want to have nothing to do with a potential venture or not, I have my public email on my profile. Cheers!

ianeliot · on Oct 18, 2020

Interesting. This may be a naive question — this is very far from my area of expertise — but is there a reason sensor data can't be sampled? It seems gratuitous to store that many events.

jandrewrogers · on Oct 18, 2020

You don't know what you need until you need it. The signal you need to dig out of the data often isn't known until some other event provides the context. Also, for some industries and some applications, there are regulatory reasons you retain the data. In some cases these are sampled data feeds, even at the extreme data rates seen, because the available raw feed would break everything (starting with the upstream network).

In virtually all real systems, data is aged off after some number of months, either truncated or moved to cold storage. Most applications are about analyzing recent history. Everyone says they want to store the data online forever but then they calculate how much it will cost to keep exabytes of data online and financial reality sets in. Several tens of petabytes is a more typical data model given current platform capabilities. Expensive but manageable.

may4m · on Oct 18, 2020

I interesting worked on a project as a data scientist with a client who worked in high precision manufacturing. Their signals (sensors) and actuators were stored in a historian which couldn't handle data 100ms samples even though the data was collected at a 10ms rate. One of the problems required us to look at the process that took just 85ms. The problem was the historian was showing signals up to 20ms it took a while to realise that it was extrapolating when you tried getting finer resolution. The company was using this historian for more than 20 years they had to commission another project to change the historian. So you're right, you don't know what you need until you need it.

fatbird · on Oct 18, 2020

Sometimes tens of scalars per second is the sampled data. It depends upon your requirements for accuracy and responsiveness for alarms, threshold checks, etc. I work with paper making machines that only give us a profile every 30 seconds--but that profile is a thousand floats, and we need to be constantly resampling it both spatially and temporally, and we're doing that for tens or hundreds of profiles for a single system--and we're supposed to handle hundreds of systems.

The more fundamental point that the GP is making is that the realm of industrial sensor data scales in ways that people haven't really grasped yet. It's much less about brute storage than it is about the interplay between bandwidth, storage, and concurrent processing power.

bsder · on Oct 18, 2020

The problem is that you are generally looking for "Something's different" rather than "Smooth ALL The Points".

So, the problem is that you threw away 90% of your data, and that's where the problem was. Oops. Now you have to switch on "Save all the data" and hope it repeats. So, given that you have to have a "Save all the data" switch anyhow, you might as well turn it on from the start.

In addition, changepoint analysis is an entire field of research in and unto itself.

Look at how many articles there are about analyzing "Did something break in my web service or am I really doing 10% more real traffic?"

acadien · on Oct 18, 2020

Depends on the application. Often down-sampled data is useful for drawing trends but not so useful for better understanding failure events.

deepsun · on Oct 18, 2020

For server monitoring data (mostly counters) is usually saved at 10 to 15 seconds intervals. It rarely queried at full resolution, it’s almost always sampled, yes.