Hacker Newsnew | past | comments | ask | show | jobs | submit | agibsonccc's commentslogin

I've been looking in to this for the java world. What's your use case? Deployment in to existing applications?


Yea exactly - Python for training, Java/.NET for inference at production. I looked at approaches like GRPC and things but my case is a bit more time-sensitive and the latency added by going over a network layer was too much.

For now I'm happy with Pytorch->ONNX and then running the ONNX model directly. But as I said, that means I can't easily train using JAX :-(



Ohh, I'll check that out!


Disclaimer: I am involved with the eclipse foundation (not as part of the core staff)

Hi, I just want to expand on this a bit...

The "eclipse foundation" is actually not just sponsored by IBM. Nor is the IDE. As for what support it provides, that's actually spot on and is under a similar idea to what apache and linux provide. There are of course differences: but for most people the vague idea that open source foundations exist and provide some similar functions as a way to host projects is enough for this discussion

A lot of eclipse's revenue actually comes from working groups with Jakarta EE being the biggest one. There are also IOT, Automotive and many other working groups + projects under the eclipse umbrella.


If something were to be more "neutral" what would you hope to see exactly? Something performant is typically going to be framework/hardware specific.


Sorry, I'm not sure what you mean by "neutral". Are you talking about my suggestion to avoid DeepStream? If so:

The frameworks that work on multiple types of hardware, like TensorFlow and (probably most popular now) PyTorch, have separate backends for their different targets. Each of these backends have huge amounts of platform-specific code, and in the case of the Nvidia backend, that code is written in terms of CUDA just as DeepStream is. That's how they achieve good performance even though the top-level API is hardware generic. The overwhelming majority of deep learning code, both the actual learning and the inference, is written in terms of these frameworks rather than NVidia's proprietary framework. Admittedly I haven't played with NVidia's library, but I highly doubt there's a serious performance difference - it's even possible that the open-source libraries are faster due to the greater community (/Google) effort to optimise them.

It does look like DeepStream does a lot more of the processing pipeline than just the inference. In that case it's going to be a lot more tricky to get the whole pipeline on the GPU using those TensorFlow or PyTorch. At the end of the day, if only DeepStream does what you need, I'm not saying you necessarily shouldn't use it - just that you should ideally attempt to avoid it if reasonably possible.


Hi Fareesh, I'd love to hear more about your use case. Email's in my profile.


Hi, could you describe your use case a bit? Just an alarm trigger for deer in the backyard?


Yes, I'd point a camera at my precious vegetables and if a deer walks into the video feed, something that scares it off is triggered so it runs off before eating the whole garden.


Could you elaborate on some of the problems you had overall?


I've unsuccessfully dabbled in gstreamer in the past. I was doing a project this weekend, and the comments on this thread motivated to give it another shot .. after a couple of hours (2-4ish?), I was able to get video off the Pi to my desktop (on the same LAN) but the performance was pretty bad. I didn't optimize much yet but let me summarize the key issues I experienced with gstreamer these last few hours:

1) Very little documentation; poorly explained pipelines. I tried to read what docs I could find but things quickly devolved into trying out random gstreamer pipelines posted in comments. People don't explain why they use one particular element over another. So it felt like whack-a-mole.

2) Installing gstreamer on the Pi was a breeze. I wanted to pull video off the connected camera and sent to VLC on my desktop. Sounded like something that would work out-of-the-box? Nope. Kept seeing lots of stackoverflow comments of people stabbing in the dark, getting errors (or have the thing just sit there and not work) with very little feedback on what was wrong.

3) I have very little indication of what is hardware and what is software accelerated in my pipeline. I have no idea where latency is coming into my pipeline.

Overall .. my modern expectation for software frameworks is "batteries included" .. it is totally reasonable for sophisticated software tools to be complex .. but gstreamer is just not designed that way. While I got it to work, I see massive latency (likely because my pipeline is inefficient) and degraded quality (no idea why).


Few questions:

1. Did you build this for your own use cases? Interesting side project?

2. How do you feel about the need for base64 being a requirement on the endpoints? Isn't GRPC the wrong medium for this? Also, what do you see as the main limitations right now? The models?


1. I built it to integrate with Home Assistant and security systems. I was trying to use Tensorflow on a Raspberry Pi and the dependencies were a nightmare. Tensorflow in general is a nightmare to compile and run IMO. I got to thinking, what if I could make all the deps inside of a docker container. What if I could run it remotely. It was born out of that.

2. As for base64, I'm not sure of a better way to support sending raw image data over JSON (in REST mode) In some ways I think GRPC is a better medium than JSON (it supports either) as GRPC supports sending the RAW bytes. What leads you to believe GRPC isn't the right transport? Plus you can do it in a stream format if you want to do a lot of video.

The only limitations I can think of are that Tensorflow supports a myriad of CPU optimizations so providing a single container image that has all the right options is basically impossible. I created one that has what I think are some of the better options (AVX, SSE4.X) and then an image that basically should run on any 64 bit intel compatible CPU. To get optimized options you need to build the docker container yourself which can take the better part of a day on slower CPUs.

With that said, I also provide ARM32 and ARM64 containers that actually run semi-okay on Raspberry Pis and and other ARM SBCs. I can run the inception model on a Pi4 on a 1080p image in about 5 seconds which is pretty good IMO.


How did you set it up on the software side though? How flexible/customizable was it?


It's still pretty early days for ray yet. That being said, spark never really got the hang of doing machine learning properly. It "works" but not for newer workloads which ray is trying to support.

It's good someone is building a company around it. I could see them building services on top of it and build a SAAS like databricks did with spark.

I'll be curious to see how ray matures.


I agree that ML on Spark was only a limited hit—- iterative jobs would actually be feasible versus Hadoop—- I still have yet to find a better ETL and SQL tool, and that’s a big part of most ML projects.

I’m worried about Ray as a SAAS Co because so far it looks to me like they’re riding reinforcement learning hype. They’d need to really penetrate the users of Horovod and Tensorflow Distributed to get beyond a beach head. And what if TPUs and Cerebras become more common? Because then the maker for multi-machine workloads becomes smaller (definitely not zero though).


Your concerns are right on point. I agree that spark is a great sql/etl tool. My thinking was on the "math execution" part. Ray is able to doa bit more there. I do feel like there is a bit of hype riding going on here as well.

One interesting thing that could happen is the hardware gets better, and then these distributed schedulers might not be able to keep up with all the different options on the market.

There is also the tension of the hardware vendors wanting to give away things that only run on their chips vs the software makers who want things to run on every chip. It seems like there will be a lot of competition among the various infra players in the next few years now that nvidia is starting to have real competition now (even if it's not big yet)


Just to qualify that "math execution" part, the beauty of Ray is that you get threadpool-like features to speed up arbitrary python code. So not just parallelism, but state/variable sharing for relatively small data. So this is great for some optimizers and definitely RL (where your "math" is some really complicated simulation / loss logic), but Ray wouldn't make much sense for BLAS stuff. Am I missing something here?

Ray shows expertise in multi-machine that's lacking in stuff like Jax, Tensorflow, and PyTorch. Horovod nailed down a lot of the performance issues for SGD in particular, but is missing the sort of rapid deployment / distribution stuff in Ray. If only they could all work together ...


Graal itself still has limitations with many libraries yet, the biggest one being having any library that heavily uses reflection. I'd look at some of the work red hat is doing with some of its java libraries and quarkus: https://developers.redhat.com/blog/2019/03/07/quarkus-next-g...


IIRC you also end up relying on a graal-provided version of python that may or may not be kept up-to-date by a project for which it's just a secondary target. Been there with IronPython and Jython, it's no fun.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: