Hacker Newsnew | past | comments | ask | show | jobs | submit | praeclarum's commentslogin

Yes that syntax works right now.


Yeah I think I’ll reduce the accuracy requirement for some transcendental functions since GPUs seem all over the place.


Please do. I have a few test machines but cannot match the variety of hardware out there.


Sorry Safari does not support WebGPU yet. Please join me in writing to Apple and requesting it.


Yeah this is the trick. You need to maximize the use of workgroup parallelism and also lay things out in memory for those kernels to access efficiently. It’s a bit of a balancing act and I’ll be working on benchmarks to test out different strategies.


Yeah so the thing is WebGPU doesn’t correctly support IEEE floating point. Particularly, 0 is often substituted for +-Inf and NaN. See section 14.6 of the spec.

https://www.w3.org/TR/WGSL/#floating-point-evaluation

It’s not such a problem for real nets since you avoid those values like the plague. But the tests catch them and I need to make the tests are tolerant. Thanks for the results!


Those are niceties and can be implemented with some small hacks. Most big nets do very little slicing. Lots of dimension permutations (transpose, reshape, and friends) but less slicing. I personally use a lot of slicing so will do my best to support a clean syntax.


I've come to believe over the last few years that slicing is one of the most critical parts of a good ML array framework for a number of things and I've used it heavily. PyTorch, if I understand correctly, still doesn't have it right in terms of some forms of slice assignment and the handling of slice objects (please correct me if I'm wrong) though it is leagues better than tensorflow was.

I've written a lot of dataloader and such code over the last number of years, and the slicing was probably the most important (and most hair-pulling) parts for me. I've really debated writing my own wrapper at some point (if it is indeed worth the effort) just to keep my sanity, even if it is as the expense of some speed.


I disagree with this, slice notation is powerful and I use it quite a bit in DL.

Even just the [:, None] trick replacing unsqueeze is super useful for me.


I’m of the same opinion. While I think I will keep the standard parameter order from torch, I will include the options overload to give all the benefits you describe.


Awesome :D Really nice project by the way


I've been working on a WebGPU optimized inference and autograd library with an API that matches PyTorch. The goal is to reach CUDA speeds in the browser. Many kernels have been implemented and it's been designed to be easily extensible. Available on NPM now! I'm working on supporting Stable Diffusion and hugging face transformers.


It’s open source and I have successfully compiled and ran it on my own machine. So I do not fear it going away. The biggest risk is it could not be updated at some point. But, again, OSS. The community could continue the work.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: