More

praeclarum · on May 20, 2023

Yes that syntax works right now.

praeclarum · on May 20, 2023

Yeah I think I’ll reduce the accuracy requirement for some transcendental functions since GPUs seem all over the place.

praeclarum · on May 20, 2023

Please do. I have a few test machines but cannot match the variety of hardware out there.

praeclarum · on May 20, 2023

Sorry Safari does not support WebGPU yet. Please join me in writing to Apple and requesting it.

praeclarum · on May 20, 2023

Yeah this is the trick. You need to maximize the use of workgroup parallelism and also lay things out in memory for those kernels to access efficiently. It’s a bit of a balancing act and I’ll be working on benchmarks to test out different strategies.

praeclarum · on May 20, 2023

Yeah so the thing is WebGPU doesn’t correctly support IEEE floating point. Particularly, 0 is often substituted for +-Inf and NaN. See section 14.6 of the spec.

https://www.w3.org/TR/WGSL/#floating-point-evaluation

It’s not such a problem for real nets since you avoid those values like the plague. But the tests catch them and I need to make the tests are tolerant. Thanks for the results!

praeclarum · on May 20, 2023

Those are niceties and can be implemented with some small hacks. Most big nets do very little slicing. Lots of dimension permutations (transpose, reshape, and friends) but less slicing. I personally use a lot of slicing so will do my best to support a clean syntax.

tysam_and · on May 20, 2023

I've come to believe over the last few years that slicing is one of the most critical parts of a good ML array framework for a number of things and I've used it heavily. PyTorch, if I understand correctly, still doesn't have it right in terms of some forms of slice assignment and the handling of slice objects (please correct me if I'm wrong) though it is leagues better than tensorflow was.

I've written a lot of dataloader and such code over the last number of years, and the slicing was probably the most important (and most hair-pulling) parts for me. I've really debated writing my own wrapper at some point (if it is indeed worth the effort) just to keep my sanity, even if it is as the expense of some speed.

whimsicalism · on May 20, 2023

I disagree with this, slice notation is powerful and I use it quite a bit in DL.

Even just the [:, None] trick replacing unsqueeze is super useful for me.

praeclarum · on May 20, 2023

I’m of the same opinion. While I think I will keep the standard parameter order from torch, I will include the options overload to give all the benefits you describe.

saiojd · on May 20, 2023

Awesome :D Really nice project by the way

praeclarum · on May 19, 2023

I've been working on a WebGPU optimized inference and autograd library with an API that matches PyTorch. The goal is to reach CUDA speeds in the browser. Many kernels have been implemented and it's been designed to be easily extensible. Available on NPM now! I'm working on supporting Stable Diffusion and hugging face transformers.

praeclarum · on Feb 12, 2021

It’s open source and I have successfully compiled and ran it on my own machine. So I do not fear it going away. The biggest risk is it could not be updated at some point. But, again, OSS. The community could continue the work.