Yeah this is the trick. You need to maximize the use of workgroup parallelism and also lay things out in memory for those kernels to access efficiently. It’s a bit of a balancing act and I’ll be working on benchmarks to test out different strategies.
Yeah so the thing is WebGPU doesn’t correctly support IEEE floating point. Particularly, 0 is often substituted for +-Inf and NaN. See section 14.6 of the spec.
It’s not such a problem for real nets since you avoid those values like the plague. But the tests catch them and I need to make the tests are tolerant. Thanks for the results!
Those are niceties and can be implemented with some small hacks. Most big nets do very little slicing. Lots of dimension permutations (transpose, reshape, and friends) but less slicing. I personally use a lot of slicing so will do my best to support a clean syntax.
I've come to believe over the last few years that slicing is one of the most critical parts of a good ML array framework for a number of things and I've used it heavily. PyTorch, if I understand correctly, still doesn't have it right in terms of some forms of slice assignment and the handling of slice objects (please correct me if I'm wrong) though it is leagues better than tensorflow was.
I've written a lot of dataloader and such code over the last number of years, and the slicing was probably the most important (and most hair-pulling) parts for me. I've really debated writing my own wrapper at some point (if it is indeed worth the effort) just to keep my sanity, even if it is as the expense of some speed.
I’m of the same opinion. While I think I will keep the standard parameter order from torch, I will include the options overload to give all the benefits you describe.
I've been working on a WebGPU optimized inference and autograd library with an API that matches PyTorch. The goal is to reach CUDA speeds in the browser. Many kernels have been implemented and it's been designed to be easily extensible. Available on NPM now! I'm working on supporting Stable Diffusion and hugging face transformers.
It’s open source and I have successfully compiled and ran it on my own machine. So I do not fear it going away. The biggest risk is it could not be updated at some point. But, again, OSS. The community could continue the work.