One of the most elegant paper series I've ever read is her "More Types for Nested Data Parallelism" (https://dl.acm.org/doi/abs/10.1145/351240.351249) + the preceding NESL / segmented scan papers by Guy Blelloch (https://www.cs.cmu.edu/~guyb/papers/Nesl2.0.pdf) . They helped lead to our GPU journey for starting Graphistry, and why we bet on GPU dataframes (ex: RAPIDS.ai) eventually overtaking Spark and friends even before DataBricks was a company :) If you do pandas/tensorflow/numpy/sql/etc., this stuff is amazing.
Seconded! I had the pleasure of taking Guy Belloch's parallel programming class at CMU as a sophomore. A large majority of the homework assignments involved leveraging the segmented scan operator to solve all sorts of classical parallel programming problems.
Really eye opening stuff! I am still impressed that prefix sum (generalized over all associative operators, not just +) can be done in O(log n) span. And the algorithm is not too complicated either, I was able to get the gist of it even as an undergrad.
I read that the prefix scan is great so I read somebody's thesis. I a) didn't really get it and b) felt it was doing some complicated stuff for little value. Doubtless I'm wrong, computer scientists tend to be smarter than me, can you give me some insight why it's so valuable?
There are two really cool things going on here. Also, keep in mind NESL was published 1992 (and thus made in ~1990), which was not far from when the Connection Machine + Cray's got hyped and largely failed to mainstream parallel compute.
1. Data parallelism, esp. for super power efficient parallel hardware (SIMD, GPU, all the stuff that lets you go 10-100X over Spark for the same budget) is hard for computing over "irregular" data like trees and graphs. Prefix scan showed a basic way to do it for a bunch of basic math over these. A lot of the 90s / early 2000s became research into covering all sorts of weird cases. This shows data parallelism is broadly applicable so worth pushing forward on.
2. Guy B, and then Manuel C & Gabriele K, then showed you don't have to be a weird HPC person hand-rolling Fortran nor rely on weird one-off special case libraries to use the above. We can build libraries of high-level primitives -- think map/reduce -- and even to the level that compilers can make it look like "normal" haskell. Most things can compile down to prefix scan. This is similar to Impala/Spark figuring out RDDs/dataframes and doing SQL on top, except again, for 10-100X better performance, and you don't have to contort everything into 1970s SQL. RE:Prefix scan in particular, we don't _have_ to implement all the algs as prefix scan, but it's reliable base case, and where possible (most cases!), we nowadays instead swap in more efficient hand-written CUDA library calls.
Nowadays, when GPUs are in the cloud, Moore's Law stalled out for CPUs but is still going strong for GPUs, and most big data systems realize their latency and data shuttling/serialization sucks, all ^^^ is a big deal. See 10yr AWS price chart I calculated: https://twitter.com/lmeyerov/status/1247382156487213057/phot... . You can just code your Python dataframe code in https://rapids.ai/ and not really think about it, just know you're doing better than the hadoop stuff underneath and without managing a dirty horde. The existence of NESL and the research area that came after means we're still at the beginning of swapping out crappy CPU software for better data parallel stuff, and _it's doable_.
I recently tried getting back into Haskell, the goal as to write Haskell and drop into Rust for speed. Even with Stack,the toolchains and the library compatibility was too daunting to figure out, cargo just works, with Rustup and Jetbrains toolbox, I can have a fully configured dev environment that can build everything I need in under 5 minutes.
Now if Cargo could install Haskell and its libs, I would jump on it in an instant.
Great slides near the end, this project looks really fun, and it's cool to see parallelism applied to simple vanilla code--rather than having to build the transformation through some eDSL. That said, I hate to be a downer, but I've really moved away from Haskell, so I don't see myself playing with this. Lately it's PyTorch or PySpark, with an eye on Rust, and mildly Julia.
I saw her keynote of this talk at Lambda Days, and it was easily my favorite talk there this year (except mine, of course :) ).
I find it kind of weird that, outside of academic circles and the like, Haskell is largely dismissed as "not practical"; I think the "cabal hell" days before Stack came out really hurt the language and its adoption. Libraries like Accelerate really demonstrate how powerful something like Haskell can be when applied correctly.
I second J and Futhark. I love J, and I wish there was a way to use the GPU with J. I have looked briefly at APL -> TAIL -> Futhark, but I don't know enough to do something useful with that particular toolchain or wow myself enough to keep going. More studying...
Lesser known Single Assignment C (SAC) is using array based functional programming as well [1],[2]. As the name implied it uses C/Algol based syntax for parallel programming of multi-core and GPU.
Similar to Haskell it's originated in academic research. It utilizes static single assignment (SSA) concept that can be used for both imperative and functional programming [3]. Neat stuff. Hopefully someone can emulate this with a modern open source hybrid functional language like Scala or D.
Fun to see come full circle here!