I use it as a fancy monitor that I can strap to my face and fits in a suitcase. sometimes I want to work lying down and it's great for that. It being based on ipados makes it kinda useless aside from the display.
Not OP, but I type while lying down in bed. I don't have the AVP, but I do have xreal Air AR glasses that basically mirror my macbook pro's screen. I rest my Apple keyboard across my crotch/upper thighs, wherever my arms are most comfortable. To my right (on the bed) is my Apple Trackpad. It's kind of a hastle to move my hand from keyboard to trackpad, but I can drive almost all functions on my computer through the keyboard, so it's only used as a last resort. I also have airpods in my ears because the laptop is closed and on the floor.
I do this when my sciatica pain is preventing me from sitting and I'm tired of standing at my desk. I can work lying down for up to a couple of hours and find the position to be highly comfortable and productive.
> Some of this complexity may be necessary for achieving optimal performance in Jax. E.g. extra indirection to avoid the compiler making some bad fusion decision, or multiple calls so something can be marked as static for the jit in the outer call
certainly some of it is but not the lion's share - I have a much simpler (private) codebase which scales pretty similarly afaict.
the complexity of Maxtext feels more Serious Engineering ™ flavored, following Best Practices.
MidJourney claims 3.3 hours of GPU for $10.
If that's 3.3 hours of A100, at a rate of $2/hour that's $6.6 of GPU costs.
My original statement was based on MJ feeling very expensive per image compared to the huge number of images I can generate in an hour with SDXL on my 3090 (and 3090s can be rented for $0.20 an hour).
But I forgot how overpriced A100s are (I doubt MJ is running 3090s but that'd be pretty cool), and that MJ is probably 4x the size of SDXL (although surely more optimized).
My revised statement is that for their $10 plan there's a few bucks of GPU compute cost. Probably between 2x and 5x margins.
A100s aren’t $2/h at MJ scale tho. Firstly they likely own quite a few and they’re relatively cheap now at $4k a card, and secondly you can get much better deals than $2/h if you rent a large amount and pre-pay.
I literally just don't feel like running them tbh, and see no reason to publish them either way. Mostly prefer to let the outputs speak for themselves.
For a while I was using an FID variant for evaluation during training, but didn't find it very helpful vs just looking at output images.
Pretty much everybody who trains models trains tens to hundreds of models before their final run. But a better way to think about this is: Google spent money to develop the TPUv4, then paid money to have a bunch built, and they're sitting in data centers, either consuming electricity or not. Google has clearly decided the value of these results exceeds the amortized cost of running the TPU fleet (which is an enormous expense).
While I don't agree with gp's price estimate of $50M, you are also forgetting that to train the final model, they had to iterate on the model for research and development. This imposes a significant multiple over the cost to just train the final model.
Marginal cost of using that is basically $0. The internal dataset is a sunk cost that's already been paid for (presumably for their other, revenue generating products like Google Images). Half of their dataset is a publicly available one.
That wouldn't make sense - compression has very cache-friendly access patterns, and would benefit greatly from the observed improvements in memory bandwidth.
That surprises me to hear. I would expect it to jump around in ram a lot. And at higher compression settings, some compressors use a lot of ram, more than will fit in cache.
SIMD - compressing has gotten faster, but (assuming OP is correct rather than just missing info) the reference algorithm didn't have room to take advantage of SIMD. The relevant improvements since 2010 or so mostly look like bandwidth improvements not latency, and coincide with increasing ubiquity of SIMD instructions and SIMD-friendly algorithms.