More

gallabytes · on July 4, 2024

I use it as a fancy monitor that I can strap to my face and fits in a suitcase. sometimes I want to work lying down and it's great for that. It being based on ipados makes it kinda useless aside from the display.

bilsbie · on July 4, 2024

How do you type when lying down?

twistedanimator · on July 4, 2024

Not OP, but I type while lying down in bed. I don't have the AVP, but I do have xreal Air AR glasses that basically mirror my macbook pro's screen. I rest my Apple keyboard across my crotch/upper thighs, wherever my arms are most comfortable. To my right (on the bed) is my Apple Trackpad. It's kind of a hastle to move my hand from keyboard to trackpad, but I can drive almost all functions on my computer through the keyboard, so it's only used as a last resort. I also have airpods in my ears because the laptop is closed and on the floor.

I do this when my sciatica pain is preventing me from sitting and I'm tired of standing at my desk. I can work lying down for up to a couple of hours and find the position to be highly comfortable and productive.

madiele · on July 4, 2024

Not the OP but you can do it with a split keyboard

AndrewOMartin · on July 4, 2024

Could have been a typo.

gallabytes · on April 24, 2024

> Some of this complexity may be necessary for achieving optimal performance in Jax. E.g. extra indirection to avoid the compiler making some bad fusion decision, or multiple calls so something can be marked as static for the jit in the outer call

certainly some of it is but not the lion's share - I have a much simpler (private) codebase which scales pretty similarly afaict.

the complexity of Maxtext feels more Serious Engineering ™ flavored, following Best Practices.

gallabytes · on Sept 17, 2023

this is not even close to true

redox99 · on Sept 17, 2023

Yeah, I'm sorry for the incorrect statement.

MidJourney claims 3.3 hours of GPU for $10. If that's 3.3 hours of A100, at a rate of $2/hour that's $6.6 of GPU costs.

My original statement was based on MJ feeling very expensive per image compared to the huge number of images I can generate in an hour with SDXL on my 3090 (and 3090s can be rented for $0.20 an hour).

But I forgot how overpriced A100s are (I doubt MJ is running 3090s but that'd be pretty cool), and that MJ is probably 4x the size of SDXL (although surely more optimized).

My revised statement is that for their $10 plan there's a few bucks of GPU compute cost. Probably between 2x and 5x margins.

joefourier · on Sept 17, 2023

A100s aren’t $2/h at MJ scale tho. Firstly they likely own quite a few and they’re relatively cheap now at $4k a card, and secondly you can get much better deals than $2/h if you rent a large amount and pre-pay.

redox99 · on Sept 18, 2023

Sure, I was establishing a upper and lower bound. That's why I said between 2x and 5x margins.

gallabytes · on May 5, 2023

I literally just don't feel like running them tbh, and see no reason to publish them either way. Mostly prefer to let the outputs speak for themselves.

For a while I was using an FID variant for evaluation during training, but didn't find it very helpful vs just looking at output images.

cubefox · on May 6, 2023

Okay. That's probably the difference between a commercial and a research project.

gallabytes · on April 15, 2023

lmao no. imagine trying to do text to image with anything other than deep learning. nothing else comes close.

KeplerBoy · on April 15, 2023

We have artists for that. I heard they can compete with sota methods.

gallabytes · on April 15, 2023

OP was about classical statistical techniques. I'm pretty sure human artists are not logistic regression?

sebzim4500 · on April 15, 2023

Compete on quality, certainly not on price or speed.

gallabytes · on April 6, 2023

yeah we trained v5 on TPUs and continue to train on them.

gallabytes · on April 6, 2023

We didn't, v5 was trained on TPUs too

gallabytes · on Aug 25, 2022

... no it definitely wasn't. that's $50m. read the paper, they tell you how long it took on a v4-256, which you know the public rental price for.

dekhn · on Aug 25, 2022

Pretty much everybody who trains models trains tens to hundreds of models before their final run. But a better way to think about this is: Google spent money to develop the TPUv4, then paid money to have a bunch built, and they're sitting in data centers, either consuming electricity or not. Google has clearly decided the value of these results exceeds the amortized cost of running the TPU fleet (which is an enormous expense).

polygamous_bat · on Aug 25, 2022

While I don't agree with gp's price estimate of $50M, you are also forgetting that to train the final model, they had to iterate on the model for research and development. This imposes a significant multiple over the cost to just train the final model.

masukomi · on Aug 25, 2022

they're talking about the total cost of the project, including the salaries of the humans, their health care, their office space, etc.

izacus · on Aug 25, 2022

And where did you think the tagged dataset and software came from?

ceeplusplus · on Aug 25, 2022

Marginal cost of using that is basically $0. The internal dataset is a sunk cost that's already been paid for (presumably for their other, revenue generating products like Google Images). Half of their dataset is a publicly available one.

motoboi · on Aug 25, 2022

One order of magnitude over 500k it's 5M, two orders is 50M

gallabytes · on March 3, 2022

That wouldn't make sense - compression has very cache-friendly access patterns, and would benefit greatly from the observed improvements in memory bandwidth.

throwaway81523 · on March 4, 2022

That surprises me to hear. I would expect it to jump around in ram a lot. And at higher compression settings, some compressors use a lot of ram, more than will fit in cache.

gallabytes · on March 3, 2022

SIMD - compressing has gotten faster, but (assuming OP is correct rather than just missing info) the reference algorithm didn't have room to take advantage of SIMD. The relevant improvements since 2010 or so mostly look like bandwidth improvements not latency, and coincide with increasing ubiquity of SIMD instructions and SIMD-friendly algorithms.