Hacker Newsnew | past | comments | ask | show | jobs | submit | wfalcon's commentslogin

Studios are a great google colab alternative with:

- persistent storage (setup env and data persists across restarts)

- free ssh and connect your local IDE

- CPU setup, GPU run (do setup work on CPUs and switch to run on a GPU when ready)

- no credit card

- pay-as-you-go if you need more GPU hours

- A100s, H100s, and more GPUs available


this is also what our (Lightning AI) lit-gpt library does. https://github.com/Lightning-AI/lit-gpt


Thanks, hadn't seen this.


welcome to Lightning AI! Mike, super excited to drive the future of ML together.


Just to be 100% clear.

We want people to build on Lightning and we want companies to deliver value and products for their users.

We place no limitations on how Lightning is use or what products people will build.

Our in-process patent portfolio is not to limit the use of lightning in any way but for defensibility purposes only.

If you have any other inquiries please email legal@pytorchlightning.ai for more details.


No! we want people to build on Lightning and we want companies to deliver value and products for their users.


The Apache License you released Lightning under explicitly allows people to clone your framework if they adhere to the license and provides them patent protection if they do so.

As I said in my previous comment, using patents to try and get around an open source license is skeevy as hell.


No, we support other frameworks too!

Just that if you use lightning you'll have zero friction. Well as with the others... you might run into issues inherent in the other framework's hard to work with designs.


maybe determined should come up with its own API instead of copying Lightning's :)

Not a nice move for the opensource spirit. Also, pretty sure it's a violation of our patent and 100% copyright infringement.


Hi William -- we have absolutely not copied any of Lightning's APIs.

In fact, our PyTorch API makes some significantly different design choices than Lightning does -- e.g., we require users to step optimizers and run the backward pass explicitly, which is a bit lower-level but allows for more flexibility when using the API.

For instance, here is an example of a GAN using our PyTorch API: https://github.com/determined-ai/determined/blob/master/exam...

This is a port of this PyTorch Lightning example: https://github.com/PyTorchLightning/pytorch-lightning/blob/m...

Despite the former being a port of the latter, there are significant differences between the two APIs.

More broadly, we welcome competition in this space and think there's a lot that we can all learn from one another.


Projects copying from each other is exactly in the open source spirit. Don't release something with an Apache License if you don't want people copying it with compatible licenses. Also, patents of open source software (to limit uses that the license would otherwise allow) and API copyrights are pretty strongly frowned upon in the open source community. As a note, the Apache license you released Lightning under grants patent lawsuit protection to anyone using your code under the license so claiming copyright and patent infringement on another Apache licensed project seems amazingly skeevy.

If this is the philosophical stance that Grid and Lightning are taking then it's definitely a project I'm going to advice people to stay well clear off. It's the worst flavor of commercialized open source software and potentially a legal liability to touch in any way as you seem way too lawsuit trigger happy.


No! we want people to build on Lightning and we want companies to deliver value and products for their users.


The Apache License you released Lightning under explicitly allows people to clone your framework if they adhere to the license and provides them patent protection if they do so.


Just to be 100% clear.

We want people to build on Lightning and we want companies to deliver value and products for their users.

We place no limitations on how Lightning is use or what products people will build.

Our in-process patent portfolio is not to limit the use of lightning in any way but for defensibility purposes only.

If you have any other inquiries please email legal@pytorchlightning.ai for more details.


we highlight these issues in our docs explicitly.

https://pytorch-lightning.readthedocs.io/en/latest/tpu.html#...


Can you point out where exactly in those docs you highlight the issue?

I just read the linked page and found no references to data loading limitations or performance limitations. Is it only in the video which isn't search indexed and few people would bother watching?

edit: The page literally advertises the speed of TPUs with "In general, a single TPU is about as fast as 5 V100 GPUs!" which is the exact opposite of warning people.


The grid is how you harness and distribute power and electricity.... like that coming from lightning :)

Second, electricity was a great new technology (ie: AI), but you needed the power grid to make it usable - that's grid AI.


Also related to TRON


We currently have google engineers training on TPU pods with PyTorch Lightning.

TPU support is VERY real... but yes, sometimes it breaks but PyTorch and Google are working very hard to bridge that gap.

But we have dedicated partners at Google on the TPU team working to get Lightning working seamlessly on pods.

Check out the discussions here: https://github.com/PyTorchLightning/pytorch-lightning/issues...


No, you do not support the TPU infeed, and this is a crucial distinction. Saying that you do support this has caused endless confusion and much surprise. It’s almost not an exaggeration to say that you’re lying (sorry for phrasing this so bluntly, but I’ve seriously spent dozens of hours trying to break this misconception due to hype like this).

TPU support is real. Pytorch does in fact run on TPUs. But you don’t support TPU CPU memory, the staging area that you’re supposed to fill with training data. That staging area is why a TPU v3-512 pod can train an imagenet resnet classifier in 3 minutes at around 1M examples per second.

You will not get anywhere near that performance with pytorch on TPUs. In fact, you’re expected to create a separate VM for every 8 TPU cores. The VMs are in charge of feeding the cores. That’s insane; I’ve driven TPU pods from a single n1-standard-2 using tensorflow.

Repeat after me: if you are required to create more than one VM, you do not (yet!) support TPU pods. I wish I could triple underline this and put it in bold. People need to understand the limitations of this technique. Creating 256 VMs to feed a v3-2048 is not sustainable.


Like I said... pytorch and tensorflow team are working very hard to make this work. And yes, it's not a 1:1 with tensorflow, but we're making progress very aggressively.


I love what you guys are doing, and I love improving the ML ecosystem, but you’ve godda understand, people see this and think “oh, ok, it’s a small difference, no big deal.” In fact it’s a huge difference.

Picture a person with one arm and without legs. Would you say they aren’t “1:1 in terms of features”? They certainly won’t be winning any races.

And unlike real people, you can’t graft on a prosthetic limb to help this situation. The issue I’m describing here is a fundamental one that everyone keeps trying to sweep under the rug and pretend isn’t an issue. And then everyone wonders what’s going on.


I 100% agree. We don't want to misrepresent TPU support. In fact, we explicitly warn users in our docs. Open to suggestions about how we can communicate this much better to our users.

We just need to be a part of the effort to help bridge the big gap and barriers keeping users from TPU adoption.

https://pytorch-lightning.readthedocs.io/en/latest/tpu.html#...


> In fact, we explicitly warn users in our docs

This was mentioned above, but nowhere on that page does it talk about any limitations whatsoever.


There's a difference between "supporting TPUs" and "supporting TPUs at 100% potential". Although the distinction is important, I don't think the marketing here is misleading.


Not only is it misleading, it even somehow tricked you. :)

We’re not talking about a small 10% reduction in performance here. We’re talking like 40x differences.

If it seems unbelievable, and like it can’t possibly be true, well: now you understand my frustration here, and why I’m trying to break the myth.

Notice not a single benchmark has ever gone head to head in MLPerf using pytorch on TPUs. And that’s because using pytorch on TPUs requires you to feed each image manually to the TPU on demand, from your VM. Meaning the TPU is always infeed bound.

Engineers should be wincing at the sound of that. Especially anyone with graphics experience. Being infeed bound means you have lots of horsepower sitting around doing nothing. And that’s exactly the situation you’ll end up in with this technique.

There’s a way to settle this decisively: train a resnet classifier on imagenet, as quickly as possible. If you get anywhere near the MLPerf v0.6 benchmarks for tensorflow on TPUs, I will instantly pivot the other direction and sing the praises of pytorch on TPUs far and wide.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: