Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Video to video with Stable Diffusion (stable-diffusion-art.com)
432 points by hui-zheng on June 12, 2023 | hide | past | favorite | 100 comments


I was going to get a 4090 this year, but I just don't think 24gb VRAM is going to bee enough in the short term future(for AI related stuff).

Ended up getting a 3060 for 1/4 of the cost, and I'm planning on using/paying for Colab until some 6090 comes out with 128gb vram.

Something that kind of bothers me that I don't understand. Why is there such an obsession about having small computers/servers? I don't care if I have to put my computer in the basement because its the size of a closet or two. Maybe there is an EE issue, like too much current or EMF if you make a large computer.

I'd usually use colab, for 99% of the jobs... but I admittedly want to do some nsfw stuff with me and the wife with AI Art.


I agree we need more VRAM. I wish someone would combine a GTX 960 level processor made on modern lithography (to cut power draw) with 32GB of low power GDDR VRAM and try to fit it in a single slot. 1x DP connector would be fine.

I agree about small computers. The rich folks in NY City and Silicon Valley set the trends, and real estate is expensive there. Also, graphic arts people make advertisements, and they want PCs to look cool regardless of their power. If you want a good, big system, take a look at used Xeon-based workstations. Sometimes a used HP z840 will sell for as little as $400 with an 1125 watt power supply.


They made a 2 slot, 24GB ram card with the same chip as on the GTX 980 Ti, called the Quadro M6000.

They're still around US$650 on Ebay though, unlike the 12GB variant of that card (much, much cheaper). Random example:

https://www.ebay.com.au/itm/255069557859

---

Digging around a bit more, it looks like there's a "Tesla M40" card of the same generation with 24GB ram. Seems to be quite a few of those on Ebay for around US$150-200.


As far as I know, the Tesla M40 is server "only" (it doesn't have sufficient cooling for a consumer desktop)


Yeah, that's a good point. Did a bit of looking into them, and people do use them in desktops but need to add some cooling.

Some places now sell them with 3D printed cooling shroud and fan already pre-assembled:

https://www.ebay.com.au/itm/125971572492

If I had the need for something with 24GB ram (even though slow), I'd probably get one of those. It's a fraction of the cost of the alternatives, and is supported by CUDA still.


Dell T7810 or 7820 are also good candidates


> Why is there such an obsession about having small computers/servers?

Space. Most people don't have room for a closet sized computer, or if they do they can't justify using the space that way instead of for something else. Also cost of materials for a case: to sell generally you need to properly deal with EMI.

Most of the cost is in the RAM and CPUs/GPUs, which beyond a certain point you simply don't need a bigger unit to fit in. If you are wanting to build up more compute power the usual way is to get a rack in that cabinet worth of space and put separate machines in there (that generally means low profile high-density kit with cooling solutions that are not optimised for quietness, but I'm sure water cooled options are available too).


I did go with the 4090 and so far don't regret it. It has forced me to learn about some memory reducing tricks when I want to fine tune SD, but it's been fun to see what can be done.

While it may not be enough VRAM for the future, I'm sort of taking the opposite stance: I'm only going to be playing with AI that I can build on my own machine using (high end) consumer hardware.

People have been creating amazing work with only SD 1.4 and 1.5. If this is the limit of what I can do with SD for the next few years, so be it, there's more than enough to play with and I'm personally not willing to fully cede the future of AI to large corps. There needs to be hackers in the space pushing the limits of what can be done (without worrying about copyright, TOS, etc).

Maybe the best stuff will be locked behind a server, but I'm much more interested in the creative possibilities of running projects (nsfw and otherwise) on my personal hardware.

That said, I think it's great to extend that logic to less price pieces of hardware (i.e. I'm only interested in what I can build at home on 8gb of VRAM, etc)


I think you need only 12GB to train a DreamBooth model? I trained some on Collab over the Christmas break but just being able to run that locally would open a lot of doors. Using collabs is just kinda a pain.


> Why is there such an obsession about having small computers/servers

More likely just cost. If a "small" computer costs $2k, how much do you think a "big" one will cost? $10k+ probably, what is the market for that?

And you can buy a rack in your home and put 10 servers in it if you really want a closet sized computer.



You'd expect manufacturing costs to go down.


Most of the cost is in CPU/GPU/RAM.


Ahh, most of the cost is RAM. Then GPU then CPU.

Look at a Dell or HP server configurator website to see why.

If you "only" need GPU, a craigslist search should find a less expensive 2 or 3yo server to plug your latest GPU into.


Ahh, most of the cost is CPU. It’s so expensive to have register memory, that you have l1 l2 caches. Each further away and cheaper. RAM is the cheapest which is why you can have so much of it. Try having 16G of register memory. UMA is a step towards that, but not quite.


Yes, so if your CPU doesnt need to be the size of a box of cards, your GPU doesnt need to be the size of a library book, the manufacturing costs would go down.


It gets harder to build big things with small transistors, not easier, due to yield.

Also if you want to use big transistors, power dissipation also means you have to stay small or you will start fires.


This is why I like the "open case design." Your "PC" is just a frame to hang components on, with nothing to enclose it. And the nice thing is these days people care a lot about quiet computers, so you can have an open frame case with three GPUs that is just making a not-unpleasant whooshing sound at full bore.


This exposes the components to more dust and potentially static shocks as a result. Cases are also designed to encourage airflow over specific components, so being case-free might not result in improved thermals.

But it is certainly more convenient for tweaking your build.


That's vastly overblown. I'd be more worried about pets or just accidents knocking some things off.


I highly doubt open air is going to drastically reduce component life. I don’t see how open air is anywhere near as bad as some of the negative pressure and unfiltered builds typical of OEMs for dust build up. I don’t even think dust causes “static shocks” in computers, it only clogs ventilation leading to overheating primarily.

Open air certainly is a choice and cases do function to better guide airflow when designed and built in correctly


I doubt it as well - for the most part - but if you're buying high end parts, I think it's hard to argue that it doesn't expose (literally) the components to more risk.

My understanding of the association between dust and static shocks are that the friction between dust particles necessarily creates static charges.


I have a wall mounted Open design case (Thermaltake Core P5) - dust has not been an issue at all. In fact, getting rid of dust is lot easier. Since I don't have to open up the case to clean it out. I just use a handheld blower to blow out the dust every so often from the sides. Regardless, it for some reason has far less dust pile up than a close case. I assume because in close cases, dust has no room to move out as easily.


Most cases are not designed to any of that stuff particularly well. Cases simply cannot predict what people are going to put in them To make any sort of detailed modeling useful.

Most cases rely on moving a bunch of air. As long as you love enough air, you cool well enough.


It reminds me of those scifi villains with exposed brain sticking out the top of their head. You'd think it would be terribly vulnerable, and it is, but the cooling and dust-managenent advantages makes it worth it.


I think it has a lot to do with living situation. Most of us don't have basements or many square feet to spare anywhere.

I built a system on one of these open frames back in 2015 when I was living in New Mexico and kept it in my grad student office: https://www.amazon.com/gp/product/B009RRIP86 . It was great (especially for swapping components), but then I moved. Living in a small apartment here in CA, that thing was pretty annoying. One time, I got egg on the motherboard. Long story, but I projected the first Trump-Hillary debate onto an old bedsheet on the wall, had friends over, and supplied eggs. Let's just say I was surprised at how far an egg can splatter. I did my best to clean it, but I had recurring issues with the ram slots following that incident. Last week I finally got rid of the whole system after its long and prestigious career.

Also, I kept it under my desk for a while and had a bad habit of sticking my feet in there and causing problems.

So, advantages of a closed case besides small footprint:

- Raw egg does not splatter into the ram slots

- You can't put your feet in the computer


Most consumer, prosumer and pro hardware has to have the EE/EMF figured out to be able to sell.

Other things to setup your own gear that are worth looking into.

Factoring in electricity costs is important too long term.

Beyond this, running hardware long term:

- If you run your hardware using a PDU (Power distribution unit) that conditions the electricity and keeps it stable, there is much less wear and tear on the electronics, especially transistors, etc, that love to pop.

Lots of gear out there like this: https://www.apc.com/us/en/product/AP7921B/apc-netshelter-swi...

- Cooling without a fan is most certainly only so good and more dependant on how well the closet or the surrouding environment can cool. It's almost certainly better to use a good case with fans, and add some cabinet fans, for example by AC Infinity

https://www.amazon.com/s?k=ac+infinity

Is the closet sealed? Maybe port a hole out and run a small few items like a small stand up AC that can be temperature activated, and something to handle the humdity one way or the other depending on the basement.

For setting up your own space, places like ebay are your friend. Industrial grade equipment is not made to fail.


> - If you run your hardware using a PDU (Power distribution unit) that conditions the electricity and keeps it stable, there is much less wear and tear on the electronics, especially transistors, etc, that love to pop.

> Lots of gear out there like this: https://www.apc.com/us/en/product/AP7921B/apc-netshelter-swi...

Those distribute power. They have no kind of conditioning on board which you'd know if you read the link you provided. Those are just so you can remotely switch stuff on and off, any protection in datacenter would be in electric cabinet.

And you do want to have any protection, especially against overvoltage/lighting as far as possible from the device


In my case, I just realized the last time I used one of these was in a data centre where electricity was conditioned by another piece of equipment upstream from it.

It would be helpful if you provided a link to something that did condition the electricity since you know about that instead of dropping a mic. Thanks!


Why wouldn’t you use colab for nsfw? Do you really not trust Google Cloud that much? Or do they have some kind of policy I’m not aware of (can’t imagine how they would enforce this)?


As always, the red carpet is rolled out to welcome all new users into the funhouse, but with time, the carpet is withdrawn, the doors are very slowly bolted shut, and then the house master starts to look under the beds of all the guests who don't fit a particular group


Yes, I don't trust any offsite stuff. (Remember PRISM?)

Me and my wife aren't even that obsessed with nudity and stuff, but given how nsfw stuff comes up in media and politics, I'll keep it offline until nudist colonies are the norm.


You apparently can't use anything to do with automatic-111 webui from colab, don't remember the exact restriction but I hear even if you do a simple, print with the word it freaks out..


You can with paid Colab. You just can't on the free tier.


cool, I have the pro, haven't tried it myself.


Google Cloud really freaks out when you ask it to do weird stuff with porcupines.

My wife and I just want to do weird stuff with porcupines, damnit!


If you want more vram, the nvidia pro cards is always an option(if you can afford it). The top spec has 48gb of vram, and you can get more than one.


Not going to bother with this until the temporal cohesion issue is solved. The results are cool, but the variety in frames makes it look like a very specific and distracting art style instead of true animation.


Mark my words. In a few years people will be writing filters and trying to figure out how to recreate the look of early AI animation.


Yes, just look at the amount of reviews of old cameras on YouTube! Of course, this is software so it will be easier to find than an old camera or lens but I am also sure it will also be celebrated as a timeless genre.


For that to happen this flickering animation style has to become popular first. I don't think it will be. It can only be applied to very niche situations. It will never be a mainstream thing like old cameras were.


Or they could just use the models we are using right now and get the same effect, not like the models are going away anywhere


Computationally the early models might be easier/cheaper to run in a few years as well.


...why? Do you expect GitHub and HuggingFace to go under in a few years?


The poster is just saying that "the "glitch" of today will be the nostalgic style of tomorrow".


Exactly. I have no idea what the other commenters thought I meant.


They are saying that there is no need to figure out how to recreate "the "glitch" of today that will be the nostalgic style of tomorrow" as the "old"/current models are all available on GitHub and huggingface, and will be in the future.

There is no need to figure out how to recreate this style as you will be able to find it on these platforms in the future. The commenter understood you, but you did not understand the commenter.


Yes. Because of the way reply notifications come in I thought it was a reply to a different comment.

But I still think that's missing the point of my (not entirely serious) comment.

People still have access to analogue amplifiers but digital simulations of them are still developed.

As models and workflows improve, if people want a flickery old-school look then they may well simulate it rather than go through the hassle of running old tools that might not mesh well with newer workflows.


> People still have access to analogue amplifiers but digital simulations of them are still developed.

That's because analog stuff requires hardware. There's "something to simulate". You don't simulate digital computers, perhaps you "emulate" them.

More generally, analog stuff is really different from digital stuff. Some "audiophiles" argue about which lossless compression algorithm applied to .wav files gives the best sound, but we know it's just bits. Bits are fungible. By contrast, you can never make a perfect copy of something analog (you can often make a copy that's indistinguishable for all purposes that matter, but there's still a difference). People still make and buy LPs, the sales are increasing. I don't expect that to ever happen for CDs, unless perhaps combined with scratching or otherwise damaging them on purpose.

Of course, nostalgia goes beyond such mundane distinctions, which gives rise to thingsike pixel art. But there's still nothing to simulate there.


We don't know actually. Could be Cease and Disease letters from creators of AI-networks, could be change in copyright laws, could be simply not profitable enough to continue operations or continue to store old data.


Nah, nobody is going to be figuring anything out, you'll be able to generate it with little effort with a prompt, just like everything else. But we will be so inundated by high quality digital art in every imaginable style that it will be boring, everything will be boring.


NVIDIA has already published something that solves the coherence problem, and IIUC it's even based on stable diffusion. Hopefully the community will reproduce it soon.

https://research.nvidia.com/labs/toronto-ai/VideoLDM/


> the variety in frames makes it look like a very specific and distracting art style instead of true animation.

Well, what is "true animation"?

There are number of types of traditional animation where you draw both the characters and the backgrounds from scratch for every frame, so you get this variety naturally.

Or you have more practical techniques like strata-cut animation[1], where again you get what we might think of today as a very "ML-like" effect of the background constantly shifting.

All of these techniques are very time intensive, so you don't often see them in commercial animation.

[1]:https://en.wikipedia.org/wiki/Strata-cut_animation


In this case we have objects disappearing and reappearing, and the design of clothing and objects (headphones/tiara) being significantly different from frame to frame, pieces of sleeves blinking in and out of existence, headphones becoming tiaras becoming helmets... I'm sure you can find examples of such things in traditional animation but there's clearly something peculiar going on here due to the frames being processed mostly standalone, and it's clearly specific to this exact way of processing. Even if you "draw both the characters and the backgrounds from scratch for every frame", normally the animator would at least have the previous frames and/or overall design at his disposal as a reference. What's happening here is more like those crowdsourcing experiments where you let many people draw a single frame without knowing what the others are up to.


you'd think a real AI would be aware of all these issues and not let this happen.


Did you scroll down and see the different methods used, specifically method 5? Not perfect, but getting pretty close.


The light reflection on the shoulder doesn't make much sense. I think this is less pronounced when soft light is involved. I wonder if sharp light and reflections will take longer to be handled properly. Every frame on the shoulder looks ok on its own, but when animated it feels like it was taken in a different environment than the rest.

I suspect depth maps (SD2 already supports them) could be used to achieve that in the future.

I wonder if a diffusion model could accept an "onion skin" noise, so the transitions between frames would be less jarring. Can someone with more knowledge than me explain what's the most promising approach here?


>The light reflection on the shoulder doesn't make much sense.

People will really grasp at anything to find something to critique in generative AI, huh?


Wait, should we not critique?

OK i'll fix that.

It's perfect. Stop all work and development on it now guys, it's perfect. It has no flaws and makes no mistakes. Amazing I guess.


I think @bondarchuk took my comment out of context, quoting just the first line. No big deal, it's just hard to respond to that.

I don't even see an issue with that particular artefact, I think it's an interesting problem from a technical pov, hence literally every other part of my comment.


It just seemed like a bit of a nitpick in the context of the huge temporal consistency issues/fixes, of course you're all free to discuss whatever you want and I agree it's somewhat interesting.


Haha, nope, not my intention at all. Although I understand the expectation give the average style of discourse on this site.


The last one is largely the original video with a cartoon filter and basically anime facetune. It is educational and interesting, but it looks the best because it's actually doing the least, though even then you get the temporal artifacts on the hair and so on.

It is improving super rapidly and it will be a pretty seamless soon enough in all likelihood. We are in the intermediate stage right now, and the results are going to be incredible.


The last one looks the creepiest - like a dystopian matrix glitch. Could be a cool effect to illustrate that.


It's a lot better than earlier work, but still not really at the level it needs to get to.


But isn't also the case you have so much control over the individual frames themselves anyway? Why draw the line where you are? Just based on how these things inherently work, I don't see the specificity problem getting solved anytime soon, short of making a whole new set of weights from some whole new cultural timeline of artistic output.

But its not an issue really! This is just what AI art is. It is something different from just painting: somewhat less and somewhat more in different dimensions. You trade in individual expression and authorship for bricolage and universality. That's cool, but it will always be "specific."

If you just want pretty anime girls or the like, you'd get a lot farther with dumber tools like motion-detection on spine-based animation anyway.


I've had the idea for a long time but never managed to try it: Alias-free convolutions (like StyleGAN3) may help with temporal cohesion.

The StyleGAN3 project page shows some good videos: https://nvlabs.github.io/stylegan3/


It's insane the speed of progress. That being said, the most smooth example in that tutorial (Temporal Kit) required a lot of outside processing.

I wonder if AUTOMATIC1111 ui will start handling workflow scripts, a.l.a blender workflow boxes.


There is another UI for workflow scripts: https://github.com/comfyanonymous/ComfyUI

Give it enough times and it will also have the UI simplicity of automatic1111 UI.


automatic1111 isn't what immediately springs to mind when I hear the phrase "UI simplicity"


ComfyUI allow you to design your own workflow. It also mean understanding how things works, it's far harder to use than automatic1111 UI.


Comfy is harder if you've only used gradio, just like Blender's modal interface was "too complicated" if you'd only used Autodesk products.

Each has its strengths, but I do wonder whether I'd feel automatic1111 was too random and restrictive if Comfy had been released first.


That's really impressive. This workflow in particular speaks of major potential:

https://comfyanonymous.github.io/ComfyUI_examples/controlnet...


Once the temporal fixes are applied, the residual morphing and flickering actually quite appeals to me. Reminds me of the animation style in A Scanner Darkly.


And to think ASD was entirely hand rotoscoped!


A Scanner Darkly was also immediately where my mind went.

I enjoy the style, but I'm also not sure it's the right medium for most works.


And looks like scramble suits without it


Does anyone else find it weird that a huge amount of AI video/imagery in tutorials etc seem to be sexy anime schoolgirls?

I'm a bloke, I've never been that particularly bothered by like, Lara Croft, Lena, muscular guys in action games etc, I get it, I want to look at idealized/attractive things as much as the next person, but something about this makes me super uncomfortable.


Maybe it's time for you to get involved and render something that you are bothered with, what's more comfortable for you.


You're not alone.


Same aesthetics as A-ha's "Take on me". Even with the flaws, I can see how this can simply be used with that exact aesthetic in mind, without the need for any improvement.


Something like this (music video, NSFW ): https://www.youtube.com/watch?v=edKo3y2cFUg


I had the exact same reaction!


Could people please add newbie resources to this post?


So all that's happening is people are getting increasingly good at hiding violations of spatiotemporal invariants?

This is treating the symptom and ignoring the disease.


Does anyone “in the know” have approximations on cost to run this on collab or similar? Maybe a very rough Dollars per minute of video?


A frame is like 20-30 seconds for one pass on my 3090, you may have multiple tools and multiple passes.. say 60 seconds a frame x 300 frames about 5 hours -- 3090 costs 0.44$ per hour on runpod.. a little more than 2$


Kinda crazy how averse to ongoing costs i am. I have a 3080TI for Blender rendering and sculpting and general blender, so i definitely need something good local.. but i'm debating buying top of the line on the next iteration so i can do this stuff locally. Thousands of dollars perhaps just to avoid a $2 cost lol.


The 2$ adds up + thats an ephemeral environment and these things are super finiky to get working properly (dependency and initial setup + adding tools to your workflow). Plus there is a more of a craft like - ad-hoc approach to how you want to do things, plus a learning curve, with what tools and scripts you want to do or write simple stuff on your own -- runpod sounds good on paper but after a while, you then want AWS/Google VM so things are stable and your environment is sane and stable. Then if you calculate, you'll come up to a cost of buying a nice card in 6-10 monthly payments.. So if you want to play with it in peace you spend once and only worry about electricity costs :D


> Something that kind of bothers me that I don't understand. Why is there such an obsession about having small computers/servers? I don't care if I have to put my computer in the basement because its the size of a closet or two. Maybe there is an EE issue, like too much current or EMF if you make a large computer.

I’m not using my GPU for heavy machine learning tasks at home. I have a Ncase M1 with a ryzen 3950x (16core/32 thread) and a 3090 and a few nvme m2 drives. It’s water cooled but for me the video card is the only pcie card I have, and water cooling keeps the temps reasonable even in the small space. It was a little tricky to build in but why take up a bunch of unnecessary space?


Does someone know how long rendering these videos takes? Can it be done in real-time?


No can't do it realtime. Each image would be about 20-60 seconds (depends on what you're doing with it)


I've only been using stable diffusion in a very basic way, give some text, get some images back. Seeing what others produce with it is amazing and also just feels beyond me, I'm simply not familiar with all these other extensions and scripts. That said I appreciate a tutorial like this which isn't glossing over steps, so it's worth a try.


That first video is fabulous. It reminds me of a standard trope in generative art : You generate a whole bunch of prospective images and then pick the nicest ones.

Except in this case we get to see all the prospects (The variations in chestplate, wings etc). And in such a cool style.


The end result look like a a cartoon filter applied frame by frame. Or, each frame is basically going through stable diffusion in photo mode


It looks to me there's a tradeoff between how much SD changes the input, and how temporally coherent it is.


Isn't runway vid2vid superior in the aspect of coherency? This flickering is horrible.


Yes but people want to run stuff on their own hardware or at least have a deep ability to configure and mix and match tools.


Did they finally fix the annoying flickering?


The Temporal Kit write up was amazing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: