Hacker Newsnew | past | comments | ask | show | jobs | submit | mishu2's commentslogin

Over the holidays I built a simple website which lets children (of all ages) easily draw something and then bring the sketch to life using AI and a prompt.

https://funsketch.kigun.org/

Only shared it via Show HN so far, and am still regularly getting some creative submissions. Will be sharing it at an art festival later this year so kids can have a more active role when visiting.


Thank you everyone for taking a look. The website had around 1,200 visitors and received over 90 sketches over the past 24 hours, and I'm happy to say I could approve almost all of them (all except 2 'rocket' sketches which were starting to look a bit dubious, but all in good fun).

The results look very interesting to me, and I think as a next step I will look into adjusting the prompts (both positive and negative). On the one hand, I'd like to keep the sketch/drawing look rather than going full photorealism as some of the videos do, on the other hand I don't want to restrict the users' creativity too much.

I plan to simplify this further and optimize it for tablets. Some friends said they'd like to have their kids try it out it and I like the idea, as by having to sketch it keeps the user in a more active role.

Please let me know if you have any other suggestions.


I'm actually trying to reduce the 'funkyness', initially the idea was to start from a child's sketch and bring it to life (so kids can safely use it as part of an exhibit at an art festival) :)

There's a world of possibilities though, I hadn't even thought of combining color channels.


I think they were suggesting that it might be possible to inject the initial sketch into every image/frame such that the model will see it but not the end user. Like a form of steganography which might potentially improve the ability of the model to match the original style of the sketch.


Thank you. I've noticed that too, and also that it has a tendency to introduce garbled text when not given a prompt (or a short one).

This is using the default parameters for the ComfyUI workflow (including a negative prompt written in Chinese), so there is a lot of room for adjustments.


Oh I was wondering why some of the hallucinations introduced Chinese text/visuals, I'm guessing that might be due to the negative prompt.


I think the main reason is that the model has a lot of training material with Chinese text in it (I'm assuming, since the research group who released it is from China), but having the negative prompt in Chinese might also play a role.

What I've found interesting so far is that sometimes the image plays a big part in the final video, but other times it gets discarded almost immediately after the first few frames. It really depends on the prompt, so prompt engineering is (at least for this model) even more important than I expected. I'm now thinking of adding a 'system' positive prompt and appending the user prompt to it.


Would be interesting to see how much a good "system"/server-side prompt could improve things. I noticed some animations kept the same sketch style even without specifying that in the prompt.


Thank you. I'm running it locally on a 4090 (24 GB).

I was running into OOM issues with Wan 2.2 before, but I found the latest version of ComfyUI can now run it (using about 17 GB of VRAM for what I assume is a quantized model?).


Having the ability to do real-time video generation on a single workstation GPU is mind blowing.

I'm currently hosting a video generation website, also on a single GPU (with a queue), which is also something I didn't even think possible a few years ago (my show HN from earlier today, coincidentally: https://news.ycombinator.com/item?id=46388819). Interesting times.


Computer games have been doing it for decades already.


I think video-based world models like Genie 2 will happen and that they'll be shrunken down for consumer hardware (the only place they're practical).

They'll have player input controls, obviously, but they'll also be fed ControlNets for things like level layout, enemy placement, and game loop events. This will make them highly controllable and persistent.

When that happens, and when it gets good, it'll take over as the dominant type of game "engine".


I don't know how much they can be shrunk down for consumer hardware right now (though I'm hopeful), but in the near-term it'll probably all be done in the cloud and streamed as it is now. People are playing streamed video games and eating the lag, so they'll probably do it for this too, for now.


This is also the VR killer app.


Are you sure it's not just polish on the porn that is already the "VR killer app"?


A very, very different mechanism that "just" displays the scene as the author explictly and manually drew it, and yet has to pull an ungodly amount of hacks to make that viable and fast enough, resulting in a far from realistic rendition...

This on the other hand happily pretends to match any kind of realism requested like a skilled painter would, with the tradeoff mainly being control and artistic errors.


> with the tradeoff mainly being control and artistic errors.

For now. We're not even a decade in with this tech, and look how far we've come in the last year alone with Veo 3, Sora 2, and Kling 4x, and Kling O1. Not to mention the editing models like Qwen Edit and Nano Banana!

This is going to be serious tech soon.

I think vision is easier than "intelligence". In essence, we solved it in closed form sixty years ago.

We have many formulations of algorithms and pipelines. Not just for the real physics, but also tons of different hacks to account for hardware limitations.

We understand optics in a way we don't understand intelligence.

Furthermore, evolution keeps evolving vision over and over. It's fast and highly detailed. It must be correspondingly simple.

We're going to optimize the shit out of this. In a decade we'll probably have perfectly consistent Holodecks.


Hmmm, future video's might just "compress" down to a common AI model and a bunch of prompts + metadata about scene order. ;)


I feel like this misses the point. Also, vision and image generation are entirely different things. Even for humans, with some people not being able to create images in their head despite having perfectly good vision.

Understanding optics instead of intelligence speaks to the traditional render workflow, a pure simulation of input data with no "creative processes". Either the massive hack that is traditional game render pipelines, or proper light simulation. We'll probably eventually get to the point where we can have full-scene, real-time ray-tracing.

The AI image generation approach is the "intelligence" approach where you throw all optics, physics and render knowledge up in the air and let the model "paint" according to how it imagines the scene, like handing a pencil to a cartoon/anime artist. Zero simulation, zero physics, zero roles - just the imagination of a black box.

No light, physics or existing render pipeline tricks are relevant. If that's what you want, you're looking for entirely new tricks: Tricks to ensure object permanence, attention to detail (no variable finger counts), and inference performance. Even if we have it running in real-time, giving up your control and definition of consistency is part of the deal when you hand off the role of artist to the box.

If you want AI in the simulation approach you'll be taking an entirely different path, skipping any involvement in rendering/image creation and instead just letting the model pupetteer the scene within some physics restraints. Makes for cool games, but completely unrelated to the technology being discussed.


Bob Ross did it, too.


1 frame of Bob Ross = 1,800s


So with 108,000 (60 X 1,800) Bob Ross PPUs (parallel painting units) we should be able to achieve a stable 60FPS!


Once you set up a pipeline, sure. They'd need a lot of bandwidth to ensure the combined output makes any kind of sense, not unlike the GPU I guess.

Otherwise it's similar to the way nine women can make a baby in a month. :)


The food/housing/etc bill for 108k Bob Ross er... PPU's seems like it would be fairly substantial too.


Started working on a case discussion platform for students almost two years ago. Mostly for dentistry and medicine, but it's template-based so works well for other purposes (e.g. teachers, social workers, etc.). It's going well and is being used by three universities right now.

On the way, I developed lightweight image editor and 3D model viewer components, which I've open sourced [1].

[1]: https://github.com/kigun-org/


Started working on a case discussion platform for students around 18 months ago. Mostly for dentistry and medicine, but it's template-based so works well for other purposes (e.g. teachers, social workers, etc.). It's going well and is being used by three universities right now.

On the way, I developed lightweight image editor and 3D model viewer components, which I've open sourced [1].

[1]: https://github.com/kigun-org/


Author here, I forgot to mention the project is written using Svelte 5 and daisyUI, because I wanted to try the tech stack out.

I have to say I quite like it for more complex interactive components like this one, but still much prefer django with some htmx + Stimulus JS sprinkled in for the rest of the website.


Photography is a hobby of mine. After putting it off for years, I finally decided to start sharing my photo collection with friends and family.

This resulted in another side project, https://mishmash.photos/ -- a website to organize, share and collaborate on albums (because I always lose photos when going on trips with friends). There are better apps out there for this, but this one is mine.

Sample album: https://mishmash.photos/share/84f83b09-0a24-4d13-b436-8131ee...

Tech stack is django, htmx, bootstrap and a Stripe integration, to keep things simple.

(There's no free tier; from reading this website I know offering free image upload usually ends badly.)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: