Is there a way to create art for a book with consistent theme and characters using AI?

lefthandeddude@lemmy.dbzer0.com · edit-2 3 months ago

Is there a way to create art for a book with consistent theme and characters using AI?

Stampela@startrek.website · edit-2 3 months ago

Qwen edit can take an image as a sample, and work with that. “The character in a victorious pose” would get whatever character you have, and reproduce it in a victorious pose. Couple of examples:

And a little janky because it IS generative AI after all…

Edit: and a bonus screenshot showing how little effort I had to put into this lol

lefthandeddude@lemmy.dbzer0.com · 3 months ago

Is there a way to run this in a Linux-based system?

Even_Adder@lemmy.dbzer0.com · 3 months ago

Yeah, you can load Qwen Edit with Comfy UI. This tutorial for Comfy UI should help get you started.

Stampela@startrek.website · 3 months ago

Yes-ish. The base is Draw Things and the relevant bits are https://github.com/drawthingsai/draw-things-community?tab=readme-ov-file#cuda-capable-linux that isn’t too difficult to setup. The app with the pretty interface is Apple only (the developer one day decided to cram the full 1.5 on his iPhone and that was the start of this. The app has feature parity between the iOS, iPad and Mac versions, the gRPC server is “just” the generation parts decoupled from the app) but there’s a Comfy plugin to use the server.

BTW on Apple’s hardware Comfy is poorly optimized, while Draw Things is optimized. The iPhone XR is the oldest hardware capable of on device generation, and (with the right settings) could do a SDXL 1024x1024 generation. 13 minutes mind you for 8 steps, but also 3gb of total system memory. On the other hand, the iPhone 17 Pro is a third of the speed of my RTX 3060. There’s also a friendly Discord, and the dev clearly enjoys adding support for new, cool models because he’s quick at it but doesn’t share roadmaps of any kind.

Yeah. I really, really like that thing.

hendrik@palaver.p3x.de · 3 months ago

As far as I know, one way to do it is to use an image2image model / image editing model. You’d need to generate one reference image of your character. And then feed the reference image into it and tell it to draw that character in a different scenario. I haven’t done it in a while, not sure what people use these days to do it locally.

lefthandeddude@lemmy.dbzer0.com · 3 months ago

That sounds so hard to do locally. I don’t even know where I would begin. Is that expert level stable diffusion?

Even_Adder@lemmy.dbzer0.com · edit-2 3 months ago

You would probably need to use something like Visual Novel Character Creation Suite to generate a character and a dataset for LoRA training, and then train a LoRA for each character. LoRA are adapters you plug into larger models to help it generate whatever concept you want, in this case it would be a character. With a well-trained LoRA you shouldn’t have any problem with character consistency.

If you’re trying to get started with generating locally, here’s a tutorial for Comfy UI.

Scrubbles@poptalk.scrubbles.tech · 3 months ago

A big note that it’s nigh impossible to put two separate characters in one image, as the model has no idea who you are referencing. You can see this in all image models how they usually only have one central person, or two people who look oddly alike. Only way to get two or more characters would be with complex workflows to inpaint certain characters.

Even_Adder@lemmy.dbzer0.com · edit-2 3 months ago

Yeah, if you’re using character LoRAs you’re probably going to need to use regional prompting to keep the concepts from bleeding into each other.

tal@lemmy.today · 3 months ago

My limited experience is that stable characters across a number of images are a weakness today, and I wouldn’t be confident that genAI is a great way to go about it. If you want to try it, here’s what I’d go with:

If you can get images with consistent outlines via some other route, you can try using ControlNet to do the rest of the image.
If you just need slight variations on a particular image, you can use inpainting to regenerate the relevant portions (e.g. an image with a series of different expressions).
If you want to work from a prompt, try picking a real-life person or character as a starting point, that may help, as models have been trained on them. Best is if you can get them at once point in time (e.g. “actor in popularmovie”). If you have a character description that you’re slapping into each prompt, only describe elements that are actually visible in a given image.

I’ve found that a consistent theme is something that is much more achievable, in that you can add “by <artist name>” to your prompt terms for any artist that the model has been trained on a number of images from. If you’re using a model that supports prompt term weighting (e.g. Stable Diffusion), you can increase the weight here to increase the strength of the effect. Flux doesn’t support prompt term weighting (though it’s really aimed at photographic images anyway). It’s possible to blend multiple artists or genres as prompt terms.

nicgentile@lemmy.world · edit-2 3 months ago

Eyeballing here. I’m a learner and have hardly used this.

Would a checkpoint model (right term?) achieve consistency, built from a specific set of pictures?

tal@lemmy.today · edit-2 3 months ago

If you have or can create a LoRA trained on images of the character you’re presenting, that may be helpful. Or if you have a checkpoint model trained on that character. Would be like having a character that the base model is trained on.

Even_Adder@lemmy.dbzer0.com · 3 months ago

You might be thinking of a LoRA. LoRA are adapters you use with larger models to help it generate whatever concept you want.

nicgentile@lemmy.world · 3 months ago

Right. That. Thanks.

NihilsineNefas@slrpnk.net · 3 months ago

Removed by mod