InvokeAI 3.4 Release - LCM LoRAs, Multi-Image IP Adapter, SD1.5 High Res, and more

Invoke
22 Nov 202315:34

TLDRThe video discusses the release of version 3.4, highlighting new features such as the LCM scheduler for image generation, high-resolution fix, control net and T-to-I adapter compatibility, and multi-image IP adapters. It also mentions community contributions, language translations, and future updates, emphasizing the efficiency and versatility of the new tools for creators.

Takeaways

  • ๐Ÿš€ Introduction of LCM (Latent Consistency Model) for optimizing diffusion process with a new scheduler, reducing steps needed to generate images.
  • ๐Ÿ“ท Quality trade-off with LCM: While LCM makes generation more efficient, it may result in some loss of detail in the final images.
  • ๐ŸŒ Showcase of model quality before and after using LCM, highlighting the differences in image details and adherence to the prompt.
  • ๐Ÿ”ง Adjusting CFG scale affects the balance between efficiency and adherence to the prompt, with higher values increasing saturation and quality adjustments.
  • ๐Ÿ”„ Recommendations for using LCM with lower CFG scale values to maintain image quality while benefiting from the speed improvements.
  • ๐ŸŒŸ Return of high-resolution fix feature in 3.4, allowing for larger images without repeating patterns, thanks to contributor Paul Curry.
  • ๐ŸŽจ Control net and TOI adapter features are now compatible, enabling simultaneous use for more complex and nuanced image generation.
  • ๐Ÿ” Multi-image IP adapters introduced for advanced users, allowing blending of multiple concepts into a single image through the linear UI.
  • ๐ŸŒ Workflow editor enhancements, including new nodes for more advanced control over image generation and blending of concepts.
  • ๐ŸŒ Community contributions acknowledged, including translations and bug fixes, with a special mention of Dutch, Italian, and Chinese translations nearing completion.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is the release of version 3.4 and its new features, particularly focusing on the LCM (Latent Consistency Model) scheduler and its impact on the image generation process.

  • What does LCM stand for and what does it do?

    -LCM stands for Latent Consistency Model. It is a new technique for optimizing and making the diffusion process more efficient, using a new scheduler called the LCM scheduler. This reduces the number of steps needed to generate an image, allowing for faster generation and enabling the creation of various visual effects.

  • What is the LCM scheduler and how does it affect image generation?

    -The LCM scheduler is a new component of the LCM technique. It helps to reduce the steps required to generate an image, making the process more efficient. However, it may result in a loss of some details compared to the standard generation process.

  • What is the significance of the LCM Laura and where can it be downloaded?

    -The LCM Laura is a resource used in conjunction with the LCM scheduler. It can be downloaded from the Latent Consistency Hugging Face repository. LCM Laura works with both SDXL and SD15 models and helps in achieving better image generation results with the LCM scheduler.

  • How does the CFG scale affect the image generation process?

    -The CFG scale is a parameter that influences the adherence to the prompt in the image generation process. Adjusting the CFG scale can result in changes to the level of detail and the overall style of the generated images. Higher values may lead to more adherence to the prompt but can also cause saturation and quality adjustments.

  • What is the high-resolution fix feature and how does it work?

    -The high-resolution fix is a feature that allows for the generation of larger images from the model's original training size of 512 x 512 to a higher resolution of the user's choice. It generates the core composition at a lower resolution, then upscales it using ESR Gan or a straight resize, and runs a denoising process at the higher resolution.

  • What changes have been made to the control net feature and the T toi adapter feature in version 3.4?

    -In version 3.4, the control net feature and the T toi adapter feature are no longer mutually exclusive. Users can now use both features simultaneously on the same generation, allowing for more flexibility and control over the image generation process.

  • What is multi-image IP adapter and how is it used in the workflow editor?

    -Multi-image IP adapter is a feature that allows users to add multiple IP adapters and pass in multiple images to the same IP adapter within the linear UI. This enables the blending of different concepts together by adjusting the weights, creating a more complex and nuanced final image.

  • How can the new features in version 3.4 be used to blend concepts?

    -The new features in version 3.4, such as the multi-image IP adapter, allow users to blend concepts by passing multiple images of the same concept or different concepts into the same IP adapter. This can result in a blend of the average of those images, creating a new, unique visual representation.

  • What are some of the smaller updates and improvements included in version 3.4?

    -Some of the smaller updates in version 3.4 include the ability to recall VAE metadata for any generations, the addition of RGBA value fields in the Color Picker within the unified canvas, and numerous speed increases for Lauras and other text encoder loading times. Backend updates have also been made to improve efficiency.

  • How can users stay updated with future releases and improvements?

    -Users can stay updated with future releases and improvements by following the Invoke AI app, joining the Discord community, and checking the release notes for detailed information on new features, bug fixes, and translations.

Outlines

00:00

๐Ÿš€ Introduction to Release 3.4 and LCM Scheduler

The video begins with an introduction to the release of version 3.4, which is slightly late but brings numerous updates. The first feature discussed is the Latent Consistency Model (LCM), a new technique for optimizing the diffusion process using the LCM scheduler. This reduces the steps needed to generate an image, allowing for efficient generation. However, the presenter notes that while LCM improves efficiency, it may result in some loss of detail. The video then demonstrates the generation of four images using normal settings and compares them with images generated using the LCM scheduler, highlighting the differences in quality and speed. The presenter also explains how to use the LCM Laura model from the hugging face repo, which works with both SDXL and SD15 models, and the importance of adjusting the CFG scale for optimal results.

05:01

๐ŸŒŸ High-Resolution Fixes and Advanced Features

The presenter moves on to discuss the return of a simple high-resolution fix in version 3.4, which allows for the upscaling of images without losing quality. This feature is available for SD 1.5 models and works by first generating the core composition at a lower resolution and then upscaling it using ESR Gan or a straight resize, followed by a denoising process. The video also covers the new ability to use control net and T toi adapter features simultaneously, and demonstrates how the T toi color adapter can be used to modify the color of an image. The presenter then discusses the use of multi-image IP adapters in the workflow editor, which allows for blending different concepts together to create unique images.

10:03

๐ŸŽจ Advanced Workflow Editor Features and Updates

The video delves into new nodes added to the workflow editor for advanced users, focusing on multi-image IP adapters that enable the blending of multiple images and concepts. The presenter shows how this can lead to the creation of new and sometimes unusual combinations of concepts. The video also touches on smaller features such as the ability to recall vae metadata for any generations, thanks to a contributor, and the addition of RGBA value fields in the color picker. The presenter encourages viewers to check the release notes for a full list of contributors and updates, and highlights the completion of Dutch, Italian, and Chinese translations for the invoke AI app.

15:05

๐Ÿ”ง Performance Improvements and Future Updates

The final paragraph discusses various performance improvements in version 3.4, particularly for lauras and other text encoder loading times, as well as backend updates that have made certain functions in the engine more efficient. The presenter teases more updates to come, invites viewers to join the community on Discord, and concludes the video with a call to like, subscribe, and participate in the community discussions.

Mindmap

Keywords

๐Ÿ’กLCM

LCM stands for Latent Consistency Modulation, a new technique introduced in the video for optimizing the diffusion process in image generation. It involves using a specific scheduler, the LCM scheduler, which reduces the number of steps needed to generate an image, thereby increasing efficiency. However, this efficiency comes at the cost of some detail loss, which is an important consideration for users to keep in mind when incorporating LCM into their workflow. The term is used in the context of discussing the new features in version 3.4, specifically relating to the improvements in the image generation process.

๐Ÿ’กCFG scale

CFG scale refers to the Control Flow Graph scale, a parameter that influences the adherence of the generated image to the original prompt or input. Adjusting the CFG scale can result in variations in the generated content, with higher values leading to increased adherence but also potential saturation and quality adjustments. The CFG scale is an important aspect of fine-tuning the image generation process to achieve desired outcomes.

๐Ÿ’กHigh-resolution fix

The high-resolution fix is a feature that allows users to generate images at higher resolutions than the model's original training size. It works by first creating the core composition at a lower resolution and then upscaling it using techniques like ESR Gan or straight resizing, followed by a denoising process at the higher resolution. This feature is particularly useful for achieving larger, detailed images without the need for complex workflows.

๐Ÿ’กControl net

A control net is a feature that enables users to exert more control over the image generation process by conditioning the model on certain inputs. In the context of the video, it is mentioned that the control net feature is no longer mutually exclusive with the T-to-I adapter feature, meaning both can be used simultaneously for more refined and customized outputs.

๐Ÿ’กT-to-I adapter

The T-to-I adapter, or Text-to-Image adapter, is a tool that processes text prompts and translates them into visual elements for image generation. The video discusses the ability to use the T-to-I adapter alongside a control net, enhancing the flexibility and creativity of the image generation process.

๐Ÿ’กMulti-image IP adapters

Multi-image IP adapters are a feature that allows users to input multiple images into a single IP adapter to blend different concepts together. This can result in a more nuanced and complex image that represents an average or blend of the input concepts. The video provides an example of using this feature to combine different images of spiders with concept art sketches of a Yeti-like creature.

๐Ÿ’กInstant lauras

Instant lauras is a term used in the community to describe the ability to pass in multiple images of the same concept into the same IP adapter. This process helps the model to understand the general idea of the concept more accurately and blend that into the generated image. It is a feature that enhances the versatility and creativity of the image generation process.

๐Ÿ’กVAE

VAE stands for Variational Autoencoder, a type of artificial intelligence model used for efficient compression and generation of high-dimensional data. In the context of the video, VAE is used in the image generation process, and updating the VAE is necessary when certain issues, such as black images, occur during the generation with specific models like SXL.

๐Ÿ’กWorkflow editor

The workflow editor is a tool or interface within the AI application that allows users to visually construct and modify the process or 'workflow' for image generation. It includes various nodes and features that can be added or adjusted to create complex and detailed images. The video discusses the addition of new nodes and features in the workflow editor, catering to advanced users who want more control over the generation process.

๐Ÿ’กDiscord

Discord is a communication platform mentioned in the video where users can join a community related to the AI application. It serves as a space for users to interact, share ideas, ask questions, and stay updated on new features and improvements.

Highlights

Introduction of LCM, a new technique for optimizing the diffusion process using the LCM scheduler.

Reduction in the number of steps needed to generate an image with the LCM scheduler, enabling the creation of visually striking images seen recently on the internet.

Comparison of image quality before and after the application of the LCM scheduler, highlighting the trade-off between efficiency and detail.

Demonstration of the generation of four images with a cyborg King theme using normal generation settings.

Explanation of the process to change settings for LCM, including adjusting the CFG scale and incorporating the LCM Laura model.

Discussion on the impact of CFG scale on the adherence to the prompt and the resulting image quality.

Recommendation to use lower ranges of the CFG scale for optimal results.

Return of a simple high-resolution fix in version 3.4, allowing for the creation of larger images from the linear UI.

Integration of the high-resolution fix feature in SD 1.5 models and its process of upscaling images using ESR Gan or a straight resize followed by D noising.

Non-mutual exclusivity of the control net feature and the T toi adapter feature, enabling their simultaneous use in the same generation.

Use of the T toi color adapter to process colors and its effect on the final image generation.

Explanation of the multi-image IP adapters feature, which allows blending of different concepts into a single image.

Demonstration of the workflow editor's new nodes for advanced users, including the ability to pass multiple images to the same IP adapter.

Showcasing the creation of a hybrid image by blending concepts of spiders and a Yeti-like creature using the multi-image IP adapters.

Adjustment of concept weights in the IP adapter to control the prominence of different elements in the generated image.

Introduction of the ability to recall vae metadata for any generations, thanks to contributor Stefan Tobler.

Addition of RGBA value fields in the Color Picker within the unified canvas, a contribution by Rines 404.

Acknowledgement of community contributions to the 3.4 release, including bug fixes, translations, and language enhancements.

Mention of significant speed increases in version 3.4, particularly for lauras and other text encoder loading times.

Tease of more updates coming soon for Invoke, encouraging users to stay tuned and join the community on Discord.