Stable Diffusion 3 HANDS ON! How Good Is It Really?

All Your Tech AI
18 Apr 202408:51

TLDRStability AI has recently launched Stable Diffusion 3 and Stable Diffusion 3 Turbo, accessible only via API through their partnership with Fireworks AI. These models are set to have their model weights available for self-hosting to members of Stability AI soon. Despite the high API pricing, with credits costing about $10 per thousand, the models have been tested and demonstrated to generate images with quality comparable to those on Stability AI's website. The prompt adherence is notably good, and the text within images is handled more coherently than previous versions. The Turbo model is faster but with lower resolution. Users can explore these models further on Pixel Dojo with a Pro Plan, which starts at $9.95 per month for unlimited generations.

Takeaways

  • 🚀 Stable Diffusion 3 and Stable Diffusion 3 Turbo have been released by Stability AI and are available via API.
  • 🤝 Stability AI has partnered with Fireworks AI, an API platform for hosting and accessing services like Stable Diffusion.
  • 📚 Model weights for self-hosting will be made available to Stability AI members in the near future.
  • ⏱️ The reviewer set up Stable Diffusion 3 beta on Pixel Doo within 3 hours.
  • 💰 The API pricing is relatively high, with costs around $10 per thousand credits.
  • 🔢 Generating an image with Stable Diffusion 3 costs 6 to 12 credits, which is 32 times more expensive than Stable Diffusion XL 1.0.
  • 📈 A Pro Plan subscription starting at $9.95 per month offers unlimited image generation on Pixel Doo.
  • 🎨 The quality of images generated by Stable Diffusion 3 is generally on par with those displayed on the website, suggesting less cherry-picking.
  • 📝 Text coherence in images generated by Stable Diffusion 3 can be inconsistent, with some attempts requiring multiple tries to get the text correct.
  • 🔍 Prompt adherence for positive prompts seems to be very good, potentially reducing the need for negative prompts.
  • 🔋 The Turbo model is faster but may result in lower quality images compared to the standard model.
  • 🔍 The reviewer suggests that users can experiment with negative prompts to improve results and invites feedback on Stable Diffusion 3 and Stable Diffusion 3 Turbo.

Q & A

  • What is the name of the latest release from Stability AI?

    -The latest release from Stability AI is called Stable Diffusion 3 and Stable Diffusion 3 Turbo.

  • How are Stable Diffusion 3 and Stable Diffusion 3 Turbo made available to users?

    -They are made available via an API and have partnered with Fireworks AI, an API platform that provides hosting and fast stable access.

  • What is the key feature of the partnership with Fireworks AI?

    -The key feature is that it allows for self-hosting of the model weights with a Stability AI membership in the near future.

  • How long did it take to get Stable Diffusion 3 beta up and running on Pixel Doo?

    -It took about 3 hours to get Stable Diffusion 3 beta up and running on Pixel Doo.

  • What is the pricing structure for the API?

    -The pricing structure involves purchasing credits, with Stable Diffusion 3 costing 6 to 12 credits per image generated, making it approximately 32 times more expensive than Stable Diffusion XL 1.0.

  • What is the starting price for the Pro Plan on Pixel Doo?

    -The Pro Plan on Pixel Doo starts at $9.95 a month, which includes unlimited usage.

  • How does the quality of images generated by Stable Diffusion 3 compare to those on the Stability AI website?

    -The quality of images generated by Stable Diffusion 3 is quite good and does not seem to be significantly cherry-picked compared to the examples on the Stability AI website.

  • What is a challenge that most AI generators have faced with text in images?

    -A challenge that most AI generators have faced is maintaining text coherence and ensuring that the text is correctly and coherently integrated into the generated images.

  • How did Stable Diffusion 3 perform with text in the images?

    -Stable Diffusion 3 showed mixed results with text in images. Some text was correctly generated, while in other cases, the text was mangled or not coherent.

  • What is the main difference between Stable Diffusion 3 and Stable Diffusion 3 Turbo?

    -The main difference is that Stable Diffusion 3 Turbo is a quicker model but with lower quality and resolution compared to the standard Stable Diffusion 3 model.

  • What is the purpose of negative prompts in image generation?

    -Negative prompts are used to guide the AI away from including certain elements or styles in the generated image, allowing for more control over the final output.

  • What does the reviewer suggest about the necessity of negative prompts with Stable Diffusion 3?

    -The reviewer suggests that due to the high adherence to the positive prompt, negative prompts may not be as necessary with Stable Diffusion 3 as with previous versions.

Outlines

00:00

🚀 Stable Diffusion 3 and Turbo Release via API

Stability AI has released two new models, Stable Diffusion 3 and Stable Diffusion 3 Turbo, exclusively through an API. They have partnered with Fireworks AI for hosting and fast access. The model weights will be available for self-hosting with a Stability AI membership soon. The API pricing is relatively high, with credits needed for image generation, making Stable Diffusion 3 about 32 times more expensive per image than Stable Diffusion XL 1.0. The speaker implemented Stable Diffusion 3 beta on Pixel Dojo within 3 hours, allowing users to generate images with optional prompts and select between the two models. Examples are provided for quick prompt loading. The speaker also discusses the cost of the Pro Plan and its benefits.

05:02

🎨 Testing Image Generation and Prompt Adherence

The speaker tests the image generation capabilities of Stable Diffusion 3 and Stable Diffusion 3 Turbo using various prompts from press releases to check for cherry-picking of images. The results are compared to those displayed on the website. The speaker notes that the models are fast and generally follow the prompts well, although there are some inconsistencies with text in images. The speaker also observes that the Turbo model is quicker but produces lower quality images. The speaker concludes that Stable Diffusion 3 mostly lives up to the hype, with good prompt adherence and image quality, and suggests that negative prompts might not be necessary due to the improved performance. The speaker invites viewers to try the models on Pixel Dojo with a Pro membership, which offers unlimited generations and access to other features.

Mindmap

Keywords

💡Stable Diffusion 3

Stable Diffusion 3 is an advanced AI model released by Stability AI, designed for generating images from textual prompts. It represents a significant upgrade from its predecessors, offering faster and more accurate image generation capabilities. In the video, the host discusses the features and performance of Stable Diffusion 3, comparing it to previous models and demonstrating its ability to generate complex images based on detailed prompts.

💡API

An API, or Application Programming Interface, is a set of rules and protocols that allows different software applications to communicate and interact with each other. In the context of the video, Stability AI has made Stable Diffusion 3 available via an API, which means users can access the image generation capabilities of the model by sending requests to the API. This is a common method for providing access to AI services and allows for integration into various platforms and applications.

💡Fireworks AI

Fireworks AI is mentioned in the video as the API platform that Stability AI has partnered with. This partnership allows for hosting and providing fast, stable access to AI models like Stable Diffusion 3. The platform is responsible for managing the infrastructure and ensuring the smooth operation of the API, which is crucial for delivering a good user experience when generating images.

💡Model Weights

Model weights refer to the parameters of a machine learning model that have been learned from training data. These weights are crucial for the model's ability to make predictions or generate outputs. In the video, the host mentions that Stability AI plans to make the model weights of Stable Diffusion 3 available for self-hosting to members, which means users with the appropriate technical knowledge and resources could run the model on their own servers.

💡Pixel Doo

Pixel Doo appears to be the platform or service where the host has implemented the Stable Diffusion 3 beta for users to generate images. It serves as an interface that allows users to input prompts and receive generated images from the AI model. The host discusses the process of setting up Pixel Doo with Stable Diffusion 3 and the user experience it provides.

💡Prompt

In the context of AI image generation, a prompt is a text description that guides the AI model in creating an image. It is a crucial part of the process as it directly influences the output. The video script discusses the use of prompts in generating images with Stable Diffusion 3, including both positive prompts that describe the desired image and negative prompts that specify what should be avoided.

💡Negative Prompt

A negative prompt is a type of prompt used in AI image generation that specifies elements or characteristics that should not be included in the generated image. It is used to refine the output and ensure that the generated image aligns more closely with the user's intentions. The host of the video briefly mentions negative prompts but focuses more on the effectiveness of positive prompts in Stable Diffusion 3.

💡Credits

In the context of the video, credits refer to the units of currency or points within the API system that are used to pay for image generation services. The host mentions that the pricing for using the Stable Diffusion 3 API is relatively high, with costs associated with generating each image based on the number of credits required.

💡Pro Plan

The Pro Plan is a subscription plan mentioned in the video that offers unlimited usage of Pixel Doo, including access to the Stable Diffusion 3 model. It is a paid plan that starts at a certain monthly cost, providing users with the ability to generate images without worrying about the number of credits consumed.

💡Cherry Picking

Cherry picking in the context of AI image generation refers to the selection of only the best or most impressive results to showcase. The host expresses a desire to avoid cherry picking by testing the AI model with various prompts and evaluating the first generation of images produced, rather than selecting only the most successful outcomes.

💡Text Coherence

Text coherence is the quality of text where the elements are well connected and the overall message is clear and logical. In the context of AI image generation, text coherence is important when generating images with text elements. The video discusses the challenges AI models face in generating coherent text within images and evaluates Stable Diffusion 3's performance in this area.

Highlights

Stability AI released Stable Diffusion 3 and Stable Diffusion 3 Turbo, available only via API.

They've partnered with Fireworks AI for hosting and fast stable access.

Model weights will be available for self-hosting with a Stability AI membership soon.

Stable Diffusion 3 beta was set up on Pixel Doo within 3 hours.

Users can generate images with a prompt, optionally a negative prompt, and choose between two versions.

API pricing is high, at about $10 per thousand credits.

Generating an image with Stable Diffusion 3 is 32 times more expensive than with Stable Diffusion XL 1.0.

A Pro Plan starts at $9.95 per month for unlimited usage of Pixel Dojo.

The quality of images generated is comparable to those displayed on the website, suggesting no cherry-picking.

Prompt adherence is strong, with generated images closely following the input prompts.

Stable Diffusion 3 may not require negative prompts as much due to its improved performance.

Text coherence in images generated by Stable Diffusion 3 is generally good, although not perfect.

Stable Diffusion 3 Turbo is faster but produces lower quality images compared to the standard model.

The AI successfully generated complex images with multiple elements, such as a kangaroo with beer and ski goggles.

An entire universe inside a bottle on a Walmart shelf was one of the creative prompts successfully generated.

Stable Diffusion 3 handled a prompt with a cheeseburger on a toilet-throne in a royal chamber well.

The AI accurately generated an image of a monkey holding a sign rating Tech AI as awesome.

Overall, Stable Diffusion 3 lives up to the hype for most part, with high-quality image generation.