I Ran Stable Diffusion 3 Prompts in Midjourney | SD3 vs. Midjourney Prompt Battle

Lexie AI
31 Mar 202403:50

TLDRThe video script presents a series of comparisons between the outputs of Stable Diffusion 3 (SD3) and Mid Journey, two AI art generation models. It showcases five different prompts, including an Elven Ranger, a child riding a llama, an alien banana cop, an anime girl, and a stack of animals. SD3 consistently outperforms Mid Journey in terms of detail and accuracy, winning each round. The video ends with a call to action for viewers to like and subscribe for more content.

Takeaways

  • 🎨 The video discusses a comparison between Stable Diffusion 3 (SD3) and Mid Journey (MJ) in generating images from text prompts.
  • 🏹 The first prompt was for an 'Elven Ranger' with specific characteristics; SD3 missed the bow while MJ had a minor issue with the arrow's position.
  • 🦙 The 'Llama Kid' prompt resulted in a cute image from SD3, but MJ struggled with the setting and the depiction of the child.
  • 👮‍♂️ SD3 created a humorous image of an 'Alien Banana Cop', while MJ's interpretation lacked the police element but captured the essence of the scene.
  • 👩‍🎤 The 'Let's Go Girl' anime-style prompt was well-executed by SD3, with MJ improving but still lagging in text rendering quality.
  • 🐔 The final prompt, 'Stack of Animals', saw SD3 struggling with the dog and mule, while MJ produced a comically incorrect but entertaining image.
  • 🏆 SD3 generally outperformed MJ in the prompt interpretations, winning the majority of the matchups.
  • 🤖 The video highlights the capabilities and limitations of AI in image generation based on text prompts.
  • 📸 The content showcases the potential for AI-generated art and the creativity that can be unlocked with technology.
  • 👍 The video encourages viewers to like and subscribe for more content on AI-generated images.
  • 🎥 The script is a commentary on the current state of AI art generation and its potential for future development.
  • 🌐 The video also promotes the creator's Twitter account for further engagement and prompt submissions.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is a comparison between the results produced by Stable Diffusion 3 and Mid Journey, two AI image generation models, based on various prompts.

  • What is the first prompt described in the video?

    -The first prompt described in the video is for an 'Elven Ranger' with braided platinum hair, a rune-etched bow, glowing eyes, and aiming at a roaring Dragon.

  • What is the issue with the Stable Diffusion 3 image for the Elven Ranger?

    -The issue with the Stable Diffusion 3 image for the Elven Ranger is that the bow is missing, which makes it difficult for the arrows to fly, especially since one of the arrows is depicted as the elf's middle finger.

  • Which AI model won the first round of the comparison based on the video?

    -Mid Journey version 6 won the first round of the comparison, despite the arrow going through the elf's thumb, because it captured most of the prompt details better than Stable Diffusion 3.

  • What is the prompt for the 'Llama Kid' matchup?

    -The prompt for the 'Llama Kid' matchup is a digital art picture of a child riding a llama with a bell on its tail through a desert.

  • How does the video describe the Stable Diffusion 3 image for the 'Llama Kid'?

    -The video describes the Stable Diffusion 3 image for the 'Llama Kid' as adorable and well-executed, with the only issue being the out-of-place bell.

  • What is the 'Alien Banana Cop' prompt about?

    -The 'Alien Banana Cop' prompt is about a Xenomorph police officer enjoying a banana during the golden hour in Hawaii.

  • Which model had a more accurate depiction of the 'Alien Banana Cop'?

    -Stable Diffusion 3 had a more accurate depiction of the 'Alien Banana Cop', showing three bananas and maintaining the alien's intimidating appearance while eating.

  • What is the 'Let's Go Girl' prompt in the video?

    -The 'Let's Go Girl' prompt is for an anime-style girl with white hair and red eyes, with a speech bubble saying 'Let's go together' on a live stage.

  • How does the video evaluate the text rendering in the 'Let's Go Girl' images?

    -The video evaluates the text rendering in the 'Let's Go Girl' images as flawless, with Stable Diffusion 3 performing significantly better than Mid Journey in this aspect.

  • What is the final prompt described in the video, and what are the reactions to the images produced by both AI models?

    -The final prompt described in the video is a 'stack of um animals', featuring a rooster standing on a cat, which is standing on a dog, which is standing on a mule, which is standing on a turtle. The reaction to the images is amusement, with the Mid Journey version receiving praise for its creativity and whimsy, particularly the depiction of 'chicken dog turtle'.

Outlines

00:00

🎨 Stable Diffusion 3 Art Comparison

The video script discusses a comparison between Stable Diffusion 3 and Mid Journey 6 in generating images from various prompts. The first prompt involves an Elven Ranger with a missing bow, where Mid Journey has a minor error with the arrow going through the thumb. The second prompt features a child riding a llama, with Stable Diffusion 3 accurately depicting the scene and Mid Journey struggling with the setting and the presence of a misplaced bell. The third prompt, 'Alien Banana Cop,' showcases Stable Diffusion 3's creative interpretation of a Xenomorph police officer with bananas, while Mid Journey's version lacks the cop appearance. The fourth prompt, 'Let's Go Girl,' highlights Stable Diffusion 3's success in text and image accuracy, with Mid Journey improving but still lagging in text quality. The final prompt, a stack of animals, sees Stable Diffusion 3 struggling with the dog and mule, while Mid Journey creates a bizarre yet fascinating image. The video ends with a call to action to like and subscribe for more content.

Mindmap

Keywords

💡stable diffusion 3

Stable diffusion 3 is a term that refers to an advanced AI image generation model mentioned in the video. It is capable of creating detailed and impressive images based on textual prompts. In the context of the video, it is compared with another model, Mid Journey version 6, to evaluate which one performs better in generating images according to given prompts. The video showcases a series of matchups where stable diffusion 3 often produces more accurate and complete images, leading to its victory in the comparisons.

💡prompts

In the context of the video, prompts are textual descriptions or requests that are used as input for AI models like stable diffusion 3 and Mid Journey to generate images. These prompts are essential as they guide the AI in creating visual representations of the described scenes or characters. The effectiveness of an AI model is often judged by its ability to accurately interpret and respond to these prompts.

💡Elven Ranger

Elven Ranger is a character concept from the video's script, which is used as a prompt for the AI models to generate an image. It refers to a fantasy archer with specific attributes like braided platinum hair, a rune etched bow, and glowing eyes. The character is supposed to be aiming at a roaring dragon, showcasing the AI's ability to interpret and visualize complex and imaginative scenarios.

💡roaring Dragon

A roaring Dragon is a mythical creature often depicted in various forms of media as a powerful and fearsome entity. In the context of the video, it serves as a part of the 'Elven Ranger' prompt, where the dragon is the target the ranger is aiming at. This element adds an action and a sense of drama to the image that the AI models are tasked to generate.

💡Mid Journey version 6

Mid Journey version 6 is another AI image generation model compared against stable diffusion 3 in the video. It is used to demonstrate how different AI models handle the same prompts and to evaluate their effectiveness in creating images. The video shows a series of comparisons where Mid Journey version 6 often falls short in accuracy or detail compared to stable diffusion 3.

💡digital art

Digital art refers to the creation of artistic compositions or designs using digital technology or computer software. In the video, the term is used to describe the output of AI models like stable diffusion 3 and Mid Journey version 6 when they generate images from textual prompts. The quality and accuracy of the digital art are crucial in determining the effectiveness of these AI models.

💡Xenomorph police officer

Xenomorph police officer is a creative concept used as a prompt in the video, combining the idea of an extraterrestrial creature from the 'Alien' film franchise with the role of a police officer. This prompts tests the AI's ability to imagine and visualize a unique and surreal scenario, blending science fiction elements with a common societal role.

💡anime style

Anime style refers to a distinctive style of animation that originated in Japan, characterized by colorful artwork, fantastical themes, and vibrant characters. In the video, the term is used to describe the visual aesthetic that the AI models are tasked to replicate when generating an image of an anime style girl with white hair and red eyes.

💡stack of animals

A stack of animals is a humorous and imaginative concept used in the video as a prompt for the AI models. It refers to a scenario where different animals are stacked on top of each other, creating a whimsical and visually interesting image. This prompt tests the AI's ability to handle complex and unconventional arrangements in its generated images.

💡Golden hour

Golden hour is a term used in photography and cinematography to describe a period shortly after sunrise or before sunset when the light is often softer, warmer, and more diffused. It is considered an ideal time for taking photographs or shooting videos due to the quality of the natural light. In the video, 'golden hour in Hawaii' is part of the prompt for the 'alien banana cop', suggesting a setting where the AI model needs to incorporate the characteristic lighting of this time of day into its generated image.

💡YouTube

YouTube is a video-sharing platform where users can upload, share, and view videos. In the context of the video, it is the medium through which the content is being presented and where viewers can interact by liking, commenting, and subscribing. The video encourages viewers to engage with the content by hitting the like button and subscribing to the channel.

Highlights

The use of Stable Diffusion 3 for generating images from text prompts.

Stable Diffusion 3 is not yet widely available to the public.

The presenter obtained prompts from someone with early preview access to Stable Diffusion 3.

Comparison of Stable Diffusion 3 and Mid Journey version 6 in image generation.

The Elven Ranger prompt resulted in an image with a missing bow in Stable Diffusion 3's output.

Mid Journey version 6 had an arrow going through the elf's thumb in its output.

The Llama Kid prompt showed a child riding a llama through a desert with a misplaced bell.

Mid Journey's interpretation of the Llama Kid prompt was less accurate in terms of setting and character.

The Alien Banana Cop prompt led to a creative image of a Xenomorph police officer with bananas.

Mid Journey's version of the Alien Banana Cop lacked the police element but captured the essence of the setting.

The Let's Go Girl prompt resulted in a well-executed anime-style girl by Stable Diffusion 3.

Mid Journey's text rendering is improving but still has significant room for improvement.

The Stack of Animals prompt showcased a creative and humorous image concept.

Stable Diffusion 3 had some difficulty with the Stack of Animals, particularly with the dog and mule.

Mid Journey's Stack of Animals image was surprisingly accurate and visually appealing.

The video encourages viewers to subscribe to the channel for more content on AI-generated images.

The presenter shares a link to 17 minutes of Sora vids, showcasing more AI-generated content.