Many have likely browsed the internet long enough to have been fooled by an AI-generated image or two at some point. Text-to-image models like DALL·E , Midjourney, and Stable Diffusion have advanced enough that most people only have about a 50%-60% chance of discerning if an image is AI-generated, and the odds aren’t improving - but are these models good enough to forego using human artists altogether?
Midjourney was created by David Holz, co-founder of UltraLeap (previously Leap Motion) and first published in July 2022. Midjourney Inc. describes itself as “an independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species.” Midjourney has been noted for enhancing collaboration and communication among design teams, and enables “rapid and iterative advancement of corporate-commissioned designs,” as noted in an MDPI joint study between Beihang University and Tsinghua University. Midjourney has also been used in photo competitions, misleading viral media, and enhancing graphic novels. It has garnered some attention for its use in comics, notably for a copyright case for Zarya of the Dawn, which uses images made with Midjourney and edited with Adobe Photoshop. This case and its ramifications were covered in a previous AMT Lab article entitled U.S. Copyright Office Ruling and Implications on A.I., but beyond the ethical and legal quandaries: is the tech good enough to satisfy a discerning comic book audience? The comic was created in fall of 2022, and for those who are curious, some of the work may be read here. When reviewing the work, problems immediately emerge. The main character is inconsistent and lacks expression, and it is hard to get a sense of narrative from the images alone. Yet Zarya of the Dawn was generated using Midjourney V3, and Midjourney V6 and Niji V6 are the latest and notably much more advanced. This article provides a review comparing the various offerings of Midjourney.
MIDJOURNEY v6 Cost and Access
In order to have multiple images that are consistent with one another, the basic plan is not practical for a large project, because the allotment of fast GPU is likely to be consumed quickly. Furthermore, the standard plan, users have access to slower image generation. The beginner plan does not provide access to “Stealth Mode” which allows the user to remove images from Midjourney’s publicly available catalog. All images are considered completely within the public domain otherwise.
Midjourney is hosted on Discord, thus requiring a Discord account. To organize work separately from all Midjourney users, one must invite the Midjourney Bot to a private server. This allows you to create different channels to organize your prompts, which would be useful for establishing different settings and characters. For those that have submitted more than 100 prompts, a new feature is available to submit prompts and tweak archived images on Midjourney.com. This is extremely useful if images are ephemeral. Certain images will disappear if followed by “Only you can see this.”
Model Comparison
Midjourney V6 vs Niji V6
Niji is a model designed by Midjourney and Spellbrush specifically designed for anime-inspired and illustrated images, while Midjourney V6 is trained to utilize all image types. Below offers a comparison with rendering models from both, each specifying for a photo.
While the prompt “photo of male BIPOC elf teen with short white dreads wearing plain robes. --s 50.” is specified for photography, Niji generated illustrations regardless. As this review is a byproduct of using Midjourney to create a comic, Niji 6 served as the primary model.
Key Parameters
Midjourney offers a variety of ways to impact image generation beyond text description using parameters and settings. Midjourney provides a full overview of their various parameters, but highlights include:
Character reference (--cref): In order to create consistency from image to image, Midjourney relies far more on image referencing than text prompts. Using this parameter followed by the image url at the end of a prompt helps Midjourney to generate the same character in the referenced image.
Style reference (--sref): Imitates an art style of a referenced url image while following your prompt (or other parameters) to render new images. The style reference does not need to have any elements related to the text prompt.
Style references and character references can be combined to imitate an art style while following design elements of a chosen character. This even applies to photographs. Style and key character design elements should still be part of the text prompt, as it leads to more accurate image generation.
Aspect ratios (--ar): The default aspect ratio for any version of Midjourney is 1:1, square images. This feature ensures the images are the aspect ratio you desire from the get-go.
No (--no): Attempts to avoid specific content from being generated. Useful when previous iterations of a prompt continually include elements that are unwelcome.
Quality (--q): Quality determines how many GPU minutes are dedicated to the generation. The lowest quality setting, 0.25, adds less detail, while 1 (the default) adds the most.
Key Settings
Users can employ the settings command to edit some default settings for all of prompt (imagine) commands.
Switch model versions: Includes past versions of Midjourney and Niji. May be useful to switch to earlier versions if one is deliberately seeking lower quality, but at the risk that images rendering are more easily discerned as AI. The main advantage of Version 6 (regardless of whether it is Niji V6 or Midjourney V6) is that it is capable of accepting longer prompts and rendering simple text.
RAW Mode: Focuses more on matching the text prompt and less automatic beautification in an image. Raw mode prompts require more GPU time. Not recommended for simple prompts or beginners.
As seen above, raw mode was able to display the man behind the bar in most of the images, while the default avoided placing the man behind any objects. Raw mode subtly allows users to make visual choices that Midjourney ordinarily doesn’t create.
Stylize: High stylization creates images that are “very artistic” but less connected to the prompt. I recommend switching to low. Best for single images for promotion or covers, but makes it difficult to maintain consistency in character and art style across multiple images.
Remix mode: Allows you to edit your prompt again when you want a variation on an image. This setting is essential for precision and tweaking without starting prompts or images from scratch.
Speed modes: Turbo mode generates images more quickly, but consumes GPU at four times the rate of fast mode. Relax mode is slowest but does not consume any allotted GPU minutes.
Image Enhancement
Once Midjourney has generated an image that a user wants to edit further or use, the Upscale feature duplicates that image with higher resolution. This brings up a few other features for manipulating the image.
Upscale (subtle and creative): This will likely interfere with original intent. To experiment with expressions or shots, it’s better to generate variations on an image or prompt, rather than using these upscale features.
Variations (subtle and strong): Dictates how much Midjourney will alter new generated images based on the original. Great for slight changes to what an image looks like without starting with an entirely new prompt.
Vary by region: Allows one to edit a specific part of an image while keeping everything else the same. Pairs with the Remix setting to allow pinpointed edits without needing photoshop experience. A few iterations of using remix mode combined with varying by region fixed the hands on one of the earlier images in this article.
Pros and Cons
For a comparison between Midjourney V5.2 and V6, and more insights into the model’s photorealistic capabilities and artist imitation, Midlibrary offers an in-depth review of its capabilities.
The Pros
Speed: There is no arguing that AI image generation can save time. Any given prompt will take Midjourney an average of thirty seconds to generate four images. While it might take ten or fifteen iterations to get to a desired image, this is much faster than any human being.
Quality: By and large, Midjourney’s latest model versions have left many AI image clues like mangled fingers behind (or at least with some region fixes). Illustrative images can easily meet the standard of human art, as demonstrated by Midjourney winning first place in digital art at Colorado State Fair in 2022.
Character consistency: Midjourney’s character reference parameter makes it easy to create a single consistent character in images, which will allow Midjourney users to pioneer into long form narratives. This is already happening with forays into comic book generation.
Adaptability: After learning how to navigate Midjourney’s settings and guides, a user has an incredible amount of control over images, including expanding shots, editing specific parts of an image, and rendering variations with precision.
The Cons
Limited basic plan: Midjourney no longer has a demo or free trial available, and there is no slow rendering time available on the basic ($10/month) plan. The cost of Midjourney is steep if users do not generate a high volume of output.
Too much detail: Midjourney, especially in its default settings, automatically adds many details to images, including those unrelated to prompts. These added flourishes may render images unusable in the context of a narrative.
Consistency is limited: The character reference feature is a double-edged sword. While a character will be recognizably similar to a reference image, the tendency is for Midjourney to also replicate a similar composition to the original image, making results more static. Furthermore, while it is an excellent way to maintain one consistent character in an image, it is impossible for it to render multiple consistent characters in a single image. Despite any combination of syntax, reference images (or lack thereof), or organization of the text prompt, Midjourney blends aspects of characters together.
4. Images are public: While you will never have copyright of an isolated image generated using AI, you might desire seclusion for your images for one reason or another. All images made outside of stealth mode may be featured on Midjourney’s website. This would be a deterrent for any user organization intending to utilize Midjourney for commercial purposes.
5. Limited understanding: While there are technically 6,000 characters at your disposal, your text prompt should be as simple as possible. The more dynamic you try to make an image, the more likely it is that there will be obvious errors or misinterpretation of the prompt. The 6,000 characters are primarily for image urls. The model does a far superior job if you use specific images to imitate. Midjourney is a model that rewards using pre-existing works to create an image accurate to what you seek.
6. Midjourney has no memory: Unlike models like Gemini or the partnership between ChatGPT and DALL·E 3, users cannot converse with the model to guide outputs. Midjourney will not remember previously described characters, settings, or any information from previous prompts. This means most prompts need to include the same descriptors throughout to maintain desired results.
7. Intellectual property concerns: In addition to Midjourney’s clandestine model training and terms of service holding users responsible for any copyright law violations, sometimes Midjourney will reference intellectual property without the prompt containing any reference to it. The following image result is shockingly similar to an established character in Jojo’s Bizarre Adventure, Dio Brando.
In the image above, without any reference to Dio Brando or the franchise, the model replicated his image. The fact that this occurred with such a vague prompt could mean users inadvertently borrow other artists’ characters or work.
CONCLUSION
Midjourney is a great tool for those that want to explore AI image generation and have vast control over image outputs and edits, but there is a learning curve to navigate Discord and implement parameters and modes. It is able to generate detailed, high-resolution images and can generate characters and styles consistent with reference images. For those that want a more casual or user-friendly interface – particularly with an AI model that works in a conversational manner – another AI model may be more appropriate. Lastly, Midjourney has not released any concrete information regarding how its models have been trained and often generates results similar to copyright images.
-
“About.” Spellbrush. Accessed May 20, 2024. https://spellbrush.com/about.
Antinozzi, Michael. “Midjourney To Users, ‘You’re on Your Own.’ 😲.” LinkedIn. Accessed May 25, 2024. https://www.linkedin.com/pulse/midjourney-users-youre-your-own-michael-antinozzi-8rqne.
Araki, Hirohiko. Inked panel of Dio Brando from Jojo’s Bizarre Adventure. Accessed May 20, 2024. https://static.wikia.nocookie.net/deathbattle/images/1/13/Portrait.dio.png/revision/latest?cb=20230414064610
Belci, Theo. “Leaked: The Names of More than 16,000 Non-Consenting Artists Allegedly Used to Train Midjourney’s AI.” The Art Newspaper. January 4, 2024. https://www.theartnewspaper.com/2024/01/04/leaked-names-of-16000-artists-used-to-train-midjourney-ai.
“BGSU Research Finds People Struggle to Identify the Difference between AI and Human Art, but Prefer Genuine Human-Made Works.” Bowling Green State University. Accessed February 27, 2024. https://www.bgsu.edu/news/online-media-newsroom/2023/12/bgsu-research-finds-people-struggle-to-identify-the-difference-b.html.
“DALL·E 3 Is Now Available in ChatGPT Plus and Enterprise.” Open AI. Accessed May 27, 2024. https://openai.com/index/dall-e-3-is-now-available-in-chatgpt-plus-and-enterprise/.
———.“DALL·E 3.” OpenAI. Accessed May 28, 2024. https://openai.com/index/dall-e-3/.
“Digital Worlds That Feel Human.” Ultraleap. Accessed May 28, 2024. https://www.ultraleap.com/.
“Discord | Your Place to Talk and Hang Out.” Discord. Accessed May 28, 2024. https://discord.com/.
———. “Image of Midjourney Bot settings.” Screenshot from Discord app. May 9, 2024.
———. “Image of Midjourney Bot vary by region selection for an AI generated image.” Screenshot from Discord app. May 27, 2024.
Eden, Rina. “How I Used Midjourney To Bring My Graphic Novel to Life.” AI Art Creators (blog), March 20, 2023. https://medium.com/ai-art-creators/how-i-used-midjourney-to-bring-my-graphic-novel-to-life-9730b2ba6c65.
Ezquer, Evan. “Creating Full Page Comic Books with AI.” YouTube. Accessed May 27, 2024. https://www.youtube.com/watch?v=YLrlx0sWC0U.
Feng, Ana-Alicia. AMT Lab @ CMU. “U.S. Copyright Office Ruling and Implications on A.I.” AMT Lab @ CMU. November 7, 2023. https://amt-lab.org/blog/2023/10/us-copyright-office-ruling-and-implications-on-ai.
“Gemini - Chat to Supercharge Your Ideas.” Gemini. Accessed May 27, 2024. https://gemini.google.com.
Gunning, Rachael. Image of beloved white dog lazing on couchthat author dogsits. May 11, 2024. Author’s personal collection.
Kashtanova, Kris. “English Version of My Graphic Novel Zarya of the Dawn….” Instagram. Accessed May 20, 2024. https://www.instagram.com/p/Ci1rUY8O3Bu/?hl=en.
Kolirin, Lianne. “Artist Rejects Photo Prize after AI-Generated Image Wins Award.” CNN. April, 18, 2023. https://www.cnn.com/style/article/ai-photo-win-sony-scli-intl/index.html.
Kovalev, Andrei. “Midjourney V6. Part 1 | In-Depth Guide.” Midlibrary. Accessed May 25, 2024. https://midlibrary.io/midguide/midjourney-v6-in-depth-review-part-1-overview.
Midjourney. “Midjourney.” Midjourney, Inc. Accessed May 18, 2024. https://www.midjourney.com/website.
———. “Subscription plans.” Accessed May 20, 2024. Screenshot from https://docs.midjourney.com/docs/plans.
———.“Midjourney Parameter List.” Midjourney, Inc. Accessed May 18, 2024. https://docs.midjourney.com/docs/parameter-list.
———.“Midjourney Stylize Parameter.” Midjourney, Inc. Accessed May 18, 2024. https://docs.midjourney.com/docs/stylize-1.
Midjourney v6.0. Response to “illustration of someone using telekinesis to draw art --ar 7:4.” AI-generated image. Midjourney, Inc. May 28, 2024.
———. Response to “photo of male BIPOC elf teen with short white dreads wearing plain robes. --s 50.” AI-generated image. Midjourney, Inc. May 28, 2024.
———. Response to “Cute duck chibi.” AI-generated image. Midjourney, Inc. May 17, 2024.
———. Response to “Text prompt used.” AI-generated image. Midjourney, Inc. May 26, 2024.
Niji v6.0. Response to “photo of male BIPOC elf teen with short white dreads wearing plain robes. --s 50.” AI-generated image. Midjourney, Inc. May 26, 2024.
———. Response to “a white dog bored on the couch in a chibi art style. --cref https://s.mj.run/bw71t6VFK6g --sref https://s.mj.run/LHJGCCmRbV8 --s 50” AI-generated image. Midjourney, Inc. May 17, 2024.
———. Response to “a lion with a rainbow mane --ar 3:5 --q .25” AI-generated image. Midjourney, Inc. May 27, 2024.
———. Response to “a lion with a rainbow mane --ar 3:5 --q .5” AI-generated image. Midjourney, Inc. May 27, 2024.
———. Response to “a lion with a rainbow mane --ar 3:5.” AI-generated image. Midjourney, Inc. May 27, 2024.
———. Response to “Shot from far below. Man in his 50s looking down through iron bars with laughter. Overweight. Dressed finely and wearing a turban. Has a beard and is smoking his pipe. He is surrounded and clouded by smoke. He is accompanied by two baby wyverns, both approximately two feet in length, one on shoulder, one in lap.” AI-generated image. Midjourney, Inc. May 27, 2024.
———. Response to “Shot from far below. Man in his 50s looking down through iron bars with laughter. Overweight. Dressed finely and wearing a turban. Has a beard and is smoking his pipe. He is surrounded and clouded by smoke. He is accompanied by two baby wyverns, both approximately two feet in length, one on shoulder, one in lap. --style raw.” AI-generated image. Midjourney, Inc. May 27, 2024.
———. Response to “photo of male BIPOC elf teen with short white dreads wearing plain robes. --s 50. Upscaled.” AI-generated image. Midjourney, Inc. May 27, 2024.
———. Response to “Leo is calm now. He uses one hand to inspect the manacle of the other wrist. His eyebrow is raised. --s 50.” AI-generated image. Midjourney, Inc. April 7, 2024.
———. Response to “Close up of tall, thin middle eastern man in rich robes with goatee and cape. He has a cool expression, looking ahead. Behind him is dwarf of approximately 5 feet. The dwarf is stout and looks to be in his mid twenties, but has a full, glorious braided beard and long hair, light in color. The dwarf is in formal wear, a vest and button-down shirt. He dwarf looks anxious, peering backwards. . Simple black and white manga style. --q .25 --sref https://s.mj.run/P55QeZhX5c0 --cref https://s.mj.run/jCkybFvnacU https://s.mj.run/aZBXhYQ9sp8 --s 50.” AI-generated image. Midjourney, Inc. May 19, 2024.
Novak, Matt. “That Viral Image Of Pope Francis Wearing A White Puffer Coat Is Totally Fake.” Forbes. Accessed May 27, 2024. https://www.forbes.com/sites/mattnovak/2023/03/26/that-viral-image-of-pope-francis-wearing-a-white-puffer-coat-is-totally-fake/?sh=141e4e6f1c6c.
Roose, Kevin. “AI-Generated Art Won a Prize. Artists Aren’t Happy.” The New York Times.” Accessed May 26, 2024. https://www.nytimes.com/2022/09/02/technology/ai-artificial-intelligence-artists.html.
“Stability AI.” Stability AI. Accessed May 28, 2024. https://stability.ai.
Yin, Hu, Zipeng Zhang, and Yuanyuan Liu. “The Exploration of Integrating the Midjourney Artificial Intelligence Generated Content Tool into Design Systems to Direct Designers towards Future-Oriented Innovation.” Systems 11: 566. December 4, 2023. https://doi.org/10.3390/systems11120566.