Disclosure Important reader notice
Important reader notice
This article is for general informational and educational purposes only. It is not legal, financial, tax, medical, security, compliance, or other professional advice, and you should not rely on it as a substitute for advice from a qualified professional who understands your specific situation.
AI tools, pricing, features, policies, laws, and platform terms can change quickly. We work to keep content accurate, but we do not guarantee that every detail is current, complete, or suitable for your use case. Always verify important claims with the original source before making business, legal, financial, safety, or purchasing decisions.
Some links may be affiliate, partner, or sponsored links. If you buy through them, AIUnpacking may earn compensation at no extra cost to you. Sponsored relationships are disclosed where applicable, and compensation does not override our editorial judgment.
AI Image Prompts: Midjourney, GPT Image, and Stable Diffusion Guide
Image prompting is not the same as chatbot prompting. You are not asking for an explanation; you are giving a visual brief. Good prompts describe the subject, setting, composition, light, style, camera or medium, and constraints. Great prompts also match the tool — and in 2026, those tools have grown up.
Each AI image model reads your prompt through a different lens. Midjourney V8.1 wants detail and parameters. GPT Image 2 reads full creative briefs like a director. Stable Diffusion 3.5 and FLUX.2 Pro need structure but reward specificity. Here is what actually works right now.
The Prompt Formula That Works for Every Tool
I have tested this structure across Midjourney, GPT Image, Stable Diffusion, FLUX, and others. It holds up:
subject + action/pose + setting + composition + lighting + style/medium + technical notes + exclusions
You do not need all eight slots every time. But when you are stuck getting mediocre results, check which slot you left empty. Nine times out of ten, fixing the lighting or composition description fixes the image.
Here is a real example that works across platforms:
A ceramic coffee mug on a walnut desk, morning light through a window, close-up product photograph, shallow depth of field, soft shadows, neutral background, no text or logo
Why this works: it names the subject first, places it in a scene, specifies the exact type of light (morning window light hits different than studio light), chooses a composition (close-up), describes depth and shadow, locks the background, and excludes the most common image generation garbage: text and logos.
Platform Syntax Comparison
Every tool has its own language. Here is what matters as of May 2026:
| Element | Midjourney V8.1 | GPT Image 2 | Stable Diffusion / FLUX |
|---|---|---|---|
| Main prompt | Detailed visual phrase with parameters | Natural language creative brief | Comma-separated terms with optional weights |
| Aspect ratio | --ar 16:9 | Specify in prompt or use size param | Width/height settings or canvas resize |
| Negative prompts | --no text, watermark | Describe what to exclude naturally | Negative prompt field or --no for FLUX |
| Style control | --sref, moodboards, Style Creator, --raw | Describe the style in language | Checkpoints, LoRAs, prompt weights |
| Repeatability | Seed, --sref codes, personalization profile | Seed parameter, prompt versioning | Seed, checkpoint, sampler, CFG scale |
| Editing | Vary, region tools, Grid Mode | Multi-image edit, inpainting, style transfer | Inpaint, img2img, ControlNet, IP-Adapter |
Midjourney Prompting in 2026
Midjourney is on V8.1 as of April 30, 2026, with native 2K HD mode as default and 5x faster generation than V7. Text rendering works best when you wrap text in quotes. The biggest shift: V7 rewarded brevity; V8.1 rewards detailed, literal visual descriptions over cryptic keywords.
Basic structure for Midjourney V8.1:
editorial portrait of a robotics engineer in a clean lab, soft window light, 85mm lens, realistic detail, cream walls, stainless steel workbenches --ar 4:5 --style raw --v 8.1
The key parameters you need to know:
--arfor aspect ratio. V8.1 supports multiple aspect ratios.--style rawkills the default artistic filter. Use this for photorealism.--vfor model version. Currently--v 8.1or--v 7.--nofor exclusions. Put the things you do not want here.--hdfor native 2K output. Now default in V8.1.--q 4for maximum coherence on complex scenes (costs extra GPU time).--chaos(0-100) to control variation between generations. Low = consistent. High = surprising.--weirdadds unconventional creative twists. Fun for exploration, risky for client work.--stylize(0-1000). Crank it up when using personalization profiles.
The Three Style Tools
Style Creator builds a reusable --sref code by having you pick preferred images from a grid. It stabilizes after 5-10 rounds of selection. SREF pulls visual DNA (color, texture, lighting) from a reference image URL without copying content. Moodboards blend multiple reference images into a single style — ideal for brand work.
For photorealism: switch to --raw and run --stylize low (100-200). For artistic work: build a personalization profile and push --stylize to 800-1000.
Midjourney Tips That Actually Matter
Keep the most important subject early in the prompt. Midjourney weights early words more heavily. Avoid stuffing ten conflicting style references into one prompt - pick one or two strong directions. The Grid Mode in the alpha interface is excellent for generating many thumbnails quickly, then upscaling only the ones worth keeping. And if V8.1’s default aesthetic feels too expressive for your work, --raw is your friend.
GPT Image 2 Prompting
OpenAI’s GPT Image 2 is the flagship image model as of April 2026. Where Midjourney wants parameter-driven precision, GPT Image 2 wants a full creative brief — format, audience, text placement, vibe, and constraints. It supports any resolution under 8.3 megapixels (max edge 3840px). The three quality tiers (low, medium, high) let you trade speed for fidelity; low is surprisingly good for most social media work.
Create a clean square social media graphic for a productivity app. Show a tidy desk with a laptop, a paper planner, and a small plant. Use bright natural lighting, modern minimal composition, and leave empty space at the top for a headline. Do not include any readable text.
Where GPT Image 2 Excels
Text-in-image is its strongest suit. Put literal text in quotes and GPT Image 2 renders it consistently. It handles multi-image editing (up to five inputs composited intelligently), infographics and structured diagrams, UI mockups that look like shipped software, and scientific visuals with accurate layouts. For text-heavy output, set quality to high.
GPT Image 2 Prompting Rules
Write prompts in a consistent order: scene first, then subject, key details, constraints. For photorealism, include the word “photorealistic” directly. Describe people with scale, body framing, and gaze direction. For edits, use “change only X, keep everything else the same” and repeat your preserve list each iteration to reduce drift.
Stable Diffusion and FLUX Prompting
The Stable Diffusion ecosystem in 2026 has two dominant branches: SD 3.5 Large and the FLUX family (FLUX.2 Pro). FLUX reads natural sentences better than keyword soup — prompting it feels closer to GPT Image than old SD. Recommended framework: Subject + Action + Style + Context, 30-80 words.
SD 3.5 Large still uses classic weighted-prompt syntax:
(professional product photo:1.3), ceramic coffee mug, walnut desk, morning window light, soft shadows, shallow depth of field, neutral background, realistic, high detail
Negative prompt for SD 3.5:
text, watermark, logo, blurry, distorted, extra objects, low quality, bad anatomy, bad hands, cropped
FLUX models do not use traditional negative prompts the same way SD does. Instead, state what you do not want in natural language within the prompt itself, or use the --no flag in supporting interfaces.
The Toolbox
For SD: Checkpoint (pick first, everything depends on it), LoRA (fine-tuned adapters for style/characters), Seed (reproducibility), Sampler/steps, CFG scale (4-7 is the sweet spot for SD 3.5), ControlNet (pose, depth, edges), IP-Adapter (image-based style and subject transfer). For FLUX.2 Pro, the tooling is simpler — the prompt does the heavy lifting. FLUX.2 interprets photographic terminology (lens, depth of field, color grade) with surprising accuracy.
Composition Words That Work
These terms work across all platforms. They are camera and art direction vocabulary, not AI magic words:
- Centered composition
- Rule of thirds
- Wide establishing shot
- Close-up macro
- Over-the-shoulder view
- Low-angle heroic shot
- Top-down flat lay
- Negative space on the left (or right)
- Symmetrical layout
- Leading lines toward the subject
- Dutch angle
- Birds-eye view
Lighting Words That Work
Lighting changes image quality more than style words ever will. Use these. Test them. See what they do to the same subject:
- Soft window light (north-facing window)
- Golden hour (warm, long shadows, directional)
- Blue hour (cool, soft, pre-dawn or post-sunset)
- Overcast daylight (flat, diffused, even exposure)
- Studio softbox (controlled, flattering, product-ready)
- Rim light (edge glow, subject separation from background)
- Backlit silhouette (subject becomes a dark shape against bright light)
- Dramatic side lighting (strong contrast, texture accentuation)
- Volumetric light (visible light beams through atmosphere, haze, or dust)
- High-key product lighting (bright, minimal shadows, white background)
- Low-key lighting (dark, moody, selective illumination)
- Rembrandt lighting (triangle of light on the shadowed cheek)
Style Words That Work
Use medium and production context rather than artist names. AI companies have tightened restrictions on living-artist references. Work with movements, eras, and media:
- Editorial photography
- Product photography
- Children’s book illustration
- Technical diagram
- Watercolor illustration
- Ink drawing
- 3D render (Octane, Cycles, Unreal Engine)
- Vector poster
- Minimal UI mockup
- Cinematic still (anamorphic, 35mm, digital cinema)
- Architectural visualization
- Vintage film photography
- Isometric illustration
- Pixel art
Negative Prompting
Negative prompting is your cleanup crew. It does not add creativity; it removes failure modes. Here is what I use:
| Problem | Exclusion |
|---|---|
| Unwanted text | no text, no letters, no watermark, no signature |
| Messy hands and anatomy | no extra fingers, no distorted hands, bad anatomy, bad hands |
| Brand contamination | no brand logos, no trademarks, no recognizable brands |
| Wrong mood | no dark lighting, no dramatic shadows, no horror elements |
| Clutter | minimal background, no extra objects, clean composition |
| Quality issues | no blur, no low quality, no jpeg artifacts, no distortion |
| Unwanted style drift | no cartoon, no anime, no 3D render (when shooting for photorealism) |
For GPT Image 2, phrase exclusions naturally: “Do not include any readable text or logos.” For Midjourney, use --no text, watermark, logo. For Stable Diffusion, fill the negative prompt field. For FLUX, embed exclusions in your main prompt as natural language constraints until the native negative prompt tools mature.
One warning: over-negative-prompting backfires. If you add thirty negative terms, you constrain the model’s creative range and can actually introduce new artifacts. Start with 5-8 targeted negatives and add more only when specific problems repeatedly appear.
Workflow for Better Results
Here is the loop I use, and it saves an embarrassing amount of time:
- Start broad. Generate 4-8 variations with a focused but not over-specified prompt. See what the model gives you.
- Pick composition first. Before you worry about lighting or style, find the composition that works. Everything else can be adjusted.
- Refine one variable at a time. Change lighting. Generate. Change the background. Generate. Change the style. If you change three things at once, you have no idea what helped or hurt.
- Use references when consistency matters. SREF codes, moodboards, reference images, LoRAs — whatever your tool supports. Text alone is not enough for brand-level consistency.
- Edit locally instead of regenerating everything. Inpainting, regional variation, and multi-image editing let you fix the one broken thing without gambling on a full regeneration.
- Log everything. Save the prompt, seed, model version, and parameters alongside the output. You will thank yourself later.
FAQ
How long should an image prompt be?
Midjourney V8.1: 20-60 words of concrete visual detail. GPT Image 2: 50-150 words for complex creative briefs, shorter for simple scenes. Stable Diffusion / FLUX: 30-80 words. The rule is not length; it is density. Every word should earn its place.
Why does my image contain garbled text?
Older models still struggle. For text-heavy outputs, use GPT Image 2 with the text in quotes and specify typography details. Midjourney V8.1 also handles quoted text well. Always proofread the output.
How do I make a consistent character across multiple images?
Use reference images (Midjourney --sref, GPT Image 2 multi-image edit, SD IP-Adapter). Lock seeds where available. Use the same model and version. For Stable Diffusion, train a character LoRA for the most reliable results across many poses and scenes.
Do I need photography knowledge to write good prompts?
You need basic visual literacy, not a photography degree. Know the difference between soft light and hard light. Understand close-up versus wide shot. Recognize when composition feels right. The terms are easy to learn; the hard part is noticing when your generated image is missing them.
Verified Sources
- Midjourney V8 Alpha announcement, March 17, 2026: https://updates.midjourney.com/v8-alpha/
- Midjourney V8.1 Alpha announcement, April 14, 2026: https://updates.midjourney.com/v8-1-alpha/
- Midjourney version documentation, accessed May 2026: https://docs.midjourney.com/hc/en-us/articles/32199405667853-Version
- OpenAI GPT Image Generation Models Prompting Guide, April 21, 2026: https://developers.openai.com/cookbook/examples/multimodal/image-gen-models-prompting-guide
- OpenAI image generation guide, accessed May 2026: https://platform.openai.com/docs/guides/image-generation
- Stability AI, Stable Diffusion 3.5, accessed May 2026: https://stability.ai/news/introducing-stable-diffusion-3-5
- Civitai Prompt-Crafting Guide, March 2025: https://education.civitai.com/civitais-prompt-crafting-guide-part-1-basics/
- FLUX.2 Prompting Guide, Black Forest Labs, accessed May 2026: https://docs.bfl.ml/guides/prompting_guide_flux2
- AI Image Prompting: The Complete 2026 Guide, SurePrompts, April 21, 2026: https://sureprompts.com/blog/ai-image-prompting-complete-guide-2026
- Midjourney V8.1 Review, Fello AI, May 5, 2026: https://felloai.com/midjourney-v8-1-review/