01 / 09
Module 01 · Concept ⚡ AI-fast

Concept Design

Produce a clean set of 2D references that AI can accurately turn into 3D: one finalized front view + a multi-view turnaround (ideally side / back / 3-4 views) + optional part breakouts. These images are the foundation for the next 8 steps — the cleaner they are, the cleaner your 3D character will be.

Time45–90 min
Difficulty⚡ AI-fast
ToolsChatGPT · Nano Banana Pro
DeliverableA-pose multi-view reference set
/ 01 · Why this step makes or breaks you

Downstream image-to-3D is "garbage in, garbage out"

Image-to-3D tools like Meshy and Tripo feed almost entirely on the reference you give them. The cleaner your reference, the cleaner the mesh that comes out.

A "pretty illustration" with dramatic lighting, a dynamic pose, and a busy background becomes a blob of melted geometry in a 3D generator's hands — armpits fused, proportions warped, and the background baked into the mesh too.

So the goal of this step isn't to paint something beautiful, it's to paint something "a machine can read." Every minute you invest here pays back many times over across the next 8 steps.

/ 02 · The 5 hard rules of image-to-3D

Your reference must satisfy all 5 at once

Every operation below exists to make your final reference satisfy these 5 rules. Burn the rules into memory first, then start.

A-pose / T-pose · limbs separated

With arms tight against the torso, the generator can't tell arm from body and the armpits fuse. An A-pose (arms hanging 30–45° out) is usually more stable than a T-pose, and the wrists are less likely to clip into the thighs.

Neutral flat light · no harsh shadows

Dramatic and rim lighting get baked straight into the geometry and textures, and once in 3D you can't remove them. What you want is even, soft studio light.

Solid-color background · no scenery or props

Pure white or pure gray, no ground, no cast shadow, no props. The cleaner the background, the more accurate the cutout and generation — otherwise the background gets modeled right along with the character.

Orthographic view · no perspective compression

Eye level, no strong high or low angles. Perspective makes the generator misjudge proportions — big head, small feet, near-large-far-small, all thrown off.

Multi-view consistency · one and the same character

Front / side / back are the same person: same colors, same gear, same proportions. This is the hardest thing for AI and the core battleground of this section — solved with a consistency model + reference inputs.

/ 03 · Step by step

From a one-line idea to a compliant reference set

1.1
Concept · nail the character down in words first

Don't start pulling the slot machine right away. First use ChatGPT to write the "design" out clearly: a one-line concept, silhouette features, color palette, signature details. The clearer the words, the easier and more consistent the images. Just drop in the structured prompt below:

Concept · ChatGPT
You are a senior character designer for a stylized 3D action game.
Lock a production-ready character concept from my seed idea. Output:
1. One-paragraph concept (who they are, their world)
2. Silhouette features (what makes them readable as a black shape)
3. Color palette: 3-4 hex colors with roles (primary / secondary / accent)
4. 5 distinctive design details (gear, marks, materials)
5. A ready-to-paste FRONT-VIEW image prompt: full-body, A-pose,
   plain white background, neutral studio lighting.

Seed idea: [你的一句话点子,例如 a wandering blue-cat spirit swordsman]
Keep it production-oriented: clean silhouette, separable parts,
no extreme anatomy that is hard to model.
💡Why: once the design exists in words, every later generation can reference the same description — that's your first insurance for multi-view consistency.
1.2
Generate the front-view master · the source of truth for every other view

Use a structured prompt to generate the front view, make a batch at once, and pick the one with the strongest silhouette that best matches the design. The structure is always: subject + style + pose + framing + light + background.

Front-view master · text-to-image
full-body character concept of [角色描述], front orthographic view,
A-pose with arms held ~35° away from the body, legs slightly apart,
character reference sheet, flat even studio lighting, soft shadows only,
plain solid white background, clean readable design, [STYLE],
symmetrical, entire figure visible head to feet, no cropping,
high detail, sharp focus

avoid: dramatic lighting, rim light, busy background, props, ground shadow,
action pose, foreshortening, extreme perspective, multiple characters,
text, watermark, cropped limbs
🔬Silhouette test: turn your chosen image into a pure black silhouette — can you still tell who it is at a glance? If not → the design isn't strong enough, reroll. Good-looking ≠ readable.
1.3
Finalize and clean up · tidy the master

Once you've picked one, use image editing to clean it up, then lock it — no more design changes. This is your master.

Clean-up finalize · image-to-image edit
Using this image, keep the EXACT same character design, outfit and colors.
Clean it up only:
- remove the background to pure white
- complete any cropped limbs so the full body (head to feet) is visible
- flatten dramatic lighting into even neutral studio light
- correct obvious proportion issues
A-pose, front orthographic view. Do not redesign anything.
1.4
Turnaround / multi-view · derive from the master with a consistency model

The core trick: feed the master as a reference image to Nano Banana Pro (Gemini 3 Pro Image) — its strength is exactly "locking character identity and staying consistent across images," and it takes up to 14 reference images. Have it change only the viewing angle, never the design.

  • Single-image turnaround (simple characters): front / side / back in one image, aligned to the same height.
  • One-by-one derivation (complex characters): generate each view individually, feeding the master + already-finalized views in as references each time for steadier consistency.
Side / back derivation · Nano Banana Pro
Use the provided reference image as the single source of truth for this
character. Generate the SAME character in a [side profile / back]
orthographic view. Keep identical outfit, colors, proportions, gear and
design details - this must read as the EXACT same character, just rotated.
A-pose, flat neutral studio lighting, plain white background,
character turnaround sheet style. Do not change or add any design element.
Three views at once · turnaround
Create a character turnaround / model sheet of THIS exact character showing
front, side and back views in a row. All A-pose, identical design and colors,
flat neutral lighting, plain white background, evenly aligned at the same
height and scale.
💡Turn on Thinking mode: Nano Banana Pro's reasoning mode drafts first, then finalizes, and consistency on complex characters is noticeably steadier. Still drifting? Add the views you've already generated into the reference inputs to give it more "anchors."
1.5
Part breakout · split out separately generatable parts only needed for complex characters

Things like weapons, helmets, capes, and large accessories: isolate them on a pure white background, orthographic. Later you generate each in higher quality as separate 3D and assemble — far cleaner than brute-forcing one whole complex character. Split whatever you can.

Part extraction · image-to-image
Isolate the [weapon / helmet / cape] from this character.
Show it ALONE on a plain white background, orthographic front and side
views, clean even studio lighting, no character, no background props,
product-shot style, high detail.
1.6
Package the deliverable · name, align, archive
  • Unify resolution and framing so the character is the same size in every frame, making view-by-view comparison easy later.
  • Naming convention: char_front / char_side / char_back / char_3q / part_weapon…
  • Create an 01_concept folder and archive the whole set. This is the input package for Module 02.
/ 04 · Style branches

Same rules, three style tweaks

🛡️ Realistic / semi-realistic

Rules unchanged, just add style words to the prompt. The friendliest category for 3D generation.

semi-realistic game character, PBR-friendly, neutral expression, balanced anatomy

✨ Anime / cartoon

Watch out: big eyes and exaggerated proportions make 3D generation harder — keep the silhouette more restrained.

anime style, cel-shaded, clean lineart, flat colors, readable silhouette

🐱 Toon / chibi

Proportions can be exaggerated, but the limbs must stay separated and the silhouette clear, or the rounded blob will fuse together.

chibi / stylized cartoon, exaggerated proportions, separated limbs, bold shapes
/ 05 · Common wrecks

80% of beginners fall into these

Using dramatic / backlight → shadows baked to death, unfixable in the 3D stage.
Dynamic pose (running, swinging a sword) → the generator can't read standard proportions, limbs fuse.
Too much perspective → warped proportions, big head, small feet.
Multi-views each drawn their own way → front/side/back aren't the same person, won't fit together later.
Props / ground in the background → the generator builds the background into the mesh too.
Brute-forcing an ultra-complex character in one go → details mush together; break out parts when you should.
Stopping once you roll something pretty → no silhouette test done, good-looking ≠ readable.
/ 06 · This section's deliverables

Only these ticked means you've passed

1 finalized front viewA-pose / neutral light / pure white bg / orthographic
At least 1 back viewsame character, consistent colors / gear / proportions
Ideal: + side view + 3/4 viewthe more complete the views, the more accurate Module 02's generation
Complex characters: a few part breakoutsweapon / helmet / cape, pure white bg, orthographic
Everything named and archived into 01_conceptthis is the input package for the next step
/ 07 · Self-check before the next step

🐾 Three questions — pass all to proceed

  1. Turn any image into a black silhouette — can you still recognize the character?
  2. Put front / side / back together — are they the same person (colors / gear / proportions)?
  3. Any harsh shadows, perspective, or busy background not yet cleared?

▸ All three pass → on to Module 02: 3D model generation.