LabNotes
Feb 27, 2026 7 min read Experiments

Character consistency across 34 art styles: what breaks and what holds

Keeping the same child recognizable across styles that range from soft watercolor to Studio Ghibli to pixel art required prompt constraints we didn't expect to need — and revealed a hierarchy of which visual traits survive style transfer.

StoryBook Studio lets parents choose from 34 art styles for their child's book. Each style is genuinely different — not a filter on a base image, but a full generation in that visual language. The product promise is that the same character appears consistently across every page. What we discovered during development is that consistency is not a property of the character description. It is a property of the reference architecture around it.

trait photoreal watercolor pixel_art isometric hair color 94% 89% 72% 68% face shape 87% 76% 54% 51% eye color 91% 82% 61% 58% skin tone 96% 91% 78% 74% clothing pattern 71% 54% 31% 28% body proportions 63% 48% 22% 19% distinctive marks 82% 67% 44% 41% metric: % retained across 50 style transfers · same character seed
Visual 1. Character trait retention rates by style category. Data: synthetic — based on internal generation testing across 34 styles.

Why proportions break first

Broad physical traits — hair color, skin tone — transfer reliably because they are high-level semantic signals that most style conditioning respects. Proportions are the first casualty. A character described as a small child with a round face generates as a lanky pre-teen in some styles because the style's internal "default child" overrides the description's proportional signals.

The fix is not more description. More words about height and face shape don't help when the style model has strong prior expectations. The fix is a structured reference sheet: a front view, back view, and face close-up generated before any page work begins. The model anchors to the image, not the text.

character_pipeline: step_1: text_description → concept_image step_2: concept_image → reference_sheet (front_view + back_view + face_closeup) step_3: reference_sheet + style + scene_prompt → page_image without step_2: proportion_drift: high face_stability: medium with step_2: proportion_drift: low face_stability: high
Visual 2. Reference sheet architecture. Skipping the intermediate step degrades consistency by roughly 40% on face stability metrics.

Styles that resist consistency hardest

Not all styles are equal. Photorealistic and semi-realistic styles hold character traits reliably — the model has strong feature-level representations to anchor to. Abstract and heavily stylized modes (pixel art, isometric, flat vector) impose the most aggressive visual transforms and lose the most character-specific detail in the process.

Studio Ghibli-adjacent styles sit in the middle. They preserve face shape and color but frequently alter proportions toward Ghibli's characteristic elongated anatomy. We added a proportional anchor prompt specifically for Ghibli variants.

Visual 3. Composite character retention score (face + proportion + color) across style categories, ordered from realistic (left) to highly abstract (right). Data: synthetic — based on 50 generations per style category.

What we ship as a result

Every character in StoryBook Studio now goes through a mandatory reference sheet generation step before page creation begins. The sheet is shown to the parent during the character builder flow — it is the confirmation moment before the book is assembled. Parents see the front, back, and face of their child's character and approve it. This also functions as a natural point to catch generations that missed the mark before they propagate through a full book.

  • Reference sheets reduced page-level face inconsistency by 38% in internal testing.
  • Parents who saw the reference sheet before proceeding had significantly lower book abandonment rates.
  • The brainstorm-from-scratch path and the upload-a-photo path both feed into the same reference sheet system.

The underlying lesson: character consistency in multi-page generative content is an architecture problem, not a prompting problem. You cannot describe your way to stability. You need an intermediate representation that all downstream generations can anchor to.