You have probably looked at an AI-generated brand visual and felt something slightly off about it, even if you couldn’t name what. The image is sharp. The colours are on-brand. But it still looks like a composition rather than a scene. Something about it reads as assembled rather than real.

This is one of the most common complaints brand teams have about AI creative work, and it is usually blamed on the tools. The tools are not the problem.

What actually separates a convincing visual from a flat one

Take the floating UI trend as a concrete example. A UI element, a phone screen, a dashboard card, a product interface, floats in front of a background image. Done poorly, it looks exactly like what it is: a PNG dropped onto a photograph. Done well, it looks like the UI is physically present in the scene, casting a faint reflection, catching the light from wherever the scene’s light source sits.

The difference between those two outcomes comes down to two decisions that have to be made before a single prompt is written:

Where is the light source in the scene, and how does it interact with the UI element?
Is the UI being treated as a glass or semi-transparent object with real depth, or as an opaque flat layer?

Treating the UI like glass rather than a flat overlay is a visual judgement call, not a technical one. Any image generation tool can execute it. Very few people working with those tools think to ask the question.

Why most AI visual work skips this step

The workflow for producing AI visuals has become fast enough that there is very little pressure to slow down and make deliberate decisions before generating. You can produce fifty images in an afternoon. The incentive structure points toward volume.

The problem is that fifty images generated without a defined light source and depth logic are fifty versions of the same mistake. They may be different images, but they share the same tell: elements that look placed rather than present.

This is a systems problem more than a skill problem. When a creative director works on a campaign, they establish a set of rules at the start: where light comes from, how surfaces behave, what the depth grammar of the visual language is. Those rules then apply to every asset in the campaign. The output feels coherent because it was built from consistent decisions, not because each image was individually perfected.

Most one-off AI creative work skips that stage entirely.

What a more considered approach actually looks like in practice

For a single hero visual, the process looks like this: start with a base image where the foreground subject and background are clearly separated. Define the light source in the scene before writing any prompt. When introducing a UI element or any floating object, describe its material properties explicitly, whether it has transparency, how it reflects ambient light, where its edges catch or soften. Then let the animation or compositing work from those same rules.

The output is an image where the eye doesn’t catch the seam. Where the UI looks like it belongs in the scene rather than arrived after the fact.

For a brand running ongoing content, the same principle applies at a larger scale. A visual system with defined depth logic, lighting grammar, and material rules produces thirty campaign assets that look like they came from the same creative mind. That is the difference between a library of generated images and an actual visual identity.

The tools to do this are accessible. The judgement to know which decisions to make before reaching for them is rarer, and it is where the real work is.

What actually separates a convincing visual from a flat one

Why most AI visual work skips this step

What a more considered approach actually looks like in practice

Why a sticker might be your brand's sharpest tool

Your phone footage can have a cinema lens look