It kinda understands context.
An image generator makes an image of static similar to like a TV does with bad signal. The Ai looks atthe static and sees shapes in it. The prompt influences what it’s trying to “see”. It starts filling in the static to a full image, it does this in steps, more steps generally means a better quality image.
Also to say a LLM is a Large Language Model and is different from an image generator, though the proccess for them is very similar.
Why I throw my phone browser into desktop mode. Can be annoying but it works most the time