Auto-generated images using latent diffusion.

About a year ago – during a meeting on the topic of AI, I was challenged by a number of senior members of staff about the security of my job. The basic theme was that with AI, no one would need a visual designer anymore. There was an almost celebratory undertone to the chat. Or perhaps I was being over-sensitive.

I am a sceptic of AI, I think it is hype. Nevertheless, I needed to take a long hard look at the emerging technologies and in particular those affecting the visual arts so that I could better understand what sort of an enemy I was dealing with.

The process felt cumbersome at first. With the major players appearing to be a suite of platforms such as Midjourney that I was not familiar with who all seemed to be a front onto a larger corporate enterprise. Add to this a parallel discussion regarding copyright infringement and I soon wondered what variant of dystopia I had wandered into!

It was also a tedious experience. Prompt engineering – as one of those senior members of staff had labelled it – was a blunt tool when it comes to specifying what I want to make. It appears you start simple, and develop your work by adding parameters to your prompt in a curious haiku-esque poetic ritual. However the output from these major platforms lacked many of the attributes that I consider pre-requisite for professional artwork. Further, I could not claim the work to be mine due to the somewhat ambiguous data ingestion processes that leave doubt as to who is the owner of the work !

I set myself a challenge. Can I automate the creation of an image that complies to the brand requirements of my employer. I settled on using Automatic1111 from Stable Diffusion, installed the necessary Python libraries and watched hours of YouTube videos by people who had already trodden this path. Automatic1111 even worked offline – which I found surprising, and this is when I started to re-evaulate my thoughts on the technology.

My brief was to enable a visitor to one of the platforms I have designed to be able to auto-generate a project image. The image would be on-brand and generated using a prompt comprised of a variable – in this instance: “Researchers, how important is public outreach? What is the motivation for researchers to engage in public outreach?” and some common denominators derived from boilerplate segments. The problem I was trying to solve was that some users of this platform, being short on time, neglect to choose a good hero image for their project. What if AI could help?

Example of poor quality image. — Example of a poor choice for a hero image for the project: **Researchers, how important is public outreach?** What is the motivation for researchers to engage in public outreach?

Early results were terrible. However, by making single word changes to the boilerplate prompt segments, then learning about and integrating emphasis techniques to my prompts, I was able to coerce better outcomes.

Early example of a generated image. — This early output mage demonstrates bad attributes of a generated image.

I had learned about the importance of models at some point during my learning. I began to realise that in order to fully control my outputs I would need to delve deeper. Models – or Checkpoints provide your prompt with visual clues as to what you want. Of course there is a far more technical definition but from a creative perspective they offer you the ability to train (teach?) your local install of Automatic1111.

This was when I began to get hooked. I had to learn a new Python install: Kohya_SS. I followed tutorials and learned how to create captions, run Dreambooth and compile a Safetensor file. I downloaded all my employers brand images and compiled my own Safetensor.

s benefitted from a safetensor input. — Same image with input from a safetensor compiled using on-brand imagery

Instantly I could see a difference in the images I created. Colours were brighter, compositions were better formed and the quality improved dramatically. However the images looked like they were from the USA, not the UK. Very difficult to articulate but in short there was an exotic quality that was unwanted for my challenge so the images still failed the test.

Another frustration was that the images I was outputting did not fulfil the criteria of being inclusive to all – which is vitally important. So I investigated making a LoRA file. LoRA (Low-Rank Adaptation) is a fine-tuning technique. They are much smaller files that contain very specific visual information and captions.

The application of LoRA used in conjunction with the much larger safetensor file finally enabled me to create an on-brand image based on variable and common prompted segments and composed of individuals from a variety of age, gender and ethnic sources using material I had ingested from copyright-safe libraries.

An image generated using a LoRA that was compiled with pictures of graduation ceremonys. — A more successful output. Still not sure if the hands are correct. Hey-ho!

I have not yet verified my claims. I may be wrong and I may have accidentally infringed on someones copyright. If this is so, apologies. As of this time of writing I am embarking on a validation process. Nevertheless, what I have learned is that creating images using latent diffusion can be as exciting as when I first encountered Adobe tools 30-odd years ago.

Latent diffusion is a different method entirely from the more familiar digital art-working methods but with the help of a design approach this can be a very rewarding form of artistic engagement, and despite what some senior members of staff think, I suspect creative people will discover more, not less ways to work and enjoy their practice.

Related Posts