Visual intelligence: the how and why of imagemaking with AI

Insights

–

Artificial intelligence has proved an unlikely success when it comes to creating art. In this final part of his series on AI, TEKenable’s Mohammad Zeeshan Khan looks at the key generative image making tools, how they work, and what the legal implications of their use might be.

Welcome back to our exploration of generative AI. In the previous article I discussed prompt engineering fine-tuning and securing Ais. In this final part of my series on gen AI, I will delve deeper into the world of Image and Video Generation, discussing how technologies like DALL-E, MidJourney, and Stable Diffusion are transforming various business domains and impacting creative professions.

Unveiling the magic: the science behind image and video generation

Delving into the captivating world of generative AI, one soon stumbles upon the intriguing subfield of image and video generation. This dynamic domain is dedicated to the creation and transformation of images or videos, conjuring them from the void or reshaping the existing ones.

These innovative models are trained on a treasure trove of visual data, enabling them to decipher intricate patterns and structures inherent in the data. Armed with this profound understanding, they are capable of crafting new, lifelike, and superior quality visual content. This remarkable ability to generate and transform visual content is not just fascinating, but it also opens up a plethora of possibilities in the realm of artificial intelligence.

Let’s delve deeper into the training process of these models:

Generative models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), are the backbone of image and video generation. The training process of these models involves several key steps:

Data Collection: The first step is to gather a large dataset of images or videos. The quality and diversity of this dataset significantly influence the model’s ability to generate realistic visual content.

Preprocessing: The collected data is then preprocessed. This may involve resizing images, normalising pixel values, or converting videos into a series of frames.

Model Architecture Selection: Depending on the specific task, an appropriate model architecture is chosen. GANs, for instance, consist of two parts: a generator that creates new images, and a discriminator that distinguishes between real and generated images.

Training: The model is trained using a large amount of computational resources. During training, the generator and discriminator of a GAN compete against each other, with the generator striving to create images that the discriminator cannot distinguish from real images. Over time, the generator becomes increasingly adept at producing realistic images.

Evaluation and Fine-tuning: The model’s performance is evaluated using various metrics, and the model is fine-tuned based on these evaluations. This step may be repeated several times until the desired level of performance is achieved.

Deployment: Once the model is satisfactorily trained, it can be deployed to generate new images or videos.

It’s important to note that the training process requires a deep understanding of machine learning principles, substantial computational resources, and careful tuning and evaluation. However, the end result – a model capable of generating high-quality, realistic visual content – is well worth the effort.

I hope this gives a clearer picture of the process. Now let’s look at the most popular models today used for generating images and videos.

DALL-E: the cutting edge in image generation

DALL-E, a groundbreaking model in the sphere of image generation, is making waves with its unparalleled capabilities. This state-of-the-art model is celebrated for its proficiency in generating high-resolution images that bear an uncanny resemblance to real photographs.

A striking example of DALL-E’s prowess is its potential application in the fashion industry. Imagine a retailer who wishes to showcase their products in various settings without the logistical challenges of organising multiple photoshoots. With DALL-E, they can generate images of models donning their products in a myriad of settings – from a sunny beach to a bustling cityscape, the possibilities are endless.

But DALL-E’s capabilities extend far beyond simple object generation. It can construct complex scenes, complete with backgrounds and multiple interacting objects.

For instance, it can generate an image of a two-story pink house shaped like a shoe, situated in a lush green field under a clear blue sky – a scene that doesn’t exist in reality but can be brought to life by DALL-E.

Moreover, DALL-E can generate fantastical images that defy the laws of physics and biology, such as a floating double-decker bus with wings or a walking pineapple. These whimsical creations can be invaluable for industries like advertising, where capturing the audience’s attention is paramount.

In essence, DALL-E is not just a tool for content creation, but a versatile artist capable of bringing the most imaginative ideas to life. Its ability to generate a wide array of images, from the realistic to the fantastical, marks a significant advancement in the field of image generation.

MidJourney: pioneering the future of video generation

MidJourney is not just another name in the field of video generation. It is a pioneer, pushing the boundaries of what is possible in the realm of digital animation. By extending the principles of image generation to the temporal domain, MidJourney is able to create smooth and realistic animations that are virtually indistinguishable from real-life footage.

The true beauty of MidJourney lies in its versatility. It can generate videos of varying lengths, from short clips for social media posts to full-length movies. This makes it suitable for a wide range of applications, from advertising and marketing to entertainment and education.

Imagine a film production company looking to create unique animations for their movies. With MidJourney, they can generate videos of landscapes that are so realistic, you can almost feel the breeze. They can create scenes of objects in motion, capturing the intricate details of how light interacts with different surfaces. And it’s not just simple scenes – MidJourney can handle complex scenarios with multiple elements interacting with each other. A bustling city street, a serene forest with wildlife, a dramatic space battle – the possibilities are endless.

Let’s delve into some examples. Suppose the film production company is working on a sci-fi movie. They need a scene depicting an alien planet with strange flora and fauna. Instead of spending countless hours manually creating each element, they can simply feed their requirements into MidJourney. The model will then generate a video that perfectly captures their vision, complete with alien creatures moving in their natural habitat and exotic plants swaying in the alien wind.

In another scenario, a marketing agency might need a video showcasing a new product. With MidJourney, they can generate a video that not only highlights the product’s features but also tells a story. The product could be shown in various scenarios, being used by different people, all in a seamless and engaging narrative.

MidJourney is revolutionising the way we create and consume video content. It’s not just a tool, it’s a game-changer, opening up a world of possibilities that were previously unimaginable.

Stable Diffusion: revolutionising image and video generation with quality and diversity

In the ever-evolving world of artificial intelligence, Stable Diffusion technology stands out as a game-changer. This recent development has taken Image and Video Generation to unprecedented heights, opening up a world of possibilities that were previously unimaginable.

Stable Diffusion is not just another AI technology. As its name suggests, it is a powerful tool that allows for the stable and efficient fusion of multiple AI models. This unique capability enhances the quality and diversity of the generated content, resulting in images and videos that are not only visually stunning but also rich in detail and variety.

Let’s delve into some examples to better understand the potential of Stable Diffusion. Consider a real estate company looking to showcase their properties in the best possible light. With traditional methods, they would have to rely on architectural design models and separate interior design models. The process of combining these models to create a realistic image of a fully furnished property can be time-consuming and complex.

*Example Prompt “An architectural model of a house with detailed interiors and exterior landscaping a 5 bedroom villa in tropics”*

This is where Stable Diffusion comes into play. The technology can seamlessly fuse architectural design models with interior design models, generating images of fully furnished properties in a fraction of the time. The result is a realistic and detailed representation of the property, complete with furniture, decor, and even lighting.

But the benefits of Stable Diffusion go beyond just time and effort. The technology provides potential buyers with a more realistic view of the property, enhancing their decision-making process. Instead of having to imagine what the property would look like when furnished, buyers can see it for themselves. This can lead to quicker decisions, higher satisfaction levels, and ultimately, more successful sales for the real estate company.

In another potential scenario, a film production company could use Stable Diffusion to fuse models of characters, environments, and special effects. This would allow them to generate complex scenes with multiple elements interacting with each other, creating a more immersive and engaging viewing experience for the audience.

*Cinematic film still of a futuristic cit*y

Stable Diffusion is revolutionising the way we create and consume visual content. By enhancing the quality and diversity of images and videos, it is paving the way for a future where the line between the virtual and the real is blurred.

Azure hosting: ensuring safety and security

With Azure’s new hosting capabilities, businesses can now leverage these advanced generative AI models with the same level of AI safety and security as proprietary models. Azure provides robust infrastructure and stringent security protocols, ensuring that your AI operations are secure, reliable, and compliant with regulatory standards.

Azure’s hosting capabilities extend to open-source models like Stable Diffusion, in addition to Azure OpenAI models like DALL-E, allowing businesses to leverage cutting-edge AI technology while maintaining high standards of safety and security. Azure’s infrastructure is designed to handle the computational demands of these models, ensuring smooth and efficient operation. Furthermore, Azure’s security protocols protect your data and AI operations from potential threats, providing peace of mind as you explore the possibilities of generative AI.

Understanding the implications of deploying AI tools

Exciting as theses possibilities are, it is important to consider the potential implications of their use in commercial projects, not just in the social or economic realms but also legally.

Impact on creative jobs

While generative AI holds immense potential for businesses, it also has significant implications for creative professions like writers and artists. On one hand, these technologies can be seen as tools that can enhance the creative process. Writers can use AI to generate ideas, plotlines, or even entire drafts, which they can then refine based on their unique style and vision. Artists can use AI to create initial sketches or explore different color palettes, freeing up more time for them to focus on the conceptual and interpretive aspects of their work.

On the other hand, there are concerns about AI potentially replacing creative jobs. However, it’s important to remember that while AI can mimic certain aspects of creativity, it lacks the human touch – the ability to understand and convey emotion, to draw from personal experiences, and to create art that resonates with the human condition. Therefore, rather than replacing creative professionals, AI is more likely to change the nature of their work, with a greater emphasis on refining and interpreting AI-generated content.

The legal issues and considerations of generated content

As we navigate the exciting landscape of generative AI, it’s crucial to address the legal considerations that come with it. The use of AI-generated content raises several legal and ethical questions that are still being explored.

Intellectual Property and Copyright

Generative AI models like DALL-E and MidJourney consume vast amounts of data from various sources. This raises questions about data use, attribution, and intellectual property. For instance, if an AI model is trained on copyrighted images or text, who owns the rights to the generated content? This is a complex issue that is currently the subject of several pending U.S. court cases.

Data Protection and Privacy

Another legal consideration is data protection and privacy. Generative AI models often require large amounts of data for training. This data may include personal or sensitive information, raising concerns about data privacy and protection.

Ethical Considerations

Beyond the legal implications, there are also ethical considerations. AI-driven content generators could potentially be used to spread misinformation, fake news, or misleading content. This could have serious ramifications, such as harming reputations or even inciting violence.

Navigating the Legal Landscape

Given these legal and ethical complexities, businesses seeking to leverage generative AI must navigate this landscape carefully. This includes understanding the legal implications of using AI-generated content, implementing robust data protection measures, and ensuring ethical use of this technology.

Microsoft’s Copilot Copyright Commitment

Happily, Microsoft has made a commitment to provide legal protection for customers who use their AI models, such as Azure AI Gen models, against copyright infringement lawsuits¹³. This commitment, known as the Copilot Copyright Commitment, is an extension of Microsoft’s existing intellectual property indemnity support¹^{3 14}.

Microsoft’s AI-powered Copilots are changing the way we work, making customers more efficient while unlocking new levels of creativity¹³. However, some customers have expressed concerns about the risk of IP infringement claims if they use the output produced by generative AI¹³. To address this concern, Microsoft has announced that if a third party sues a commercial customer for copyright infringement for using Microsoft’s Copilots or the output they generate, Microsoft will defend the customer and pay the amount of any adverse judgments or settlements that result from the lawsuit, as long as the customer used the guardrails and content filters built into Microsoft’s products¹³¹⁴.

This commitment is based on Microsoft’s belief in standing behind their customers when they use their products¹³. They believe that if their products create legal issues for their customers, it should be Microsoft’s problem rather than the customers’¹³. This philosophy is not new for Microsoft. For roughly two decades, they have defended their customers against patent claims relating to their products, and they have steadily expanded this coverage over time¹³.

The Microsoft Copilot Copyright Commitment will be effective starting October 1, 2023, and will apply to paid versions of Microsoft commercial Copilot services and Bing Chat Enterprise¹⁴. It will not extend to any free products, custom-built Copilot services, or consumer products or services, even if identified as a Copilot¹⁴.

This commitment does not change Microsoft’s position that it does not claim any intellectual property rights in the outputs of its Copilot services¹⁴. Customers who use Microsoft’s Copilot services and outputs in accordance with the terms and conditions of their commercial licensing agreements and the Product Terms will automatically benefit from this commitment².

In conclusion, Microsoft’s commitment to provide legal cover for Azure AI Gen models from copyright law suits is a significant step in addressing the legal concerns surrounding the use of AI-generated content. It provides reassurance to customers and encourages the continued use and development of AI technologies¹³¹⁴.

While generative AI offers immense potential for businesses, it also presents novel legal and ethical challenges. As we continue to explore this exciting frontier, it’s crucial to navigate these challenges with care and responsibility.