What is Generative AI

In the field of generative artificial intelligence (AI), algorithms like ChatGPT are used to make new content. AI is a type of machine learning that lets computers learn from examples, like putting things into groups.

What is Generative AI
Photo by Richard Horvath / Unsplash

In the field of generative artificial intelligence (AI), algorithms like ChatGPT are used to make new content. AI is a type of machine learning that lets computers learn from examples, like putting things into groups. These neural networks are trained on many different examples, like images and text, to make realistic artifacts that reflect the training data without copying it. Generative AI can make many different kinds of content, like pictures, videos, music, speech, text, software code, and designs for products.

What Powers Generative AI?

Foundation models, which are complex machine learning systems trained on a huge amount of data on a large scale, have led to a new class of generative AI applications. With recent improvements, companies can now use large language models (LLMs) trained on natural language to build specialized models for generating images and language on top of these foundation models. These models have the potential to change a lot about how we make content and improve the user experience as a whole.

What are Large Language Models?

Large language models (LLMs) are AI systems that work with language to make digital representations that are simple but useful. The word "large" refers to the trend of training language models with more parameters, since using more data and computing power always leads to better performance. The OpenAI GPT-4, the Google PaLM, and the Meta LLaMA are all examples of LLMs. But it's not clear whether you should call certain products LLMs themselves or say that they are powered by LLMs. LLM is the most specific of the three, and people who work in AI often use it to talk about systems that work with language. It's not clear what counts as a language model and what doesn't, and there's no agreement on what size of model should be considered "large."

What are Foundational Models?

The Stanford University institute popularized the term "foundation model" to describe AI systems with broad capabilities that can be adapted for specific purposes. This approach differs from other AI systems, which are specifically trained and used for a specific purpose. Examples of foundation models include LLMs, such as ChatGPT, which used an LLM called GPT-3.5 as its foundation. OpenAI later specialized ChatGPT for chatbot settings by creating a tweaked version of GPT-3.5. The term "foundation model" is often used synonymously with "large language model," as language models are the clearest example of systems with broad capabilities that can be adapted for specific purposes. However, "foundation model" aims to establish a broader function-based concept that could accommodate new types of systems in the future.

Types of Generative AI

Generative AI covers a range of machine learning and deep learning techniques, including:

Transformer Models

Transformer models are neural networks that learn context by finding relationships in sequential data, like words in a sentence, and keeping track of them. They are often used for tasks that have to do with natural language processing (NLP), and they are built on transformer architectures. Transformers take a small, constant number of steps and use a self-attention mechanism to figure out how all the words in a sentence relate to each other. In the English-French translation model, for example, the Transformer can learn to pay attention to the word "river" in a single step and figure out that the word "bank" means the edge of a river.

To figure out the next way to represent a word, the Transformer compares it to every other word in the sentence. This gives each word a "attention score." With these scores, we can figure out how much each word should add to the next representation of "bank." In the example, the word "river" could get a high attention score when figuring out a new way to represent "bank". The attention scores are then used as weights for a weighted average of all words' representations, which is fed into a fully connected network to create a new representation for "bank" that shows that the sentence is about a river bank.

Generative Adversarial Networks (GANs)

Generative adversarial networks (GANs) are neural networks that use a generator and a discriminator to make new content that is then shown to the discriminator. The generator learns how to make more realistic content that can fool the discriminator, and the discriminator learns how to tell the difference between different types of content. Deepfakes are fake videos or pictures of real people saying or doing things they haven't said or done. They are made with GANs. But there is a huge amount of potential for GAN technology to be used in legitimate business applications like product design, art, and content creation.

A GAN has two parts: the generator and the discriminator. The generator learns to make plausible data, which the discriminator uses to learn how not to do. The discriminator learns to tell the difference between the generator's fake data and real data, and the generator gets punished for making results that don't make sense. As the training goes on, the generator gets closer to being able to make output that can trick the discriminator.

The generator and the discriminator are both neural networks, and the output of the generator is directly connected to the input of the discriminator. Through backpropagation, the generator updates its weights based on how the discriminator classifies things.

Variational Autoencoders (VAEs)

Variational Autoencoders (VAEs) are a probabilistic take on traditional autoencoders, which look for patterns in a dataset by compressing data into a lower-dimensional space and learning to generate new data by sampling from this compressed space. VAEs are called "autoencoders" because, like traditional autoencoders, they have an encoder and a decoder. VAEs turn input data into parameters of probability distributions, like the mean and standard deviation of a Gaussian. This makes a continuous, structured latent space that can be used to make images. In general, VAEs are easier to figure out how to program than sparse autoencoders and denoising autoencoders.

VAEs are used in many different applications such as medical image imaging, facial recognition systems, natural language processing, and product design. VAEs can even be used to generate lifelike images from completely arbitrary noise. VAE-based applications have been making a lot of progress in recent years and have even been used to generate lifelike human faces from a randomly sampled latent space.

References