Can AI be a good artist?

11 June 2020 - 9 mins read time
Tags: Style Transfer Neural Network Explain

Neural Style Transfer

For a human being to create a master-class painting or become an artist takes a huge amount of time, skills and dedication. In the era of Artifical Intelligence can we manage to create AI program that enables us to create the art. Answer is "yes" this is all possible due to Convolution Neural Network(CNN). Convolution Neural Network is a type of deep learning algorithm that are inspired by the human visual cortex. If you want detail explaination of convolution neural network comment down below or ping me. Convolution Neural Network are the building block of the Neural Style Transfer.

Before we get into the nitty-gritty of Neural Style Transfer here are some examples. I believe Motivation is a key factor in successful learning.

Lovely isn't it

Now that you are motivated, lets try to understand what is Neural Style Transfer.

Neural Style Transfer is “process of using deep neural networks to migrate the semantic content of one image to different styles”.
In simple words neural style transfer is an optimization technique used to take two images—a content image and a style reference image (such as an artwork by a famous painter)—and blend them together so the output image looks like the content image, but “painted” in the style of the style reference image.

For example :::

Content + Style = New Style Transfer Image

In this example the content image is on left hand side. We have a style reference image at left hand side and then what neural style transfer can produce. If you want another style, just replace style reference image and run the application again. Classical art, abstractionism, impressionism you name it. Isn’t it amazing?

You can stop here if you are not intrested in math behind the neural style transfer. Here is a link for online style transfer service. Deep Art

Math behind neural style transfer

To get better understanding of neural style transfer we have to visualize the what our layers in Convolution Neural Network(CNN) is learning. Huge shoutout to research paper - Visualizing and Understanding Convolutional Networks . Most of the examples are took from this paper.

Lets consider you trained a convnet on an image and you want to visualize what the hidden layer in network are computing one way is to visualize what layers are learning or what acitvates our layers.

♦ Pick a unit in layer 1. Find the nine image patches that maximize the unit's activation.

♦ Repeat for another unit, for shallow layers (ie. Layer 1 to Layer 2) unit's look for simple features.
♦ For deeper layers units will look for complicated features.
This give us good intitution of what shallow and deeper layers are computing.

To build a neural style transfer lets define a cost function so that we can minimize this cost function using optimizers like gradient descent and get the ouput image :D
Lets call content image as “C”, style image as “S” and generated image as “G”.
From above defination of nst we know that we want to blend C with S to create G equation: $ C+S = G $

Cost function for generated image would be : $ J(G) = \alpha * J_{content}(C,G)$ $+ \beta * J_{style}(S,G) $

J(G) = measures how good is particular generated image is, we will use gradient descent to minimize in order to generate the image
We will break the equation in 2 parts :
1. Content Cost - $J_{content}(C,G)$ which takes content image and generated image and it measures how similar is content of image G to content of image C
2. Style Cost - $J_{style}(S,G)$ which takes style image and generated image and it measures how similar is style of image G to image style of S
$ \alpha, \beta $ are the weights of content cost and style cost respectively.

Lets start by creating Generator image (G)

We can use random noise to create generator image, so lets initalize G as random noise with parameters such as widht of an image, height of an image and channels (RGB) in an image.

Then we will use gradient descent to minimize the J(G) $G:= G - \frac{\partial J(G)}{\partial G}$

Content Cost Function

Content Cost - $J_{content}(C,G)$ which takes content image and generated image and it measures how similar is content of image G to content of image C
Let say you use hidden layer $l$ to compute content cost.
If $l$ is small number we use hidden layer 1 it will force us to generate pixel value similar to content image.
If we use big number for $l$ it will ask whether there is object present in our image, if there is object present in our content image there will be object in our generated image.
We will use pretrained convnet eg: VGG Network for layers.
Lets call $a^{[l][C]}$ and $a^{[l][G]}$ be activation of layers on the image.
If $a^{[l][C]}$ and $a^{[l][G]}$ are similar both images will contain similar content.
So the cost function becomes : $J_{content}(C,G) = \frac{1}{2} \lvert a^{[l][C]}$ - $a^{[l][G]} \rvert^2 $

Style Cost Function

Style Cost - $J_{style}(S,G)$ which takes style image and generated image and it measures how similar is style of image G to image style of S
Lets say we are using layer $l$ actiavtion to measure style.
Style is defined as correlation between activations across different channels.

1st dimension is height of an image,
2nd dimension is width of an image,
3rd dimension is channels of an image
We will define these dimension like,
$n^{[h]}$ as height
$n^{[w]}$ as height
$n^{[c]}$ as channels
We will find how correlated are the activations across different channels.

Style Matrix

Let $a_{i,j,k}^{[l]}$ is equal to activations at $(i,j,k)$ where $i=height$, $j=width$ and $k=channels$
We will define 2 different matrix style matrix and generat matrix.
Style Image Matrix : $ G_{k,k'}^{[l](S)} = \sum_{i=1}^{n_h^{[l]}} \sum_{j=1}^{n_w^{[l]}} a_{i,j,k}^{[l](S)} a_{i,j,k'}^{[l](S)}$

Generate Image Matrix : $ G_{k,k'}^{[l](G)} = \sum_{i=1}^{n_h^{[l]}} \sum_{j=1}^{n_w^{[l]}} a_{i,j,k}^{[l](G)} a_{i,j,k'}^{[l](G)}$

We denote these matrices as $G$ because in linear algebra its called “Gram Matrix”.
$k$ and $k'$ shows how correlated our activations are. Correlation means unnormalized cross covariance.

We will combine both of these gram matrix to create final Style Matrix
$ J_{style}^{[l]}(S,G) = \frac{1}{(2n_h^{[l]} 2n_w^{[l]} 2n_c^{[l]})^2} \sum_{k} $ $ \sum_{k'} (G_{k,k'}^{[l](S)} - G_{k,k'}^{[l](G)}) $

This equation can be simplified to,
$ J_{style}(S,G) = \sum_{l} \lambda^{[l]} \ast J_{style}^{[l]}$$(S,G) $
where, $\lambda^{[l]}$ is used over different layer to get pleasing visuals and $\sum_{l}$ is for higher as well as lower level layer.

Final Cost Function

$ J(G) = \alpha * J_{content}(C,G) +$$ \beta * J_{style}(S,G) $

In end we will try to use gradient descent or adam to minimize the above cost function.

Here is something for you to reach the end of the blog!