# Wait, isn't this like a developers meetup?

##### arXiv

The algorithm behind DeepArt.io and Prisma

# What is neural artistic style transfer?

+
=

Image source: arXiv

# What we'll learn today (hopefully)

1. How is the problem formulated?
2. What computers have to learn to solve this problem
3. Code! ðŸ’»

+

=

# Step 1: Formulate the Problem

#### What we want

Buildings, trees, river...

#### What we don't want

Exact colors, textures...

# Step 1: Formulate the Problem

#### What we want

Colors (lots of blue? Some hint of yellow?), textures (the famous Van Gogh thick brush strokes)...

#### What we don't want

Objects in the painting (mountains, houses)

# Step 1: Formulate the Problem

#### What we want

A new picture which content is closest to our Content Image and which style is closest to our Style Image

# Step 1: Formulate the Problem

#### How to measure "close"

Euclidean distance!

â†’

# Step 1: Formulate the Problem

#### ... but I have more questions. How can the computer even?!?!?!?

We now know what to do, but how can the computer accomplish all these things?

What does the computer need to do? How does the computer know which one is the content? How does the computer learn which one is the style? I HAZ SO MANY QUESTIONS.

# A (very) crash course on neural networks

• For images, conventional neural networks is computationally expensive (a 30x30 image still needs 900 inputs!)

# A (very) crash course on convolutional neural networks

Source: MathWorks

# VGG

• One kind of ConvNet architecture
• Winner of the 2014 ImageNet challenge
• Used in this paper to extract content and style from the input images
• Find out more

# How can computers see the content?

• The higher layers detect the more higher-level features
• ... therefore are good layers to extract our content!
• We'll lose the exact pixel information, but it's OK (in fact, we don't need 'em)

# How close is our generated image to our content image?

We calculate the Euclidean distance between the corresponding feature map of our content image and feature map of our generated image.

# How close is our generated image to our style image?

It's not as straightforward, but it's OK!

The Euclidean distance still comes in handy, but instead of calculating the distance between feature maps, we'll be calculating the distance between the Gram matrices of a feature map.

# Gram matrices-a-what?

Refresher: Gram matrix = a matrix multiplied by its transpose

We'll be looking at the correlations between feature responses in an image. In each layer, we multiply all feature maps point-wise.

Another way to think of it: the spatial information of our image is distributed, because every column is multiplied with every row in the matrix.

# Wrapping everything up

We can adjust the parameters, depending on our preference (more style? More content?)

# Now what?

We need to do some optimization to minimize the loss iteratively. This will give some kind of direction for our initial random image to improve towards a generated image with minimum loss.

The paper uses an optimization algorithm called Limited-memory BFGS (L-BFGS).

# Step 3: Code ðŸ’»

#### Jupyter Notebook

This is the annotated version of Gatys et al.'s code. My own implementation was too messy to be presented and I didn't have time to tidy everything up, unfortunately.

# Is that all?

No! Check these out: