
Nerve -style transfer is one of the most attractive applications of deep learning in computer vision. In this blog, we will create a neurological style transfer pipeline from the beginning using the Pitturch and VGG 19 models. Instead of relying on pre -bullet libraries or APIs, we will detect basic mechanics: how the features are extracted, how the styling using the gram matrix is ​​encoded, and how the damage is improved and how to improve through gradual descent.
This post is best for readers who want to:
- Understand the role of competent nerve networks in image representation.
- See how VGG 19’s internal layers can be used to isolated the content and style.
- Learn how the gram matrix catchs an artistic style.
- Implement a step -by -step style transfer algorithm with an interpretation code.
Finally, you will understand how the nerve -style transfer works and has its own stylized image – a real combination of content and creativity – and code to reproduce it. Whether you are eager to learn deep, an artist, or just to know how AI can paint, this post is for you.
Nerve -style transfer (NST) mixes two images – a The picture of the content And a The style picture – By making a new improvement The created picture Using deep learning. Instead of editing the model, we Frozen Update the picture using this and back propaganda.
We load our first Content And Manner Picture and preview them so that they can be played in a pre -VGGG 19 model. We also create The created picture As a clone of content icon.
generated_img = content_img.clone().requires_grad_(True).to(device)
Here, we set requires_grad=True
To tell Pirates that this icon should be updated during the correction – that is, we want to change the pixel values ​​of this icon so that the material looks like a style image while maintaining the content.
Styling transfer heart is a deep sculpture nerve network called Vgg19Actually trained to identify items in everyday images.
VGG takes pictures through many layers of Confusion, each layer learns to detect different levels of visual detail:
- 🖌 🖌 Shallow layers –
- 🧠🧠Deep layers Learn high-level features: shapes, configurations and structure- Which means Or Content Of the syllable
That’s why we:
- Evacuate Manner Style image layers – where artistic structure and brush work remain.
- Evacuate Content From the deep layers of content icon – where the overall shape and the article is encoded.
Think about it like this:
Shallow layers = How it looks like (A feeling of brush stroke)
Deep layers = What is it? (A tower, a tree, a face)
The style pixel is not about copying the texture texture. Instead, it’s about catching Relationship between different visual features – Overall texture, color palette and samples.
Wee Wee, we use something Gram matrix. Without going into the deep math, it works like this:
- We take features from a layer (say, colorfulness or edge samples).
- The gram matrix measures how these features are related to each other.
- This “harmony” provides us with a style fingerprint.
This is calculated for multiple layers of styling image and compared to The same layers From the created icon.
🎨 If the material is a painting outline, the style canvas has a brush structure.
Two completely mixed photos – for one ContentFor the other Manner – We need to measure how close we are the image we have. At the same place Total loss The function comes.
This is a combination of two parts:
- pulled out of A Deep layer VGG19 model (usually
conv4_2
Which is equivalent to layer 21). - This layer encoded The spiritual structure: Where are items, their shape and general layout.
- Damage is done as calculated Mean Square Error (MSE) Between the properties of the image material and the image of the original material.
content_loss = MSE(gen_content_features, original_content_features)
CONTENT MATERIALS TILLS AT AII What Scare
- By its calculation Multiple layers (Like,
conv1_1
For, for, for,.conv2_1
For, for, for,.conv3_1
For, for, for,.conv4_1
For, for, for,.conv5_1
) VGG19 Network. - These layers arrest SamplingFor, for, for,. StructureAnd Brush stroke At different levels.
- Layer Lay Wee, we compare it Gram matrix Of the style image created.
- The gram matrix reflects how different features in the picture are interactive – such as combination of color or structure are common.
style_loss = Σ MSE(Gram(gen_feat), Gram(style_feat)) for all style layers
Style styling photo tells AI How Scare
To create an image that respects both material and style, we add both losses to the same loss.
total_loss = content_loss + style_weight * style_loss
style_weight
Usually A A lot of large numbersLike, such as1e6
Or1e9
.- This ensures the style of style Firmly reflected In the final image.
- If the weight is too low, the production looks like a lot of material. Very high, and it forgets the structure.
- These damages are used to produce a graduals to propagate Pixels Our created/clone icon
Here is a cool part: We do not train nerve networks from the beginning. Instead, we have VGG19 Frozen And ask:
“What should be the pixels of the iconized icon so that it has the same picture and the same style content?”
We start with a copy of the content image (or even a random image), and we refresh Only the syllable – Not model.
We do it using this Gradual descent: A fancy way to tweet a phased picture to reduce the gap in both content and style.
With each repetition:
- The icon produced is like the material in the structure.
- This style pays more colors, textures and strokes than the image.
After running a nerve -style transfer to my own photos, what I got is it!
Material picture:
The style image:
Pictured picture:
You can clearly see:
- Sequences and shapes Content is protected from icon.
- Brush -like stroke and color Directly taken from style.
Long long training is essential (even when losing loss)
At first, I assumed that once the loss of loss is reduced, training can stop. But through experience, I noticed that it was not.
Even when Curved posts of damage endsThe syllable Keeps improved – The edges are cleaned, textures are more naturally available, and the final result looks far more “finished”. Correction (especially using L-BFGs) is still making micro adjustments that help align deep texture and excellent details.
✅ Lesson: Damage – Visual Convener. Trust the process. Let him walk longer
You can find full code for this project – including image loading, VGG 19 model setup, gram matrix calculation, and styling transfer training – in my gut hub repository:
👉 👉 Gut Hub Ripo: Nerve -style transfer with piturich
Feel free to fork, try your photos, and even extend it to a fast style transfer or video!
In this version of the styling transfer, we start with noise or content -based image and keep Improved direct pixels To meet styling + content.
But there is another powerful approach: instead of improving every icon from the beginning, we can train Convocal Neural Network (CNN) They Learns to apply a specific style In the same forward pass.
How does it work:
- You train CNN using the same style of transfer loss function.
- But instead of improving the syllable, make you better The weight of the network.
- Once training, this “Styling Transfer Network” You can stile the image of any new content immediately.
This is what the models like Johnson Et El’s cognition losses Creating real-time artistic filters using Crow-well-learned network.
🧠This issue twists “every time improving pixels” → “Improve a model once, use forever.”
It’s fast, beautiful and practical – especially for mobile apps, video frames, or interactive design tools.