This AI Creates Realistic Animated Looping Videos from Static Images | Hacker Noon

@whatsaiLouis Bouchard

I explain Artificial Intelligence terms and news to non-experts.

This model takes a picture, understands which particles are supposed to be moving, and realistically animates them in an infinite loop while conserving the rest of the picture entirely.

The end result is amazingly realistic videos like this one, using only still pictures to generate it.

Watch the video

References

►Read the full article: https://www.louisbouchard.ai/animate-pictures/
►Paper: Holynski, Aleksander, et al. “Animating Pictures with Eulerian
Motion Fields.” Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition. 2021., https://arxiv.org/abs/2011.15128
►Project link (code coming soon): https://eulerian.cs.washington.edu/

Video Transcript

 00:00

Have you ever taken a beautiful landscape picture and later on you noticed that it didn’t

00:05

look quite as good as when you were there.

00:07

It may be because you just cannot freeze such a real-life landscape and expect it to look

00:12

as good.

00:13

In that case, what about having this picture animated where the normally-moving particles

00:18

would be in constant movement, just like the moment you took the photo?

00:22

Observing the water flow or see the smoke disperse in the air.

00:25

Well, this is what a new algorithm from Facebook and the University of Washington does.

00:30

It takes a picture, understands which particles are supposed to be moving, and realistically

00:35

animates them in an infinite loop while conserving the rest of the picture entirely still creating

00:41

amazing-looking videos like this one.

00:44

Sincerely, I don’t know why but I LOVE how it looks and wanted to share their work.

00:49

What do you think about these results, and how would you use them?

00:53

Personally, once the code is released, I am using these as desktop backgrounds.

00:57

Now that we’ve seen what it can achieve, I hope you are as excited as I was when discovering

01:02

this paper.

01:03

Let’s get into the even more interesting things.

01:06

Which is: how can they take a single picture and create a realistic animated looping video

01:11

out of it?

01:13

This is done in three important steps.

01:15

The first step is to find what needs to be animated from what needs to stay still.

01:20

In other words, find the water, smoke, or clouds to animate.

01:23

Of course, detecting these moving particles is extremely easy for humans as we can imagine

01:29

the real scene and how it actually was, but how can a computer that sees only a picture

01:35

and doesn’t know the world do this?

01:37

Well, the answer lies within the question: we need to teach it a bit more about the world

01:43

and how it works, or in this case, how it moves.

01:46

This is done by training an artificial intelligence model on videos of real landscape scenes instead

01:52

of pictures.

01:53

This way, it can learn how water, smoke, and clouds typically behave in the form of a flow

01:59

field.

02:00

This flow field is a version of the input image where each pixel value is an approximation

02:04

of their direction and speed at a frozen time.

02:07

It is called an Eulerian flow field.

02:10

Eulerian flow fields look at how fluid moves focusing on a fixed location instead of following

02:15

the particles of the fluid.

02:17

You can see this as sitting in front of a waterfall and watching the same exact positions

02:22

observing how the water changes there, instead of following the water down the waterfall.

02:27

And this is exactly what we need in this case as the image is precisely representing that:

02:32

flowing water in a still position.

02:35

So using many landscape videos, they started by identifying these fields for each video.

02:41

This is done quite easily as it actually moves during the videos, and we can use widely known

02:46

techniques to identify the moving particles in each frame.

02:50

Then uses this identified flow for each frame as a landmark to train their algorithm.

02:55

The training starts with an image-to-image translation network using video frames as

03:00

inputs.

03:01

These identified flow fields are used to compare the outputs to teach the network in a supervised

03:06

way what we want to achieve.

03:08

This is done by iteratively correcting and improving the network based on the difference

03:12

between the generated image and our known flow fields.

03:16

After such training, the network can generate this flow field without any external help

03:20

for any image of a landscape received.

03:23

This works just like any other GAN architecture, more precisely any encoder coupled with a

03:29

decoder.

03:30

It first encodes the input frame, the landscape image, and then decodes it to generate a new

03:36

version of the same image, conserving the spatial features and changing the image’s

03:40

style.

03:41

In this case, the style changed is the pixel values which identify a motion field instead

03:46

of the actual colors of the images.

03:49

The second step is to animate these sections of the image and do it realistically.

03:53

For this, we only need two things: the input image and the Eulerian or static flow estimation

04:00

we just found for the image.

04:02

Using this information, we know where the pixels are supposed to go next based on their

04:06

speed and directions, but directly applying this will cause some

04:10

issues as some pixels may not have any values after the translation, resulting in black

04:15

holes starting where the motion begins in the picture.

04:18

This is because 1.

04:19

the predicted motion field isn’t perfect and 2.

04:22

some pixels will go to the same resulting pixel after their displacement

04:26

, which means that it will get worse over time and produce something like this.

04:30

So how can we make this more intelligent?

04:33

Again, it is done using an encoder and a decoder and doing one more step in-between the two.

04:39

So they encode the input frame a second time using a different encoder trained on this

04:44

specific task, producing what they call here their deep features.

04:48

These deep features are the encodings of the input image, meaning that it is a concentration

04:52

of the important information for this task about the picture.

04:56

What is judged “important information” here is what they optimized their model to do during

05:01

training.

05:02

Using these deep features, controlled by the displacement fields indicating how the next

05:06

frame looks like, they use a decoder trained to generate the

05:10

next frame from this condensed information about the frame and the flow field we give

05:15

it.

05:16

Note that during training, they used two different frames, the first and last frames, to learn

05:20

the real-looking flow of the fluids and try to avoid such black holes from happening.

05:25

Now comes the third and last step: the looping part.

05:29

Using the same frame as starting frame, they generate animation in two directions, a forward

05:34

movement and a backward movement, until they reach the second frame.

05:38

This enables them to produce the looping effect by merging the two videos since one starts

05:44

when the other ends and meets in the center.

05:46

Then, at inference time, or in other words, when you actually use the model, it does the

05:52

same thing with only a starting frame, which is the image you give the model.

05:56

And voila, you have your animated image!

05:59

I hope you enjoyed this video as much as I enjoyed discovering this technique.

06:03

If so, I invite you to read their paper too for more technical details about this super

06:08

cool model.

06:09

It is extremely well done!

06:14

Thank you for watching!         

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.

read original article here