How to Create Realistic Slow Motion Videos With AI | Hacker Noon

TimeLens can understand the movement of the particles in-between the frames of a video to reconstruct what really happened at a speed even our eyes cannot see.

image

Louis Bouchard Hacker Noon profile picture

@whatsaiLouis Bouchard

I explain Artificial Intelligence terms and news to non-experts.

I’m sure you’ve all clicked on a video thumbnail from the slow mo guys to see water floating in the air when popping a water balloon or other super cool-looking “slow-mos” made with extremely expensive cameras. Now, we are lucky enough to be able to do something not really comparable but still quite cool with our phones. What if you could reach the same quality without such an expensive setup?

Well, that’s exactly what Time Lens, a new model published by Tulyakov et al. can do with extreme precision.

Just look at that video, the results are amazing! It generated slow-motion videos of over 900 frames per second out of videos of only 50 FPS!

This is possible by guessing what the frames in-between the real frames could look like, and it is an incredibly challenging task.

Learn more in the video and check out the crazy results.

Watch the video

References

The full article: https://www.louisbouchard.ai/timelens/
Official code: https://github.com/uzh-rpg/rpg_timelens
Stepan Tulyakov*, Daniel Gehrig*, Stamatios Georgoulis, Julius Erbach, Mathias Gehrig, Yuanyou Li, Davide Scaramuzza, TimeLens: Event-based Video Frame Interpolation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 2021, http://rpg.ifi.uzh.ch/docs/CVPR21_Gehrig.pdf

Video transcript

00:00

i’m sure you’ve all clicked on a video

00:02

thumbnail from the slo-mo guys to see

00:04

the water floating in the air when

00:06

popping a water balloon or any other

00:08

super cool looking slo-mo they made with

00:10

extremely expensive cameras now we are

00:13

lucky enough to be able to do something

00:15

not really comparable but still quite

00:17

cool with our phones what if you could

00:20

reach the same quality without such an

00:22

expensive setup well that’s exactly what

00:25

time lens a new model published by

00:27

tuliakov it all can do with extreme

00:30

precision just look at that it generated

00:32

a slow motion videos of over 900 frames

00:36

per second out of videos of only 50

00:39

frames per second this is possible by

00:41

guessing what the frames in between the

00:43

real frames could look like and it’s an

00:46

incredibly challenging task instead of

00:48

attacking it with the classical idea of

00:50

using the optical flow of the videos to

00:52

guess the movement of the particles they

00:54

used a simple setup with two cameras and

00:57

one of them is very particular by the

01:00

way if you work in the ai field and want

01:02

to have your models online running on

01:04

web apps i’m sure you will love the

01:06

sponsor of this video ubs stick until

01:09

the end to learn more about them and how

01:11

they can be quite handy for you let’s

01:13

get back to the paper the first camera

01:15

is the basic camera recording the rgb

01:18

frames as you know them the second one

01:20

on the other hand is an event camera

01:22

this kind of camera uses novel sensors

01:25

that only reports the pixel intensity

01:28

changes instead of the current pixel

01:30

intensities which a regular camera does

01:32

and it looks just like this this camera

01:35

provides information in between the

01:37

regular frames due to the compressed

01:39

representation of the information they

01:41

report compared to regular images this

01:44

is because the camera reports only

01:46

information regarding the pixels that

01:48

changed and in a lower resolution making

01:50

it much easier to record at a higher

01:52

rate making it a high temporal

01:54

resolution camera but low definition you

01:57

can see this as sacrificing the quality

02:00

of the images it captures in exchange

02:02

for more images fortunately this lack of

02:06

image quality is fixed by using the

02:08

other frame based camera which we will

02:10

see in a few seconds time lens leverages

02:13

these two types of cameras the frame and

02:16

the event cameras using machine learning

02:18

to maximize these two cameras type of

02:20

information and better reconstruct what

02:23

actually happened between those frames

02:26

something that even our eyes cannot see

02:28

in fact it achieved results that our

02:30

intelligent phones and no other models

02:33

could reach before here’s how they

02:35

achieve that as you know we start with

02:37

the typical frame which comes from the

02:39

regular camera with something between 20

02:42

and 60 frames per second this cannot do

02:44

much as you need much more frames in a

02:47

second to achieve a slow motion effect

02:49

like this one more precisely to look

02:51

interesting you need at least 300 frames

02:54

per second which means that we have 300

02:57

images for only one second of video

03:00

footage but how can we go from 20 or so

03:03

frames to 300 we cannot create the

03:06

missing frames this is just too little

03:08

information to interpolate from well we

03:11

use the event based camera which

03:13

contains much more time-wise information

03:16

than the frames as you can see here it

03:18

basically contains incomplete frames in

03:20

between the real frames but they are

03:22

just informative enough to help us

03:24

understand the movement of the particles

03:26

and still grasp the overall image using

03:28

the real frames around them the events

03:31

and frame information are both sent into

03:33

two modules to train and interpolate the

03:36

in-between frames we need the warping

03:38

based interpolation and the

03:40

interpolation by synthesis modules this

03:43

warping module is the main tool to

03:45

estimate the motion from events instead

03:48

of the frames like the synthesis module

03:50

does it takes the frames and events and

03:52

translates them into optical flow

03:54

representation using a classic u-net

03:57

shaped network this network simply takes

03:59

images as inputs encodes them and then

04:02

decodes them into a new representation

04:05

this is possible because the model is

04:07

trained to achieve this task on huge

04:09

data sets as you may know i already

04:11

covered similar architectures numerous

04:13

times on my channel which you can find

04:15

with various applications for more

04:17

details but in short you can see it as

04:20

an image to image translation tool that

04:22

just changes the style of the image

04:24

which in this case takes the events and

04:27

finds an optimal optic flow

04:28

representation for it to create a new

04:31

frame for each event it basically

04:33

translates an event image into a real

04:35

frame by trying to understand what’s

04:37

happening in the image with the optical

04:40

flow if you are not familiar with

04:41

optical flow i’d strongly recommend

04:43

watching my video covering a great paper

04:45

about it that was published at the same

04:47

conference a year ago the interpolation

04:50

by synthesis module is quite

04:52

straightforward it is used because it

04:54

can handle new objects appearing between

04:57

frames and changes in lighting like the

04:59

water reflection shown here

05:02

this is due to the fact that it uses a

05:04

similar u-net shaped network to

05:06

understand the frames with the events to

05:08

generate a new fictional frame in this

05:11

case the unit takes the events in

05:13

between two frames and generates a new

05:15

possible frame for each event directly

05:18

instead of going through the optical

05:20

flow the main drawback here is that

05:23

noise may appear due to the lack of

05:24

information regarding the movement in

05:27

the image which is where the other

05:28

module helps then the first module is

05:31

refined using even more information from

05:34

the interpolation synthesis i just

05:36

covered it basically extracts the most

05:38

valuable information about these two

05:40

generated frames of the same event to

05:43

refine the warped representation and

05:45

generate a third version of each event

05:48

using a unit network again finally these

05:52

three frame candidates are sent into an

05:54

attention-based averaging module this

05:56

last module simply takes these three

05:59

newly generated frames and combines them

06:02

into a final frame which will take only

06:04

the best parts of all three possible

06:07

representation which is also learned by

06:09

training the network to achieve that if

06:11

you are not familiar with the concept of

06:13

attention i’d strongly recommend

06:15

watching the video i made covering how

06:17

it works with images you now have a high

06:20

definition frame for the first event in

06:23

between your frames and just need to

06:24

repeat this process for all the events

06:27

given by your event camera and voila

06:30

this is how you can create amazing

06:32

looking and realistic slow motion videos

06:34

using artificial intelligence if you

06:37

watch until now and enjoy this paper

06:39

overview i’m sure you are more than

06:41

interested in this field and you may

06:43

have developed a machine learning model

06:45

for yourself or for work and at some

06:47

point you most probably wanted to deploy

06:50

your models run them live in the cloud

06:52

and make them available for others to

06:55

use or call them from other applications

06:58

you most certainly know that setting up

07:00

a serving infrastructure to do this can

07:02

be a very challenging task especially

07:05

when you like to focus on research as i

07:07

do luckily my friends at ubf and the

07:10

sponsors of this video built a solution

07:13

for us it’s a fully managed free serving

07:16

and hosting platform that helps you

07:18

deploy your code as a web service with

07:20

an api super easily the ubs platform is

07:24

very user friendly it helps you turn

07:26

your scripts and models into live web

07:28

services within minutes you can also

07:30

create more complex data pipelines

07:32

combining different services together do

07:35

version controls on your models and much

07:37

more you can use it for yourself or as a

07:40

data science team there’s a lot of cool

07:42

functionality to explore you can try it

07:45

for yourself by visiting ubs.com and

07:47

creating an account for free their free

07:50

tier already has a lot of monthly

07:52

compute budget and allows you to use all

07:55

the functionality so there’s literally

07:57

no reason not to check it out you can

07:59

find a wide range of examples for

08:01

working with tools like scikit-learn

08:04

tensorflow or other familiar frameworks

08:06

and information you need on their dex

08:08

and github plus their team is there to

08:11

help you with a slack server available

08:13

for anyone to join and ask questions

08:16

click this card to sign up for a free

08:18

account or see the first link in the

08:20

description you will be impressed with

08:22

their toolkit and how easy it is to use

08:25

as always if you are curious about this

08:27

model the link to the code and paper are

08:29

in the description below thank you again

08:31

ubs for sponsoring this video and many

08:34

thanks to you for watching it until the

08:36

Tags

Join Hacker Noon