Player is loading...

Dynamic texture analysis with deep learning on three orthogonal planes
Dr. Vincent Andrearczyk, HES-SO

Thursday, April 19, 2018 · 10:44 a.m. · 20m 31s

Dynamic Textures (DTs) are sequences of images of moving scenes that exhibit certain stationarity properties in time such as smoke, vegetation and fire. The analysis of DT is important for recognition, segmentation, synthesis or retrieval for a range of applications including surveillance, medical imaging and remote sensing. Convolutional Neural Networks (CNNs) have recently proven to be well suited for texture analysis with a design similar to dense filter banks. The repetitivity property of DTs in space and time allows us to consider them as volumes and to analyze regularly sampled spatial and temporal slices. We train CNNs on spatial frames and temporal slices extracted from the DT sequences and combine their predictions in a late fusion approach to obtain a competitive DT classifier trained end-to-end.

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.

00:00:00

huge d. in dublin city university i'm now working with has of with funding uh and has a so

00:00:06

so looks so i'm gonna talk will be stripped to it every

00:00:09

first uh talk about natural vision and motion analysis to

00:00:15

make a link with the topic big our then uh introduce static and dynamic texture is uh i will

00:00:22

present a handcrafted so classy cat approaches for text analysis

00:00:27

and the planning a methods that have been developed recently and then i will uh introduce

00:00:32

classical crafted dynamic texture methods and finally keep

00:00:38

learning approaches for than any pictures

00:00:42

um so

00:00:44

ah i have a a very briefly talk about natural vision basics safer for the people would

00:00:50

have introduced it before but no so all i'm your own soon the brain uh

00:00:57

fire when visual studio p. within their research they're receptive field

00:01:02

uh the first uh selves infusion to receive a

00:01:08

information are the simple sense in the v. one hopes

00:01:14

in the view won the primary visual cortex uh and they i couldn't to

00:01:21

gabor filters used in text analysis and all the uh computer vision methods

00:01:28

then complex sense um receive input from different simple selves and

00:01:32

bead spatial invariance uh from fees more basic uh

00:01:39

sounds like the in in uh all the um contests in all the visual cortex

00:01:47

and watch acton complex neurons um detect more they you want to take

00:01:53

more abstract an invariant in complex patents in the visual field

00:02:00

so these are the basics of vision and know how

00:02:04

much an analysis uh is done in the brain

00:02:08

so much analysis is vital for survival and interaction

00:02:12

with environment before an immense insects of humans

00:02:16

uh in six outline to non moving ah scenes to more

00:02:20

no moving parts in the in the visual field

00:02:24

uh various various experiments have shown a p. d.

00:02:28

x. seven motion recognition of insects and animals

00:02:31

so for instance but of a it has been shown that

00:02:34

butterflies uh attracted to a fake butterflies made of paper

00:02:41

we got last uh the the shape all the carlisle off the wings but based on

00:02:47

the motion for the swing of a bee's have been shown to to uh

00:02:54

go to what's oscillating flowers twice as fast as non moving flowers

00:03:00

so there are a lot of experience like this also humans have been shown to be

00:03:04

able to recognise very complex movements just by placing a lights on the body

00:03:11

and so only looking at the lights uh not the the more than that though not the shape of the body some people can recognise

00:03:18

i'm convinced is all walking or running but also particular people so particular act those that are

00:03:24

doing this if they know the people so they're very good to the motion recognition

00:03:29

actually a top chart ease the the human inability to perceive motion sort of motion blindness

00:03:35

and it has been shown to be uh huh maybe due to listen

00:03:39

in cortical area every five in one of these confirm for um

00:03:46

sold them how the motion is analysing the brain it's they we have directives active sense

00:03:52

in the the actions affected sensing directly now behind the i've um

00:03:58

they respond to any stimulus selby that's lot of different

00:04:01

cells that respond to stimulus in a specific direction so each cell is specific for given direction

00:04:07

and there should be our

00:04:11

despite with the right out about once model thought a model for their actions selective sal is

00:04:20

that two points in the field of you so it's it's horizontal movement um the first

00:04:28

point uh at a given time is compared to another point in a given direction

00:04:35

with the delay time and then the the subtraction of the of the

00:04:41

of these points uh i will provide a positive output only get people on

00:04:47

your output and a positive output would mean that at this point

00:04:51

has moved to that point thank you so much and that is exactly the same

00:04:56

uh an interesting things that human can train them motion detection

00:05:00

sold training to recognise a given motion in a given

00:05:04

a direction people can improve the detection in this direction so it's

00:05:10

it's a lot of learning possible with this baby sat

00:05:16

no the different types of motion that we can classified three main types of motion one is localise in

00:05:20

space and time cell in a sequence of frames you how it some motion that that happens

00:05:28

locally in space and time so like at the door opening it's shortened

00:05:32

time and it's and localised in the in the field of you

00:05:37

and activities is described as the as localised in space so and periodic in time

00:05:44

so like a person walking or running gets continues in time

00:05:49

typically can time but it's localised in the space and that any texture that

00:05:53

i will mostly talk about is not localised in space and time

00:05:57

soul like in the video of grass or tree it is neither localised

00:06:04

in space more time so that's what i would focus on

00:06:11

what is first texture static texture it is an image few using computer vision um

00:06:18

based on the spatial relationships between intensity because it's so it's opposed to call q. would

00:06:23

that is based on the distribution of the pixel intensity is like he's never

00:06:28

in texture we're interested in the in the real special relationship between the seventh as it is

00:06:34

it is usually a categorised by spatial repetitive eighty so we have certain are pretty picky of patterns

00:06:42

we've some very eight variability of the patterns themselves and of the of the special

00:06:50

the relationship between these buttons and a lot often categorised by high tone violations or cross this line

00:06:56

was the image plays a height variation of books intensity so these are usually crackers

00:07:02

i'll fix to texture images

00:07:05

what is a dynamic takes shape is the extension of a static takes you to the special pampered main

00:07:12

uh so it's a sequence of frames

00:07:14

text of moving text to this and it is to capture temporal variation on top of the

00:07:21

the static texture so stylish proper valuation like motion and the formation of the textures

00:07:29

it is also carcass by but cartwright play spatially productivity static

00:07:33

picture but he's also stash and i property in time

00:07:36

so stationary property means that it's a stochastic processes with an change statistics across time

00:07:44

so to random process that has stuff that constant statistics across time

00:07:52

these are examples of dynamic text you so natural than any textures like small trees and ways

00:07:58

and on natural dynamics pictures like traffic and crowds and in medical imaging

00:08:04

uh can be from the trust sounds n. y. p. evolution of

00:08:10

uh often why across time can also provide some time and it takes you

00:08:18

um the properties of a done any texture range solves spatially as the static picture

00:08:25

ranges from right by regular to stochastic so is the very regular oh

00:08:32

regular patterns with the regular uh spatial arrangement of the buttons

00:08:38

and it ranges from these two beret to completely stochastic

00:08:42

a texture

00:08:44

temporarily it's the same it ranges from a rigid so with translation like

00:08:51

dick translation of the cost of rotation but it's me when you

00:08:57

and on the region like a diffusion of the small and can also be caught us by periodicity

00:09:04

so these are the large range of different types of the authentic pictures

00:09:10

uh that different types of analysis so it's most common specification

00:09:17

that's a vacation segmentation because of the synthesis of a dynamic

00:09:22

textures and shape and texture is just with static pictures

00:09:27

uh the application computer visions in computer vision a remote monitoring and surveillance off

00:09:34

five the forest fire a monitoring of the traffic on highways and or cities

00:09:42

we don't know the crowds and remote sensing from satellites analysis of the about all this

00:09:49

uh in medical imaging it's it can be used for segmentation

00:09:53

diagnostic problems deep predictive and analysis of different uh

00:09:58

applications and decreasing amount of video data makes this the the analysis of panic extra really important so

00:10:07

part smart phones surveillance data maybe condom robotics these lot of video

00:10:13

see the top that we can use for this

00:10:19

uh that's our challenges uh to static and then it takes channels is uh

00:10:25

first is the diversity and complexity of the picture of the textures

00:10:30

maybe it is there as you can be due to get position of the camera out of the point of view

00:10:37

but um and the number of different natural phenomena it

00:10:41

can it yeah it uses a high interest valuation

00:10:45

but also fine grained tasks all recognising that similar uh with with the difference

00:10:51

between classes can so if these also sometimes a little intricate valuation

00:10:56

other problems like spatial distortion of the texture uh has to be taken to counsel does it need for

00:11:03

not for important invariance to geometric transformation uh and

00:11:08

high obstruction uh in the analysis huh

00:11:14

so now i'll talk about the classical uh the

00:11:18

hand crafted a texture analysis

00:11:22

um so these generally of feature extraction told different inputs

00:11:27

can be different inputs for classification for segmentation

00:11:31

and it's generally feature extraction but to extract descriptors of the uh

00:11:37

of the texture so it's can be look and binary part and a filter banks

00:11:43

wave leds so those that has been a lot of work on on

00:11:48

handcrafted feature extraction than these uh these descriptors these features

00:11:54

can be put into a global descriptor for classification

00:11:58

they can they can be used as local descriptors for segmentation of texture regions

00:12:04

or for the problems but these um these

00:12:08

handcrafted approach lacks obstruction sometimes invariance

00:12:13

is it they are not been built and generalisation to unknown textures so

00:12:19

like in most computer vision problems learning zone significantly better generalisation and abstraction

00:12:29

so all uh uh depending for static picture what white to to use convolutional networks that

00:12:36

we have learned features instead of handcrafted features we have an excellent obstruction and generalisation

00:12:43

uh it's made of a cascade of feed the banks that is really what you could

00:12:47

feel the texture analysis so feed the banks have been use the button text analysis

00:12:52

but but handcrafted uh it is

00:12:56

the problem is that large and complex shapes emerged in the deepest layer so

00:13:01

we have very basic double features in a delay is but we have complex shapes are

00:13:07

really see huge faces of this is very this is well known in in returning

00:13:14

and so and these fee that the feature maps uh he uh uh the t.

00:13:19

p. g. maths maintain the real at the relative spatial arrangement of buttons so

00:13:26

so we maintain is the the the the layout of the image so now when

00:13:30

that phase has the guys at the given location and and the mouth somewhere

00:13:36

underneath and this is sort of information that we want to discard vortex channels is because this is

00:13:42

this is uh not infirmity for billions of picture it's not an object recognition problem it's a texture

00:13:48

so without the main two problems of uh of

00:13:52

class of normal convolutional networks for for texture

00:13:57

a good so was was say in the texture has repeated patterns and

00:14:02

we don't really care about the there's the layout of the image

00:14:06

on like a face on object which is a large object that

00:14:12

it's a it's a that he's the entire field of you uh and

00:14:18

it meetings that yeah it's sold objective is to discard the global shape and

00:14:23

layout analysis of the c. n. n.'s and for these uh the

00:14:30

yep wrote the usual approaches to use all that is putting so to cool a local descriptors into

00:14:37

global descriptors into a global descriptor regardless the collective

00:14:41

that the special location of the local descriptors

00:14:45

so uh and the the good thing was with another of putting is also that we can

00:14:50

we can then have arbitrary input sizes this is a consequence of nautilus putting

00:14:56

uh we we can have any input size to network uh because the feature might would you

00:15:02

be pulled into a single uh descriptors so we don't we we can having imprecise

00:15:10

uh this is an uh first uh method is in deep learning fall uh

00:15:16

for texture the analysis so it's the feature vector c. n. n.

00:15:21

um speeches are taken from the uh the deep feature maps so

00:15:28

all but the last conclusion layer before for the connection is

00:15:33

and their dictionary learning any encoding by feature

00:15:38

vector is performed on top of these

00:15:41

uh on the on top of these features so these features still have

00:15:45

spatial information and feature vector will up with feature vector will obtain

00:15:52

a global descriptor that discounts the this the spatial information and

00:15:57

and carcasses texture and then these uh these fish affect the script is we'll

00:16:03

be classified using this v. and so this was a first approach

00:16:08

which is not trained and and it's only the the this unit is only used for

00:16:14

for feature extraction and um it's not fine tuned on a given problem only

00:16:19

use a preacher network attractive features and perform a fish effect on people

00:16:26

um another approach that they developed was to to perform

00:16:31

an average putting on top of intermediate layers

00:16:34

well

00:16:38

um so the idea was to discard the global shape by an average putting up these

00:16:42

feature maps so this car removing convolutional laos and a average putting this like this

00:16:50

it uh allows a reduced complexity of beating environs and it's applicable to buy as soon and uh

00:16:57

architectures

00:16:59

this is another a static picture approach but i would be quickly it's it's um

00:17:04

it performs these new encoding layer that performs this dictionary learning and

00:17:09

to at for training and to and within the network

00:17:13

um so now we have four types of a classic dynamic texture approaches

00:17:20

so statistical local descriptors motion pays so extracting first the motion

00:17:25

and then the uh school did driving a statistics from from the the motion some motions

00:17:31

struck by the of the control model based and filtering or transform but i would just

00:17:37

keep them because we don't have enough time so abortion base model based filtering

00:17:44

um but now the planning for that any texture we can first thing

00:17:49

we can think of is to view the dynamic pictures of volume

00:17:54

and perform three d. c. enhance the problems that have the

00:17:59

is a heavy computation due to three by three by three all

00:18:02

all larger cannons and also larger my conception because of

00:18:07

a three d. feature maps for that ah i developed a um

00:18:13

let's see in and those three c. announce applied on three

00:18:17

planes one is spatial blue one and two temple planes

00:18:22

so these are slices of the texture and so we applied three

00:18:26

text to c. n. n. on so one on each plane

00:18:29

and we combine the uh the the scores their predictions course

00:18:34

off each a network for collective classification of um

00:18:41

the the given with the the available data sets consistently high

00:18:44

accuracy was obtained ninety percent on one of them

00:18:48

uh one thing that we realise that was that single plane so you training on only one plane

00:18:55

or stopping very high accuracy so all that was a good transfer from

00:19:00

a network trained on internet to some time for the temporal

00:19:06

slices like this so there was still a very good transfer of a learned features

00:19:12

and the but the best results well well thing the three planes which will give compare mentality of spatial and temporal analysis

00:19:20

these are some examples of the specification and also last data sets benefits

00:19:25

more from the steep learning approach then shut and small ones

00:19:29

and a deeper than a diplomatic architecture only slightly outperforms a a social

00:19:34

one like alex that's all it shows that the results saturated

00:19:38

uh separated for the the available data set so we need more challenging and more and larger than any texture

00:19:45

data sets to be able to improve more these approaches and evaluate the pruning methods

00:19:53

so the conclusion was that the texture of the important thing is another let's

00:19:57

putting off the deep feature moving the layout information and the global shape

00:20:03

then it takes to combine spatial and temporal analysis on the three orthogonal planes um

00:20:11

because these depending that adopting the state of the art on most of the of the type of the data set

00:20:18

but these data sets are uh that the results are saturated and we need larger and more complex it's it's

Share this talk:

Recommended talks

09:18

New challenges of large-scale Deep Learning: High Performance Computing for distributing computations
Mara Graziani, HES
April 19, 2018 · 11:21 a.m.

330 views

01:11:42

Robust image feature extraction learning and object registration
Prof. Vincent Lepetit, TU Graz, Austria
April 24, 2015 · 11:03 a.m.

245 views

Dynamic texture analysis with deep learning on three orthogonal planes
Dr. Vincent Andrearczyk, HES-SO

Embed

Transcriptions

Recommended talks

New challenges of large-scale Deep Learning: High Performance Computing for distributing computations
Mara Graziani, HES
April 19, 2018 · 11:21 a.m.

Robust image feature extraction learning and object registration
Prof. Vincent Lepetit, TU Graz, Austria
April 24, 2015 · 11:03 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Dynamic texture analysis with deep learning on three orthogonal planes Dr. Vincent Andrearczyk, HES-SO

Embed

Transcriptions

Recommended talks

New challenges of large-scale Deep Learning: High Performance Computing for distributing computations Mara Graziani, HES April 19, 2018 · 11:21 a.m.

Robust image feature extraction learning and object registration Prof. Vincent Lepetit, TU Graz, Austria April 24, 2015 · 11:03 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Dynamic texture analysis with deep learning on three orthogonal planes
Dr. Vincent Andrearczyk, HES-SO

New challenges of large-scale Deep Learning: High Performance Computing for distributing computations
Mara Graziani, HES
April 19, 2018 · 11:21 a.m.

Robust image feature extraction learning and object registration
Prof. Vincent Lepetit, TU Graz, Austria
April 24, 2015 · 11:03 a.m.