Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:00
huge d. in dublin city university i'm now working with has of with funding uh and has a so
00:00:06
so looks so i'm gonna talk will be stripped to it every
00:00:09
first uh talk about natural vision and motion analysis to
00:00:15
make a link with the topic big our then uh introduce static and dynamic texture is uh i will
00:00:22
present a handcrafted so classy cat approaches for text analysis
00:00:27
and the planning a methods that have been developed recently and then i will uh introduce
00:00:32
classical crafted dynamic texture methods and finally keep
00:00:38
learning approaches for than any pictures
00:00:42
um so
00:00:44
ah i have a a very briefly talk about natural vision basics safer for the people would
00:00:50
have introduced it before but no so all i'm your own soon the brain uh
00:00:57
fire when visual studio p. within their research they're receptive field
00:01:02
uh the first uh selves infusion to receive a
00:01:08
information are the simple sense in the v. one hopes
00:01:14
in the view won the primary visual cortex uh and they i couldn't to
00:01:21
gabor filters used in text analysis and all the uh computer vision methods
00:01:28
then complex sense um receive input from different simple selves and
00:01:32
bead spatial invariance uh from fees more basic uh
00:01:39
sounds like the in in uh all the um contests in all the visual cortex
00:01:47
and watch acton complex neurons um detect more they you want to take
00:01:53
more abstract an invariant in complex patents in the visual field
00:02:00
so these are the basics of vision and know how
00:02:04
much an analysis uh is done in the brain
00:02:08
so much analysis is vital for survival and interaction
00:02:12
with environment before an immense insects of humans
00:02:16
uh in six outline to non moving ah scenes to more
00:02:20
no moving parts in the in the visual field
00:02:24
uh various various experiments have shown a p. d.
00:02:28
x. seven motion recognition of insects and animals
00:02:31
so for instance but of a it has been shown that
00:02:34
butterflies uh attracted to a fake butterflies made of paper
00:02:41
we got last uh the the shape all the carlisle off the wings but based on
00:02:47
the motion for the swing of a bee's have been shown to to uh
00:02:54
go to what's oscillating flowers twice as fast as non moving flowers
00:03:00
so there are a lot of experience like this also humans have been shown to be
00:03:04
able to recognise very complex movements just by placing a lights on the body
00:03:11
and so only looking at the lights uh not the the more than that though not the shape of the body some people can recognise
00:03:18
i'm convinced is all walking or running but also particular people so particular act those that are
00:03:24
doing this if they know the people so they're very good to the motion recognition
00:03:29
actually a top chart ease the the human inability to perceive motion sort of motion blindness
00:03:35
and it has been shown to be uh huh maybe due to listen
00:03:39
in cortical area every five in one of these confirm for um
00:03:46
sold them how the motion is analysing the brain it's they we have directives active sense
00:03:52
in the the actions affected sensing directly now behind the i've um
00:03:58
they respond to any stimulus selby that's lot of different
00:04:01
cells that respond to stimulus in a specific direction so each cell is specific for given direction
00:04:07
and there should be our
00:04:11
despite with the right out about once model thought a model for their actions selective sal is
00:04:20
that two points in the field of you so it's it's horizontal movement um the first
00:04:28
point uh at a given time is compared to another point in a given direction
00:04:35
with the delay time and then the the subtraction of the of the
00:04:41
of these points uh i will provide a positive output only get people on
00:04:47
your output and a positive output would mean that at this point
00:04:51
has moved to that point thank you so much and that is exactly the same
00:04:56
uh an interesting things that human can train them motion detection
00:05:00
sold training to recognise a given motion in a given
00:05:04
a direction people can improve the detection in this direction so it's
00:05:10
it's a lot of learning possible with this baby sat
00:05:16
no the different types of motion that we can classified three main types of motion one is localise in
00:05:20
space and time cell in a sequence of frames you how it some motion that that happens
00:05:28
locally in space and time so like at the door opening it's shortened
00:05:32
time and it's and localised in the in the field of you
00:05:37
and activities is described as the as localised in space so and periodic in time
00:05:44
so like a person walking or running gets continues in time
00:05:49
typically can time but it's localised in the space and that any texture that
00:05:53
i will mostly talk about is not localised in space and time
00:05:57
soul like in the video of grass or tree it is neither localised
00:06:04
in space more time so that's what i would focus on
00:06:11
what is first texture static texture it is an image few using computer vision um
00:06:18
based on the spatial relationships between intensity because it's so it's opposed to call q. would
00:06:23
that is based on the distribution of the pixel intensity is like he's never
00:06:28
in texture we're interested in the in the real special relationship between the seventh as it is
00:06:34
it is usually a categorised by spatial repetitive eighty so we have certain are pretty picky of patterns
00:06:42
we've some very eight variability of the patterns themselves and of the of the special
00:06:50
the relationship between these buttons and a lot often categorised by high tone violations or cross this line
00:06:56
was the image plays a height variation of books intensity so these are usually crackers
00:07:02
i'll fix to texture images
00:07:05
what is a dynamic takes shape is the extension of a static takes you to the special pampered main
00:07:12
uh so it's a sequence of frames
00:07:14
text of moving text to this and it is to capture temporal variation on top of the
00:07:21
the static texture so stylish proper valuation like motion and the formation of the textures
00:07:29
it is also carcass by but cartwright play spatially productivity static
00:07:33
picture but he's also stash and i property in time
00:07:36
so stationary property means that it's a stochastic processes with an change statistics across time
00:07:44
so to random process that has stuff that constant statistics across time
00:07:52
these are examples of dynamic text you so natural than any textures like small trees and ways
00:07:58
and on natural dynamics pictures like traffic and crowds and in medical imaging
00:08:04
uh can be from the trust sounds n. y. p. evolution of
00:08:10
uh often why across time can also provide some time and it takes you
00:08:18
um the properties of a done any texture range solves spatially as the static picture
00:08:25
ranges from right by regular to stochastic so is the very regular oh
00:08:32
regular patterns with the regular uh spatial arrangement of the buttons
00:08:38
and it ranges from these two beret to completely stochastic
00:08:42
a texture
00:08:44
temporarily it's the same it ranges from a rigid so with translation like
00:08:51
dick translation of the cost of rotation but it's me when you
00:08:57
and on the region like a diffusion of the small and can also be caught us by periodicity
00:09:04
so these are the large range of different types of the authentic pictures
00:09:10
uh that different types of analysis so it's most common specification
00:09:17
that's a vacation segmentation because of the synthesis of a dynamic
00:09:22
textures and shape and texture is just with static pictures
00:09:27
uh the application computer visions in computer vision a remote monitoring and surveillance off
00:09:34
five the forest fire a monitoring of the traffic on highways and or cities
00:09:42
we don't know the crowds and remote sensing from satellites analysis of the about all this
00:09:49
uh in medical imaging it's it can be used for segmentation
00:09:53
diagnostic problems deep predictive and analysis of different uh
00:09:58
applications and decreasing amount of video data makes this the the analysis of panic extra really important so
00:10:07
part smart phones surveillance data maybe condom robotics these lot of video
00:10:13
see the top that we can use for this
00:10:19
uh that's our challenges uh to static and then it takes channels is uh
00:10:25
first is the diversity and complexity of the picture of the textures
00:10:30
maybe it is there as you can be due to get position of the camera out of the point of view
00:10:37
but um and the number of different natural phenomena it
00:10:41
can it yeah it uses a high interest valuation
00:10:45
but also fine grained tasks all recognising that similar uh with with the difference
00:10:51
between classes can so if these also sometimes a little intricate valuation
00:10:56
other problems like spatial distortion of the texture uh has to be taken to counsel does it need for
00:11:03
not for important invariance to geometric transformation uh and
00:11:08
high obstruction uh in the analysis huh
00:11:14
so now i'll talk about the classical uh the
00:11:18
hand crafted a texture analysis
00:11:22
um so these generally of feature extraction told different inputs
00:11:27
can be different inputs for classification for segmentation
00:11:31
and it's generally feature extraction but to extract descriptors of the uh
00:11:37
of the texture so it's can be look and binary part and a filter banks
00:11:43
wave leds so those that has been a lot of work on on
00:11:48
handcrafted feature extraction than these uh these descriptors these features
00:11:54
can be put into a global descriptor for classification
00:11:58
they can they can be used as local descriptors for segmentation of texture regions
00:12:04
or for the problems but these um these
00:12:08
handcrafted approach lacks obstruction sometimes invariance
00:12:13
is it they are not been built and generalisation to unknown textures so
00:12:19
like in most computer vision problems learning zone significantly better generalisation and abstraction
00:12:29
so all uh uh depending for static picture what white to to use convolutional networks that
00:12:36
we have learned features instead of handcrafted features we have an excellent obstruction and generalisation
00:12:43
uh it's made of a cascade of feed the banks that is really what you could
00:12:47
feel the texture analysis so feed the banks have been use the button text analysis
00:12:52
but but handcrafted uh it is
00:12:56
the problem is that large and complex shapes emerged in the deepest layer so
00:13:01
we have very basic double features in a delay is but we have complex shapes are
00:13:07
really see huge faces of this is very this is well known in in returning
00:13:14
and so and these fee that the feature maps uh he uh uh the t.
00:13:19
p. g. maths maintain the real at the relative spatial arrangement of buttons so
00:13:26
so we maintain is the the the the layout of the image so now when
00:13:30
that phase has the guys at the given location and and the mouth somewhere
00:13:36
underneath and this is sort of information that we want to discard vortex channels is because this is
00:13:42
this is uh not infirmity for billions of picture it's not an object recognition problem it's a texture
00:13:48
so without the main two problems of uh of
00:13:52
class of normal convolutional networks for for texture
00:13:57
a good so was was say in the texture has repeated patterns and
00:14:02
we don't really care about the there's the layout of the image
00:14:06
on like a face on object which is a large object that
00:14:12
it's a it's a that he's the entire field of you uh and
00:14:18
it meetings that yeah it's sold objective is to discard the global shape and
00:14:23
layout analysis of the c. n. n.'s and for these uh the
00:14:30
yep wrote the usual approaches to use all that is putting so to cool a local descriptors into
00:14:37
global descriptors into a global descriptor regardless the collective
00:14:41
that the special location of the local descriptors
00:14:45
so uh and the the good thing was with another of putting is also that we can
00:14:50
we can then have arbitrary input sizes this is a consequence of nautilus putting
00:14:56
uh we we can have any input size to network uh because the feature might would you
00:15:02
be pulled into a single uh descriptors so we don't we we can having imprecise
00:15:10
uh this is an uh first uh method is in deep learning fall uh
00:15:16
for texture the analysis so it's the feature vector c. n. n.
00:15:21
um speeches are taken from the uh the deep feature maps so
00:15:28
all but the last conclusion layer before for the connection is
00:15:33
and their dictionary learning any encoding by feature
00:15:38
vector is performed on top of these
00:15:41
uh on the on top of these features so these features still have
00:15:45
spatial information and feature vector will up with feature vector will obtain
00:15:52
a global descriptor that discounts the this the spatial information and
00:15:57
and carcasses texture and then these uh these fish affect the script is we'll
00:16:03
be classified using this v. and so this was a first approach
00:16:08
which is not trained and and it's only the the this unit is only used for
00:16:14
for feature extraction and um it's not fine tuned on a given problem only
00:16:19
use a preacher network attractive features and perform a fish effect on people
00:16:26
um another approach that they developed was to to perform
00:16:31
an average putting on top of intermediate layers
00:16:34
well
00:16:38
um so the idea was to discard the global shape by an average putting up these
00:16:42
feature maps so this car removing convolutional laos and a average putting this like this
00:16:50
it uh allows a reduced complexity of beating environs and it's applicable to buy as soon and uh
00:16:57
architectures
00:16:59
this is another a static picture approach but i would be quickly it's it's um
00:17:04
it performs these new encoding layer that performs this dictionary learning and
00:17:09
to at for training and to and within the network
00:17:13
um so now we have four types of a classic dynamic texture approaches
00:17:20
so statistical local descriptors motion pays so extracting first the motion
00:17:25
and then the uh school did driving a statistics from from the the motion some motions
00:17:31
struck by the of the control model based and filtering or transform but i would just
00:17:37
keep them because we don't have enough time so abortion base model based filtering
00:17:44
um but now the planning for that any texture we can first thing
00:17:49
we can think of is to view the dynamic pictures of volume
00:17:54
and perform three d. c. enhance the problems that have the
00:17:59
is a heavy computation due to three by three by three all
00:18:02
all larger cannons and also larger my conception because of
00:18:07
a three d. feature maps for that ah i developed a um
00:18:13
let's see in and those three c. announce applied on three
00:18:17
planes one is spatial blue one and two temple planes
00:18:22
so these are slices of the texture and so we applied three
00:18:26
text to c. n. n. on so one on each plane
00:18:29
and we combine the uh the the scores their predictions course
00:18:34
off each a network for collective classification of um
00:18:41
the the given with the the available data sets consistently high
00:18:44
accuracy was obtained ninety percent on one of them
00:18:48
uh one thing that we realise that was that single plane so you training on only one plane
00:18:55
or stopping very high accuracy so all that was a good transfer from
00:19:00
a network trained on internet to some time for the temporal
00:19:06
slices like this so there was still a very good transfer of a learned features
00:19:12
and the but the best results well well thing the three planes which will give compare mentality of spatial and temporal analysis
00:19:20
these are some examples of the specification and also last data sets benefits
00:19:25
more from the steep learning approach then shut and small ones
00:19:29
and a deeper than a diplomatic architecture only slightly outperforms a a social
00:19:34
one like alex that's all it shows that the results saturated
00:19:38
uh separated for the the available data set so we need more challenging and more and larger than any texture
00:19:45
data sets to be able to improve more these approaches and evaluate the pruning methods
00:19:53
so the conclusion was that the texture of the important thing is another let's
00:19:57
putting off the deep feature moving the layout information and the global shape
00:20:03
then it takes to combine spatial and temporal analysis on the three orthogonal planes um
00:20:11
because these depending that adopting the state of the art on most of the of the type of the data set
00:20:18
but these data sets are uh that the results are saturated and we need larger and more complex it's it's

Share this talk: 


Recommended talks

Robust image feature extraction learning and object registration
Prof. Vincent Lepetit, TU Graz, Austria
April 24, 2015 · 11:03 a.m.
242 views