Player is loading...

Embed

Embed code

Transcriptions

Note: this content has been automatically generated.
00:00:02
okay i'm good morning everyone um and soon as and uh
00:00:06
i'll be presenting some work on uh oh the recently maybe what
00:00:11
i've been doing on physical bins and the idea for painting completed instruments
00:00:17
so um i don't think this audience really needs any motivation for why
00:00:21
interpreted it is important but for the purposes of uh oh
00:00:25
image classification uh we know that the neural network of computes on and uh image
00:00:31
of letter vector of size and cooper is the scale at uh so we would like
00:00:36
to have some information about what is the competition that goes on inside the poodle network
00:00:41
i forgot i but the decision of compressing an n. dimensional vector to ask enough
00:00:46
so this is important for our purposes of debugging uh let's say we want to understand the biases the
00:00:52
marker to see whether it has line of some undesirable
00:00:56
biases that code of be a impact full during deployment
00:01:01
um and then also a legal requirements that we must satisfy uh
00:01:09
as machine learning uh for machining algorithms uh which makes uh
00:01:13
uh interpret ability or the ability to explain output a very cushion
00:01:19
okay so uh in the stock we will look at
00:01:23
a zillion c. maps as one approach for uh interpret ability
00:01:28
uh so in this context does latency map is a map of the same dimension as the input
00:01:33
in this example will or use the example of famous classification uh
00:01:37
andy each coordinate in the c. in c. map represents the importance
00:01:41
off the corresponding pixel in the image has placed by the neural network
00:01:46
so uh in this example there's a picture of a needle and then you let look
00:01:49
it thinks that the head of the eagle is a most informative to making that decision
00:01:55
uh now mozilla seem uh either them to operate in this manner so they take a an image and
00:02:00
then they take the neural network and then they try to convert the um apple explaining the importance is
00:02:05
but then uh we would like to understand exactly how the ceilings algorithm works you try to avoid being
00:02:11
a situation where we explain one black box is the neural network with other black box widget scenes algorithm
00:02:17
so we would like some a vase so that we could trust this algorithm better
00:02:22
uh and uh we would like this are going to be more transparent and innocence
00:02:27
uh so one way to uh ensure transparency uh off the sailors algorithm itself
00:02:33
is the art to evaluate heavily so in this example a record the
00:02:38
uh try to evaluate for this tendency map indeed captures the importance of for
00:02:43
um importance placed by the neural network by falling procedures so in this example let's say they
00:02:48
had a beagle is most important we could try to block we had a beagle to some procedure
00:02:53
and then check where the the neural network uh actually face recognise that's needed
00:02:59
but then this procedure is likely suspect because now when we
00:03:03
try to blog had a legal we introduce some additional artifacts
00:03:06
and uh that can have effects of its own um uh well we we try to uh evaluate
00:03:13
counter factions like this for example what would happen if the ahead of the that was not visible
00:03:18
uh be uh inadvertently try to uh we know inordinately bring additional low
00:03:23
factors into play and it's not really a a trivial
00:03:27
to understand uh how can we uh evaluates astound affections easily
00:03:31
so of toward this many works in literature take the alternate approach which is uh ought to
00:03:36
just say that okay here's my c. n. c. map and it satisfies all these desirable properties
00:03:41
and uh you can be guaranteed that if you look at the ceiling to map my my algorithm it will have these properties and
00:03:47
the up which is why it is a more transparent so this is similar to the approach that we will take in the stock
00:03:54
so uh to understand more read the literature of assyrian see
00:03:59
uh algorithm stand straight uh uh we can look at somebody's algorithm so uh presently gesture
00:04:05
uh in particular we look at just three of them pray and try to understand how uh the the sort of make the straight off
00:04:13
so the first algorithm is the uh just based on taking
00:04:16
the gradients of uh of uh the neural network so they
00:04:20
say after the neural network and a question and x. is the image that that we would like an explanation for uh
00:04:27
uh the input didn't matter just a asked us to hop in the data into the output with respect to
00:04:32
be in park which is the map or the same size as the uh input itself and the uh this
00:04:40
is that one day just because it has a very clear connection to the new letter
00:04:43
function uh if you go back to our example of for blocking or parts of the image
00:04:48
uh in this example at the head of the eagle uh has more importance
00:04:52
which precisely means that if we portal pixels near the head of the deal
00:04:56
it will uh affect the output of the neural network the most uh this uh is a natural consequence of the fact that is a bit of
00:05:03
uh but the problem here is that a sail into maps all painting put variance can sometimes
00:05:08
be noisy and uh uh an interpreter bill and uh this is a less desirable impact this
00:05:14
uh so we'd like to move the approaches which uh i really do not have this nice
00:05:21
so of another approach which uh people
00:05:26
uh have proposing literature is correlated back propagation the idea
00:05:29
here is uh to simply take to remove the noise
00:05:32
that is interviews in the process of a computing the input routines so uh
00:05:38
uh specifically for the case of really neural networks let's say we have inactivation function
00:05:43
a given by a massive zero come up in port and the gradients
00:05:47
would look like uh the question given below uh where the the uh
00:05:52
by accident and ride i the extra pounds introduced by a guided back
00:05:56
propagation so here we see that they do not use exactly the gradients
00:06:00
uh as mandated by chain rule by the modify the
00:06:02
gradients uh so that uh it suppresses some additional artifacts
00:06:06
uh that that possibly introduce by uh the input areas themselves
00:06:11
uh and this need prisons so some cleaner c. n. c. maps with the problem here is that this
00:06:16
is not have any key a connection to the uh and the line function that in a little computes
00:06:21
uh indeed uh one decent work is also shown that get it
00:06:24
back prop actually a dolphin specially magically under very strange because we
00:06:29
want i've seen in c. uh maps to i tell something or the model itself and not the uh cover the input for us
00:06:36
ah but on the other hand assailants manson more clean and good though possibly me more interpret that will
00:06:42
but it's not really clear what they mean if they have no connection to the lip of function itself
00:06:48
okay so the tight and the final method in literature that are going to discuss is the that can
00:06:54
now that i am also falls in this category of uh
00:06:57
uh you to stick methods uh that will prevail impact is
00:07:01
uh and the idea behind black m. is very simple so let's say we had using upon which neural network
00:07:07
and uh we would like to visualise the uh intermediate feature maps of this condition your network
00:07:13
um what the feature sizes uh and dimensions and we can't really visualise it
00:07:18
um as it is so we would like to compress the feature feature map
00:07:22
uh into single uh muscular feature map and then we like that and
00:07:26
the method of compressing 'em multidimensional feature map or singles killer feature map
00:07:32
uh is done by taking into account the importance of each particular channel in the feature map
00:07:37
underpinning the importance is is done by gradients so it's uh the procedure
00:07:42
is a sort of like v. upon um a calculating the intermediate feature maps
00:07:49
off the neural network we up in the data is from the output to that intermediate feature map
00:07:54
uh do some fancy knowledgeable and then i'm hoping that'll do for some reason and then uh up in the silence him up
00:08:01
so yeah there's several questions record like why do we use the value and
00:08:05
the which clearly use yet and why use this uh particular form of everything
00:08:09
um but of this procedure has found to be quite
00:08:13
useful interact design does indeed produce way interpreted resiliency maps
00:08:18
but again uh we can ask several questions about specific choices made
00:08:21
in the paper uh oh well but and also the fact that
00:08:25
they apply specifically to convolution neural networks um but the fact that
00:08:28
they work so well uh justify the use impact is in some sense
00:08:33
okay so having looked at some of these methods in literature
00:08:36
let's uh want to a limited that we're proposing just logical bins
00:08:41
uh so in all the metadata discussed so far that we were always
00:08:45
wondering whether a silly snapping beep captures all the boat about computation that
00:08:50
is performed by the neural network decided the gradient has some uh additional
00:08:54
noises and then the data back prop try to remove it and so on
00:08:58
but then is that a method through which we can actually say for sure
00:09:02
that the scenes map actually captures all the compilation then by the neural network
00:09:06
so one guideline through which record sort of achieve this is by property called completeness
00:09:12
um and it's been proposed by various works in literature and the basic idea here is that as you can see map mostly medical
00:09:19
care this point without without the neural network yeah exactly medical sense so for instance let's say we have a c. s. u. map yet
00:09:26
uh if we simply add up the numbers in the c. in c. map will be
00:09:30
in a scale and that's good i must be exactly equal to the new little functional work
00:09:35
and uh uh does that i didn't which is used by some works in a alliteration
00:09:39
uh but uh uh the actually this really is i mean so for example some works
00:09:44
actually this by adding custom back propagation rules which already saw it may not be a very good idea
00:09:49
and uh i does achieve a a complete mess by uh with the baseline so you're required to present
00:09:55
a another image and you get complete mess server the spectral baseline and not absolute completeness in some sense
00:10:02
uh okay so this is a guideline out a low property which you would like to satisfy
00:10:08
and uh let's see how to do that now so let's say we haven't really neural network f.
00:10:13
uh it turns out that the fall any question holes locally and start
00:10:17
after the show this so let's say i have a a function f.
00:10:22
of ex uh we can decompose this scale i input to dance
00:10:26
one don't contains the input sensitivity the other item contains a the
00:10:31
sensitivity of the new dance so the importance of duties imputed gradients
00:10:35
and uh then you don't sensitivity is the gradient of the output with respect to let's
00:10:40
say each one you don't in the neural network so he'll be the first to the biceps
00:10:46
and uh biases and i just the bias parameters that we train a neural networks
00:10:49
but also implicit places such as the bash normalisation uh running averages and so on
00:10:55
so this completely accounts for it uh then you liked about work uh so it means that
00:11:02
if we are somehow able to visualise the input that sensitivities and then you don't
00:11:07
sensitivities then you would have sort of a complete explanation for uh the neural network approach
00:11:12
and some preliminary work so far more because you can check this uh uh people
00:11:19
uh one thing to note is that the resulting of market is no
00:11:23
longer resiliency map works more expensive as it is you know because of
00:11:29
the the input begins and the uh get into that aspect to all the new utterance
00:11:32
uh attending more information than a cannot be a compressed
00:11:36
into something which is as large as the image itself
00:11:40
okay
00:11:43
so what the specific is often lose neural networks it turns out
00:11:47
that the the radiance with respect printed on scans visualise as feature maps
00:11:52
um so in this example of reuse the v. g. the sixteen network
00:11:58
uh it turns out that the output of the network and exactly better than as the sum of a few times
00:12:04
and the first on the input recording times input as but the uh for the good integration
00:12:10
and either tom status point to the the radiance with respect
00:12:13
willing to make it better me purse multiplied by suitable by system
00:12:17
and the uh in this example the first them here the salience to map aggregated across layer one and so on
00:12:24
um and this sort of figure tells us that if we exactly add up all the numbers in the salience you map
00:12:31
uh it will precisely correspond to the functional workmen exactly medical sense now okay but the
00:12:37
problem is that we'll paint and say yes in absence of one and it's not really useful
00:12:43
oh it's not really a a is not really useful interpreted really mattered so one way
00:12:48
no to do this is to actually add to get all these maps into single map
00:12:52
and we could do that by just into polluting the maps uh uh in
00:12:56
the bicycle been park services that they correspond to the same size as the image
00:13:00
a notice that doesn't uh matter smaller because there are next pulling inside the
00:13:04
network which should it uses the spatial size of the activation and so on
00:13:09
okay so let's look at a few full jekyll coincidence the maps so
00:13:13
for um uh the major thing eagle and uh that of an umbrella
00:13:18
so uh we visualise the full jekyll been and just the bicycle going
00:13:21
by and compare it with the baseline on track i'm i'm a it
00:13:27
and immediately we see that the the bicycle bins for instance and even the full jekyll bins
00:13:32
uh they are sort of what it really similar to black and white they're also more details so for instance uh we see here that
00:13:39
uh the head of the girl seems to be more important for classification than
00:13:42
the things but that can does not seem to distinguish between uh these two aspects
00:13:46
a similarly yeah in in the major below uh it seems that
00:13:50
the person walking on the street uh also all get some attribution
00:13:54
and possibly could be distracted but uh that can actually ignores this uh uh aspect okay
00:14:03
so uh all this is good but we would also like to uh so see
00:14:08
where the other part was more actually captures the competition then by the neural network
00:14:12
um so one way to sort of enjoy this uh that was proposed
00:14:16
uh recently by this paper cores sanity checks for c. and c. maps
00:14:20
and uh the idea is very simple uh so if i uh if we just take a model that we've plane
00:14:26
and uh we just add noise to one of the lesser and my son of the list does that now the model becomes completely random
00:14:32
we would like the c. in c. maps all sort of like the fact that the model is a now useless
00:14:37
uh but surprisingly um many some methods in literature at least not
00:14:41
that size property so what instance uh if i take that guided backdrop
00:14:46
uh the example above and uh i so this or
00:14:51
the first images of pain my more like performs belt
00:14:54
and then for the second image i added noise donnelly intimately as such that now
00:14:58
the model performs almost at random question is either the guided by providing versions interchangeable
00:15:04
so this means that the sail into maps don't really captured uh the competition and by the model in a meaningful sense
00:15:10
a similar is true for input gradients uh but there's no you dread already qualitative level
00:15:15
uh if we actually go and compare the numbers it and so that input into do change
00:15:20
but we surely would not see a a much of a change in in this aspect
00:15:24
but fortunately the map that we introduce and backgammon produces do
00:15:28
not suffer from this property if bill a model performs at random
00:15:32
um and then uh uh the attribution maps also changed to reflect this fact
00:15:37
so this is a good say a sanity check that must be done for an easy in the map
00:15:41
to verify that it's indeed capturing aspects of the model and not recovering aspects of the input itself
00:15:48
okay so just to conclude oh oh we have a a propose the project will be
00:15:52
in the composition of the function and uh this more diversity list map representation foreseen ends
00:15:58
and uh we stacy the fact that completeness is not
00:16:01
an sufficient property just a guideline for painting but resiliency maps
00:16:06
and a future work will focus on identifying a more precise requirements for the c. and c. maps

Share this talk: 


Conference Program

Methods for Rule and Knowledge Extraction from Deep Neural Networks
Keynote speech: Prof. Pena Carlos Andrés, HEIG-VD
3 May 2019 · 9:10 a.m.
349 views
Q&A - Keynote speech: Prof. Pena Carlos Andrés
Keynote speech: Prof. Pena Carlos Andrés, HEIG-VD
3 May 2019 · 10:08 a.m.
Visualizing and understanding raw speech modeling with convolutional neural networks
Hannah Muckenhirn, Idiap Research Institute
3 May 2019 · 10:15 a.m.
Q&A - Hannah Muckenhirn
Hannah Muckenhirn, Idiap Research Institute
3 May 2019 · 10:28 a.m.
Concept Measures to Explain Deep Learning Predictions in Medical Imaging
Mara Graziani, HES-SO Valais-Wallis
3 May 2019 · 10:32 a.m.
What do neural network saliency maps encode?
Suraj Srinivas, Idiap Research Institute
3 May 2019 · 10:53 a.m.
Transparency of rotation-equivariant CNNs via local geometric priors
Dr Vincent Andrearczyk, HES-SO Valais-Wallis
3 May 2019 · 11:30 a.m.
Q&A - Dr Vincent Andrearczyk
Dr Vincent Andrearczyk, HES-SO Valais-Wallis
3 May 2019 · 11:48 a.m.
Interpretable models of robot motion learned from few demonstrations
Dr Sylvain Calinon, Idiap Research Institute
3 May 2019 · 11:50 a.m.
Q&A - Sylvain Calinon
Dr Sylvain Calinon, Idiap Research Institute
3 May 2019 · 12:06 p.m.
The HyperBagGraph DataEdron: An Enriched Browsing Experience of Scientific Publication Databa
Xavier Ouvrard, University of Geneva / CERN
3 May 2019 · 12:08 p.m.
Improving robustness to build more interpretable classifiers
Seyed Moosavi, Signal Processing Laboratory 4 (LTS4), EPFL
3 May 2019 · 12:21 p.m.
Q&A - Seyed Moosavi
Seyed Moosavi, Signal Processing Laboratory 4 (LTS4), EPFL
3 May 2019 · 12:34 p.m.