Player is loading...

Embed

Embed code

Transcriptions

Note: this content has been automatically generated.
00:00:00
um hi everybody and uh so i'm gonna talk about
00:00:04
comes the measures to explain the pairing productions and medical imaging
00:00:09
um but first let me start by saying that interpret ability
00:00:13
does not necessarily coincide with information so i like a lot
00:00:17
this example uh from this paper or a them siding that
00:00:21
the bottom i'm showing the hex them a picture of a lion
00:00:25
and the picture of why itself was really cool to see lines before during the presentation and
00:00:30
you can see that um one format as
00:00:33
much more information than the other still we need
00:00:38
the human readable format to be able to interpret the picture and
00:00:42
eventually to process them in time with our human brain civil liked so
00:00:49
in my vision of incompatibility um of pasta people earning interpret ability i like to think
00:00:56
of it as the ability to explain how to present in understandable terms to a human
00:01:02
so we can take it out of it in this way we can think that
00:01:05
the more the representations base has is a
00:01:08
pictorial space whereas input pixels or acted nations
00:01:13
and the human the presentation space worse instead of in
00:01:16
high level concepts and could be represented by another victoria space
00:01:20
so what we actually trying to solve here is a mapping is the question of finding a mapping between one
00:01:27
space very first the nation them all the presentation space
00:01:31
to the other space that is estimation humour presentation space
00:01:34
and this mapping can be just function and it could it's essentially the exhibit ability task
00:01:40
which can be solved post doc by existing model so it can be solved after training
00:01:46
our network by a different model and if we think of it the it is the
00:01:50
easiest baseline we could use linear model could represent to the nearing to put the beauty baseline
00:01:58
and also just let me to say that when we try to explain
00:02:02
oh to present in understandable terms with human we should also consider that you not all humans may be
00:02:06
familiar with machine learning so we want to really use
00:02:09
something that is understandable to a wider public i now
00:02:16
if you just google what a lion is you can see that conceptual
00:02:20
descriptions are used to identify object categories for example and we keep media
00:02:26
a line is described as a muscle are keeps ask out with short brown that had reduced neck
00:02:32
and round years here it's after the hand of the tape to this is the description if you
00:02:36
have in mind are ready because of the fact that you can easily transition from one to the other
00:02:43
and well now i'm also wanted to point out it if you
00:02:47
look at this data set where people are asked to quickly drop
00:02:52
uh what he's a class and object category for them you would see
00:02:56
that some of these concepts actually are reflected in the drawings by the people
00:03:01
i mean i can see that most of the alliance your have rounded had
00:03:06
us round years there are some very tough here and there but not many
00:03:12
and this is sort of what was the idea when he
00:03:14
was meant attributes uh back in two thousand and ten worst
00:03:19
where used to describe features in image net and in
00:03:24
this in this approach concepts were used in a binary
00:03:28
way so you can just think think of a concept as if this present or not present in the match
00:03:34
for example for this image of the lion we have a bounding
00:03:37
box representing where the concept is located and you can see that over
00:03:43
twenty five samples uh that was the people whereas lining up last one
00:03:49
um if the concept was present in the minus one if the concept was not present any
00:03:54
still when sure they were seen zero soul for all the twenty five images of a lion
00:04:00
the concept round was always present you can see that this plus twenty five on the counting
00:04:06
instead for other concepts like black or blue not there was the concept was not present
00:04:13
so it has minus twenty five and then they're a concept like rough
00:04:18
where people all them when sure so it has zero counting for concept
00:04:23
but how about for example what we read before so that the line
00:04:27
is the cat would teach asked rounds had small neck and on yours
00:04:31
you see here that is not only about the elements that form
00:04:34
the image but is also about their size or their contribution somehow
00:04:40
and this is the idea behind my my our approach
00:04:44
uh the ease of learning continues concept measures um
00:04:48
and to use them to explain the learning predictions
00:04:51
now traditionally concept learning is all those banner justification task in
00:04:55
this we have seen it is presence or absence of the concept
00:04:59
our our position is the action you can use regression to model the transition from small to
00:05:05
large concept to for example i i like this image because you have to consider the cat
00:05:12
and you know all that from a cat you can become ally in by
00:05:16
increasing the chest size then of course there are other elements as well that contributes but this is one of them
00:05:22
and so this classification or regression can be solved also on
00:05:28
the features of indian and and there was previous work by um
00:05:33
group in google brain and then we extended the work with
00:05:36
the idea of repression and i'm pointing the the papers there
00:05:41
so the idea of concert attribution to explain the parenting as that's golly this we give
00:05:48
we we give the function decision function could be a deepening all metric we have
00:05:53
a set of concepts let's say q. concepts that can be either binary of continuous
00:05:59
and then we had access to the internet ah actually patients
00:06:02
before layer few facts uh for for the terminal input x.
00:06:07
now the first time the first thing we have to do as
00:06:10
we should find vectors which are representative of the concepts so all the
00:06:17
we can find inspectors uh i will explain just later how we do this but
00:06:21
if if we're using binary concepts we will have to sort of a binary classification task
00:06:27
on the activation space on the fee halifax for set to uh training images
00:06:33
and we will have to do so very question task instead of four continues budget concepts
00:06:40
the idea of concert that tradition is basically to find at the end of the story
00:06:45
and that tradition vector that given the activation sit
00:06:49
at a layer and the factors representative of the concepts
00:06:54
it gives us a director of a one two eight you this number of concepts weight
00:07:00
each a. i. is basically the contribution of
00:07:03
each concept to the classification prediction so um
00:07:08
to put it back in the for example with example of a
00:07:10
lion if we have that one concept is the best of the chest
00:07:17
the attribution a one would be how much that after the chest influences the production
00:07:25
and here there's more detail about the idea of concept attribution we
00:07:29
progression concert back tears the bayes align as that continues valid concern
00:07:34
measures in your quest in the activation space of a layer so
00:07:37
basically we have would build a data set of training and that's
00:07:42
that are that contain one object of interest and this
00:07:47
object could be segment it may either manually or automatically
00:07:51
and from this segmentation we can extract us yourself handcrafted features
00:07:56
like i dunno text to descriptor or the status of the shape
00:08:00
of the of the object in the match like uh the eccentricity
00:08:03
of the l. a. ups or a tour laurie out the perimeter
00:08:08
and even like whatever kind of a feature of interest you
00:08:12
have whatever concept that is measurable that your interest in the image
00:08:17
and then we can pass this image is we can feed them into a radiating
00:08:21
metric and we can extract the activation serve one day year for that even image
00:08:27
and if we do this for for all the set of strange images then we're able to solve regression problem
00:08:33
that allows us to find a direction that is p. c. that goes
00:08:36
from small bodies of the concert measure too large values of the concert measure
00:08:43
now i bought candies be useful to one of the things that we looking
00:08:48
into when i'm presenting just pretty preliminary results here as we could use the um
00:08:54
how good we can depress the concept in the layers to see
00:08:58
if early layers in the network for example focus and simple concepts
00:09:02
the the idea is that simple concepts should be learnt
00:09:05
with less training at bucks at a very early layers
00:09:10
and so here i'm presenting for example um
00:09:14
and m. multilayer perception of two hidden layers of
00:09:18
four thousand nine to six units that has been
00:09:21
trained on nest handwritten digits and i'm reporting the
00:09:27
uh r. square so then quality the how performance the fans of their question um over training
00:09:34
for a network that has been trained on a regional basis that and another network that has been trained with us
00:09:41
label corruption of forty percent so with label corruption we basically take the
00:09:45
original labels and we shuffle them we take forty percent of them we shuffle
00:09:50
we do this to see if this is generally don and
00:09:54
research to see if the network is memorising the label correspondence
00:09:59
and they want to see if any patterns are break
00:10:01
in a broken by these memories ish memories nation approach
00:10:07
and what you can see is that some concepts for example the
00:10:10
concert area works better than the concert eccentricity and it's learned earlier in
00:10:18
eight in earlier networks and it only layers and be
00:10:22
in a end with less with the with the lower number of epochs
00:10:27
where for eccentricity the training curve saturated bit later
00:10:33
so this is just what what we're trying to find out
00:10:37
and what we could use uh the question also vectors for
00:10:42
another question we could try to solve as what happens
00:10:46
again when we memorised the label correspondence and we we're
00:10:51
for example we can see that constant learning does not seem to be affected by label noise at early layers
00:10:57
still here i to come up with a perception that multi
00:11:01
your perception before i i did more layers up to six
00:11:05
and i increased the um amount of a
00:11:10
label corruption in experiments from zero to one
00:11:15
and you can see how the freshly issue for example the
00:11:18
one the blue layer for both concepts remain sort of stable
00:11:23
wait even despite the noise i mean despite the noise it remains
00:11:26
stable it seems not to be too much affected by the corruption
00:11:32
but we can also try to measure developments of the
00:11:35
concert measure of the classification so for example we can
00:11:40
computer relevant to feast concept um to the decision function
00:11:45
by um computing the a directional derivative of the network output on the
00:11:51
direction representing the concept so we can take our input image x. i
00:11:56
uh we can take f. f. x. i. and we can
00:11:59
compute the remit of of the decision function in the activation space
00:12:04
and projected on the direction that is representative of the concept
00:12:09
and what is this telling us is basically how much did
00:12:13
they re various changes when we increase when we're ball moving
00:12:17
our input towards the direction of increase of the constant measure
00:12:22
um if we do this for each of the testing and that's of our data
00:12:27
so that we can then we we obtain a serious uh of a sensitivity scores
00:12:33
and we can accumulate all of them by and by in a in a global explanation
00:12:40
of all the inputs of one class and uh these are called by directional relevance scores
00:12:47
and we take into account the regression the termination coefficient and we provide basic we multiplied
00:12:53
this value basket before him enough for the individual sensitivities and we divide it by the
00:12:59
standard deviation of being the the sensitivities so we know that these values large if the regression was really good
00:13:06
and if the individual sensitivity score sewing needle
00:13:10
bodies of the degree that is consistent among samples
00:13:14
i would use this for example to evaluate the relevance of nuclear commercialism and presses the pathology
00:13:20
we took a data sets that was representative of a series
00:13:25
of concepts which we had for example manually annotated nothing like
00:13:29
and the concerts we looked into all where a nuclei morphology us uh
00:13:34
so for example the size of the nuclei the shape or the texture
00:13:39
um you can see that we could have smaller login okay i ordered later
00:13:43
more less eccentric all where the texture is more or less this image in yes
00:13:49
and from that we we extracted these measures of the concept in terms of correlation
00:13:54
a hydraulic features eccentricity area pixel contrast
00:13:59
and we a in in this speech she hear what i'm showing is how the concepts are there in it in the
00:14:06
in the the layers of a network that is just trying to classify uh this just holding a
00:14:11
binary justification task from uh to distinguish platt patches of two more from patches of known to more
00:14:18
and with binary bi directional relevance course we can evaluate the concept relevance in terms
00:14:25
of the directional related uh of each of the single concepts for the whole set
00:14:31
of testing inputs and they can see that for example contrast as relevant is positively
00:14:37
relevant while correlations negatively relevancy have correlation
00:14:40
minus one encounters this almost a zero three
00:14:45
so what this means is that correlation negatively impacts the decision function meaning that if
00:14:52
you have a patch and you increase the correlation of the pixels inside the area
00:14:57
ten year patch is more likely to be classified as non two more
00:15:02
we had a few enhance the contrast between the pixels your patches more likely to
00:15:07
be classified as two more this is what our that's kelly concept relevance is telling us
00:15:13
and then we need to address to another date is that this is a work that um
00:15:19
i did in collaboration with a meeting isn't in northeastern
00:15:22
university and the case yeah institute in the us and we
00:15:26
bask really wanted to evaluate the relevance of features of
00:15:30
blessedness in a database everything up at your remote to really
00:15:34
um the first step was really to interactive decisions wanna
00:15:39
stand which concepts was where the most important in put
00:15:43
to them to to understand to interpret the networks
00:15:46
and we ended up with a six concepts of um
00:15:51
that's a curvature to us was c. d. and dilation which are
00:15:54
pretty important in pretty irrelevant in cases of pressing up different to eighty
00:15:59
and then we applied progression constant vectors at six different points and the layers of an
00:16:05
inception be one which was trained to classify
00:16:08
three different classes three different stages of the disease
00:16:13
and we used our that bi directional scores to evaluate
00:16:17
the global an individual concept out that's so basically we can
00:16:21
evaluate the relevance of a concept at the individual data point level by just
00:16:26
looking at the values of the t. v. video for the single data point
00:16:30
and then we can also if needed we can also some or all of these values with bi
00:16:34
directional scores um on the whole data sets for example for class so all the samples of one class
00:16:42
and yes we selected basically six concepts out of a pool
00:16:47
of one hundred forty three handcrafted beecher features so this was
00:16:52
relevant for the doctors to find also a ranking between the features
00:16:56
and an understanding of these features and the impact on the network
00:17:02
and other things that i think that we actually looking into as
00:17:05
more technical details about how we
00:17:09
use the convolutional feature maps um
00:17:13
to learn the regression so when you extract exhibitions of one they are you generally have some feature maps
00:17:21
that are two dimensional so they you have and two dimensional feature max and
00:17:27
to solve their question the most simple thing is to just on roll this maps
00:17:32
that there is a problem of dependencies of neighbouring pixel so for example if these
00:17:37
maps confirm any match you'll have that big so's in the neighbour in the neighbourhood
00:17:42
ten generator have similar values they are correlated want to the other
00:17:46
and when you don droll this this the map
00:17:50
that's billy this this to the structure is broken so
00:17:56
one maybe i did we had is to apply special full that special pulling a it
00:18:00
average pulling what max bowling to the feature of the networks and these improves vastly this
00:18:07
is a solution to this problem of dependencies neighbouring pixels and
00:18:11
also it improves the problem of high dimensionality of your space because
00:18:15
if you own drawl a teacher maps can easily go to a million of uh um of the mentions
00:18:22
and so we are presenting at where enough the meeting a journal paper with
00:18:27
a improved results uh by given by max blowing the features uh uh for me
00:18:34
for both the applications of on is the pathology and personal but you've prematurely
00:18:42
and the last thing that i am going to just introduced you as the
00:18:47
possibility of uh expanding their research under
00:18:50
question approach for example by using rubberised progression
00:18:55
so all i'm as is that one of the things says that you
00:18:59
have very high dimensions and the colour eyes regression is one of the um
00:19:04
approaches to reduce the importance of feature of features and to avoid that
00:19:08
you over feature data soul one of the ideas is to use reach regression
00:19:15
and um we did this basically two on features of the best relations for eternity gratuity
00:19:21
and his the pathology and what i'm showing here in this table is the basically the
00:19:27
the rubberised regression improves teacher is you have your uh depression constant vector
00:19:32
so you're able to model uh your concept indian readings in a better way
00:19:38
um and also it it is interesting to see how the um if
00:19:43
you don't use pulling you need to large of a lot a stronger
00:19:47
a regular relation whereas if she's pulling the regular edition
00:19:51
which you need is less longer so these actually goes
00:19:55
with the idea of high dimensional space and i'm introducing to the machine idea space

Share this talk: 


Conference Program

Methods for Rule and Knowledge Extraction from Deep Neural Networks
Keynote speech: Prof. Pena Carlos Andrés, HEIG-VD
3 May 2019 · 9:10 a.m.
349 views
Q&A - Keynote speech: Prof. Pena Carlos Andrés
Keynote speech: Prof. Pena Carlos Andrés, HEIG-VD
3 May 2019 · 10:08 a.m.
Visualizing and understanding raw speech modeling with convolutional neural networks
Hannah Muckenhirn, Idiap Research Institute
3 May 2019 · 10:15 a.m.
Q&A - Hannah Muckenhirn
Hannah Muckenhirn, Idiap Research Institute
3 May 2019 · 10:28 a.m.
Concept Measures to Explain Deep Learning Predictions in Medical Imaging
Mara Graziani, HES-SO Valais-Wallis
3 May 2019 · 10:32 a.m.
What do neural network saliency maps encode?
Suraj Srinivas, Idiap Research Institute
3 May 2019 · 10:53 a.m.
Transparency of rotation-equivariant CNNs via local geometric priors
Dr Vincent Andrearczyk, HES-SO Valais-Wallis
3 May 2019 · 11:30 a.m.
Q&A - Dr Vincent Andrearczyk
Dr Vincent Andrearczyk, HES-SO Valais-Wallis
3 May 2019 · 11:48 a.m.
Interpretable models of robot motion learned from few demonstrations
Dr Sylvain Calinon, Idiap Research Institute
3 May 2019 · 11:50 a.m.
Q&A - Sylvain Calinon
Dr Sylvain Calinon, Idiap Research Institute
3 May 2019 · 12:06 p.m.
The HyperBagGraph DataEdron: An Enriched Browsing Experience of Scientific Publication Databa
Xavier Ouvrard, University of Geneva / CERN
3 May 2019 · 12:08 p.m.
Improving robustness to build more interpretable classifiers
Seyed Moosavi, Signal Processing Laboratory 4 (LTS4), EPFL
3 May 2019 · 12:21 p.m.
Q&A - Seyed Moosavi
Seyed Moosavi, Signal Processing Laboratory 4 (LTS4), EPFL
3 May 2019 · 12:34 p.m.