Player is loading...

Methods for Rule and Knowledge Extraction from Deep Neural Networks
Keynote speech: Prof. Pena Carlos Andrés, HEIG-VD

Friday, May 3, 2019 · 9:10 a.m. · 57m 55s

Abstract: Artificial deep neural networks are a powerful tool, able to extract information from large datasets and, using this acquired knowledge, make accurate predictions on previously unseen data. As a result, they are being applied in a wide variety of domains ranging from genomics to autonomous driving, from speech recognition to gaming. Many areas, where neural network-based solutions can be applied, require a validation, or at least some explanation, of how the system makes its decisions. This is especially true in the medical domain where such decisions can contribute to the survival or death of a patient. Unfortunately, the very large number of parameters required by deep neural networks is extremely challenging to cope with for explanation methods, and these networks remain for the most part black boxes. This demonstrates the real need for accurate explanation methods able to scale with this large quantity of parameters and to provide useful information to a potential user. Our research aims at providing tools and methods to improve the interpretability of deep neural networks.

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.

00:00:00

the morning everyone and uh thank you for the invitation i'll for attending today

00:00:10

well i that's not to say it

00:00:15

00:00:17

this is a domain that that is not new and i'm going to show you that and um

00:00:26

it's becoming again of interest and uh as a for many older issues in

00:00:34

a machine learning which were dormant for some years in the winter okay okay i'm

00:00:41

and uh and now we are and it was winter era principle or hopefully not in uh

00:00:48

you prepare winter okay so i'm just

00:00:53

uh some few words about my uh

00:00:58

the long dimple these uh you see a lot of levels inside well actually uh that

00:01:09

we belong to the h. e. s. is all either don't we have the i.

00:01:14

c. t. departments and an i. c. t. is it that where we have six

00:01:20

research lines or interest groups and and

00:01:28

of the usually the biomedical applications domain but and very close to the

00:01:33

italian data analysis um access also so uh why that the because my group

00:01:45

i call it computational intelligence for computational biology as it represents

00:01:49

very well that name what we do what we do is

00:01:54

it's coming from the competition italians domain we apply to the computational biology period

00:02:03

for your competition intelligence in principle is you really know what to say it computational

00:02:07

biology is the study of biological systems or leaving systems in general by means sauce

00:02:15

computational models that's a very shortly and it's a includes

00:02:22

the by informatics part and that explains also why my group

00:02:27

belongs to the eh uses it to for the informatics but

00:02:30

i'm not going to speak about that might interest today is

00:02:38

to give deal

00:02:41

my view on the old area is a limited you but it's what

00:02:46

we uh uh more or less try to do we made in our group

00:02:51

and uh and perhaps going to start with a high a male and modelling um

00:03:00

very shortly in the brain introduction it was mentioned that i'm uh electronic engineer

00:03:06

i'm also a a dietitian so i have

00:03:12

a strong not strong but uh a v. mainly

00:03:18

a a background engineering and um because of that

00:03:23

some of the ways i see the domain or right

00:03:29

a deal with the domain are related with my past uh or we are all slave for last

00:03:36

so i'm going to speak a little bit about my d. h. m.

00:03:40

m. how a i. m. l. i. modelling a related then some words about

00:03:46

understanding explain interpreting what well what we're looking for and then

00:03:53

hopefully this the the centre of my presentation would

00:03:57

be these two aspects interpret the role models and

00:04:01

model induction and at the end i charge view of what we we we we are trying to

00:04:10

deals from at this moment sell to start with this year's part a i. m. l. m. modelling

00:04:20

i just

00:04:23

given of more because of my uh ultimate edition

00:04:28

last i have the tendency to see everything as a system and

00:04:36

every in output relationship as a model so but i'm not the only one

00:04:44

if we take these uh definition which is very common about a machine learning machine and

00:04:49

explore the study of consumption of algorithms that can there from and make petitions and data

00:04:55

so okay that's the classic we have data and at the end we had

00:04:58

prediction on this issue also some kind of behaviour that's uh but the same

00:05:06

definition continue saying that size others operate by building it more little from example input in order to

00:05:12

make data they're even traditions or decisions so we can say that what is inside it's a model

00:05:20

it's some kind of representation of a reality

00:05:25

that can be a very straightforward uh uh

00:05:31

we have a decision making process and we have people to have all

00:05:34

puts but can be more complex we can have a complex more complex systems

00:05:41

which are the reading of the data and we have to build models of a one part of that

00:05:46

so that's my vision we have they that we have some kind of output and in in the me that we have a model

00:05:53

so yes machinery is model based and based on that i can go a

00:06:02

little bit farther and say okay machine there me is similar equivalent or whatever

00:06:10

to modelling so the same concepts we have been applying for modelling for many many years

00:06:18

could be it's somehow applied to understand what maturing is and

00:06:25

from that point of view when we

00:06:30

when we uh the see that doesn't modelling the process east okay we have s. m. although

00:06:38

the model has some kind of input

00:06:42

and it produces some kind of outlook on the output can be as

00:06:46

i mentioned something abstract like a behaviour

00:06:49

or or a production or whatever and

00:06:56

let me can with uh some approaches or some view of

00:07:01

what uh how do we do a moderately and that's a classic

00:07:07

wait where you have somebody who knows how to build models the discuss discusses with

00:07:15

the expert in the area and then it's pro he produces m. although he or she produces a model and

00:07:23

in a sort of a classic design lot we have this model but

00:07:28

always we need to validate the model in this k. with the expert

00:07:33

that's uh what they call human breathe analysts based

00:07:37

hi for that to start the timer or a more conventional or more data driven

00:07:46

database modelling we have data we have to integrate some day this data in some way

00:07:53

um some kind of a processing and at the end we hats

00:07:59

and modelling tool which begins and more they'll based on

00:08:04

an algorithm and iteratively very often iteratively so was too

00:08:10

reviews the difference between the data the target and they'll paint or the the

00:08:17

the predicted behaviour that's a high view of a more than process and

00:08:24

always given that we are modelling just just for the sake of modelling

00:08:30

but for some kind of goal we have to validate the model corresponds

00:08:36

to uh what uh the expert could the text expect from uh from that

00:08:43

00:08:45

we can see that in a more general way with haas data different

00:08:52

kinds of data sources we have a conditioning issue we have this part which

00:09:00

this same problem is the most common parts that are built in a

00:09:06

modern approach and for me this important to see that we have really too

00:09:12

parts even if we don't often think about

00:09:15

modelling us that's these two separate parts we have

00:09:21

and more they'll that kind of model which is a representation of the of what we want to

00:09:29

to uh to model and an algorithm responsible for building the model and

00:09:38

uh from my point of view

00:09:41

having these two parts

00:09:47

define i given machine approach so very often we speak

00:09:53

about the model uh actually knowing approach as a monolithic methods

00:09:58

but if you look at it you can say you can see that there are really two parts

00:10:03

the representation and the building agree and very often

00:10:09

you are working with the single representation and we are you are changing

00:10:14

the strategy of that great big and that's my point of view that

00:10:20

any much in learning approach can be seen as these uh these two parts and uh i

00:10:28

i have been uh is some sometimes see with my students doing the the exercise was okay take

00:10:35

the top and matching larry allowance or whatever at least of machine

00:10:39

learning algorithms you want to and for each one try to identify

00:10:43

these two parts actually we try to use to to

00:10:48

add a third elements the performance but that's a handler

00:10:52

and just tried it will the that exercise and you will see that in principle all is uh

00:10:59

separating these two parts even you sometimes they are very closely uh and

00:11:05

related okay so now let me go to understanding explaining and interpreting

00:11:15

i could call that the understanding understanding understanding

00:11:21

explanation were explaining explanation or whatever kind of uh

00:11:24

ward a freak but mainly is what are we looking for

00:11:32

starting with the this point where we have another into built and i model

00:11:39

i can say that they interpret the ability of the decision

00:11:46

lies mainly or depends mainly on their presentation then

00:11:53

their presentation is a very important for having some kind of explanation and from their

00:12:00

presentation put point of view there is a classic distinction of a three kinds of uh

00:12:08

presentations we have the white boxes where you have

00:12:12

a complete due off every detail on the relationships on

00:12:17

the elements on the system or the variables in the process or whatever kind of um things are you modelling

00:12:24

classical you can have for example a on or you know essentially question and you have to you

00:12:30

don't divide parameters uh idea put any parameter here that's not a very good example but usually about

00:12:37

constance multiplying and adding and that's a classic a white box and the

00:12:43

other side we have asked how you know very well neural networks which are

00:12:49

the prototype of a black box it's but it's not the only kind of black box without and

00:12:56

as soon as you have what white on black why not having the continues of great and that is

00:13:02

a lot of uh different approaches where we are in

00:13:07

the middle of a these eh to uh i extremes

00:13:14

these three approaches can also be a related with these

00:13:20

tradeoff between production and explain ability we can also that

00:13:24

all that performance and interpret ability and different names and

00:13:31

see this is real for life sciences where's or a

00:13:36

work that make any steam all those are very precise

00:13:42

but they are not very productive very precise because they describe exactly

00:13:46

the process but they have a very productive because they are extremely local

00:13:53

so all they predict something very local but they are not really able

00:13:57

to capture relationships a a larger scale and usually they everybody predicted but

00:14:04

for the expert in the domain the question is very clear as describing

00:14:09

very very clearly so they are very explainable but it's usually whatever the cliff

00:14:19

on the other side you have black boxes which it

00:14:23

if they are well don't has a you know how to

00:14:29

and then you have a very good production with a generalisation or rotation whatever kind of uh

00:14:37

protection against the overwhelming but it's effectively with and have some kind of text

00:14:43

inability and in the meanwhile we have these middle usually you will find it

00:14:50

kinds of mall there that are more or less interpret double or that are more

00:14:54

of it it's a predictive and you have this kind of that continues eh eh

00:15:03

transition from one extreme to the other

00:15:07

mm that's

00:15:12

putting explained ability the next but what's

00:15:16

what's that x. how to make sure that that's one of the

00:15:19

open questions and even if we use think on what do people expect

00:15:28

and i'm well placed to describe that goes in my projects i have to

00:15:36

my parameters are biologists uh an regular

00:15:41

just all kind of the life scientists

00:15:45

from uh and clinicians to a scientist and

00:15:52

when we speak about some kind of some level of explanation we can

00:15:58

dolls from very simple things like which are the more productive but uh variables

00:16:04

i mean i'm going to speak uh to present a ready rapidly tom somebody

00:16:09

somebody market discovery and the idea is to find the two three twenty fifty

00:16:16

by your markers that that the more descriptive for predictive

00:16:21

uh for the given deceased for example and that's already some kind of explanation and

00:16:28

a lot of work is down from the past that very often we have to

00:16:34

transform our our data so was to extract information that's what

00:16:40

we call a feature extraction teacher during and for many people

00:16:47

just what are the most informative features that can be used for

00:16:51

making a good production so all i have to apply that kind of filter that kind of a a transformation

00:16:58

uh or that kind of a preprocessing and that's the second level

00:17:05

the very also in um personally i'm very interested in the functional relationships between

00:17:12

variables feature or font all that so that we have

00:17:17

i call functional relationship for that can be a a logic

00:17:21

that's involves or equation simple equations or

00:17:26

linear combinations and that some kind of explanation

00:17:32

uh i in leaving that very abstract because functional relationships can be

00:17:40

as white boxer say the differential equations

00:17:44

or ass black box as simple correlations

00:17:48

so it's very hard to say what is this

00:17:52

kind of but relationships are much more informative that simply

00:17:56

a static list of variables or features and

00:18:01

many people are interested then in the mind of many

00:18:06

people when we speak about explanation very open we expect that

00:18:13

re some explanations so i'll

00:18:17

they are more though why the view predicted that uh that asian

00:18:24

was a sick of uh that this this and we

00:18:30

were we we should expect something light the our user

00:18:36

i decided that because he presented that and that and that but that's

00:18:43

a whole domain and the uh perhaps that's something that we have to

00:18:49

be able to from now to the future we're not close to that

00:18:56

even if i'm saying going to mention a little bit later

00:19:01

the beginning of or to be sending billions were was was more based on that

00:19:08

part than on the older i mean there was a lot of

00:19:15

human like logic

00:19:18

uh in the systems that were able to produce some kind of re some

00:19:22

explanations but they were very hard to be helpful to magically as we are useful

00:19:28

okay

00:19:30

we have that model but forgot more that we can also have

00:19:34

this got other kind of a view what the went to explain

00:19:38

we can and that was my view uh in principle uh when

00:19:44

i started is the main that we are having no one explanation

00:19:50

i mean i have a flag must uh lung cancer agnostics system and i would

00:19:55

like to see rule this describing these are these are the rules describing one cancer

00:20:04

and that's nice but

00:20:08

yes let's go to a mixed for your domain and ask him asker

00:20:13

to give you have full list of criteria going to the egg knows when cancer

00:20:20

usually is not very common to have all picture that very often or or forced

00:20:27

very specific things quite often we are interested in global explanation we're explaining the whole model

00:20:36

uh i'm trying to explain the whole model vendors to understand the form of on the other side

00:20:45

much more often assumed example i try to use rate uh uh

00:20:50

something say go off we are more interested in explained as he

00:20:54

just why did you say that what the predicted that why did

00:20:57

you propose that at that moment we are explaining that specific decision

00:21:04

and that these are local explanation this patient has cancer because

00:21:11

in the rodeo in the eh in they mess i can see uh that kind of remote do uh which is a very

00:21:18

big or which is very regular and whatever kind of uh and

00:21:21

that's another kind of explanation and that's i think that's more realistic

00:21:29

but as usual if you oh whoa

00:21:33

if you will you will very specific tool every

00:21:37

case then you lost some of the global view and

00:21:44

you see that it you you can see that is a continuum

00:21:47

and perhaps for for some cases the optimum was found in between okay

00:21:55

and i think i have to accelerate a little bit

00:21:59

some approaches i mean uh taking this a images from the

00:22:03

uh uh material from the darker program unexplainable activists and variance

00:22:08

they proposed initiative to develop this kind of systems and

00:22:14

they say okay for then there are mainly two families that this what for me it's uh the more clear what

00:22:22

you the fact he didn't divide families these are there are models which are interpret that will from the beginning

00:22:31

yeah

00:22:34

that's what we call the ad hoc approach the models are built to be profitable they are imagined to be interpreted

00:22:43

and there are also approaches the other side that are

00:22:49

five history is do they have that we don't know the induction that's what we call the post talk approach where

00:22:57

we are interested in extracting some explanation elements for blood from black while that's

00:23:02

we are how we accept having backboard models but i given moment we bailed

00:23:07

an explanation part from this model and all that nowadays

00:23:13

uh they are based on the blaring because it's uh

00:23:18

the new kid on the block and uh that's when dave point of view in our group

00:23:25

we work with wall to wall warts uh in the interpreter

00:23:31

model side um i've been working with the box models from uh

00:23:37

the uh twenty years now and these are models are conceived to

00:23:41

be interpret double and all the other side that's where i mean

00:23:47

presenting today as the the centre part some

00:23:51

approaches for mulder induction where from the people earning

00:23:56

uh we are extracting explanation elements from like models we are doing

00:23:59

that and that's what i'm going to present so let me start

00:24:04

with interpret double models or the ad hoc approach

00:24:09

as i mentioned this is not a new topic

00:24:12

if they go phi would all fashion artificial intelligence

00:24:19

was already strong oriented towards you my representations expert systems uh

00:24:24

uh or kind of a search algorithm some rule base three is graphs et cetera

00:24:29

what are what we're oriented to capture human knowledge bad approach was uh

00:24:37

human driven and very every we ah

00:24:43

we could we could find some data remain approaches used to improve for you to

00:24:48

adopt does uh those uh more those which were built on that so that's not new

00:24:55

or the black agree white box issue that i present some minutes ago

00:25:01

was already bore the last century activists in the networks in

00:25:05

the sixties fuzzy logic in the sixty decision trees in the seventies

00:25:09

what formally uh oriented hours learning because decision trees have been

00:25:14

there uh for longer and the white box over the

00:25:18

base models about that they have been there for much more

00:25:23

a lot of years in that last real winter era nineties painter was already there

00:25:31

there was a lot of warts on accuracy was interpreted

00:25:34

willing to trade off road induction wrote distraction out around

00:25:37

the time i was doing my p. h. d. uh there was a it was if more or less a subject

00:25:44

uh the of interest to us a lot of people doing that and as mike you mentioned

00:25:51

these domain the camera

00:25:55

uh and standby for several years and now we are again revisiting that

00:26:02

uh under the new i might also mention

00:26:06

my only research was interpret ability oriented fuzzy modelling

00:26:12

and for my research i compiled so my interpreter will speak period

00:26:17

it was interesting for me i well i went to a tutorial last year on interpret double falls models

00:26:25

and eighty percent of the criteria were presented by the where the same or very close

00:26:33

to what i come five at that time is not that i invented forever just the the

00:26:39

that's these are constants that are still valid and on that base i propose an approach could

00:26:48

these oriented that we interpret abilities and the hawk approach i mean the more

00:26:51

those are built to be interpret double and the model yep a process is

00:26:58

constraint to respect these a criteria of interpret ability so

00:27:05

this is a an approach um

00:27:11

in the current two thousand and it was called for the cocoa um diversion idea

00:27:18

propose other thing was on my that i um when i start to the the

00:27:25

uh the verbal we uh built a new version basin c. plus plus

00:27:31

c. plus plus which we call fuse for a fuzzy you need a genie

00:27:36

eugene unity can gene and uh then for the project we probably built also um there

00:27:43

that code and now we are going to or spite on trying to keep

00:27:50

uh connected to what is a used wisely and

00:27:56

um this approach has been used to uh

00:28:03

as i mentioned for for example for violin diagnosing by market discovery

00:28:08

as heavily as a two thousand and seven we have a product for

00:28:13

selecting by market for about a catchers cancer screening in two thousand nine to eleven wouldn't probably with

00:28:20

uh interested in profiling by the markers for every don't cancer

00:28:25

uh in this e. two years we wear involve also uh

00:28:31

in colour rectal cancer screening unlucky mess of typing this was

00:28:37

our pharmacy aware and team of uh both my oldest and a commercial partner in that

00:28:46

pardon was only a commercial partner interested in

00:28:51

understanding the role of lung cancer in that project

00:28:56

uh this was a company producing or and now they are commerce advising a screening test

00:29:03

and here it was separately with the hospital interested in having a new test of a forgiveness

00:29:10

of dating scene the there so just to mention that even if you that is an approach

00:29:19

conceived

00:29:21

and bore

00:29:23

twenty almost twenty years ago it has been able to deal with the real world problems

00:29:30

perhaps all of them are small data problems but in life

00:29:35

sciences we are very often confronted to that i think the third

00:29:40

of your uh war shops war was on the small data exactly because for many areas

00:29:47

big data is not realistic and we still have

00:29:51

to produce productive systems for that so i that was

00:29:58

um view of what can be known

00:30:02

i conceding directly more there's the which are uh um which are interpret that

00:30:09

will perhaps one mile to do that we were obliged to deal with it

00:30:16

gene selection thing or feature selection you know which is one of the first steps i mentioned for understanding

00:30:24

is which are the most relevant variables and if you are speaking about

00:30:27

by marker selection you're clearly interested on uh having this uh variables so

00:30:35

it made me go on then to the more the induction part the post hoc approach based on

00:30:42

the plumbing and there i'm going to the to

00:30:48

all through a two percent patients the first one

00:30:55

is uh using or exploiting the local and global

00:30:59

internal representation about something that is very common nowadays and

00:31:07

well we start that's always from the cons the the

00:31:12

the fact that there are the words are black boxes and we could be interested in obtaining some understanding

00:31:19

and it's not clear examples are uh going to filter activation

00:31:23

or having some kind of a sideline see uh uh um detection

00:31:30

and

00:31:33

based on that we can already be in some kind of a basic rules

00:31:39

having a the and some kind of filters than for this class those for him to reply as we can

00:31:44

see that uh there are some features that are more

00:31:48

activity than all others and we have a class activation map

00:31:52

where we can see which are what are the importance of different filters and

00:31:56

these are based on the take the local uh that's a relatively recent approach

00:32:04

and now we're group we we're also we well we also explore that um for them all we'd

00:32:12

be of an emotion detection that uh that exist for of for a long time perhaps a comment

00:32:19

uh from the for this research we are saying okay there is a lot of people doing

00:32:26

uh deploring yes let's let's use what they did we our remote exploring

00:32:33

completely new the blurring of architectures we're using

00:32:38

common architectures and common solutions like that one but

00:32:44

what we did just for them all was to event divider regions aware

00:32:49

more use by the classifier to detect the motion to say these is uh

00:32:56

but uh i was it's not sad about

00:32:59

mainly angry face this one also and say that

00:33:05

like the information use mainly relate mainly plays around

00:33:09

the mouth on the on the ice and bad well that's uh

00:33:14

we can also that for localisation of objects into an an image

00:33:18

like a a tiger here even the tiger is what i am and

00:33:24

is this a a wreck you know graffiti some interesting

00:33:28

uh events or uh uh elements in the in the image

00:33:35

that's something that we really the but we worked in our view is

00:33:41

of having an explanation if for example if i have to identify a lion

00:33:46

i can say this is a lie on because

00:33:49

there are some a few are we identify this pause the mouth

00:33:54

here which are typical of a lion's i've given that we have

00:33:59

uh for all four of them or several of them we can say this is a lion

00:34:05

i then is not only localisation but we need some kind of know

00:34:10

what explanation to say okay all of them of the unnecessary these are simple

00:34:16

and the f.

00:34:19

they wait we can x. the scene in a network that

00:34:23

it which are the elements is something like okay we want to

00:34:29

maximise the class of a dog even if he's not the

00:34:32

dog and we can say we can see that some of the

00:34:36

points of the elements of the image belongs to the dock

00:34:40

it took a a class but they're all there is to

00:34:44

these red one which are not belonging to the class and because of that perhaps it is not classify that's about that's

00:34:51

the uh respect activation mixing the station and a bit classy

00:34:57

way if i'm a minimising the uh some kind of loss

00:35:01

function and the problem is that the images that are he

00:35:05

didn't divide or that maximise the class are not very clear

00:35:11

and there are some adaptation of chain going from minimising last to my c. magazine to maximising the selected

00:35:18

out with some of the kind of optimisation function objective

00:35:22

function and then we can have some images that are

00:35:26

sometimes the or sometimes a more confuse of using about that uh mm

00:35:37

people trying to do a image reconstruction we're not a remote very satisfied with this kind of results then

00:35:45

the transitional was made from classical regular racers to using

00:35:51

neural networks themselves us regular ice so we are having the rio

00:35:56

their neural network to predict and then yeah at the neural network too

00:36:00

uh avoid the or to regular ice the they're the and

00:36:05

that's the famous very well noun gowns and that's what we

00:36:14

we implement it so we have a classifier that was

00:36:19

trained with uh the classic way and then we implement

00:36:25

another network which tries to maximise and even platt's and we are trying to

00:36:34

use that know what's a generator to find image that is

00:36:38

able to maximise this class and minimise the other for it

00:36:43

and that should be any image representing exactly that emit that the

00:36:48

class and with that we can obtain resource like uh for your art

00:36:56

you see the email just a remote is you will have us to see you hurt

00:37:00

but you will have elements that are found several times that are

00:37:05

classic for work and all we clearly see that are some images

00:37:10

that are clear of a work i know some of them are serial to the gays are but the gaze

00:37:16

or meets all other forms and for each class well

00:37:20

leave lemons are very clearly represented here and these are

00:37:25

images that are generated artificially to maximise the clatter

00:37:30

was learned by the the network so we are producing

00:37:37

the

00:37:39

in much that could be the ripper said the group

00:37:43

presented a common parts of all the images that were supposed

00:37:47

or worse were the say to have this class inside

00:37:52

it from that point of view that's clear for us i

00:37:58

that clear use we have we can see elements that are very

00:38:02

clear represented representative but sometimes we also discover some biases for example

00:38:11

for the second let me go on there are always elements like ice

00:38:17

more than the class they only they the the element

00:38:20

itself why because almost any image presenting that will also contain

00:38:27

ice or for the mean is scared we are very open a painting

00:38:32

uh uh uh also part of the core of of the body which

00:38:36

are uh not only representing the the object for the the same for uh

00:38:43

for a musical instruments very often the they are not completely separate and that's also a good

00:38:51

way to say okay the images because of that we can save this classes no represented by bits

00:38:57

only the all the but but its functionality perhaps the classes not will affect mm

00:39:07

but this approach

00:39:10

can be also used

00:39:12

not only for the class we could be also interested what are the images maximising this

00:39:19

filter for example if like identify the filter as being very active for a given class

00:39:24

then i can go and see that he may use active

00:39:28

making a lot this filter are composed of lines ah okay

00:39:33

we can see that these filters i was busy allies in vertical

00:39:37

hurries on the up and some kind of callers i we can

00:39:43

at the moment

00:39:45

uh concentrate on what this filter contains and what is called is a capture by

00:39:50

the network we already trained but we can also go to filter stuff which are low

00:39:58

low level not exactly the output but and see that some features are spacey allies for example

00:40:05

in uh these forms or in uh i use

00:40:08

or even a diagnosis and whatever that beard heads

00:40:15

and

00:40:17

that are the elements that could allow us

00:40:26

if i have a lion

00:40:29

is it composed of these four elements and these elements could be contained in this

00:40:38

intermediates layer and that's effectively what we propose and we would we the the

00:40:44

a more recently for for example for this class the

00:40:49

beagle we detected that the most active filters is uh

00:40:56

are activated by long years and this is a locally presented here

00:41:02

for that image and another filter very active was that one hand does filter

00:41:08

the text colour partners of the for and this filter the

00:41:13

texts that notes and in that way we can say if

00:41:18

in an image i have long ears specific colour part there and diagnosis then i half below

00:41:27

some other examples and we're going to stop here but for soccer ball we have quite serious

00:41:34

octagonal patterns and some grass and you see that grass it's

00:41:39

important for detecting a soccer ball from the images perhaps that

00:41:44

he well the information that the network image band we don't have soccer balls which are not

00:41:50

in real situations uh perhaps that's a bi yeah that could be corrected to improve our database

00:41:58

or for this a bear also we have

00:42:03

their heads with fee there's an some black and white colouring and that

00:42:08

combination is able to i we're selecting here the three most active filters

00:42:13

for that image and you see that we are combining loco

00:42:19

information with some kind of be mobile information about this local

00:42:25

and the the way we are combining both of them and that give us a good approach to fix that the spanish

00:42:32

here are some references but well that's not so

00:42:37

okay let me see if i can go back to my

00:42:41

or the presentation

00:42:48

okay i have to

00:42:51

finished up one

00:42:54

i think that we can go come to the okay so now let me speak about rule extraction and for that

00:43:04

'cause i have to finish it varies from

00:43:10

a exactly like i think it's still have um some more minutes

00:43:15

but well uh i try not to repeat where the same things but

00:43:21

the idea is okay from that point we can get some understanding combining

00:43:27

local i'm going to represent patients that's uh what is is already uh

00:43:32

of uh i have to make that and that's the approach a mention

00:43:37

i'm not going to mention it again but we're also through real another approach

00:43:43

the other approach is okay we have for example this kind of a network is the the you sixteen

00:43:50

and but i was layer that are performing feature extraction but

00:43:55

at the end there are for the connected players three layers

00:43:59

which are making the decisions the final decisions are made at that level

00:44:05

and

00:44:06

these little is composed of three consecutive normally i read this and that is that is

00:44:12

a there's a lot of a new rooms inside and that makes make it hard to

00:44:19

understand what kind of uh

00:44:22

what kind of a relationship is second thing inside so the

00:44:27

proposed approach is okay is is the idea is very simple

00:44:31

let's say okay instead of that

00:44:36

or or we are trying to find

00:44:41

an equivalent to interpret obliquely wanting to these three layers is

00:44:45

not that i'm replacing a replacing then for a explanation issues but

00:44:50

if i want production i will use my my network lineup but for explanation i can

00:44:58

replace it would use that to generate some kind of

00:45:02

a rule based to a representation and then again we are

00:45:09

going into the representation pass the central issue for it but that really are in that case

00:45:19

then a little restraint and we have some

00:45:25

feature active i activation some filters and we replace

00:45:29

the last convolutional layer by um

00:45:37

around the forest

00:45:39

random forces not interpret double per se because we have a lot of uh different

00:45:44

uh trees and it's very hard to find a global explanation but we can on the base

00:45:52

into the into the into our database we can see which are the

00:45:58

if the rules contained in these forums that are the most active and we can create the ranking

00:46:06

they went to lose a ranking is not a going and looking up

00:46:10

these activation is we have a simple perception which is combining all of them

00:46:17

trying to obtain them be maximum performance with respect to the

00:46:21

network i mean the the maximum of predictive power had the weights

00:46:27

of a connection for this perception represents

00:46:31

are used to rank the rules very simple and the way you can have it top rule the second of on

00:46:38

and the idea is that somebody does the human part is taking

00:46:44

the top for a top five to seven rolls and then with that

00:46:50

uh and my in mind we can use for example here all

00:46:54

four is not a very good example five of the five most uh

00:46:58

important roles we're threes are saying about classes is that class yes

00:47:03

yes we can say this is the label and these rules are explaining

00:47:08

the um the decision with a very relatively good uh precision results

00:47:17

we can see here for different lot different classes if we use the top

00:47:22

top uh the top rule the top three rules top five uh three four five six in that

00:47:29

way and you can see that for many classes the top to the rules are already very good

00:47:35

and for some others is necessary to go to five

00:47:39

six or seven rules and these rules are are composed

00:47:46

um sometimes for for some classes we observer that three rules maybe

00:47:50

more upgrade the fifty rolls for example because uh some rules are

00:47:56

uh per topping the decision

00:48:01

and as i mentioned some in some cases to go five a remote not enough

00:48:09

the rules ha of uh this cat use for for example the top the rules for the l. class

00:48:14

are the the feature x. four five a four hundred and fifty is bigger than two and

00:48:22

that's that can be expressed as a level of importance and these

00:48:29

features can be be sliced with uh the other methods we use or

00:48:35

we can use the same um activation of each uh theatre

00:48:41

to create some kind of maps where well it's not very

00:48:46

we have these fears fears to roll a reason

00:48:51

which is the second four hundred and fifteen four hundred and twenty seven okay

00:48:56

and we see that oh these are where both classes are more active and this is the

00:49:04

the threshold and we obtain that special for this we can go

00:49:12

look at the all the rule and forgiving more detail

00:49:16

if if we can see that effectively these are clearly

00:49:22

the all we are interested in and more we are going

00:49:25

to close to the the threshold the less we are having clearly

00:49:31

images and these are all the images that were used for training

00:49:35

i mean if you have a single image you will find it

00:49:40

in a given point on for that image you can half the level of population of each of the

00:49:50

some uh

00:49:53

he's here we have things that are clearly not calls but because the the or close to the the limits

00:50:01

or that belongs to different kind of calls

00:50:06

that allows us to eat all the lisa model here

00:50:10

which is uh that part is a liar but because of the two circles it could be similar to two guys

00:50:19

another class a goldfish were the same we have the clear goldfish we have other kind of features which

00:50:26

are not of the same class at objects that are come clearly elias because of the court or whatever

00:50:33

and some classes are higher there

00:50:36

and uh we mention the some classes they asked for four five seven rules so

00:50:41

i'll single rules are not very clearly and we have a lot of all liars

00:50:51

but even there we can try to just see why these are here these are highly mostly water

00:50:58

here we have a total scheme and uh the

00:51:01

and here we have the clear purples because they are

00:51:06

they have total scheme and they are in the water so if you don't for that class

00:51:11

that's a method

00:51:13

uh uh some

00:51:16

comments about the the way it was used the two

00:51:21

regularly station uh to the last function that were used for regular decision for minimisation and

00:51:28

we have a ranting of rules and the preferences provided by the human

00:51:34

the human is a saying okay uh we're interesting three or five or we

00:51:39

i think three are no for that that's just remember that we were here

00:51:44

exploring a lot of different classes but if you are interested

00:51:48

in uh in grants perhaps you're interested in three or five

00:51:52

classes of words and not in the fit and not in every anymore in the in the the images are mm

00:52:02

for me it's clear that these levels are very close to what i have been

00:52:07

doing for a long time the fuzzy logic where things are gradually changing from uh

00:52:13

the from zero to one instead of a given of threats will so

00:52:18

perhaps with fertility we could we could have rules that better capture they'd

00:52:24

the eh

00:52:26

this mall um changes between the the matches we belong to the

00:52:32

class okay okay let's say that that was all for the second approach

00:52:41

and now

00:52:44

uh the last thing i would like to discuss very shortly with a single slide

00:52:51

case what do

00:52:55

where do we plan or wish to go from now

00:53:00

an example in my domain of uh life sciences is uh

00:53:05

the reading about the classification that was a present the very

00:53:09

uh well some some days ago only by uh you all with a partner

00:53:16

and um well we were already thinking of that but there's a lot of people working on the the very good example

00:53:23

is to classify it to classify a given an a rating

00:53:27

of graffiti uh us being everything about the or not and

00:53:33

our first step that our core research is protecting loco

00:53:39

a class relevant features and produces some kind of rules

00:53:44

of grammatical instead of uh trying to you in for them

00:53:50

and

00:53:52

in that way we have these two parts the most focused lottery methods and the explanation

00:53:59

based on roles this is our current research but we are aware that that's not enough

00:54:07

and at a given moment to say that as an extra

00:54:11

day and that our cotton world sports we need an experimentation

00:54:17

i don't think at least for the moment that we could

00:54:20

automatically go and look the villager tour and even defy what we

00:54:27

learn here as being extra days or couple so

00:54:31

we need some kind of experimentation and include that

00:54:37

in the modelling process so was too

00:54:40

try to use things that belong to specific classes and

00:54:45

nothing that we discover unless they are but so important that

00:54:48

we propose that to to the expert and in the middle of that we have a we need some kind of um

00:54:55

of a quantification of the of the explanation of the

00:54:59

quality of the spanish and and all that would be used

00:55:03

to produce final explanation that could be pro uh close to the uh

00:55:10

recent uh explanation that they mentioned the beginning something like if the number

00:55:15

of exceed eight is a bigger than one and the number of quota will sports is bigger than two then the patient houses with you know but

00:55:23

that kind of role is not something that we are close to obtain and

00:55:28

we know that we are only here and we need to continue uh uh

00:55:35

improving our methods here and also i'm adding new uh explanatory

00:55:43

ah issues so as to see going to rio explanation is still

00:55:53

a long time um uh for some things we are not so far from that

00:55:58

the system that we were presented last week or of the two weeks ago uh use

00:56:06

an intermediate uh learning part where

00:56:12

are they uh are partnering with the to be hospitals in the

00:56:16

us they had a lot of expert undertaken that and they are producing

00:56:22

an intermediate step where they say okay our image you know wearing that we have the amplified that kind of thing that kind of a

00:56:28

the kind of thing and then this part was not a simple

00:56:32

rule uh it build and as we proposed here but they need again

00:56:38

and they're mean on from the the feature that were

00:56:42

distracted we didn't divide to the decision so they are

00:56:47

doing the some kind of a deep learning for you to define the

00:56:52

the events and then another part for you define the

00:56:57

relationship between that so automatically going for a mention finding

00:57:03

interesting features finding the relationships between these features

00:57:08

and producing every sound very uh explanation that seems to be the state i haven't

00:57:17

had access to uh who the specific to uh have is there still a

00:57:24

okay only speaking about what they uh they can be they

00:57:28

can do but don't know exactly how about something to follow because

00:57:33

that vision within three uh having access to a lot of data as is the case in the us

00:57:41

yeah this facilitates a lot of things this kind of results i well i think

Share this talk:

Conference Program

57:55

Methods for Rule and Knowledge Extraction from Deep Neural Networks
Keynote speech: Prof. Pena Carlos Andrés, HEIG-VD
May 3, 2019 · 9:10 a.m.

1445 views

04:39

Q&A - Keynote speech: Prof. Pena Carlos Andrés
Keynote speech: Prof. Pena Carlos Andrés, HEIG-VD
May 3, 2019 · 10:08 a.m.

13:00

Visualizing and understanding raw speech modeling with convolutional neural networks
Hannah Muckenhirn, Idiap Research Institute
May 3, 2019 · 10:15 a.m.

369 views

01:56

Q&A - Hannah Muckenhirn
Hannah Muckenhirn, Idiap Research Institute
May 3, 2019 · 10:28 a.m.

239 views

20:07

Concept Measures to Explain Deep Learning Predictions in Medical Imaging
Mara Graziani, HES-SO Valais-Wallis
May 3, 2019 · 10:32 a.m.

457 views

16:17

What do neural network saliency maps encode?
Suraj Srinivas, Idiap Research Institute
May 3, 2019 · 10:53 a.m.

719 views

17:25

Transparency of rotation-equivariant CNNs via local geometric priors
Dr Vincent Andrearczyk, HES-SO Valais-Wallis
May 3, 2019 · 11:30 a.m.

458 views

01:11

Q&A - Dr Vincent Andrearczyk
Dr Vincent Andrearczyk, HES-SO Valais-Wallis
May 3, 2019 · 11:48 a.m.

103 views

15:45

Interpretable models of robot motion learned from few demonstrations
Dr Sylvain Calinon, Idiap Research Institute
May 3, 2019 · 11:50 a.m.

194 views

01:23

Q&A - Sylvain Calinon
Dr Sylvain Calinon, Idiap Research Institute
May 3, 2019 · 12:06 p.m.

125 views

12:00

The HyperBagGraph DataEdron: An Enriched Browsing Experience of Scientific Publication Databa
Xavier Ouvrard, University of Geneva / CERN
May 3, 2019 · 12:08 p.m.

106 views

12:44

Improving robustness to build more interpretable classifiers
Seyed Moosavi, Signal Processing Laboratory 4 (LTS4), EPFL
May 3, 2019 · 12:21 p.m.

132 views

02:23

Q&A - Seyed Moosavi
Seyed Moosavi, Signal Processing Laboratory 4 (LTS4), EPFL
May 3, 2019 · 12:34 p.m.

14:38

Interpretation of End-to-end one Dimension Convolutional Neural Network for Fault Diagnosis on a Planetary Gearbox
Sooho Kim, UniGe
May 3, 2019 · 12:41 p.m.

Recommended talks

59:34

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
July 4, 2016 · 2:01 p.m.

2368 views

22:15

Methods for Rule and Knowledge Extraction from Deep Neural Networks
Keynote speech: Prof. Pena Carlos Andrés, HEIG-VD

Embed

Transcriptions

Conference Program

Methods for Rule and Knowledge Extraction from Deep Neural Networks
Keynote speech: Prof. Pena Carlos Andrés, HEIG-VD
May 3, 2019 · 9:10 a.m.

Q&A - Keynote speech: Prof. Pena Carlos Andrés
Keynote speech: Prof. Pena Carlos Andrés, HEIG-VD
May 3, 2019 · 10:08 a.m.

Visualizing and understanding raw speech modeling with convolutional neural networks
Hannah Muckenhirn, Idiap Research Institute
May 3, 2019 · 10:15 a.m.

Q&A - Hannah Muckenhirn
Hannah Muckenhirn, Idiap Research Institute
May 3, 2019 · 10:28 a.m.

Concept Measures to Explain Deep Learning Predictions in Medical Imaging
Mara Graziani, HES-SO Valais-Wallis
May 3, 2019 · 10:32 a.m.

What do neural network saliency maps encode?
Suraj Srinivas, Idiap Research Institute
May 3, 2019 · 10:53 a.m.

Transparency of rotation-equivariant CNNs via local geometric priors
Dr Vincent Andrearczyk, HES-SO Valais-Wallis
May 3, 2019 · 11:30 a.m.

Q&A - Dr Vincent Andrearczyk
Dr Vincent Andrearczyk, HES-SO Valais-Wallis
May 3, 2019 · 11:48 a.m.

Interpretable models of robot motion learned from few demonstrations
Dr Sylvain Calinon, Idiap Research Institute
May 3, 2019 · 11:50 a.m.

Q&A - Sylvain Calinon
Dr Sylvain Calinon, Idiap Research Institute
May 3, 2019 · 12:06 p.m.

The HyperBagGraph DataEdron: An Enriched Browsing Experience of Scientific Publication Databa
Xavier Ouvrard, University of Geneva / CERN
May 3, 2019 · 12:08 p.m.

Improving robustness to build more interpretable classifiers
Seyed Moosavi, Signal Processing Laboratory 4 (LTS4), EPFL
May 3, 2019 · 12:21 p.m.

Q&A - Seyed Moosavi
Seyed Moosavi, Signal Processing Laboratory 4 (LTS4), EPFL
May 3, 2019 · 12:34 p.m.

Interpretation of End-to-end one Dimension Convolutional Neural Network for Fault Diagnosis on a Planetary Gearbox
Sooho Kim, UniGe
May 3, 2019 · 12:41 p.m.

Recommended talks

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
July 4, 2016 · 2:01 p.m.

Interpretable artificial intelligence for cancer personalized medicine
Dr. Maria Rodriguez Martinez, Group Leader, IBM
Jan. 25, 2023 · 9:48 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Methods for Rule and Knowledge Extraction from Deep Neural Networks Keynote speech: Prof. Pena Carlos Andrés, HEIG-VD

Embed

Transcriptions

Conference Program

Methods for Rule and Knowledge Extraction from Deep Neural Networks Keynote speech: Prof. Pena Carlos Andrés, HEIG-VD May 3, 2019 · 9:10 a.m.

Q&A - Keynote speech: Prof. Pena Carlos Andrés Keynote speech: Prof. Pena Carlos Andrés, HEIG-VD May 3, 2019 · 10:08 a.m.

Visualizing and understanding raw speech modeling with convolutional neural networks Hannah Muckenhirn, Idiap Research Institute May 3, 2019 · 10:15 a.m.

Q&A - Hannah Muckenhirn Hannah Muckenhirn, Idiap Research Institute May 3, 2019 · 10:28 a.m.

Concept Measures to Explain Deep Learning Predictions in Medical Imaging Mara Graziani, HES-SO Valais-Wallis May 3, 2019 · 10:32 a.m.

What do neural network saliency maps encode? Suraj Srinivas, Idiap Research Institute May 3, 2019 · 10:53 a.m.

Transparency of rotation-equivariant CNNs via local geometric priors Dr Vincent Andrearczyk, HES-SO Valais-Wallis May 3, 2019 · 11:30 a.m.

Q&A - Dr Vincent Andrearczyk Dr Vincent Andrearczyk, HES-SO Valais-Wallis May 3, 2019 · 11:48 a.m.

Interpretable models of robot motion learned from few demonstrations Dr Sylvain Calinon, Idiap Research Institute May 3, 2019 · 11:50 a.m.

Q&A - Sylvain Calinon Dr Sylvain Calinon, Idiap Research Institute May 3, 2019 · 12:06 p.m.

The HyperBagGraph DataEdron: An Enriched Browsing Experience of Scientific Publication Databa Xavier Ouvrard, University of Geneva / CERN May 3, 2019 · 12:08 p.m.

Improving robustness to build more interpretable classifiers Seyed Moosavi, Signal Processing Laboratory 4 (LTS4), EPFL May 3, 2019 · 12:21 p.m.

Q&A - Seyed Moosavi Seyed Moosavi, Signal Processing Laboratory 4 (LTS4), EPFL May 3, 2019 · 12:34 p.m.

Interpretation of End-to-end one Dimension Convolutional Neural Network for Fault Diagnosis on a Planetary Gearbox Sooho Kim, UniGe May 3, 2019 · 12:41 p.m.

Recommended talks

Deep Supervised Learning of Representations Yoshua Bengio, University of Montreal, Canada July 4, 2016 · 2:01 p.m.

Interpretable artificial intelligence for cancer personalized medicine Dr. Maria Rodriguez Martinez, Group Leader, IBM Jan. 25, 2023 · 9:48 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Methods for Rule and Knowledge Extraction from Deep Neural Networks
Keynote speech: Prof. Pena Carlos Andrés, HEIG-VD

Methods for Rule and Knowledge Extraction from Deep Neural Networks
Keynote speech: Prof. Pena Carlos Andrés, HEIG-VD
May 3, 2019 · 9:10 a.m.

Q&A - Keynote speech: Prof. Pena Carlos Andrés
Keynote speech: Prof. Pena Carlos Andrés, HEIG-VD
May 3, 2019 · 10:08 a.m.

Visualizing and understanding raw speech modeling with convolutional neural networks
Hannah Muckenhirn, Idiap Research Institute
May 3, 2019 · 10:15 a.m.

Q&A - Hannah Muckenhirn
Hannah Muckenhirn, Idiap Research Institute
May 3, 2019 · 10:28 a.m.

Concept Measures to Explain Deep Learning Predictions in Medical Imaging
Mara Graziani, HES-SO Valais-Wallis
May 3, 2019 · 10:32 a.m.

What do neural network saliency maps encode?
Suraj Srinivas, Idiap Research Institute
May 3, 2019 · 10:53 a.m.

Transparency of rotation-equivariant CNNs via local geometric priors
Dr Vincent Andrearczyk, HES-SO Valais-Wallis
May 3, 2019 · 11:30 a.m.

Q&A - Dr Vincent Andrearczyk
Dr Vincent Andrearczyk, HES-SO Valais-Wallis
May 3, 2019 · 11:48 a.m.

Interpretable models of robot motion learned from few demonstrations
Dr Sylvain Calinon, Idiap Research Institute
May 3, 2019 · 11:50 a.m.

Q&A - Sylvain Calinon
Dr Sylvain Calinon, Idiap Research Institute
May 3, 2019 · 12:06 p.m.

The HyperBagGraph DataEdron: An Enriched Browsing Experience of Scientific Publication Databa
Xavier Ouvrard, University of Geneva / CERN
May 3, 2019 · 12:08 p.m.

Improving robustness to build more interpretable classifiers
Seyed Moosavi, Signal Processing Laboratory 4 (LTS4), EPFL
May 3, 2019 · 12:21 p.m.

Q&A - Seyed Moosavi
Seyed Moosavi, Signal Processing Laboratory 4 (LTS4), EPFL
May 3, 2019 · 12:34 p.m.

Interpretation of End-to-end one Dimension Convolutional Neural Network for Fault Diagnosis on a Planetary Gearbox
Sooho Kim, UniGe
May 3, 2019 · 12:41 p.m.

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
July 4, 2016 · 2:01 p.m.

Interpretable artificial intelligence for cancer personalized medicine
Dr. Maria Rodriguez Martinez, Group Leader, IBM
Jan. 25, 2023 · 9:48 a.m.