Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:00
so it's great to be here and it's great to see us that are files that so many people from so many
00:00:04
diverse backgrounds so and it's great was that basis in portions
00:00:08
everybody very much looking forward interacting with a lot you make them
00:00:12
okay so as a file say i work at i. b. m. research
00:00:15
i'm that technically that of the team of computational systems biology and what
00:00:19
they want to tell you two days about the work my team is
00:00:22
the wiener interpret aboard a people that need for constant pressure nice maybe sing
00:00:27
and because i know that this is a very evaluate um her out because they were ninety five when i assume that people
00:00:33
but there's not that ah so are gonna be five very general introduction so i hope that everybody falls me through the stock
00:00:38
i dunno it too technical is you want to get on with the technical details and we're happy to talk to you later on in the coffee breaks
00:00:46
so this is another be aware that one and then you to they first the one that before they want to what the bay
00:00:51
the the field of endeavour double the planning because again so no you are not what does feel button with all my might not
00:00:57
and then i cannot dangerfield applications all how would have use interpret that would be planning
00:01:02
for perfect different specific problems related to cancer personalise minutes and then of course i would call
00:01:10
okay so let's gets that beep so i guess i don't need to make a strong case that
00:01:15
people who said to me now stand in their performances some big two performances in many different fields
00:01:21
this is a very subjective for top list of what i've been i've been main achievements in the last decade of people anymore
00:01:27
there's it's that being with alex neck with one one of the
00:01:30
first commercially run it was forty which analysis in two thousand twelve
00:01:35
then i know of these one i'll fungal in two thousand
00:01:37
sixteen using reinforcement learning to be the human champion at all
00:01:42
and now focusing on biology we are still my real by by them with
00:01:46
admission performance apart focal how has completely revolutionise the problem of protein structure prediction
00:01:53
so this is so great of course but that is a challenge with this uh increasing performances
00:01:58
and the challenges is the size of the models which are exponentially increasing as you can see here
00:02:04
so this is not not only look suppose from only sleazy also
00:02:07
generate two eighteen two thousand eighteen to generate two thousand twenty one
00:02:11
and this will be the way this is focus on models for an l. p. natural language processing but if in a very similar in other fields
00:02:18
and you can see how we can all from the really small more beers with only one billion but i mean there's
00:02:24
the models with one point have really owns of parameters and i didn't
00:02:28
see that trend is not personally now somewhat those are becoming she which
00:02:33
and you might want it okay short more the sandwich but what what
00:02:36
like a t. v. they perform well what's what's the matter with love tomatoes
00:02:41
and what i want one you here is just very some problem with lots more there are several products
00:02:47
the first one is first of all you need huge amount of data to train them and maybe in an l. p. you have the slight
00:02:53
amount of weight that upset but in biology typically you do not so
00:02:56
do need more smaller more there's that you can i get into into it
00:03:01
also there is a lot of bigoted ethical debate about the environmental footprint but we're not
00:03:05
going to this debate about somebody this last more this big a lot affinity to the
00:03:09
band of deployment pain et cetera so we're i'm set sometimes the interesting performances i mean
00:03:15
you know so this was when i actually got a bit with the syphilis need it
00:03:19
but the part that impacted not is more interesting for me is the battle connection of the small but since all these
00:03:25
mothers have so data that impossible to understand how they work so you have to to us they do the right thing
00:03:31
but we have seen by now that often models that you're thinking you are very reasonable answer which is nevertheless wrong
00:03:37
so this blind trust that makes us cash and what i'm proposing here but what if
00:03:42
many people in the feelers a purple scene is that we should not blindly tress more bits
00:03:47
we should trust me more there is that bad transpired in or idealistic midterms party
00:03:52
and this is what the feel of interpreted with the plan is aiming to do and i'm gonna
00:03:56
be you know in the stock three of these approaches that we have there will be my team
00:04:00
to get models that get you high accuracy but nevertheless
00:04:04
give you hints about white in what is making a prediction
00:04:07
okay so with this introduction let's get started uh let me show
00:04:13
you the first result that would like two percent here to pick
00:04:16
so that's that with this model we pointed pacman at this time when
00:04:19
we stopped working in fact money was like maybe two three four yes back
00:04:23
we're interested in building and would be more than the plenty model for drug sensitivity pitch
00:04:30
in the more technical ah speaking what we wanted to break is the
00:04:33
i. c. fifty values which basically said value that tells you how sensitive
00:04:38
ah i scanned set sail is to implement the cancer compound also do we lack of course when did a bit
00:04:43
of a compound it would like to be as efficient as possible and this is what this non basically evaluation it should
00:04:49
so what was so special about pacman appealing but not the first one issue much you land
00:04:53
integrated clock sensitivity but what was somewhat bigger what what difference with this that with previous works
00:04:59
so we want to the first one that will move the monarchy pictures all
00:05:02
we know that there are many different types of information that that predictive of
00:05:06
toxicity and we did not want to focus on only one of fit when
00:05:09
the to use as many as possible to get a production as good as possible
00:05:14
so specifically wheels technical information of the compile which will be nice important
00:05:20
i also wanted to use high throughput information so here we news transcript or make a high throughput transcript permit profiles
00:05:26
we have like they don't also tested with five to put properly profiles that you can in principle adapted to that type of
00:05:31
five triple data that you have to give information of just prefer to to be fine to give information about the same line
00:05:40
and the last thing on this is also important in biology will use prior knowledge and the prior knowledge is
00:05:44
important in biology because typically in biology we also have
00:05:47
the the the the low dimensionality of of our court
00:05:51
someone had we have many features of their high dimensional inanimate of features have low sample size
00:05:57
so do we don't have any hope to train brute force more there's as you can do it maybe in an l. p.
00:06:03
so my is very important to use this prior knowledge to try to constrain your model
00:06:07
to do to use the space of possible solutions and to make them more they'll hopefully trainable
00:06:13
okay so these are they constraints we get to this model so what do we ask the model to predict
00:06:19
on one hand as i said we ask the model to predict the icy fifty
00:06:22
values but second we also asked the model to give us hype light so give us
00:06:27
at least on information about what was the features that were more important to make up
00:06:31
h. okay so we have a piece to this this tool that passed up against him
00:06:39
okay so that first look that's on accuracy and then we went into interpret every
00:06:44
so one week way of assessing the accuracy is that what we do in this type of
00:06:49
plot so he at in this x. is that we have the experimentally measured basically the values in
00:06:55
and here i have the predicted values from actually from from my from the model pacman
00:07:00
and it's not here corresponds to operate of us a line and a compound
00:07:04
so do we have a perfect model or than boys will align with the yeah one so we have perfectly and i
00:07:10
so by measuring how much they get from the yeah we're not you can just guesses how will all probably
00:07:16
so you can already see visually willing reasonably well the place where and what we
00:07:20
also kind of say so matrix but it they're sorted yes we predicted with high accuracy
00:07:26
also this was quite nice to see asked that we probably is is not that that was um an independent benchmark to think that
00:07:31
was probably the d. after with these all four independently of fast
00:07:35
compared twenty eight different a. i. based models for greg sensitivity production
00:07:40
um pacman was one of the outperforming models from the stack of what's so that was also very nice to see
00:07:46
but let's not to interpret ability which is the battle to i would like to highlight in the stock
00:07:51
so what we'll see batman is we also have their many types of comedy ever just to
00:07:55
interpret identity and many people working on it what would have been using here what does happen shoulders
00:08:02
so there's some type of of of of what you're looking at that you think
00:08:04
that they they knew more than what you're telling it and then to train fastest norman
00:08:09
and what you can what was the only is you can get this type of attention maps so
00:08:13
what is and just marks big that colour means that department the motel paid more attention to this feature
00:08:20
and what is what is that you can make some interesting observations so for instance here and this
00:08:25
is i'm comparing the still real compounds which is messy they need any might be neat data both use
00:08:32
in the clinic for the treatment of looking yeah and you can see why i chose this one
00:08:36
because they're identical except for the sat down here you have a site for and here you have manipulated
00:08:42
and what what is quite interesting tool settings in the case of mess even if you see that that tension
00:08:46
is mostly localise here in this up to sell four while the rest of the molecule gets kind of diffuse attention
00:08:53
well in that case ultimately name that tension there's not this we name but then so that tension is
00:08:57
diffuse yeah he don't here so it seems that the model is not clear about what is more informative
00:09:04
so again this is quantity but we make is that both companies on the gunnery spam is the hypotheses that probably the subtle we
00:09:09
sampled that that be imported you want want to find out if you make this molecule to sign i'm working with ten times properties
00:09:16
so this is an example of how i interpret in the modern is getting information and what how the model
00:09:20
makes a decision you can spare really hypotheses to fire up the mice your bait that would get your problem
00:09:28
so if you're reading this thing of playing with batman we have developed an open what service would
00:09:33
you can open up your great that you kind of know that your profiles plans to public data
00:09:38
you can also decide you're more like you and then you get the
00:09:40
production of i. c. t. values but also you get this interpret them remarks
00:09:44
both at the level of the compound buttons for the little g. s. the model would page which didn't send more important
00:09:51
and i would like before wrapping up this says say example i would like to mention
00:09:55
this additional what would i not when i get into the there's here out of time considerations
00:10:00
but i like to mention that we have continued this works off the department we've
00:10:03
developed pacman and it would be i didn't comes from the informal learning what's prodding
00:10:08
sort of you know we somebody kind of been deep type of off of of
00:10:12
of letting would you basically have any that if we if reading would give a feedback
00:10:16
anyhow so these phenomena then was focus on the mats more challenging task of the signing compounds
00:10:22
optimised for a particular transcript commit profile and again this study with the
00:10:26
eh computationally study show that is already permission matchup wrote for other design
00:10:32
so you're interested in hearing about the vast talk to me later in the coffee break and i'm happy to to to see if it depends
00:10:39
but let's move to the next the second project i would like to add this spectacle what i would like to discuss here
00:10:46
um okay so here we work a focus in a different probably also related to cancel personalise
00:10:52
therapy but here they said that we move away from the fun that can affect the and
00:10:56
we moved from doing more there who is an american works for me is very exciting ten
00:11:01
thirty minute set up is in particular who are quite interested in what in the cell based therapies
00:11:06
so again for those of you who i'm not an expert on this topic to sell these cells have very important properties
00:11:12
one of them is a set of toxic probability that means that the cells have at the cell receptor is that the t. c. s. for short
00:11:18
that is recognised by indian constantly if this in your forties when it recognises assailed
00:11:23
that looks animals because of the seas sale or concerts it at the cell can destroy
00:11:29
and because of this property of course this is very interesting to use the same sense but then set and we can see digits and
00:11:34
this is what pieces that beast have tried to do they have
00:11:37
trained when you're near the set is still hands a cancer kenyan activities
00:11:44
so too does that opinion even you have to either setup probably it's the first one is
00:11:47
the product of recognition you have to make sure that you know whether the cell is recognising
00:11:52
and this amounts to predicting the binding of at the centre set don't twenty would be
00:11:58
so we decide to approach is probably going to interpret that would be
00:12:00
planted and we developed at the plenty more the the politically spending probability
00:12:05
so for a technical term architectural point of view actually we
00:12:08
borrow many of the elements from pacman adopted application is different
00:12:14
so let me show you some results again ah ask but this is some with the mobile networks so you think that the
00:12:20
smoothing what i mean is that doing today to different types of information on one hundred how the sequence of the disorder sceptre
00:12:27
and on the other hand behind the sequence of tape it up and then the article
00:12:30
one thing that that and ice independently and then they interact and you make your prediction
00:12:36
what is very interesting is as do we have any use the same a potential make any sense as
00:12:41
as we did with documents and then we can get this type of maps as we did with document
00:12:46
yeah the can see that we get that initial mass at the level of amino acid so this ad
00:12:51
for instance three different sequences eighty senator sequences that that
00:12:55
predicted by the model to buy this in a people
00:12:58
and when you get here in that qualities that i mean i see that the more
00:13:01
those things the more that things are more important to pretty combined into this particularly picked up
00:13:07
another and what what are the challenges of this problem is that we don't know the veterans so for that we need to we need
00:13:12
to have the crystal graphic on the structure of the more than to see which i mean what is an important binding and we'll having it
00:13:20
but that that was undertaken for instance you can see that these
00:13:22
these sequences that again are predicted to bind to the same that people
00:13:27
does the minister that highlighted with high attention are a good data and closer
00:13:32
unity in in unity sequences well those were there more viability are considered less important
00:13:38
and that makes sense because if they're gonna buy this anecdote it makes sense that perhaps
00:13:42
your remote beeps at least in a bind impact so they decided to take is reasonable
00:13:48
made up by not showing the plot but we could also assess accuracy with the standard machine an approach so you have
00:13:52
to have traded of train and you're tested and of course we got very reasonable accuracy in the board based on that
00:13:58
here i'm focusing on the but but i will part about of course we need to spend that ah at us evaluation
00:14:04
what they want to highlight here is that this that type of thing that but that really the not not only does
00:14:09
when teens work where but they can also did you hear hide lies when things do not work but it's only one example
00:14:16
so when we focus on this probably in predicting the binding of this interceptor and at the top of course you can do it
00:14:21
in two different directions you can't fares fixed at the top unproductive
00:14:25
mind into new disorder set that this is what they did here
00:14:29
but you can do it in the other direction do sixty at the top and you put in the binding to new disorder sectors
00:14:35
this is much more harder because this much less the that so we knew
00:14:38
that would be hard task and would try the it we've got this attention maps
00:14:44
and of course there are many wrong see somebody's attention maps you you will have to be much eleven expert
00:14:48
to see that there are many reflects for instance here that these here either convenient but is that's zero token
00:14:55
does it they just happen doc in that we had to make all sick
00:14:58
we see the same if it does not convey any manner by the legal meaning
00:15:02
and yeah the has the highlighted with attention so that the rate energy that's something nice off with the models production
00:15:08
oh somebody they surprisingly all happy don't that that show was heightened
00:15:12
as possible highlight in the same positions which again is very suspicious
00:15:17
so we have to investigate any need mark understand what was going
00:15:20
on eleven twenty what we realised what was happening is that the modern
00:15:24
realises quite correctly that there is not enough data to try to lead bind into new it because it was just it s. not there
00:15:30
i want the model that is just the morris's the data and that's why you select this yet because that is just kind
00:15:35
of enough to predict just just hasn't regulator so you pretty
00:15:38
well to your data but of course you cannot unionist annuity tops
00:15:43
so i'm sorry this example again is a negative example because i think it's very important
00:15:47
to predict when things work well but basically pretty important but it would does not work with
00:15:52
and this is the conclusion of the is that that you is is it one important but in both cases when things work and would please do not work
00:16:01
okay i'm sorry i see the this whole this kind of like concept for the shift already supported
00:16:07
okay um i don't know now move to the last example i would like to show
00:16:13
which um we again use interpret that would be planning to write different bass here and we
00:16:18
are focusing on product that can set and we wanted to show is that we can do
00:16:23
ah use in there but i want to play i mean to do um ah data integration
00:16:27
and here with focusing on integrating the wage base ah it's lights with i don't mistake information
00:16:34
so just give a little bit about how does that that's what's so this comes from this that that comes from the
00:16:38
ca which is selected reported only in the u. s. which has um would be more than later for the from concept types
00:16:45
so basically it for its by nazi use for companies that any c. take that basically
00:16:49
the biopsies a slight in c. portions the middle portion is used for id sixty what's it
00:16:55
and this top and bottom portion size used to use time they
00:16:58
emit imagine that image slice this band that it's in a slide
00:17:02
and this is used to quality control to make sure that you have enough concepts that under somewhat before going to sequencing
00:17:09
so what we have solar cell is whether we would predict i don't it's sick let us
00:17:14
make sure he was he was in so they much analysis through the through the image slides
00:17:20
and short answer was when before going to dance at the how we did it again that it is about technical in the details
00:17:27
so basically we have this list lights from the top and the
00:17:29
bottom section and then we just passed is split into small patches
00:17:34
i miss but is is is uh was finalised within the pen help retrain him a tennis model and
00:17:39
it and then get this good we use resin that which is one of the standards for the much analysis
00:17:44
and then we had a simple classification and are densely so with plastic model to do is
00:17:49
to predict for each patch to predict that you know special ed real for this particular that
00:17:55
but we also i cannot ensure later and think we use this
00:17:58
feedback from that angela get to weight the average of each batch
00:18:02
so in the end you just get a core fits always that intersect confidence with attention is very high
00:18:07
we give more weight to dispatch because within the models more confident about
00:18:10
the production in dispatch undecided they'll be davy of what we're doing here
00:18:16
okay let me show you some results so first of all we in the way we evaluate the erroneous these
00:18:22
in this median absolute percent it ever so basically but you how all from yeah from the mission about you
00:18:28
and here they closer to see little bit sold the good news is that would more than ready and apply
00:18:33
that outperform the space instead of yeah another model that was using similar architecture button could interpret that we need to
00:18:40
so the first computer is that in that but i mean if you're ready increases your accuracy
00:18:46
and this of course we compared with random guessing we just did more than just predicted mean at anybody and
00:18:51
you see that these these are already have the paper sitting in improvement over the random guessing so we're predicting something
00:18:58
now we don't want to see a specific production for a specific genes
00:19:01
so this is the idea that we more they'll uh with house accuracy
00:19:06
oh i forgot to say that but we only for somebody to forty is
00:19:09
that that's a city with colorectal cancer so we have far more than thirty
00:19:13
so this is the idea that have the highest accuracy is deciding
00:19:16
that is associated with one of the two types of colorectal cancer patients
00:19:21
and there's this that that's what you guys evaluate how these them what the output of the more the so this is really not h. h.
00:19:26
inane slides and this is their role at that that they're all ah
00:19:31
the gene expression predicted for each part without the white beam by that patient
00:19:35
and this is the way the the ones you waited with attention so you can see
00:19:39
that these output this final output use that much more modern art to consider the more that
00:19:44
identifies which regions are associated with a high is units special forties but be ready made and
00:19:50
you can see a c. m. l. tinsel hotspots so it becomes a bit more new ones
00:19:55
and of course you can do this for different for different genes come forth with only meat is and what you can see
00:20:00
that clear in what else is the song radios everybody knows what
00:20:04
sports associated with out a pretty high especially not this particular thing
00:20:09
so basically this is a again um the computational model was published already what we are doing now with working with
00:20:15
apologies from university of them to experiment anybody they don't real images with with with i wanna stay in for this yes
00:20:21
so one day depictions of the smoking so what would happen if that's all
00:20:25
fine and the model predicts with with that us either you know special value
00:20:28
but of course would like to validate now that these regions when
00:20:31
it corresponds to get regions of highly specialised practicality and disassemble impressed
00:20:37
but the message i would like to you these for them being is that at least within that
00:20:40
but that anything you already get an increase in accuracy which is already yeah i think we can have
00:20:47
okay and things like that i'm gonna come not so what i thought
00:20:53
i come visit now is that um we need to have in the
00:20:56
but that one more there's that because that is really actually mean outstanding predictions
00:21:00
but models behave at black boxes so we really need to break the
00:21:03
black box and get some insight about how the more that are making predictions
00:21:10
in that but i think it is important into my opinion for for two different reasons first
00:21:14
of all because to continue new insights or they think that that you could not have known otherwise
00:21:19
but also because sometimes you many cases where ah ah models give give very reasonable predictions
00:21:24
about role so we have really be a bit skip the guy with black box models
00:21:29
and regarding the use cases so i show you background with this and i would have preposition
00:21:34
for reading but but that one would be more than the platoon approach to predict doc sensitivity
00:21:39
we also talk about like that which is also a more that we'd have a
00:21:42
bit of the predictive i know this intercepted again you see that but i really think
00:21:47
and the last one is this the model that person to display the at any values from inmates analysis and and this
00:21:53
is again the politicians online but you can you can see already that anybody with high accuracy they i dennis expression binding
00:22:01
and with that i would like that i would like to think that people for i. b. m. v. set of course

Share this talk: 


Conference Program

The Idiap Research Institute in Martigny is launching a new public series of symposiums
Lonneke, Idiap Research Institute
Jan. 25, 2023 · 9:39 a.m.
420 views
Interpretable artificial intelligence for cancer personalized medicine
Dr. Maria Rodriguez Martinez, Group Leader, IBM
Jan. 25, 2023 · 9:48 a.m.
5 minutes questions: Interpretable artificial intelligence for cancer personalized medicine
Dr. Maria Rodriguez Martinez, Group Leader, IBM
Jan. 25, 2023 · 10:10 a.m.
NIPMAP: Niche Phenotype Mapping of Multiplex Histology Data by Community Ecology.
Dr. Jean Hausser, Assistant Professor, Karolinska Institute, Sweden
Jan. 25, 2023 · 10:21 a.m.
133 views
5 minutes questions: NIPMAP: Niche Phenotype Mapping of Multiplex Histology Data by Community Ecology.
Dr. Jean Hausser, Assistant Professor, Karolinska Institute, Sweden
Jan. 25, 2023 · 10:43 a.m.
Data Science for Precision Oncology
Prof. Olivier Michielin, Head of the Center of Precision Oncology, CHUV, and Group Leader at the Swiss Institute of Bioinformatics
Jan. 25, 2023 · 11:24 a.m.
5 minutest questions: Data Science for Precision Oncology
Prof. Olivier Michielin, Head of the Center of Precision Oncology, CHUV, and Group Leader at the Swiss Institute of Bioinformatics
Jan. 25, 2023 · 11:54 a.m.
Applications of AI in oncology drug discovery
Dr. Slavica Dimitrieva, Associate Director & Senior Principal Scientist, Oncology Data Science, Novartis Institutes for BioMedical Research
Jan. 25, 2023 · 12:07 p.m.
5 minutes questions: Applications of AI in oncology drug discovery
Dr. Slavica Dimitrieva, Associate Director & Senior Principal Scientist, Oncology Data Science, Novartis Institutes for BioMedical Research
Jan. 25, 2023 · 12:32 p.m.

Recommended talks