Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:00
thank you i i i i i i will just give you an overview of
00:00:06
what could be the charges of scaling did learning to high performance computing centres
00:00:12
but before we start let me just give you an example and uh let me introduce rat
00:00:18
friend is a physician she is that mister pathology physicians or he stutters cancer any has
00:00:25
currently ten patients and he start is five difference lights
00:00:30
as this dishes the siemens for each patient so you have a total fifties lights that say
00:00:35
and he spent a lot of powers that it microscope and he works with george is
00:00:40
is friends and it's usually they collaborate together on the cases and a half
00:00:45
a lot of strong disagreements about what could be the predicted comes with age of of a patient
00:00:52
and give is generally really subjective problem to give to
00:00:57
assign cancer stage and it can create many problems
00:01:01
so often they would like to third opinion about what they're saying and that
00:01:06
would like an objective uh advice of what could be the correct answer
00:01:12
so friend who has a more a door to have more
00:01:15
general approach an likes uh artificial experiment in deep learning
00:01:20
has this idea of developing and deploying expert for giving
00:01:24
a objective um assessments of the cancer straight patient
00:01:30
and they face this problem uh as many research
00:01:34
laboratory laboratories around various the computational resources spot
00:01:40
and this is where the project that i'm working on the whole process providing
00:01:44
computing uh solutions for access scale solemn just a situated at the moment
00:01:50
and the idea of what we would like to create is to give or user friendly access to
00:01:55
a large scales infrastructure there generally really tricky to use and have
00:01:59
been used only for a case specific studies like astrophysics or
00:02:04
uh i don't know molecule uh simulations
00:02:09
so why do we liked latch feel deep learning well first scalability is it is it herbal property
00:02:16
we are not just in research that usually big model stand to win over smaller model
00:02:23
and this is for example with the um is a nice chart that shows that
00:02:27
as you increase the complexity of your model you actually achieve better performances
00:02:33
but also the number of operations so the number
00:02:37
of floating point arithmetic operations per second
00:02:40
city need to run your models also increases with the complexity of the model
00:02:46
the second reason more like but still be burning if that data proved to be effective and
00:02:53
that we should actually aim for a new image not so recently and this
00:02:58
is also also something that's a nicely appear during the previous stocks
00:03:03
is that recently research is being a focusing on increasing the model size or improving the pow the
00:03:09
power um of g. p. u. but the largest that the set sizes remain constant over time
00:03:17
so they did if uh experiment would be try to see what would have happened if instead people focus it and
00:03:23
having larger data sets and spending money on annotating what happens is that there is increase in the performances
00:03:31
one of the problems though is that
00:03:34
annotating data sets is really expensive and people usually doesn't like to spend money on
00:03:41
i'm off in another thing that we would like to say is that productivity should
00:03:46
be and bounded so we could like we would like really much to
00:03:49
have better tools so that we could drive more ideas ensured a period of
00:03:53
time we have a sort of interactivity when we run our experiments
00:03:58
is usually what happens is that if you haven't experimented takes minutes to run or hours you can still have
00:04:04
some kind of feedback and you can still think about what's going on and what you should do next
00:04:09
if if you have to wait for taste a few weeks of for months then
00:04:12
maybe you will be careful even your experiments and some you would even try
00:04:18
and you can even think actually of uh i'm searching ultimate in an automated mode
00:04:25
for what could be the better model bitch fest model that you should use for your data's for
00:04:30
example these shows what happens if you have a reinforcement learning algorithm that's trying to find
00:04:35
the best evolution a layer to apply according to the problem that you're trying to solve
00:04:40
and the red the red lines really show that refers wondering oops
00:04:46
so the questions about you know we're trying to solve that we want
00:04:50
to actually think about if we train larger more powerful models faster
00:04:58
well one one thing that i would like to point out is that stick learning is intrinsically high performance computing
00:05:06
mostly you have when you have the parametric no matter how big it is or how shallow it is
00:05:11
you are always most of dot products so you haven't hired mike intensity
00:05:18
and generally if you want to train metric and if
00:05:22
you have conclusion of course arithmetic intensity increases
00:05:26
and if so generally if like on top of the had you can say that matter training takes more or less than half a lot
00:05:34
which is high performance computing and turn around time as one of our metrics because
00:05:39
we like to run experiments in fast way we want to seek out some
00:05:44
and it is also one of the key performance indicators of high performance computing as well
00:05:49
with arthur efficiency and strong scalability which is something that we would like to have
00:05:56
anyway there is one problem that strong scully put it near matters is stuff it's a complex problem
00:06:03
because there may be many difficulties when you try to probably
00:06:06
lies at the parametric especially due to stochastic right indecent
00:06:11
that he's intrinsically and that it active algorithm when you're looking for the better
00:06:17
values of your parameters in europe beating of parameters on the basis of the previous ones
00:06:22
so it's sequential problem and how do you how can we probably lies this
00:06:28
so there's two types of how that isn't that can be used with the one with mother brought it home and one is the problem
00:06:36
model rather than focusing competition in the model across different machines so you have one deep learning model
00:06:42
you have usually years and have multiple diffuse and then you
00:06:48
assign to each to view the training of one later
00:06:51
when you haven't fared dotted rather listen you just replicate your bottle on each of you do you use
00:06:57
and you split the data across the deep use so that you train the same model
00:07:02
on the different reviews and what you have at the end the problem at the end is that you want to much you wait
00:07:07
you want to have it the and one of the model that has been trained on all your data
00:07:13
but we want also to think about something else that he's you getting computation at they have affirms computing level
00:07:19
and at the server level for example so you have multiple centres without any need to move the data
00:07:25
so you could use for example uh essential data house and change different institutions
00:07:31
but you can also think of training independent models in it institutions with the data that you happen and reasoning sample model
00:07:38
and this is nothing really new but you cannot to think of a toss train the weights
00:07:42
so try to do is just to learning so you'd start training in one centre
00:07:47
once you saturating and not learning anything more just move your model
00:07:51
to another centre news the data that are in that centre
00:07:55
and this is pretty good in case of hospitals because usually hospitals don't like to share
00:07:59
their data is patient it is private and you can adjust it also teaches valuable
00:08:05
and alternative if we can think of single weight transfer we can also think of the people we transfer of where we always really
00:08:12
the constantly shifting or weights and training and all the data in a cyclical way
00:08:18
so i'll just stop i'll only wanted to introduce some new frontiers for healthcare it could be given thanks to this
00:08:24
project and for examples yelling our models would allow the
00:08:28
effective use of ever going volumes in medical research
00:08:33
and i i allow us it out to talk all cancer diversity because you
00:08:37
may have different types of cancer and we need many different models
00:08:42
and uh it's been given in front of the bill representation learning with a better understanding of the diseases and so better model
00:08:49
and again decreasing the turnaround time for experiments one of the things
00:08:53
that we would like because it gives you high interactivity
00:08:57
and then we trying to also say that it's a is gonna help for
00:09:03
creating more experiments and adjusting on the transparencies of giving better
00:09:07
interpret ability for models that it's it's pretty common

Share this talk: 


Recommended talks

Dynamic texture analysis with deep learning on three orthogonal planes
Dr. Vincent Andrearczyk, HES-SO
April 19, 2018 · 10:44 a.m.
594 views
Mots de bienvenue
Laurent Sciboz, directeur Institut Informatique de Gestion | HES-SO Valais/Wallis
June 15, 2018 · 9:24 a.m.
271 views