Player is loading...

New challenges of large-scale Deep Learning: High Performance Computing for distributing computations
Mara Graziani, HES

Thursday, April 19, 2018 · 11:21 a.m. · 09m 18s

Deep Learning (DL) frameworks report excellent performances in several tasks, although the demand for more computational resources frequently prevents the use of more complex models and larger datasets. Learning from massive datasets in feasible time is one of the new challenges of the European-funded project PROCESS, which proposes a user-friendly access to High Performance Computing Centres (HPC) to extend HPC from task-specific to general-purpose applications. In such context we investigate the challenges of distributing computations among thousands of cores and hundreds of GPUs, highlighting future prospects and current limitations.

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.

00:00:00

thank you i i i i i i will just give you an overview of

00:00:06

what could be the charges of scaling did learning to high performance computing centres

00:00:12

but before we start let me just give you an example and uh let me introduce rat

00:00:18

friend is a physician she is that mister pathology physicians or he stutters cancer any has

00:00:25

currently ten patients and he start is five difference lights

00:00:30

as this dishes the siemens for each patient so you have a total fifties lights that say

00:00:35

and he spent a lot of powers that it microscope and he works with george is

00:00:40

is friends and it's usually they collaborate together on the cases and a half

00:00:45

a lot of strong disagreements about what could be the predicted comes with age of of a patient

00:00:52

and give is generally really subjective problem to give to

00:00:57

assign cancer stage and it can create many problems

00:01:01

so often they would like to third opinion about what they're saying and that

00:01:06

would like an objective uh advice of what could be the correct answer

00:01:12

so friend who has a more a door to have more

00:01:15

general approach an likes uh artificial experiment in deep learning

00:01:20

has this idea of developing and deploying expert for giving

00:01:24

a objective um assessments of the cancer straight patient

00:01:30

and they face this problem uh as many research

00:01:34

laboratory laboratories around various the computational resources spot

00:01:40

and this is where the project that i'm working on the whole process providing

00:01:44

computing uh solutions for access scale solemn just a situated at the moment

00:01:50

and the idea of what we would like to create is to give or user friendly access to

00:01:55

a large scales infrastructure there generally really tricky to use and have

00:01:59

been used only for a case specific studies like astrophysics or

00:02:04

uh i don't know molecule uh simulations

00:02:09

so why do we liked latch feel deep learning well first scalability is it is it herbal property

00:02:16

we are not just in research that usually big model stand to win over smaller model

00:02:23

and this is for example with the um is a nice chart that shows that

00:02:27

as you increase the complexity of your model you actually achieve better performances

00:02:33

but also the number of operations so the number

00:02:37

of floating point arithmetic operations per second

00:02:40

city need to run your models also increases with the complexity of the model

00:02:46

the second reason more like but still be burning if that data proved to be effective and

00:02:53

that we should actually aim for a new image not so recently and this

00:02:58

is also also something that's a nicely appear during the previous stocks

00:03:03

is that recently research is being a focusing on increasing the model size or improving the pow the

00:03:09

power um of g. p. u. but the largest that the set sizes remain constant over time

00:03:17

so they did if uh experiment would be try to see what would have happened if instead people focus it and

00:03:23

having larger data sets and spending money on annotating what happens is that there is increase in the performances

00:03:31

one of the problems though is that

00:03:34

annotating data sets is really expensive and people usually doesn't like to spend money on

00:03:41

i'm off in another thing that we would like to say is that productivity should

00:03:46

be and bounded so we could like we would like really much to

00:03:49

have better tools so that we could drive more ideas ensured a period of

00:03:53

time we have a sort of interactivity when we run our experiments

00:03:58

is usually what happens is that if you haven't experimented takes minutes to run or hours you can still have

00:04:04

some kind of feedback and you can still think about what's going on and what you should do next

00:04:09

if if you have to wait for taste a few weeks of for months then

00:04:12

maybe you will be careful even your experiments and some you would even try

00:04:18

and you can even think actually of uh i'm searching ultimate in an automated mode

00:04:25

for what could be the better model bitch fest model that you should use for your data's for

00:04:30

example these shows what happens if you have a reinforcement learning algorithm that's trying to find

00:04:35

the best evolution a layer to apply according to the problem that you're trying to solve

00:04:40

and the red the red lines really show that refers wondering oops

00:04:46

so the questions about you know we're trying to solve that we want

00:04:50

to actually think about if we train larger more powerful models faster

00:04:58

well one one thing that i would like to point out is that stick learning is intrinsically high performance computing

00:05:06

mostly you have when you have the parametric no matter how big it is or how shallow it is

00:05:11

you are always most of dot products so you haven't hired mike intensity

00:05:18

and generally if you want to train metric and if

00:05:22

you have conclusion of course arithmetic intensity increases

00:05:26

and if so generally if like on top of the had you can say that matter training takes more or less than half a lot

00:05:34

which is high performance computing and turn around time as one of our metrics because

00:05:39

we like to run experiments in fast way we want to seek out some

00:05:44

and it is also one of the key performance indicators of high performance computing as well

00:05:49

with arthur efficiency and strong scalability which is something that we would like to have

00:05:56

anyway there is one problem that strong scully put it near matters is stuff it's a complex problem

00:06:03

because there may be many difficulties when you try to probably

00:06:06

lies at the parametric especially due to stochastic right indecent

00:06:11

that he's intrinsically and that it active algorithm when you're looking for the better

00:06:17

values of your parameters in europe beating of parameters on the basis of the previous ones

00:06:22

so it's sequential problem and how do you how can we probably lies this

00:06:28

so there's two types of how that isn't that can be used with the one with mother brought it home and one is the problem

00:06:36

model rather than focusing competition in the model across different machines so you have one deep learning model

00:06:42

you have usually years and have multiple diffuse and then you

00:06:48

assign to each to view the training of one later

00:06:51

when you haven't fared dotted rather listen you just replicate your bottle on each of you do you use

00:06:57

and you split the data across the deep use so that you train the same model

00:07:02

on the different reviews and what you have at the end the problem at the end is that you want to much you wait

00:07:07

you want to have it the and one of the model that has been trained on all your data

00:07:13

but we want also to think about something else that he's you getting computation at they have affirms computing level

00:07:19

and at the server level for example so you have multiple centres without any need to move the data

00:07:25

so you could use for example uh essential data house and change different institutions

00:07:31

but you can also think of training independent models in it institutions with the data that you happen and reasoning sample model

00:07:38

and this is nothing really new but you cannot to think of a toss train the weights

00:07:42

so try to do is just to learning so you'd start training in one centre

00:07:47

once you saturating and not learning anything more just move your model

00:07:51

to another centre news the data that are in that centre

00:07:55

and this is pretty good in case of hospitals because usually hospitals don't like to share

00:07:59

their data is patient it is private and you can adjust it also teaches valuable

00:08:05

and alternative if we can think of single weight transfer we can also think of the people we transfer of where we always really

00:08:12

the constantly shifting or weights and training and all the data in a cyclical way

00:08:18

so i'll just stop i'll only wanted to introduce some new frontiers for healthcare it could be given thanks to this

00:08:24

project and for examples yelling our models would allow the

00:08:28

effective use of ever going volumes in medical research

00:08:33

and i i allow us it out to talk all cancer diversity because you

00:08:37

may have different types of cancer and we need many different models

00:08:42

and uh it's been given in front of the bill representation learning with a better understanding of the diseases and so better model

00:08:49

and again decreasing the turnaround time for experiments one of the things

00:08:53

that we would like because it gives you high interactivity

00:08:57

and then we trying to also say that it's a is gonna help for

00:09:03

creating more experiments and adjusting on the transparencies of giving better

00:09:07

interpret ability for models that it's it's pretty common

Share this talk:

Recommended talks

20:31

Dynamic texture analysis with deep learning on three orthogonal planes
Dr. Vincent Andrearczyk, HES-SO
April 19, 2018 · 10:44 a.m.

594 views

04:05

Mots de bienvenue
Laurent Sciboz, directeur Institut Informatique de Gestion | HES-SO Valais/Wallis
June 15, 2018 · 9:24 a.m.

271 views

New challenges of large-scale Deep Learning: High Performance Computing for distributing computations
Mara Graziani, HES

Embed

Transcriptions

Recommended talks

Dynamic texture analysis with deep learning on three orthogonal planes
Dr. Vincent Andrearczyk, HES-SO
April 19, 2018 · 10:44 a.m.

Mots de bienvenue
Laurent Sciboz, directeur Institut Informatique de Gestion | HES-SO Valais/Wallis
June 15, 2018 · 9:24 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

New challenges of large-scale Deep Learning: High Performance Computing for distributing computations Mara Graziani, HES

Embed

Transcriptions

Recommended talks

Dynamic texture analysis with deep learning on three orthogonal planes Dr. Vincent Andrearczyk, HES-SO April 19, 2018 · 10:44 a.m.

Mots de bienvenue Laurent Sciboz, directeur Institut Informatique de Gestion | HES-SO Valais/Wallis June 15, 2018 · 9:24 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

New challenges of large-scale Deep Learning: High Performance Computing for distributing computations
Mara Graziani, HES

Dynamic texture analysis with deep learning on three orthogonal planes
Dr. Vincent Andrearczyk, HES-SO
April 19, 2018 · 10:44 a.m.

Mots de bienvenue
Laurent Sciboz, directeur Institut Informatique de Gestion | HES-SO Valais/Wallis
June 15, 2018 · 9:24 a.m.