Torch 1

Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.

00:00:00

Okay perfect. So as a as a friend so

00:00:06

pointed that torture is actually

00:00:08

written at it yet. Um I was having

00:00:13

dinner with some folks for old timers

00:00:16

that an idea. And they were walking

00:00:19

back from the town square and I B so

00:00:23

the old building of idea this really

00:00:26

tiny brown building and it was said

00:00:30

that porches rooted in that tiny run

00:00:32

building then run and was basically

00:00:36

telling here and this this beautiful is

00:00:39

town. So today I'm gonna get three

00:00:44

lectures on towards yes to just some

00:00:47

logistics alright. So just some

00:00:54

logistics the first cell I'm gonna be

00:00:58

giving is on the overview of torch

00:01:01

basically give you like a full view of

00:01:04

high level view of what forges what the

00:01:08

communities like what it is like to

00:01:10

work and forged and some high level

00:01:15

details of the philosophy of George and

00:01:19

so on. The second dog is gonna be a

00:01:22

deep dive into porch we're gonna go

00:01:25

into the the in the inner workings of

00:01:29

George basically looking at how cancers

00:01:35

and storage as work and torch and like

00:01:37

how to use the neural networks package

00:01:40

and the optimisation package and so on.

00:01:43

And that is gonna be useful to start

00:01:47

getting into torch it's gonna be like a

00:01:50

few rack and then the third talk is

00:01:54

basically going to be extensions of

00:01:59

course interesting package is new

00:02:01

paradigms of computation. Um and some

00:02:07

showcases of after after you are sure

00:02:10

stark on genitive modelling I will

00:02:15

directly going to an implementation of

00:02:18

one of the generative models is gonna

00:02:20

talk about in torch. And also give some

00:02:26

extensions of to read and so on. Um

00:02:30

during the breaks as well as during the

00:02:32

lunchtime you could chat that some of

00:02:35

us were here. Um I would so there is a

00:02:40

very "'cause" I grew go us are gonna

00:02:42

get up he's from price tag and he came

00:02:48

here just to be able to chat at any of

00:02:51

you of deeper questions into torch if

00:02:55

you have issues which George how to get

00:02:57

them fixed up and so on. And also we

00:03:00

have two excellent local experts

00:03:02

backdrop a narrow. And image repel us

00:03:06

both of them are PST students have run

00:03:09

a colour there and they can also be

00:03:13

good source of people to check that and

00:03:16

I I'm available as well probably not

00:03:19

during the lunch break but definitely

00:03:23

and the other breaks. Um okay let's get

00:03:27

started. This. So this particular talk

00:03:33

will have the structure what is towards

00:03:37

the community of George a common use

00:03:40

uses of George how people use torch in

00:03:43

the community the core philosophy

00:03:45

behind or some something that we

00:03:48

wouldn't change regardless of how we

00:03:52

would move forward in the future the

00:03:54

key drivers of good the reason white or

00:03:57

just popular or by B thing towards just

00:04:01

popular that would be helpful in

00:04:03

general to get a perspective on what

00:04:06

are the main value additions of george.

00:04:11

And also a little bit about the future

00:04:14

what we're planning next very very high

00:04:19

level view of like our future plans. So

00:04:24

what is stored storage is a scientific

00:04:26

computing framework you can think of it

00:04:28

as a similar to matlab or python with

00:04:33

that's I pine by it it's it's at the

00:04:38

core of torch is is and and the aerial

00:04:42

every we call them tenders. Um and it's

00:04:50

it's an interactive reptile based and

00:04:54

run meant so George is you you can open

00:04:58

an interpreter you can execute commands

00:05:01

you can see what happens it's is that

00:05:02

exactly like when you work with matlab

00:05:05

you can plot something you can train a

00:05:07

network and so on. Um and it's very

00:05:14

very simple to use it has one indexing

00:05:19

to emulate matlab it was written as

00:05:23

from from my understanding this is this

00:05:25

predates me but it was written as close

00:05:30

like something that matlab users can go

00:05:34

to words they wanted to do more serious

00:05:37

competition from like a systems

00:05:40

perspective it has plotting it has all

00:05:45

the bells and whistles you'd expect

00:05:46

from a scientific computing packets

00:05:48

definitely not as rich as matlab and

00:05:53

certain aspects like matlab has certain

00:05:57

tool boxes that are not available and

00:05:59

and the other committee but it's the

00:06:02

same can be said for torture we are

00:06:05

strong and for example the neural

00:06:07

networks packages and optimisation

00:06:11

algorithms based on grade in peace and

00:06:14

and we'd like to focus and either at

00:06:17

least for the near future. So one of

00:06:21

the key values of tort is that we have

00:06:27

we are based on this language called

00:06:31

cola. And this indian college at that

00:06:34

that runs the and this is a very very

00:06:37

perform and jet engine digit is just in

00:06:42

time compilation engine where it it

00:06:46

takes your it takes your high level

00:06:48

code you're the look of it and then it

00:06:49

compiles it dynamically and very smart

00:06:54

plays and what that reflects too is

00:06:57

that you can write high level could

00:06:59

like you're right in matlab or I don't

00:07:02

know low like where you don't have type

00:07:04

safety and other features that you

00:07:08

would get from a compiled language. But

00:07:10

it would be fairly fast it would like

00:07:14

you wouldn't have if you write a for

00:07:16

the in towards you wouldn't you

00:07:18

wouldn't be like oh my god it's it's

00:07:20

taking days to and that the the

00:07:22

difference between the difference

00:07:27

between writing stuff in lieu and

00:07:30

writing stuff in C is bearable I would

00:07:34

I mean it there's obviously performance

00:07:36

differences clear performance

00:07:38

differences but like while you

00:07:40

prototyping it's it's very very

00:07:42

efficient compared to other interpreted

00:07:45

languages like python or matlab the

00:07:50

second key feature we have in torch

00:07:53

that we use all the time. Uh is it's

00:07:57

really really easy integration into C

00:08:00

we have and and and low there's

00:08:05

something called FFI which is now

00:08:08

common in other languages like for

00:08:10

example python. But George was always

00:08:14

meant to be of very very moos interface

00:08:17

on top of C so you wanted to write your

00:08:20

heavy heavy like heavy processing in C

00:08:25

and then you would just want to have

00:08:26

little interpreted language to do quick

00:08:29

prototyping. And to write to rap C

00:08:33

could indoors you don't actually even

00:08:35

have to write complicated bindings you

00:08:38

basically just call the C code as is

00:08:41

within your little a program as an

00:08:43

example here we wrap D and BD as

00:08:48

quickly and then library and forge and

00:08:51

we never actually have to go into

00:08:55

writing our own C code or of any sort

00:08:58

all we have to do is in little we can

00:09:01

directly call us the C function in this

00:09:04

case could in convolution forward just

00:09:07

that with the arguments that it expects

00:09:10

ended just just works out of the box

00:09:14

this saves a lot of time and you are

00:09:16

wrapping either existing libraries or

00:09:19

you're writing your own C code and you

00:09:22

want to have an interface between them

00:09:26

the second key value that torch

00:09:28

provides this is one of the big reasons

00:09:30

that people come to as and use torches

00:09:34

because of the strong cheap you

00:09:36

support. Um torch has a large transfer

00:09:42

library of over a hundred fifty

00:09:44

functions and all of them are

00:09:47

compatible with the they work on both

00:09:50

the CPU and the G you and especially on

00:09:54

the jeep you these functions are

00:09:56

extremely performance quite a few could

00:10:00

engineers spend some time optimising

00:10:04

most parts of the code and we also part

00:10:08

of the court tensor library we have a

00:10:10

neural network library that specialises

00:10:12

just for neural networks and their

00:10:14

performance and that has also been

00:10:17

fairly optimised for GP use. And of a

00:10:23

complete G utensil library is actually

00:10:27

as of today unique to torch especially

00:10:32

one that's performing like there there

00:10:36

are alternatives like could a mad and

00:10:41

the the eigen library. But they're very

00:10:44

limited in their jeep you support like

00:10:46

it just wasn't written with cheap you

00:10:49

first dancers in mine. And like I think

00:10:55

that's something that we have been

00:10:59

focusing on for about since two

00:11:02

thousand eleven or so. And initially we

00:11:06

started off at whatever we can on the

00:11:09

GP support but now it's fairly complete

00:11:11

and for for our users they find it very

00:11:14

natural the transition between CPNGQ

00:11:16

without having a look like without

00:11:19

knowing the difference the next thing I

00:11:23

wanna talk about is who uses storage.

00:11:26

There's actually a large number of

00:11:28

users of course these are only just a

00:11:31

subset of utters idiots it's right

00:11:34

there in the middle of it quite a few

00:11:36

people using torture. Um and also

00:11:40

really large companies like face book

00:11:42

where work of you stored for all of our

00:11:45

research all the for a research that

00:11:48

and there's about fifty of us and tutor

00:11:53

uses torch not only for research but

00:11:56

also for in production environments for

00:11:59

covering okay court covers their image

00:12:05

recognition video recognition and the

00:12:08

language use cases as well

00:12:11

introduction. Um and there's quite a

00:12:14

few schools that you torture show some

00:12:16

of the most active ones I've the put

00:12:19

there. And other larger companies yeah

00:12:22

and ex IBM I. D. M. I visit them a

00:12:27

couple weeks and they said pretty much

00:12:31

all of their speech and an LP pipelines

00:12:34

are now and George and and there's some

00:12:38

interesting companies there Tara deep

00:12:41

for example on the on the right side

00:12:43

bottom. They do torch for FPGA is and

00:12:50

specialised chips like so they

00:12:52

basically have these refugees and chips

00:12:55

and then you can just use torch and as

00:12:57

the as the and the back and is an FPGA

00:13:00

stuff or GP or CPU for example. Um mood

00:13:04

stocks the company at the bottom there

00:13:06

the used towards to train the networks

00:13:09

and they run them on mobile they're

00:13:12

basically a mobile company for image

00:13:15

recognition other examples in the

00:13:20

community are packages I'm gonna just

00:13:24

like go over like a few popular

00:13:26

packages one of the strong points of

00:13:29

some other communities for example the

00:13:31

cafe community has been the models to

00:13:34

where people after they do research

00:13:38

they can share their train models so

00:13:41

for other researchers to use. And like

00:13:45

what we wanted was to leverage the

00:13:49

value that the cafe community provide

00:13:51

so we have a package called load cafe

00:13:53

that's very gay that basically loads

00:13:56

caffeine models into porch up pretty

00:13:59

seamlessly and you you can then use

00:14:02

these models for to do all of your

00:14:06

research and forge. Um and like if a

00:14:10

new paper comes out and there's a

00:14:12

preacher in cafe model you can just

00:14:14

like pull that off extract features and

00:14:16

like plug it into existing George got

00:14:18

for example at face the once a there's

00:14:24

these class of a con that's that

00:14:26

appeared recently called racy deal

00:14:28

networks if you are in the line of

00:14:31

computer vision or deep learning you

00:14:33

would have heard of them. Um at least

00:14:36

look we as soon as that people we

00:14:38

cannot be really interested and that

00:14:40

and we released code for training

00:14:44

vestibule networks from scratch these

00:14:46

are very very deep networks up to a

00:14:49

thousand layers deep and training these

00:14:53

from a systems perspective is not that

00:14:57

simple because thousand layers they

00:15:00

want to take as many jeep use as you

00:15:03

can give them. So it just as a the evil

00:15:08

this mostly for ourselves but we

00:15:09

thought it would be interesting to just

00:15:11

open source it for the community where

00:15:14

you have a complete example that's

00:15:17

fairly simple to follow where you train

00:15:20

multiple cheap use con that's that's

00:15:25

that's especially in this case and that

00:15:29

gives you an end result that state of

00:15:31

the art. And we also really speed in

00:15:33

models for that. Um and Google really

00:15:37

is is their inception models which are

00:15:39

also another up retrain demented models

00:15:42

for vision and we do have those models

00:15:47

as well ported records and one thing

00:15:50

that's what thing that happens is that

00:15:52

if there is any three two in model that

00:15:55

appears in another framework someone in

00:15:58

the community just port it into a torch

00:16:01

within like a week or two and so are in

00:16:05

our users are in that were never had

00:16:08

the feeling of like being left out from

00:16:11

from the state of the art. And I think

00:16:16

this is one of the important aspects of

00:16:19

course the fact that we have a large

00:16:20

enough community that people don't feel

00:16:23

like working like they're just working

00:16:26

by themselves but they feel like they

00:16:28

are leveraging a lot of value from the

00:16:30

community itself and like were the past

00:16:37

year we've been looking at the logs for

00:16:41

like how popular torches and it's it

00:16:45

has about that I was in downloads the

00:16:47

day on like we basically track the

00:16:51

number of installs over get have ben

00:16:54

it's fully come to you and that's one

00:16:57

of the interesting parts of George

00:17:00

George itself is not backed by single

00:17:03

company for example it's it there is a

00:17:07

nonprofit duh that runs torch and all

00:17:12

the companies that are involved in

00:17:14

using towards they also contribute back

00:17:16

to the the open sores towards in

00:17:21

various parts like that engineering

00:17:23

with performance optimisations with new

00:17:25

packages and so on. Um some of the it

00:17:31

interesting packages examples of torso

00:17:34

examples of form a huge driver and user

00:17:39

adoption for torch if you have high

00:17:42

quality examples and how to use torch

00:17:44

people find that very very useful to

00:17:48

get into torch read or then for example

00:17:53

Reading tutorials which might not cover

00:17:56

like the use case you want to do for

00:17:58

example. So some of the interesting

00:18:01

examples that appear this is neural

00:18:03

talk to which is the captioning. Um

00:18:08

network where you send an image and it

00:18:12

fill us but out a demand image caption

00:18:16

it was written by stand for guys and

00:18:19

require pretty and just in johnson. Um

00:18:24

this is a like one of the nice examples

00:18:27

where you you have an example of a con

00:18:30

and then plugging into an LSTMR and

00:18:33

then and the whole training glue there

00:18:36

and like training these things is

00:18:39

obviously not just like putting

00:18:41

together putting them together and it's

00:18:44

like using some learning rate there are

00:18:47

settles the subtleties in their and

00:18:50

examples like these are interesting

00:18:54

another example is the new style

00:18:56

project another really popular project

00:18:58

on top of which many people have built

00:19:01

art installations. And so on. Um

00:19:04

decision also from Stanford by Justin

00:19:08

Johnson you give a an image from the

00:19:13

real world and some are some painting

00:19:16

and it would do this optimisation to

00:19:22

match the statistics of both of them at

00:19:25

different layers of a pretty train

00:19:27

network. And you would actually get a a

00:19:32

picture that is that looks like these

00:19:36

style of the painting but it's still

00:19:38

the content of the image you gave. Um

00:19:41

this is that this has been one of the

00:19:42

most popular projects and like we've

00:19:47

also there's there's other that that

00:19:51

there's other variance of these just

00:19:54

feed for variance where you can

00:19:56

actually do in euro style and real

00:19:58

time. And we've like someone in the

00:20:01

community converted that plug it into a

00:20:05

video stream and you can actually have

00:20:07

do streams of fly you know or another

00:20:12

popular computer vision application

00:20:15

that has been appearing recently is a

00:20:18

visual question answering your shoe a

00:20:20

Benji O yesterday showed a demo of face

00:20:24

book system that does visual question

00:20:26

answering there is an open source

00:20:28

implementation from Virginia tech doing

00:20:32

be a question answering and this is the

00:20:34

university where one of the popular

00:20:36

datasets comes from for a week a and

00:20:42

some more interesting examples is this

00:20:46

one here is called the neural doodle

00:20:49

the you just do it'll the the painting

00:20:53

that you one like you just give a rough

00:20:57

sketch of like what you want to paint

00:20:59

and then you give some other painting

00:21:01

and it will actually produce produce

00:21:04

your doodle into really arty painting

00:21:08

coming to the more practical aspects

00:21:12

some things that forced does really

00:21:15

well so low itself is a language that's

00:21:19

very very light overhead and it has

00:21:22

been like one of the reasons little as

00:21:26

popular before or shower regardless of

00:21:28

George is that it is used in game

00:21:31

engines a lot because it's a very small

00:21:33

language to the low language itself is

00:21:37

about twelve thousand lines of C could

00:21:40

and game engines use it a lot too and

00:21:44

bed and little into into really complex

00:21:51

and high performance C plus plus. Um

00:21:54

I'd face book one of the things we've

00:21:55

been looking at is hard to learn

00:21:57

physics from the world and we wanted to

00:22:02

start off with virtual worlds. So we

00:22:05

plugged towards into one of the most

00:22:08

popular game engines available which is

00:22:10

unreal engine. And we really is this

00:22:14

integrated I these disintegration into

00:22:18

the open source. And you can basically

00:22:22

plug towards into an unreal engine and

00:22:24

run and and with with the very high

00:22:28

performance like low latency pipeline

00:22:36

you will basically get to interact with

00:22:39

the unreal engine world then you can

00:22:40

for example do various reinforcement

00:22:43

learning and computer vision research

00:22:47

or hybrid of those this examples here

00:22:51

is where at a paper that was published

00:22:54

that ICML this year my colleague adam

00:22:57

lower and some others they learn how

00:23:02

the learn to the learn the physics of

00:23:07

blocks where they want to predict

00:23:09

whether blocks are falling or how

00:23:13

blocks fall if they if they do fall

00:23:15

where do the fall and if you're given a

00:23:18

pitcher example of the picture on top

00:23:21

can you predict whether that picture

00:23:24

the blocks in that picture would fall

00:23:26

over this stay stable and questions

00:23:30

like these and then one of the

00:23:32

interesting things here is that network

00:23:35

was trained fully and the unreal engine

00:23:38

and run meant but then then and at the

00:23:43

validation time at S time when they

00:23:47

wanted to see if that network actually

00:23:49

generalises to real world block falling

00:23:53

they they constructed a small unmanned

00:23:58

of wooden blocks set that like a white

00:24:00

background. And the the network

00:24:03

actually does really well just and and

00:24:06

this real world enrolment even though

00:24:08

it was trained completely in this

00:24:10

unreal in based which will so you could

00:24:16

see how that KI that might extend to

00:24:18

other applications as well another big

00:24:23

thing these days is reinforcement

00:24:24

learning especially that atari games

00:24:27

there are a couple of projects that set

00:24:31

up all the reinforcement learning and

00:24:32

enrolments for you so that you can and

00:24:35

including implementing all of the

00:24:38

popular algorithms and reinforcement

00:24:40

learning for you so you can basically

00:24:42

just go and and use those as you

00:24:46

baselines and do for the research in

00:24:49

improving your reinforcement learning

00:24:51

algorithms. Um this is one of the one

00:24:55

of these little base and amendments

00:24:57

that has all of the popular recent

00:25:03

reinforcement learning algorithms

00:25:04

implemented like DQ networks double BQ

00:25:08

and and so on. Um but apart from this

00:25:11

there is this company called opening I

00:25:15

that's really zen and Ron and that to

00:25:18

do reinforcement learning research it's

00:25:20

called the RLGM and that's something

00:25:22

that has been really well written it

00:25:24

has a lot of and on men's and they are

00:25:27

examples of using O Jim but course that

00:25:31

appeared recently as well and also

00:25:34

completely open source. Um coming to

00:25:38

coming to the NLP side of things there

00:25:40

are several good projects of an LP and

00:25:43

in torch that open source training

00:25:47

language models training sequence to

00:25:51

sequence models maybe for translation

00:25:53

for example there's also this one

00:25:56

interesting project where you have a

00:25:59

conversational model basically a chat

00:26:01

but based on the Google paper that

00:26:04

appear not too long ago in the lumber

00:26:09

that doesn't fifteen. Um after that

00:26:11

paper appeared someone from the

00:26:13

community quickly implemented that

00:26:15

model and torch and this is another

00:26:17

good project if you want to do an LP

00:26:21

research and George and just to take a

00:26:24

look at the internals and and lastly

00:26:29

your shows probably gonna be talking a

00:26:32

little bit about the gender kit

00:26:33

modelling. And I will cover a little

00:26:38

bit agenda kit modelling for images in

00:26:40

the third lecture. Um but there is a

00:26:43

project that I wrote that the that

00:26:47

produces pretty pictures. Um so if you

00:26:51

want if you have it on sets of images.

00:26:54

And you want to basically train a

00:26:57

generative model that can generate

00:26:59

images that are similar to the images

00:27:02

that you gave it like the model like

00:27:05

the models fairly stable and if you

00:27:07

just take the images that you have

00:27:09

probably like about ten thousand plus

00:27:10

images that you give any to try to

00:27:13

build a genitive modeller on this and

00:27:15

people in the community just get this

00:27:19

code to generate eighteen century art

00:27:24

generators Monday characters and so on.

00:27:28

So that's basically an overview of the

00:27:33

community the the the very good

00:27:37

examples that you have in the committee

00:27:40

and George and next I will go to

00:27:44

towards just like the basic packages

00:27:47

that are from the core of course and I

00:27:52

will go into deeper dives of some of

00:27:55

these packages in the next to to

00:27:57

lectures the main packages neural

00:28:00

networks so we have a core package

00:28:02

called and then it's just stands for

00:28:07

neural networks. And and then is built

00:28:12

on this concept that if you want to

00:28:14

compose neural network if you want to

00:28:15

go complicated neural networks then you

00:28:18

build them as some kind of I like how

00:28:22

you build how you build the of a system

00:28:26

that Lego blocks you would basically

00:28:28

put them together like them one after

00:28:31

the other and you can have containers

00:28:34

where you can stack lots on top of each

00:28:37

other or put blocks in parallel as

00:28:40

well. And this helps compose really

00:28:43

really complicated neural networks. Um

00:28:47

and the and then package is powerful

00:28:52

enough to have captured a lot of

00:28:56

architectures without Reading a lot of

00:28:59

code for example one one example that

00:29:03

comes to mind is when the when Oxford

00:29:08

release the BDC network and their paper

00:29:11

cafe had the digit did they did their

00:29:16

research in cafe and the the network

00:29:19

definition of the VGG network in cafe

00:29:23

was about two thousand lines of code

00:29:26

and then the pro to buff and in torch

00:29:30

you could basically right that within

00:29:32

sixty seventy five lines or less. And

00:29:36

it it was just like of it's because

00:29:39

George is not data and it's code you

00:29:43

can really right very flexible

00:29:46

structures and the neural networks

00:29:48

package Powers all that when you when

00:29:51

you don't have you kind of when you

00:29:53

want to build really complicated no

00:29:55

networks for example if you have to

00:29:56

create an LSTM cell or some new kind of

00:30:01

fancy memory or like just crazy

00:30:04

networks that you dreamt off last night

00:30:08

you we have another package that

00:30:10

extends the N and package called the

00:30:11

and then grab package will be going

00:30:14

into both and then and then in the in

00:30:16

the deep that the and then have

00:30:18

packages that's you construct really

00:30:20

complicated neural networks it has a

00:30:23

graph API similar to TNO and that

00:30:26

answer flow. But it's constructed at a

00:30:29

granularity that's slightly higher

00:30:32

instead of cry building graphs on top

00:30:35

of every tensor operation you would

00:30:38

build graphs on top of I'm modules that

00:30:42

a pack more compute then a single tends

00:30:46

operation and recall these layers and I

00:30:53

want to go into another interesting

00:30:56

paradigm of of that that recently

00:30:59

appeared at this is this is a pack is

00:31:01

that was contributed but better. Um

00:31:05

this is call undergrad and this is

00:31:07

slightly differently in which people

00:31:11

can do gradient based learning. It's

00:31:16

unlike and the other packets that is

00:31:20

available for doing deep learning

00:31:27

except for the autocrat that people

00:31:29

from cuter and also vote in python this

00:31:33

is this is the the the the way it works

00:31:37

is not new it has been well understood

00:31:40

what do I ideas it tape based mechanism

00:31:44

to record what is going on in the

00:31:46

forward so basically you can write your

00:31:49

neural network as a bunch of cancer

00:31:52

operations you can even have if

00:31:56

conditions for loops and while that's

00:31:59

basically you can write a function that

00:32:01

is just like the standard kind of could

00:32:05

you right with the for loops and

00:32:08

wireless and elves. And when you

00:32:11

execute that function what autocrat

00:32:13

does is it goes in deep into the low

00:32:17

language itself. And it it records

00:32:20

every operation that happen in the

00:32:22

forward face. Um and autograph defines

00:32:26

a backward operator for every operation

00:32:28

in the ford phase and and and little

00:32:31

that like very little restriction. And

00:32:38

when you want to compute the greeting

00:32:39

to dress back to your function you dis

00:32:43

arbitrary function that you just

00:32:44

define. Um it basically plays the tape

00:32:48

backwards the tape that it recorded

00:32:51

during the forward face it just plays

00:32:52

it backward in for every operation it

00:32:55

confuse the gradient with respect to

00:32:57

each variable involved. And this is

00:33:00

really useful when you're training

00:33:03

networks with dynamic grabs where it's

00:33:09

not the same computation that you do

00:33:11

every time like do like your

00:33:13

competition might be conditional on for

00:33:16

example the current normal the

00:33:18

gradients or it's like it can be and it

00:33:21

can be of a very dynamic about this

00:33:23

dependent on any arbitrary thing and

00:33:27

out of bed can do efficient gradient

00:33:30

computation using that and another

00:33:35

package also really spectrometer which

00:33:37

is very important in today's world is a

00:33:42

package for distributed learning to do

00:33:44

luck to train George models over multi

00:33:48

machine and multi GPU power lines. And

00:33:55

to actually do different kinds of

00:33:58

distributed learning using to dislike

00:34:00

package you actually don't have to

00:34:02

write a lot of code or understand a lot

00:34:05

of complicated infrastructure there it

00:34:10

the dislike package packages the the

00:34:14

whole distributed learning as some kind

00:34:17

of your neural network model has a

00:34:22

bunch of parameters that you're trying

00:34:24

to optimise and you're in you you you

00:34:29

no network is consists of parameters

00:34:31

and activation that it basically

00:34:33

ignores the activations part. And these

00:34:35

parameters are can be can be basically

00:34:40

past and do a these certain functions

00:34:44

that will big your parameters according

00:34:47

to either synchronous a CD asynchronous

00:34:49

a CD or elastic asynchronous a CD

00:34:52

algorithm and even extending this learn

00:34:56

to do your new kind of distributed

00:34:59

research you invent a new algorithm to

00:35:03

do distributed optimisation is really

00:35:06

easy as well because all and little

00:35:09

with the few lines occurred in fact if

00:35:11

you go look at the synchronous an

00:35:13

elastic asynchronous implementations

00:35:16

they're actually in a single file that

00:35:18

very few lines of code and this learn

00:35:22

takes a the MPI paradigm where it

00:35:26

basically has certain operations like

00:35:31

already use and scatter gather

00:35:34

implemented and you you you can build

00:35:37

your distributor documentation on top

00:35:39

of that this is unlike. Um the tensor

00:35:45

flow or the MX net paradigm where you

00:35:50

look at your whole computation your

00:35:54

your whole neural network and the

00:35:55

optimisation as a competition grad. And

00:35:59

you try to do dependency analysis and

00:36:03

find how to optimally collect or

00:36:08

distribute the certain cancers are

00:36:11

variables when appropriate it sim

00:36:14

simplified model but it works as well

00:36:17

and as like fairly good performance I

00:36:23

haven't done benchmarks of this against

00:36:26

against either of the other packages in

00:36:29

the distributed setting so I can't

00:36:31

actually say how it plays out in terms

00:36:34

of this forces that in terms of

00:36:37

performance not coming to the core

00:36:39

philosophy of course. Um there's a few

00:36:44

few things we really care about and

00:36:46

really like about origin we want to

00:36:48

keep them and not move away from any of

00:36:51

the aspects the first is interactive

00:36:53

computing. BV strongly care by having a

00:36:59

researcher open and interpreter keep it

00:37:02

open for days just like do very

00:37:05

nonlinear pads of computation where

00:37:08

they might execute whatever function

00:37:11

that they think of next and this is I

00:37:14

like I think we feel that this is one

00:37:18

of the most powerful based research is

00:37:21

carried out than usual and we do not

00:37:23

want to go to some kind of compiled

00:37:27

then Ron men where you have to like

00:37:32

debugging or doing changing what you do

00:37:37

is harder you have to go into file

00:37:40

change it and rerun you program and so

00:37:42

on. Um so as part of the interactive

00:37:46

computing paradigm one thing we care

00:37:49

about is to have no compilation time at

00:37:53

runtime when you're using towards

00:37:55

itself. So something that for example

00:37:59

packages like TNO or chain or do is

00:38:03

that they invoke the compiler at

00:38:05

runtime do basically optimise their

00:38:07

code better they they put together a

00:38:11

code that specific to your complication

00:38:13

grab and then they compile it was like

00:38:16

this is your and we see see at runtime

00:38:18

and this is something that in looks a

00:38:21

lot of overhead on cognitive overhead

00:38:24

for the researcher it's it's I've see

00:38:27

Indiana programs compiled for like two

00:38:29

or three minutes or even more I've

00:38:33

heard of. TN to programs compiling for

00:38:36

hours for example and this is something

00:38:40

that eats simply research time you're

00:38:41

sitting there and front of the computer

00:38:43

to do research and we strongly believe

00:38:46

that having no compilation time not

00:38:49

even like to second of compilation time

00:38:51

is really important for from a research

00:38:56

setting the next thing is improve the

00:38:59

programming what what I mean by that is

00:39:03

that you want to write your your code

00:39:06

as naturally to the language as

00:39:09

possible you want to write you could

00:39:12

like all you always Virginia code like

00:39:14

that for those while that's it like you

00:39:17

wouldn't want to right part of your

00:39:22

code for example defining a neural

00:39:24

network and some other data like

00:39:28

language like a J.'s on config greater

00:39:31

or like some pro to about or in the

00:39:35

case of tender flow for example as a

00:39:39

paradigm very you have this you have to

00:39:41

use special operators to you do while

00:39:44

loop sort of conditions for example. Um

00:39:48

we we strongly believe that imperative

00:39:50

programming is the least resistance pat

00:39:54

for new researchers to researchers just

00:40:00

get used to programming and do research

00:40:03

and feel very little cognitive overhead

00:40:06

and not to think about how to do

00:40:08

certain things and have it back onto

00:40:10

the main actually and the third thing

00:40:15

is minimal abstraction. So we keep an

00:40:19

emphasis where whenever we want to

00:40:23

whenever you want to find some porch

00:40:25

good that actually does the actual

00:40:29

computation for example if you want to

00:40:32

find out where this soft max operation

00:40:35

is being computed and let's say

00:40:37

somewhere inside C the the number of

00:40:41

hops you have to take to go find that

00:40:44

code you want to keep that as minimal

00:40:46

as possible probably like but then if

00:40:49

you jump for one or two functions

00:40:51

you'll find the code that actually runs

00:40:54

in in C or could actually runs the

00:40:58

competition that you care about and

00:41:00

this is something we think is very

00:41:02

important when people want to write new

00:41:07

modules or contribute back because if

00:41:10

you have too many abstract since you

00:41:11

can't think linearly like you have to

00:41:14

always start thinking. Um through those

00:41:18

attractions and it's it's an overhead

00:41:21

aware after three or four abstractions

00:41:24

you pretty much lost and like you you

00:41:26

don't know a where if you change a

00:41:29

particular part of the code bitch

00:41:32

pieces start moving and that's

00:41:34

something that it's really hard when

00:41:37

you're doing development especially

00:41:39

when you're not an engineer restrain

00:41:41

for several years. Um and so we we

00:41:46

think that having as minimal of an

00:41:48

action as possible to the code that

00:41:50

actually runs the complication that you

00:41:54

just define is very important and we

00:41:57

design all four packages with that

00:41:59

philosophy and lastly we have this

00:42:05

notion of maximal flexibility. Um and

00:42:09

look kind of plays into what we need

00:42:13

their in torture you don't have any

00:42:16

constraints on what you can do or

00:42:18

cannot do are class system doesn't have

00:42:22

a tightly defined interface where you

00:42:24

have to implement certain functions or

00:42:26

you have to you cannot implement

00:42:30

certain interfaces we wrote our own

00:42:34

class system we torn type system in the

00:42:36

something that's a little really give

00:42:40

this the power of er lu it doesn't

00:42:42

actually have any of these look lets

00:42:45

you define any of these fundamental

00:42:48

system that you take for granted and a

00:42:50

more strongly typed language and V

00:42:54

design own systems to be as flexible as

00:42:58

possible in this aspect where any of

00:43:00

the users they can do arbitrary things

00:43:05

that that we never expect them to do

00:43:09

when we're designing the package. But

00:43:11

we think that adds a lot of power to

00:43:14

the users especially the hacker kind.

00:43:16

And it does get as into a lot of

00:43:19

trouble there if you want to write a

00:43:23

package now we have to think about all

00:43:25

the possibilities and all the ways in

00:43:27

which users will use it. And make sure

00:43:29

that the package that rewriting doesn't

00:43:31

break in all these cases and it's kind

00:43:33

of harder as core developers. But we

00:43:38

think it's really important from from a

00:43:41

hacker culture perspective to keep this

00:43:43

mikes more flexibility lastly able talk

00:43:49

a little bit about the key drivers of

00:43:51

growth for torch white people you

00:43:53

stores we seen that having tutorials

00:43:59

and more importantly support more than

00:44:02

tutorials having a lot of really fast

00:44:05

support as really important when you're

00:44:08

building your frameworks. Um because

00:44:12

most users are not covered by a

00:44:14

tutorial like most used wanna do

00:44:15

something else than what you what your

00:44:17

tutorial for but they would ask

00:44:19

questions on the forums or an stack

00:44:22

overflow for example and you would want

00:44:25

to answer back as soon as possible

00:44:28

probably within four hours or like

00:44:32

within twenty four hours because

00:44:34

otherwise users just like they just

00:44:37

never come back to using your package.

00:44:40

I especially like the new ones and

00:44:43

another important thing is pretty train

00:44:45

models and high quality open source

00:44:47

projects as they showed earlier in the

00:44:50

slides. And the GP use it's cheap you

00:44:54

support is something that people come

00:44:57

to court for these days it's actually

00:45:00

been much these situation around you

00:45:03

use has improved a lot especially tend

00:45:06

to flow coming in a lot of a lot of

00:45:09

other frameworks have basically said

00:45:11

they they can't be substandard anymore

00:45:14

but for for quite some time a lot of

00:45:19

users that can't records came for the

00:45:21

fact that we had really strong jeep you

00:45:23

support and you're very proactive in

00:45:26

our development I'm in awe extractions

00:45:29

as I explained in the last slide seems

00:45:32

to be actually one of the key drivers

00:45:34

of credit as your compile time is like

00:45:38

something out of for users say they

00:45:40

find really are some and torch. Um and

00:45:44

one of the big big things is community

00:45:47

a community had is is one of the key

00:45:50

reasons people. Um you storage stick to

00:45:55

torch and are pretty happy overall

00:45:58

because when you just have a lot of

00:46:00

other people doing the same thing you

00:46:03

can just have people to chat with the

00:46:07

back or like ask people have for help

00:46:11

and so on. Um and lastly I quickly

00:46:16

added some couple of slides because I

00:46:20

think someone asked me yesterday about

00:46:23

whether I would do a comparison of

00:46:25

course with other frameworks I'm not

00:46:28

gonna do a comparison of torture that

00:46:29

are frameworks but my colleague young

00:46:31

teen yeah word cafe made this slide

00:46:35

where he places all the frameworks and

00:46:39

this linear access on one side you have

00:46:44

this these properties where you want

00:46:47

stability and SP then like like

00:46:51

basically production ready like never

00:46:54

break and easy to understand for

00:46:57

production engineers and so on. And the

00:46:59

other side is what the researchers want

00:47:02

which is like a flexibility fast

00:47:04

situation cycles and so on. Um and like

00:47:09

as the adding pointed towards us it

00:47:13

somewhere closer to the research side

00:47:17

what we don't compromise on is the

00:47:21

speed like because of the year choices

00:47:25

we made early on like sticking to

00:47:27

little and being a very very close

00:47:29

interface to see we actually are one of

00:47:33

the fastest framework if not like I I'm

00:47:37

I I do benchmarks on the side just out

00:47:42

of interest and forge maintain up

00:47:44

maintains its position as being one of

00:47:47

the fastest frameworks ah so without

00:47:50

compromising the flexibility and debug

00:47:53

ability and it's B and like the whole

00:47:57

research aspect of it coming to the

00:48:01

future of george. So what are the

00:48:05

trends we've seen is that a lot of

00:48:07

goodness comes from fusing computation

00:48:10

for example if you have a convolutional

00:48:12

later followed by a regulator and then

00:48:15

as a bash norm or convolution by some

00:48:18

value if you actually fuse these

00:48:21

operations into a single a single could

00:48:25

occur no that does all of them together

00:48:26

for example it actually ends up being

00:48:29

much faster than if you do them one

00:48:31

after the other even though it's easier

00:48:33

to understand and and it's easier to

00:48:39

implement them separately. And we are

00:48:44

looking into doing this kind of fusion

00:48:47

one of the packages that came out of

00:48:49

Paris tech is the spectacle a net where

00:48:54

it takes your existing neural network

00:48:57

and torch and then it optimise is a

00:48:59

network for memory consumption start

00:49:02

sharing certain certain buffers that

00:49:06

that can be shared and the overall I

00:49:09

make consumption of in no network is

00:49:11

drastically reduced and you can change

00:49:15

the future the D kind of automation to

00:49:18

do that come part at runtime and the

00:49:21

ark continuing and will continue to

00:49:24

break down the barriers of entry for a

00:49:26

new users especially for them to start

00:49:29

developing their own modules rather

00:49:31

than just using what we provide. Um

00:49:33

because that's the only way we scale

00:49:37

and we strongly believe in that like

00:49:39

you cannot have a team of really strong

00:49:43

five engineer is that will do

00:49:46

everything for the deep learning about

00:49:49

it just doesn't scale so we want to

00:49:51

empower people to start thing their own

00:49:56

modules in contribute back and that's

00:49:59

always been our emphasis and we are

00:50:01

going to continue making design choices

00:50:04

that break down barriers of entry and

00:50:08

one of this is we want "'em" keep

00:50:09

making and never forget is keeping

00:50:13

focus on the long tail and by the long

00:50:15

tail I mean all the institutions that

00:50:17

cannot afford three hundred GPU for

00:50:20

researcher it's like we understand that

00:50:23

most of our users and most deploring

00:50:26

researchers in the world have a system

00:50:29

under their desk that they're probably

00:50:31

sharing with another researcher at like

00:50:34

one to four jeep use and this is

00:50:36

something we never wanna forget while

00:50:39

they're writing new stuff or rather

00:50:42

making performance improvements. Um

00:50:46

lastly an important point to to make to

00:50:50

be honest about the world is that the

00:50:53

python ecosystem is much larger than

00:50:56

for example the like the system which

00:51:00

pretty much is just orders on the

00:51:02

scientific computing aside to bridge

00:51:05

this gap we have extensions I will talk

00:51:08

about them in the third lecture we have

00:51:10

a big bridge to python we can call and

00:51:12

the average rate python functional

00:51:14

package including packages that return

00:51:18

by dancers and they will see mostly be

00:51:21

converted into forced answers and vice

00:51:24

versa and we are also looking into some

00:51:28

deeper python integration maybe some

00:51:30

python bindings but that's just like

00:51:33

ongoing thought that's the end of the

00:51:39

first top and feel free to ask

00:51:42

questions okay it is yeah bleep. So I

00:52:16

think if we agree talk really looking

00:52:19

forward to see what comes next but one

00:52:23

question but but me S to be is the

00:52:25

composition melody of models we always

00:52:28

seem to start from scratch. And there's

00:52:31

so many Greek mobiles a there that we

00:52:33

don't know about is there any plan of

00:52:36

creating a marketplace to actually know

00:52:39

what others are doing in them to not

00:52:42

having to know myself as a human we

00:52:45

could good a good model but actually

00:52:47

have suggestions so this is for towards

00:52:53

itself actually happened to created

00:52:55

it's not even that much work it's one

00:52:58

single get have grappled with the read

00:52:59

me and people can send in Florida

00:53:01

that's right. Um but it counted a

00:53:04

larger team off we want a universal

00:53:07

marketplace where you want to have

00:53:11

model definitions from cafe towards

00:53:13

tensor float and everything. Um right

00:53:17

now the they've we propagate

00:53:20

information on like what's available is

00:53:22

mostly via tweeter where every time a

00:53:26

new papers implemented in torture every

00:53:28

time a new creature models with these

00:53:30

indoors we just treated out. And most

00:53:33

of our users of all as there that's for

00:53:40

now but like I mean I don't really know

00:53:43

how to get all the frameworks together

00:53:47

because they they all have their own

00:53:49

strong opinions on the marketplace in

00:53:52

the format the common format to fall

00:53:54

and so on. Uh at least four doors yeah

00:53:57

we're doing what we can to keep all

00:54:00

information centrally okay thank you.

00:54:15

Um could you discuss a little bit on

00:54:18

the PI stability of watch and why you

00:54:21

don't have really cycles okay so that's

00:54:24

something that has gets asked a lot we

00:54:28

don't have really cycles because we

00:54:30

don't have enough maintainers. Um if

00:54:34

any of you are willing to become a

00:54:37

maintainer of towards just as a really

00:54:40

is and in there that's cutting trees

00:54:42

branches feel free to reach out to me

00:54:46

we want to start doing more stable and

00:54:50

structured really cycles there is no

00:54:55

technical limitations there just so you

00:55:03

some new question maybe we can just go

00:55:06

for the coffee break down because you

Share this talk:

Conference Program

59:34

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
July 4, 2016 · 2:01 p.m.

2370 views

55:38

Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
July 4, 2016 · 3:20 p.m.

427 views

01:01:02

Day 1 - Questions and Answers
Panel
July 4, 2016 · 4:16 p.m.

331 views

55:14

Torch 1
Soumith Chintala, Facebook
July 5, 2016 · 10:02 a.m.

815 views

55:57

Torch 2
Soumith Chintala, Facebook
July 5, 2016 · 11:21 a.m.

342 views

01:08:04

Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
July 5, 2016 · 1:59 p.m.

2156 views

49:29

Torch 3
Soumith Chintala, Facebook
July 5, 2016 · 3:28 p.m.

275 views

52:43

Day 2 - Questions and Answers
Panel
July 5, 2016 · 4:21 p.m.

151 views

45:40

TensorFlow 1
Mihaela Rosca, Google
July 6, 2016 · 10 a.m.

2659 views

52:33

TensorFlow 2
Mihaela Rosca, Google
July 6, 2016 · 11:19 a.m.

1705 views

01:05:51

AMD's Open Compute and Open Source cross platform solutions for Machine Learning
Mauricio Breternitz, AMD
July 6, 2016 · 1:59 p.m.

1406 views

01:04:41

TensorFlow 3 and Day 3 Questions and Answers session
Mihaela Rosca, Google
July 6, 2016 · 3:21 p.m.

2251 views

Recommended talks

52:50

Pose estimation and gesture recognition using structured deep learning
Christian Wolf, LIRIS team, INSA Lyon, France
Oct. 17, 2014 · 11:06 a.m.

389 views

20:42

Learning to Segment 3D Linear Structures Using Only 2D Annotations
Dr. Mateusz Kozinski, EPFL
April 19, 2018 · 11:33 a.m.

348 views

Torch 1
Soumith Chintala, Facebook

Embed

Transcriptions

Conference Program

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
July 4, 2016 · 2:01 p.m.

Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
July 4, 2016 · 3:20 p.m.

Day 1 - Questions and Answers
Panel
July 4, 2016 · 4:16 p.m.

Torch 1
Soumith Chintala, Facebook
July 5, 2016 · 10:02 a.m.

Torch 2
Soumith Chintala, Facebook
July 5, 2016 · 11:21 a.m.

Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
July 5, 2016 · 1:59 p.m.

Torch 3
Soumith Chintala, Facebook
July 5, 2016 · 3:28 p.m.

Day 2 - Questions and Answers
Panel
July 5, 2016 · 4:21 p.m.

TensorFlow 1
Mihaela Rosca, Google
July 6, 2016 · 10 a.m.

TensorFlow 2
Mihaela Rosca, Google
July 6, 2016 · 11:19 a.m.

AMD's Open Compute and Open Source cross platform solutions for Machine Learning
Mauricio Breternitz, AMD
July 6, 2016 · 1:59 p.m.

TensorFlow 3 and Day 3 Questions and Answers session
Mihaela Rosca, Google
July 6, 2016 · 3:21 p.m.

Recommended talks

Pose estimation and gesture recognition using structured deep learning
Christian Wolf, LIRIS team, INSA Lyon, France
Oct. 17, 2014 · 11:06 a.m.

Learning to Segment 3D Linear Structures Using Only 2D Annotations
Dr. Mateusz Kozinski, EPFL
April 19, 2018 · 11:33 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Torch 1 Soumith Chintala, Facebook

Embed

Transcriptions

Conference Program

Deep Supervised Learning of Representations Yoshua Bengio, University of Montreal, Canada July 4, 2016 · 2:01 p.m.

Hardware & software update from NVIDIA, Enabling Deep Learning Alison B Lowndes, NVIDIA July 4, 2016 · 3:20 p.m.

Day 1 - Questions and Answers Panel July 4, 2016 · 4:16 p.m.

Torch 1 Soumith Chintala, Facebook July 5, 2016 · 10:02 a.m.

Torch 2 Soumith Chintala, Facebook July 5, 2016 · 11:21 a.m.

Deep Generative Models Yoshua Bengio, University of Montreal, Canada July 5, 2016 · 1:59 p.m.

Torch 3 Soumith Chintala, Facebook July 5, 2016 · 3:28 p.m.

Day 2 - Questions and Answers Panel July 5, 2016 · 4:21 p.m.

TensorFlow 1 Mihaela Rosca, Google July 6, 2016 · 10 a.m.

TensorFlow 2 Mihaela Rosca, Google July 6, 2016 · 11:19 a.m.

AMD's Open Compute and Open Source cross platform solutions for Machine Learning Mauricio Breternitz, AMD July 6, 2016 · 1:59 p.m.

TensorFlow 3 and Day 3 Questions and Answers session Mihaela Rosca, Google July 6, 2016 · 3:21 p.m.

Recommended talks

Pose estimation and gesture recognition using structured deep learning Christian Wolf, LIRIS team, INSA Lyon, France Oct. 17, 2014 · 11:06 a.m.

Learning to Segment 3D Linear Structures Using Only 2D Annotations Dr. Mateusz Kozinski, EPFL April 19, 2018 · 11:33 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Torch 1
Soumith Chintala, Facebook

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
July 4, 2016 · 2:01 p.m.

Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
July 4, 2016 · 3:20 p.m.

Day 1 - Questions and Answers
Panel
July 4, 2016 · 4:16 p.m.

Torch 1
Soumith Chintala, Facebook
July 5, 2016 · 10:02 a.m.

Torch 2
Soumith Chintala, Facebook
July 5, 2016 · 11:21 a.m.

Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
July 5, 2016 · 1:59 p.m.

Torch 3
Soumith Chintala, Facebook
July 5, 2016 · 3:28 p.m.

Day 2 - Questions and Answers
Panel
July 5, 2016 · 4:21 p.m.

TensorFlow 1
Mihaela Rosca, Google
July 6, 2016 · 10 a.m.

TensorFlow 2
Mihaela Rosca, Google
July 6, 2016 · 11:19 a.m.

AMD's Open Compute and Open Source cross platform solutions for Machine Learning
Mauricio Breternitz, AMD
July 6, 2016 · 1:59 p.m.

TensorFlow 3 and Day 3 Questions and Answers session
Mihaela Rosca, Google
July 6, 2016 · 3:21 p.m.

Pose estimation and gesture recognition using structured deep learning
Christian Wolf, LIRIS team, INSA Lyon, France
Oct. 17, 2014 · 11:06 a.m.

Learning to Segment 3D Linear Structures Using Only 2D Annotations
Dr. Mateusz Kozinski, EPFL
April 19, 2018 · 11:33 a.m.