Player is loading...

Embed

Embed code

Transcriptions

Note: this content has been automatically generated.
00:00:00
Okay perfect. So as a as a friend so
00:00:06
pointed that torture is actually
00:00:08
written at it yet. Um I was having
00:00:13
dinner with some folks for old timers
00:00:16
that an idea. And they were walking
00:00:19
back from the town square and I B so
00:00:23
the old building of idea this really
00:00:26
tiny brown building and it was said
00:00:30
that porches rooted in that tiny run
00:00:32
building then run and was basically
00:00:36
telling here and this this beautiful is
00:00:39
town. So today I'm gonna get three
00:00:44
lectures on towards yes to just some
00:00:47
logistics alright. So just some
00:00:54
logistics the first cell I'm gonna be
00:00:58
giving is on the overview of torch
00:01:01
basically give you like a full view of
00:01:04
high level view of what forges what the
00:01:08
communities like what it is like to
00:01:10
work and forged and some high level
00:01:15
details of the philosophy of George and
00:01:19
so on. The second dog is gonna be a
00:01:22
deep dive into porch we're gonna go
00:01:25
into the the in the inner workings of
00:01:29
George basically looking at how cancers
00:01:35
and storage as work and torch and like
00:01:37
how to use the neural networks package
00:01:40
and the optimisation package and so on.
00:01:43
And that is gonna be useful to start
00:01:47
getting into torch it's gonna be like a
00:01:50
few rack and then the third talk is
00:01:54
basically going to be extensions of
00:01:59
course interesting package is new
00:02:01
paradigms of computation. Um and some
00:02:07
showcases of after after you are sure
00:02:10
stark on genitive modelling I will
00:02:15
directly going to an implementation of
00:02:18
one of the generative models is gonna
00:02:20
talk about in torch. And also give some
00:02:26
extensions of to read and so on. Um
00:02:30
during the breaks as well as during the
00:02:32
lunchtime you could chat that some of
00:02:35
us were here. Um I would so there is a
00:02:40
very "'cause" I grew go us are gonna
00:02:42
get up he's from price tag and he came
00:02:48
here just to be able to chat at any of
00:02:51
you of deeper questions into torch if
00:02:55
you have issues which George how to get
00:02:57
them fixed up and so on. And also we
00:03:00
have two excellent local experts
00:03:02
backdrop a narrow. And image repel us
00:03:06
both of them are PST students have run
00:03:09
a colour there and they can also be
00:03:13
good source of people to check that and
00:03:16
I I'm available as well probably not
00:03:19
during the lunch break but definitely
00:03:23
and the other breaks. Um okay let's get
00:03:27
started. This. So this particular talk
00:03:33
will have the structure what is towards
00:03:37
the community of George a common use
00:03:40
uses of George how people use torch in
00:03:43
the community the core philosophy
00:03:45
behind or some something that we
00:03:48
wouldn't change regardless of how we
00:03:52
would move forward in the future the
00:03:54
key drivers of good the reason white or
00:03:57
just popular or by B thing towards just
00:04:01
popular that would be helpful in
00:04:03
general to get a perspective on what
00:04:06
are the main value additions of george.
00:04:11
And also a little bit about the future
00:04:14
what we're planning next very very high
00:04:19
level view of like our future plans. So
00:04:24
what is stored storage is a scientific
00:04:26
computing framework you can think of it
00:04:28
as a similar to matlab or python with
00:04:33
that's I pine by it it's it's at the
00:04:38
core of torch is is and and the aerial
00:04:42
every we call them tenders. Um and it's
00:04:50
it's an interactive reptile based and
00:04:54
run meant so George is you you can open
00:04:58
an interpreter you can execute commands
00:05:01
you can see what happens it's is that
00:05:02
exactly like when you work with matlab
00:05:05
you can plot something you can train a
00:05:07
network and so on. Um and it's very
00:05:14
very simple to use it has one indexing
00:05:19
to emulate matlab it was written as
00:05:23
from from my understanding this is this
00:05:25
predates me but it was written as close
00:05:30
like something that matlab users can go
00:05:34
to words they wanted to do more serious
00:05:37
competition from like a systems
00:05:40
perspective it has plotting it has all
00:05:45
the bells and whistles you'd expect
00:05:46
from a scientific computing packets
00:05:48
definitely not as rich as matlab and
00:05:53
certain aspects like matlab has certain
00:05:57
tool boxes that are not available and
00:05:59
and the other committee but it's the
00:06:02
same can be said for torture we are
00:06:05
strong and for example the neural
00:06:07
networks packages and optimisation
00:06:11
algorithms based on grade in peace and
00:06:14
and we'd like to focus and either at
00:06:17
least for the near future. So one of
00:06:21
the key values of tort is that we have
00:06:27
we are based on this language called
00:06:31
cola. And this indian college at that
00:06:34
that runs the and this is a very very
00:06:37
perform and jet engine digit is just in
00:06:42
time compilation engine where it it
00:06:46
takes your it takes your high level
00:06:48
code you're the look of it and then it
00:06:49
compiles it dynamically and very smart
00:06:54
plays and what that reflects too is
00:06:57
that you can write high level could
00:06:59
like you're right in matlab or I don't
00:07:02
know low like where you don't have type
00:07:04
safety and other features that you
00:07:08
would get from a compiled language. But
00:07:10
it would be fairly fast it would like
00:07:14
you wouldn't have if you write a for
00:07:16
the in towards you wouldn't you
00:07:18
wouldn't be like oh my god it's it's
00:07:20
taking days to and that the the
00:07:22
difference between the difference
00:07:27
between writing stuff in lieu and
00:07:30
writing stuff in C is bearable I would
00:07:34
I mean it there's obviously performance
00:07:36
differences clear performance
00:07:38
differences but like while you
00:07:40
prototyping it's it's very very
00:07:42
efficient compared to other interpreted
00:07:45
languages like python or matlab the
00:07:50
second key feature we have in torch
00:07:53
that we use all the time. Uh is it's
00:07:57
really really easy integration into C
00:08:00
we have and and and low there's
00:08:05
something called FFI which is now
00:08:08
common in other languages like for
00:08:10
example python. But George was always
00:08:14
meant to be of very very moos interface
00:08:17
on top of C so you wanted to write your
00:08:20
heavy heavy like heavy processing in C
00:08:25
and then you would just want to have
00:08:26
little interpreted language to do quick
00:08:29
prototyping. And to write to rap C
00:08:33
could indoors you don't actually even
00:08:35
have to write complicated bindings you
00:08:38
basically just call the C code as is
00:08:41
within your little a program as an
00:08:43
example here we wrap D and BD as
00:08:48
quickly and then library and forge and
00:08:51
we never actually have to go into
00:08:55
writing our own C code or of any sort
00:08:58
all we have to do is in little we can
00:09:01
directly call us the C function in this
00:09:04
case could in convolution forward just
00:09:07
that with the arguments that it expects
00:09:10
ended just just works out of the box
00:09:14
this saves a lot of time and you are
00:09:16
wrapping either existing libraries or
00:09:19
you're writing your own C code and you
00:09:22
want to have an interface between them
00:09:26
the second key value that torch
00:09:28
provides this is one of the big reasons
00:09:30
that people come to as and use torches
00:09:34
because of the strong cheap you
00:09:36
support. Um torch has a large transfer
00:09:42
library of over a hundred fifty
00:09:44
functions and all of them are
00:09:47
compatible with the they work on both
00:09:50
the CPU and the G you and especially on
00:09:54
the jeep you these functions are
00:09:56
extremely performance quite a few could
00:10:00
engineers spend some time optimising
00:10:04
most parts of the code and we also part
00:10:08
of the court tensor library we have a
00:10:10
neural network library that specialises
00:10:12
just for neural networks and their
00:10:14
performance and that has also been
00:10:17
fairly optimised for GP use. And of a
00:10:23
complete G utensil library is actually
00:10:27
as of today unique to torch especially
00:10:32
one that's performing like there there
00:10:36
are alternatives like could a mad and
00:10:41
the the eigen library. But they're very
00:10:44
limited in their jeep you support like
00:10:46
it just wasn't written with cheap you
00:10:49
first dancers in mine. And like I think
00:10:55
that's something that we have been
00:10:59
focusing on for about since two
00:11:02
thousand eleven or so. And initially we
00:11:06
started off at whatever we can on the
00:11:09
GP support but now it's fairly complete
00:11:11
and for for our users they find it very
00:11:14
natural the transition between CPNGQ
00:11:16
without having a look like without
00:11:19
knowing the difference the next thing I
00:11:23
wanna talk about is who uses storage.
00:11:26
There's actually a large number of
00:11:28
users of course these are only just a
00:11:31
subset of utters idiots it's right
00:11:34
there in the middle of it quite a few
00:11:36
people using torture. Um and also
00:11:40
really large companies like face book
00:11:42
where work of you stored for all of our
00:11:45
research all the for a research that
00:11:48
and there's about fifty of us and tutor
00:11:53
uses torch not only for research but
00:11:56
also for in production environments for
00:11:59
covering okay court covers their image
00:12:05
recognition video recognition and the
00:12:08
language use cases as well
00:12:11
introduction. Um and there's quite a
00:12:14
few schools that you torture show some
00:12:16
of the most active ones I've the put
00:12:19
there. And other larger companies yeah
00:12:22
and ex IBM I. D. M. I visit them a
00:12:27
couple weeks and they said pretty much
00:12:31
all of their speech and an LP pipelines
00:12:34
are now and George and and there's some
00:12:38
interesting companies there Tara deep
00:12:41
for example on the on the right side
00:12:43
bottom. They do torch for FPGA is and
00:12:50
specialised chips like so they
00:12:52
basically have these refugees and chips
00:12:55
and then you can just use torch and as
00:12:57
the as the and the back and is an FPGA
00:13:00
stuff or GP or CPU for example. Um mood
00:13:04
stocks the company at the bottom there
00:13:06
the used towards to train the networks
00:13:09
and they run them on mobile they're
00:13:12
basically a mobile company for image
00:13:15
recognition other examples in the
00:13:20
community are packages I'm gonna just
00:13:24
like go over like a few popular
00:13:26
packages one of the strong points of
00:13:29
some other communities for example the
00:13:31
cafe community has been the models to
00:13:34
where people after they do research
00:13:38
they can share their train models so
00:13:41
for other researchers to use. And like
00:13:45
what we wanted was to leverage the
00:13:49
value that the cafe community provide
00:13:51
so we have a package called load cafe
00:13:53
that's very gay that basically loads
00:13:56
caffeine models into porch up pretty
00:13:59
seamlessly and you you can then use
00:14:02
these models for to do all of your
00:14:06
research and forge. Um and like if a
00:14:10
new paper comes out and there's a
00:14:12
preacher in cafe model you can just
00:14:14
like pull that off extract features and
00:14:16
like plug it into existing George got
00:14:18
for example at face the once a there's
00:14:24
these class of a con that's that
00:14:26
appeared recently called racy deal
00:14:28
networks if you are in the line of
00:14:31
computer vision or deep learning you
00:14:33
would have heard of them. Um at least
00:14:36
look we as soon as that people we
00:14:38
cannot be really interested and that
00:14:40
and we released code for training
00:14:44
vestibule networks from scratch these
00:14:46
are very very deep networks up to a
00:14:49
thousand layers deep and training these
00:14:53
from a systems perspective is not that
00:14:57
simple because thousand layers they
00:15:00
want to take as many jeep use as you
00:15:03
can give them. So it just as a the evil
00:15:08
this mostly for ourselves but we
00:15:09
thought it would be interesting to just
00:15:11
open source it for the community where
00:15:14
you have a complete example that's
00:15:17
fairly simple to follow where you train
00:15:20
multiple cheap use con that's that's
00:15:25
that's especially in this case and that
00:15:29
gives you an end result that state of
00:15:31
the art. And we also really speed in
00:15:33
models for that. Um and Google really
00:15:37
is is their inception models which are
00:15:39
also another up retrain demented models
00:15:42
for vision and we do have those models
00:15:47
as well ported records and one thing
00:15:50
that's what thing that happens is that
00:15:52
if there is any three two in model that
00:15:55
appears in another framework someone in
00:15:58
the community just port it into a torch
00:16:01
within like a week or two and so are in
00:16:05
our users are in that were never had
00:16:08
the feeling of like being left out from
00:16:11
from the state of the art. And I think
00:16:16
this is one of the important aspects of
00:16:19
course the fact that we have a large
00:16:20
enough community that people don't feel
00:16:23
like working like they're just working
00:16:26
by themselves but they feel like they
00:16:28
are leveraging a lot of value from the
00:16:30
community itself and like were the past
00:16:37
year we've been looking at the logs for
00:16:41
like how popular torches and it's it
00:16:45
has about that I was in downloads the
00:16:47
day on like we basically track the
00:16:51
number of installs over get have ben
00:16:54
it's fully come to you and that's one
00:16:57
of the interesting parts of George
00:17:00
George itself is not backed by single
00:17:03
company for example it's it there is a
00:17:07
nonprofit duh that runs torch and all
00:17:12
the companies that are involved in
00:17:14
using towards they also contribute back
00:17:16
to the the open sores towards in
00:17:21
various parts like that engineering
00:17:23
with performance optimisations with new
00:17:25
packages and so on. Um some of the it
00:17:31
interesting packages examples of torso
00:17:34
examples of form a huge driver and user
00:17:39
adoption for torch if you have high
00:17:42
quality examples and how to use torch
00:17:44
people find that very very useful to
00:17:48
get into torch read or then for example
00:17:53
Reading tutorials which might not cover
00:17:56
like the use case you want to do for
00:17:58
example. So some of the interesting
00:18:01
examples that appear this is neural
00:18:03
talk to which is the captioning. Um
00:18:08
network where you send an image and it
00:18:12
fill us but out a demand image caption
00:18:16
it was written by stand for guys and
00:18:19
require pretty and just in johnson. Um
00:18:24
this is a like one of the nice examples
00:18:27
where you you have an example of a con
00:18:30
and then plugging into an LSTMR and
00:18:33
then and the whole training glue there
00:18:36
and like training these things is
00:18:39
obviously not just like putting
00:18:41
together putting them together and it's
00:18:44
like using some learning rate there are
00:18:47
settles the subtleties in their and
00:18:50
examples like these are interesting
00:18:54
another example is the new style
00:18:56
project another really popular project
00:18:58
on top of which many people have built
00:19:01
art installations. And so on. Um
00:19:04
decision also from Stanford by Justin
00:19:08
Johnson you give a an image from the
00:19:13
real world and some are some painting
00:19:16
and it would do this optimisation to
00:19:22
match the statistics of both of them at
00:19:25
different layers of a pretty train
00:19:27
network. And you would actually get a a
00:19:32
picture that is that looks like these
00:19:36
style of the painting but it's still
00:19:38
the content of the image you gave. Um
00:19:41
this is that this has been one of the
00:19:42
most popular projects and like we've
00:19:47
also there's there's other that that
00:19:51
there's other variance of these just
00:19:54
feed for variance where you can
00:19:56
actually do in euro style and real
00:19:58
time. And we've like someone in the
00:20:01
community converted that plug it into a
00:20:05
video stream and you can actually have
00:20:07
do streams of fly you know or another
00:20:12
popular computer vision application
00:20:15
that has been appearing recently is a
00:20:18
visual question answering your shoe a
00:20:20
Benji O yesterday showed a demo of face
00:20:24
book system that does visual question
00:20:26
answering there is an open source
00:20:28
implementation from Virginia tech doing
00:20:32
be a question answering and this is the
00:20:34
university where one of the popular
00:20:36
datasets comes from for a week a and
00:20:42
some more interesting examples is this
00:20:46
one here is called the neural doodle
00:20:49
the you just do it'll the the painting
00:20:53
that you one like you just give a rough
00:20:57
sketch of like what you want to paint
00:20:59
and then you give some other painting
00:21:01
and it will actually produce produce
00:21:04
your doodle into really arty painting
00:21:08
coming to the more practical aspects
00:21:12
some things that forced does really
00:21:15
well so low itself is a language that's
00:21:19
very very light overhead and it has
00:21:22
been like one of the reasons little as
00:21:26
popular before or shower regardless of
00:21:28
George is that it is used in game
00:21:31
engines a lot because it's a very small
00:21:33
language to the low language itself is
00:21:37
about twelve thousand lines of C could
00:21:40
and game engines use it a lot too and
00:21:44
bed and little into into really complex
00:21:51
and high performance C plus plus. Um
00:21:54
I'd face book one of the things we've
00:21:55
been looking at is hard to learn
00:21:57
physics from the world and we wanted to
00:22:02
start off with virtual worlds. So we
00:22:05
plugged towards into one of the most
00:22:08
popular game engines available which is
00:22:10
unreal engine. And we really is this
00:22:14
integrated I these disintegration into
00:22:18
the open source. And you can basically
00:22:22
plug towards into an unreal engine and
00:22:24
run and and with with the very high
00:22:28
performance like low latency pipeline
00:22:36
you will basically get to interact with
00:22:39
the unreal engine world then you can
00:22:40
for example do various reinforcement
00:22:43
learning and computer vision research
00:22:47
or hybrid of those this examples here
00:22:51
is where at a paper that was published
00:22:54
that ICML this year my colleague adam
00:22:57
lower and some others they learn how
00:23:02
the learn to the learn the physics of
00:23:07
blocks where they want to predict
00:23:09
whether blocks are falling or how
00:23:13
blocks fall if they if they do fall
00:23:15
where do the fall and if you're given a
00:23:18
pitcher example of the picture on top
00:23:21
can you predict whether that picture
00:23:24
the blocks in that picture would fall
00:23:26
over this stay stable and questions
00:23:30
like these and then one of the
00:23:32
interesting things here is that network
00:23:35
was trained fully and the unreal engine
00:23:38
and run meant but then then and at the
00:23:43
validation time at S time when they
00:23:47
wanted to see if that network actually
00:23:49
generalises to real world block falling
00:23:53
they they constructed a small unmanned
00:23:58
of wooden blocks set that like a white
00:24:00
background. And the the network
00:24:03
actually does really well just and and
00:24:06
this real world enrolment even though
00:24:08
it was trained completely in this
00:24:10
unreal in based which will so you could
00:24:16
see how that KI that might extend to
00:24:18
other applications as well another big
00:24:23
thing these days is reinforcement
00:24:24
learning especially that atari games
00:24:27
there are a couple of projects that set
00:24:31
up all the reinforcement learning and
00:24:32
enrolments for you so that you can and
00:24:35
including implementing all of the
00:24:38
popular algorithms and reinforcement
00:24:40
learning for you so you can basically
00:24:42
just go and and use those as you
00:24:46
baselines and do for the research in
00:24:49
improving your reinforcement learning
00:24:51
algorithms. Um this is one of the one
00:24:55
of these little base and amendments
00:24:57
that has all of the popular recent
00:25:03
reinforcement learning algorithms
00:25:04
implemented like DQ networks double BQ
00:25:08
and and so on. Um but apart from this
00:25:11
there is this company called opening I
00:25:15
that's really zen and Ron and that to
00:25:18
do reinforcement learning research it's
00:25:20
called the RLGM and that's something
00:25:22
that has been really well written it
00:25:24
has a lot of and on men's and they are
00:25:27
examples of using O Jim but course that
00:25:31
appeared recently as well and also
00:25:34
completely open source. Um coming to
00:25:38
coming to the NLP side of things there
00:25:40
are several good projects of an LP and
00:25:43
in torch that open source training
00:25:47
language models training sequence to
00:25:51
sequence models maybe for translation
00:25:53
for example there's also this one
00:25:56
interesting project where you have a
00:25:59
conversational model basically a chat
00:26:01
but based on the Google paper that
00:26:04
appear not too long ago in the lumber
00:26:09
that doesn't fifteen. Um after that
00:26:11
paper appeared someone from the
00:26:13
community quickly implemented that
00:26:15
model and torch and this is another
00:26:17
good project if you want to do an LP
00:26:21
research and George and just to take a
00:26:24
look at the internals and and lastly
00:26:29
your shows probably gonna be talking a
00:26:32
little bit about the gender kit
00:26:33
modelling. And I will cover a little
00:26:38
bit agenda kit modelling for images in
00:26:40
the third lecture. Um but there is a
00:26:43
project that I wrote that the that
00:26:47
produces pretty pictures. Um so if you
00:26:51
want if you have it on sets of images.
00:26:54
And you want to basically train a
00:26:57
generative model that can generate
00:26:59
images that are similar to the images
00:27:02
that you gave it like the model like
00:27:05
the models fairly stable and if you
00:27:07
just take the images that you have
00:27:09
probably like about ten thousand plus
00:27:10
images that you give any to try to
00:27:13
build a genitive modeller on this and
00:27:15
people in the community just get this
00:27:19
code to generate eighteen century art
00:27:24
generators Monday characters and so on.
00:27:28
So that's basically an overview of the
00:27:33
community the the the very good
00:27:37
examples that you have in the committee
00:27:40
and George and next I will go to
00:27:44
towards just like the basic packages
00:27:47
that are from the core of course and I
00:27:52
will go into deeper dives of some of
00:27:55
these packages in the next to to
00:27:57
lectures the main packages neural
00:28:00
networks so we have a core package
00:28:02
called and then it's just stands for
00:28:07
neural networks. And and then is built
00:28:12
on this concept that if you want to
00:28:14
compose neural network if you want to
00:28:15
go complicated neural networks then you
00:28:18
build them as some kind of I like how
00:28:22
you build how you build the of a system
00:28:26
that Lego blocks you would basically
00:28:28
put them together like them one after
00:28:31
the other and you can have containers
00:28:34
where you can stack lots on top of each
00:28:37
other or put blocks in parallel as
00:28:40
well. And this helps compose really
00:28:43
really complicated neural networks. Um
00:28:47
and the and then package is powerful
00:28:52
enough to have captured a lot of
00:28:56
architectures without Reading a lot of
00:28:59
code for example one one example that
00:29:03
comes to mind is when the when Oxford
00:29:08
release the BDC network and their paper
00:29:11
cafe had the digit did they did their
00:29:16
research in cafe and the the network
00:29:19
definition of the VGG network in cafe
00:29:23
was about two thousand lines of code
00:29:26
and then the pro to buff and in torch
00:29:30
you could basically right that within
00:29:32
sixty seventy five lines or less. And
00:29:36
it it was just like of it's because
00:29:39
George is not data and it's code you
00:29:43
can really right very flexible
00:29:46
structures and the neural networks
00:29:48
package Powers all that when you when
00:29:51
you don't have you kind of when you
00:29:53
want to build really complicated no
00:29:55
networks for example if you have to
00:29:56
create an LSTM cell or some new kind of
00:30:01
fancy memory or like just crazy
00:30:04
networks that you dreamt off last night
00:30:08
you we have another package that
00:30:10
extends the N and package called the
00:30:11
and then grab package will be going
00:30:14
into both and then and then in the in
00:30:16
the deep that the and then have
00:30:18
packages that's you construct really
00:30:20
complicated neural networks it has a
00:30:23
graph API similar to TNO and that
00:30:26
answer flow. But it's constructed at a
00:30:29
granularity that's slightly higher
00:30:32
instead of cry building graphs on top
00:30:35
of every tensor operation you would
00:30:38
build graphs on top of I'm modules that
00:30:42
a pack more compute then a single tends
00:30:46
operation and recall these layers and I
00:30:53
want to go into another interesting
00:30:56
paradigm of of that that recently
00:30:59
appeared at this is this is a pack is
00:31:01
that was contributed but better. Um
00:31:05
this is call undergrad and this is
00:31:07
slightly differently in which people
00:31:11
can do gradient based learning. It's
00:31:16
unlike and the other packets that is
00:31:20
available for doing deep learning
00:31:27
except for the autocrat that people
00:31:29
from cuter and also vote in python this
00:31:33
is this is the the the the way it works
00:31:37
is not new it has been well understood
00:31:40
what do I ideas it tape based mechanism
00:31:44
to record what is going on in the
00:31:46
forward so basically you can write your
00:31:49
neural network as a bunch of cancer
00:31:52
operations you can even have if
00:31:56
conditions for loops and while that's
00:31:59
basically you can write a function that
00:32:01
is just like the standard kind of could
00:32:05
you right with the for loops and
00:32:08
wireless and elves. And when you
00:32:11
execute that function what autocrat
00:32:13
does is it goes in deep into the low
00:32:17
language itself. And it it records
00:32:20
every operation that happen in the
00:32:22
forward face. Um and autograph defines
00:32:26
a backward operator for every operation
00:32:28
in the ford phase and and and little
00:32:31
that like very little restriction. And
00:32:38
when you want to compute the greeting
00:32:39
to dress back to your function you dis
00:32:43
arbitrary function that you just
00:32:44
define. Um it basically plays the tape
00:32:48
backwards the tape that it recorded
00:32:51
during the forward face it just plays
00:32:52
it backward in for every operation it
00:32:55
confuse the gradient with respect to
00:32:57
each variable involved. And this is
00:33:00
really useful when you're training
00:33:03
networks with dynamic grabs where it's
00:33:09
not the same computation that you do
00:33:11
every time like do like your
00:33:13
competition might be conditional on for
00:33:16
example the current normal the
00:33:18
gradients or it's like it can be and it
00:33:21
can be of a very dynamic about this
00:33:23
dependent on any arbitrary thing and
00:33:27
out of bed can do efficient gradient
00:33:30
computation using that and another
00:33:35
package also really spectrometer which
00:33:37
is very important in today's world is a
00:33:42
package for distributed learning to do
00:33:44
luck to train George models over multi
00:33:48
machine and multi GPU power lines. And
00:33:55
to actually do different kinds of
00:33:58
distributed learning using to dislike
00:34:00
package you actually don't have to
00:34:02
write a lot of code or understand a lot
00:34:05
of complicated infrastructure there it
00:34:10
the dislike package packages the the
00:34:14
whole distributed learning as some kind
00:34:17
of your neural network model has a
00:34:22
bunch of parameters that you're trying
00:34:24
to optimise and you're in you you you
00:34:29
no network is consists of parameters
00:34:31
and activation that it basically
00:34:33
ignores the activations part. And these
00:34:35
parameters are can be can be basically
00:34:40
past and do a these certain functions
00:34:44
that will big your parameters according
00:34:47
to either synchronous a CD asynchronous
00:34:49
a CD or elastic asynchronous a CD
00:34:52
algorithm and even extending this learn
00:34:56
to do your new kind of distributed
00:34:59
research you invent a new algorithm to
00:35:03
do distributed optimisation is really
00:35:06
easy as well because all and little
00:35:09
with the few lines occurred in fact if
00:35:11
you go look at the synchronous an
00:35:13
elastic asynchronous implementations
00:35:16
they're actually in a single file that
00:35:18
very few lines of code and this learn
00:35:22
takes a the MPI paradigm where it
00:35:26
basically has certain operations like
00:35:31
already use and scatter gather
00:35:34
implemented and you you you can build
00:35:37
your distributor documentation on top
00:35:39
of that this is unlike. Um the tensor
00:35:45
flow or the MX net paradigm where you
00:35:50
look at your whole computation your
00:35:54
your whole neural network and the
00:35:55
optimisation as a competition grad. And
00:35:59
you try to do dependency analysis and
00:36:03
find how to optimally collect or
00:36:08
distribute the certain cancers are
00:36:11
variables when appropriate it sim
00:36:14
simplified model but it works as well
00:36:17
and as like fairly good performance I
00:36:23
haven't done benchmarks of this against
00:36:26
against either of the other packages in
00:36:29
the distributed setting so I can't
00:36:31
actually say how it plays out in terms
00:36:34
of this forces that in terms of
00:36:37
performance not coming to the core
00:36:39
philosophy of course. Um there's a few
00:36:44
few things we really care about and
00:36:46
really like about origin we want to
00:36:48
keep them and not move away from any of
00:36:51
the aspects the first is interactive
00:36:53
computing. BV strongly care by having a
00:36:59
researcher open and interpreter keep it
00:37:02
open for days just like do very
00:37:05
nonlinear pads of computation where
00:37:08
they might execute whatever function
00:37:11
that they think of next and this is I
00:37:14
like I think we feel that this is one
00:37:18
of the most powerful based research is
00:37:21
carried out than usual and we do not
00:37:23
want to go to some kind of compiled
00:37:27
then Ron men where you have to like
00:37:32
debugging or doing changing what you do
00:37:37
is harder you have to go into file
00:37:40
change it and rerun you program and so
00:37:42
on. Um so as part of the interactive
00:37:46
computing paradigm one thing we care
00:37:49
about is to have no compilation time at
00:37:53
runtime when you're using towards
00:37:55
itself. So something that for example
00:37:59
packages like TNO or chain or do is
00:38:03
that they invoke the compiler at
00:38:05
runtime do basically optimise their
00:38:07
code better they they put together a
00:38:11
code that specific to your complication
00:38:13
grab and then they compile it was like
00:38:16
this is your and we see see at runtime
00:38:18
and this is something that in looks a
00:38:21
lot of overhead on cognitive overhead
00:38:24
for the researcher it's it's I've see
00:38:27
Indiana programs compiled for like two
00:38:29
or three minutes or even more I've
00:38:33
heard of. TN to programs compiling for
00:38:36
hours for example and this is something
00:38:40
that eats simply research time you're
00:38:41
sitting there and front of the computer
00:38:43
to do research and we strongly believe
00:38:46
that having no compilation time not
00:38:49
even like to second of compilation time
00:38:51
is really important for from a research
00:38:56
setting the next thing is improve the
00:38:59
programming what what I mean by that is
00:39:03
that you want to write your your code
00:39:06
as naturally to the language as
00:39:09
possible you want to write you could
00:39:12
like all you always Virginia code like
00:39:14
that for those while that's it like you
00:39:17
wouldn't want to right part of your
00:39:22
code for example defining a neural
00:39:24
network and some other data like
00:39:28
language like a J.'s on config greater
00:39:31
or like some pro to about or in the
00:39:35
case of tender flow for example as a
00:39:39
paradigm very you have this you have to
00:39:41
use special operators to you do while
00:39:44
loop sort of conditions for example. Um
00:39:48
we we strongly believe that imperative
00:39:50
programming is the least resistance pat
00:39:54
for new researchers to researchers just
00:40:00
get used to programming and do research
00:40:03
and feel very little cognitive overhead
00:40:06
and not to think about how to do
00:40:08
certain things and have it back onto
00:40:10
the main actually and the third thing
00:40:15
is minimal abstraction. So we keep an
00:40:19
emphasis where whenever we want to
00:40:23
whenever you want to find some porch
00:40:25
good that actually does the actual
00:40:29
computation for example if you want to
00:40:32
find out where this soft max operation
00:40:35
is being computed and let's say
00:40:37
somewhere inside C the the number of
00:40:41
hops you have to take to go find that
00:40:44
code you want to keep that as minimal
00:40:46
as possible probably like but then if
00:40:49
you jump for one or two functions
00:40:51
you'll find the code that actually runs
00:40:54
in in C or could actually runs the
00:40:58
competition that you care about and
00:41:00
this is something we think is very
00:41:02
important when people want to write new
00:41:07
modules or contribute back because if
00:41:10
you have too many abstract since you
00:41:11
can't think linearly like you have to
00:41:14
always start thinking. Um through those
00:41:18
attractions and it's it's an overhead
00:41:21
aware after three or four abstractions
00:41:24
you pretty much lost and like you you
00:41:26
don't know a where if you change a
00:41:29
particular part of the code bitch
00:41:32
pieces start moving and that's
00:41:34
something that it's really hard when
00:41:37
you're doing development especially
00:41:39
when you're not an engineer restrain
00:41:41
for several years. Um and so we we
00:41:46
think that having as minimal of an
00:41:48
action as possible to the code that
00:41:50
actually runs the complication that you
00:41:54
just define is very important and we
00:41:57
design all four packages with that
00:41:59
philosophy and lastly we have this
00:42:05
notion of maximal flexibility. Um and
00:42:09
look kind of plays into what we need
00:42:13
their in torture you don't have any
00:42:16
constraints on what you can do or
00:42:18
cannot do are class system doesn't have
00:42:22
a tightly defined interface where you
00:42:24
have to implement certain functions or
00:42:26
you have to you cannot implement
00:42:30
certain interfaces we wrote our own
00:42:34
class system we torn type system in the
00:42:36
something that's a little really give
00:42:40
this the power of er lu it doesn't
00:42:42
actually have any of these look lets
00:42:45
you define any of these fundamental
00:42:48
system that you take for granted and a
00:42:50
more strongly typed language and V
00:42:54
design own systems to be as flexible as
00:42:58
possible in this aspect where any of
00:43:00
the users they can do arbitrary things
00:43:05
that that we never expect them to do
00:43:09
when we're designing the package. But
00:43:11
we think that adds a lot of power to
00:43:14
the users especially the hacker kind.
00:43:16
And it does get as into a lot of
00:43:19
trouble there if you want to write a
00:43:23
package now we have to think about all
00:43:25
the possibilities and all the ways in
00:43:27
which users will use it. And make sure
00:43:29
that the package that rewriting doesn't
00:43:31
break in all these cases and it's kind
00:43:33
of harder as core developers. But we
00:43:38
think it's really important from from a
00:43:41
hacker culture perspective to keep this
00:43:43
mikes more flexibility lastly able talk
00:43:49
a little bit about the key drivers of
00:43:51
growth for torch white people you
00:43:53
stores we seen that having tutorials
00:43:59
and more importantly support more than
00:44:02
tutorials having a lot of really fast
00:44:05
support as really important when you're
00:44:08
building your frameworks. Um because
00:44:12
most users are not covered by a
00:44:14
tutorial like most used wanna do
00:44:15
something else than what you what your
00:44:17
tutorial for but they would ask
00:44:19
questions on the forums or an stack
00:44:22
overflow for example and you would want
00:44:25
to answer back as soon as possible
00:44:28
probably within four hours or like
00:44:32
within twenty four hours because
00:44:34
otherwise users just like they just
00:44:37
never come back to using your package.
00:44:40
I especially like the new ones and
00:44:43
another important thing is pretty train
00:44:45
models and high quality open source
00:44:47
projects as they showed earlier in the
00:44:50
slides. And the GP use it's cheap you
00:44:54
support is something that people come
00:44:57
to court for these days it's actually
00:45:00
been much these situation around you
00:45:03
use has improved a lot especially tend
00:45:06
to flow coming in a lot of a lot of
00:45:09
other frameworks have basically said
00:45:11
they they can't be substandard anymore
00:45:14
but for for quite some time a lot of
00:45:19
users that can't records came for the
00:45:21
fact that we had really strong jeep you
00:45:23
support and you're very proactive in
00:45:26
our development I'm in awe extractions
00:45:29
as I explained in the last slide seems
00:45:32
to be actually one of the key drivers
00:45:34
of credit as your compile time is like
00:45:38
something out of for users say they
00:45:40
find really are some and torch. Um and
00:45:44
one of the big big things is community
00:45:47
a community had is is one of the key
00:45:50
reasons people. Um you storage stick to
00:45:55
torch and are pretty happy overall
00:45:58
because when you just have a lot of
00:46:00
other people doing the same thing you
00:46:03
can just have people to chat with the
00:46:07
back or like ask people have for help
00:46:11
and so on. Um and lastly I quickly
00:46:16
added some couple of slides because I
00:46:20
think someone asked me yesterday about
00:46:23
whether I would do a comparison of
00:46:25
course with other frameworks I'm not
00:46:28
gonna do a comparison of torture that
00:46:29
are frameworks but my colleague young
00:46:31
teen yeah word cafe made this slide
00:46:35
where he places all the frameworks and
00:46:39
this linear access on one side you have
00:46:44
this these properties where you want
00:46:47
stability and SP then like like
00:46:51
basically production ready like never
00:46:54
break and easy to understand for
00:46:57
production engineers and so on. And the
00:46:59
other side is what the researchers want
00:47:02
which is like a flexibility fast
00:47:04
situation cycles and so on. Um and like
00:47:09
as the adding pointed towards us it
00:47:13
somewhere closer to the research side
00:47:17
what we don't compromise on is the
00:47:21
speed like because of the year choices
00:47:25
we made early on like sticking to
00:47:27
little and being a very very close
00:47:29
interface to see we actually are one of
00:47:33
the fastest framework if not like I I'm
00:47:37
I I do benchmarks on the side just out
00:47:42
of interest and forge maintain up
00:47:44
maintains its position as being one of
00:47:47
the fastest frameworks ah so without
00:47:50
compromising the flexibility and debug
00:47:53
ability and it's B and like the whole
00:47:57
research aspect of it coming to the
00:48:01
future of george. So what are the
00:48:05
trends we've seen is that a lot of
00:48:07
goodness comes from fusing computation
00:48:10
for example if you have a convolutional
00:48:12
later followed by a regulator and then
00:48:15
as a bash norm or convolution by some
00:48:18
value if you actually fuse these
00:48:21
operations into a single a single could
00:48:25
occur no that does all of them together
00:48:26
for example it actually ends up being
00:48:29
much faster than if you do them one
00:48:31
after the other even though it's easier
00:48:33
to understand and and it's easier to
00:48:39
implement them separately. And we are
00:48:44
looking into doing this kind of fusion
00:48:47
one of the packages that came out of
00:48:49
Paris tech is the spectacle a net where
00:48:54
it takes your existing neural network
00:48:57
and torch and then it optimise is a
00:48:59
network for memory consumption start
00:49:02
sharing certain certain buffers that
00:49:06
that can be shared and the overall I
00:49:09
make consumption of in no network is
00:49:11
drastically reduced and you can change
00:49:15
the future the D kind of automation to
00:49:18
do that come part at runtime and the
00:49:21
ark continuing and will continue to
00:49:24
break down the barriers of entry for a
00:49:26
new users especially for them to start
00:49:29
developing their own modules rather
00:49:31
than just using what we provide. Um
00:49:33
because that's the only way we scale
00:49:37
and we strongly believe in that like
00:49:39
you cannot have a team of really strong
00:49:43
five engineer is that will do
00:49:46
everything for the deep learning about
00:49:49
it just doesn't scale so we want to
00:49:51
empower people to start thing their own
00:49:56
modules in contribute back and that's
00:49:59
always been our emphasis and we are
00:50:01
going to continue making design choices
00:50:04
that break down barriers of entry and
00:50:08
one of this is we want "'em" keep
00:50:09
making and never forget is keeping
00:50:13
focus on the long tail and by the long
00:50:15
tail I mean all the institutions that
00:50:17
cannot afford three hundred GPU for
00:50:20
researcher it's like we understand that
00:50:23
most of our users and most deploring
00:50:26
researchers in the world have a system
00:50:29
under their desk that they're probably
00:50:31
sharing with another researcher at like
00:50:34
one to four jeep use and this is
00:50:36
something we never wanna forget while
00:50:39
they're writing new stuff or rather
00:50:42
making performance improvements. Um
00:50:46
lastly an important point to to make to
00:50:50
be honest about the world is that the
00:50:53
python ecosystem is much larger than
00:50:56
for example the like the system which
00:51:00
pretty much is just orders on the
00:51:02
scientific computing aside to bridge
00:51:05
this gap we have extensions I will talk
00:51:08
about them in the third lecture we have
00:51:10
a big bridge to python we can call and
00:51:12
the average rate python functional
00:51:14
package including packages that return
00:51:18
by dancers and they will see mostly be
00:51:21
converted into forced answers and vice
00:51:24
versa and we are also looking into some
00:51:28
deeper python integration maybe some
00:51:30
python bindings but that's just like
00:51:33
ongoing thought that's the end of the
00:51:39
first top and feel free to ask
00:51:42
questions okay it is yeah bleep. So I
00:52:16
think if we agree talk really looking
00:52:19
forward to see what comes next but one
00:52:23
question but but me S to be is the
00:52:25
composition melody of models we always
00:52:28
seem to start from scratch. And there's
00:52:31
so many Greek mobiles a there that we
00:52:33
don't know about is there any plan of
00:52:36
creating a marketplace to actually know
00:52:39
what others are doing in them to not
00:52:42
having to know myself as a human we
00:52:45
could good a good model but actually
00:52:47
have suggestions so this is for towards
00:52:53
itself actually happened to created
00:52:55
it's not even that much work it's one
00:52:58
single get have grappled with the read
00:52:59
me and people can send in Florida
00:53:01
that's right. Um but it counted a
00:53:04
larger team off we want a universal
00:53:07
marketplace where you want to have
00:53:11
model definitions from cafe towards
00:53:13
tensor float and everything. Um right
00:53:17
now the they've we propagate
00:53:20
information on like what's available is
00:53:22
mostly via tweeter where every time a
00:53:26
new papers implemented in torture every
00:53:28
time a new creature models with these
00:53:30
indoors we just treated out. And most
00:53:33
of our users of all as there that's for
00:53:40
now but like I mean I don't really know
00:53:43
how to get all the frameworks together
00:53:47
because they they all have their own
00:53:49
strong opinions on the marketplace in
00:53:52
the format the common format to fall
00:53:54
and so on. Uh at least four doors yeah
00:53:57
we're doing what we can to keep all
00:54:00
information centrally okay thank you.
00:54:15
Um could you discuss a little bit on
00:54:18
the PI stability of watch and why you
00:54:21
don't have really cycles okay so that's
00:54:24
something that has gets asked a lot we
00:54:28
don't have really cycles because we
00:54:30
don't have enough maintainers. Um if
00:54:34
any of you are willing to become a
00:54:37
maintainer of towards just as a really
00:54:40
is and in there that's cutting trees
00:54:42
branches feel free to reach out to me
00:54:46
we want to start doing more stable and
00:54:50
structured really cycles there is no
00:54:55
technical limitations there just so you
00:55:03
some new question maybe we can just go
00:55:06
for the coffee break down because you

Share this talk: 


Conference program

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
4 July 2016 · 2:01 p.m.
Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
4 July 2016 · 3:20 p.m.
Day 1 - Questions and Answers
Panel
4 July 2016 · 4:16 p.m.
Torch 1
Soumith Chintala, Facebook
5 July 2016 · 10:02 a.m.
Torch 2
Soumith Chintala, Facebook
5 July 2016 · 11:21 a.m.
Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
5 July 2016 · 1:59 p.m.
Torch 3
Soumith Chintala, Facebook
5 July 2016 · 3:28 p.m.
Day 2 - Questions and Answers
Panel
5 July 2016 · 4:21 p.m.
TensorFlow 1
Mihaela Rosca, Google
6 July 2016 · 10 a.m.
TensorFlow 2
Mihaela Rosca, Google
6 July 2016 · 11:19 a.m.
TensorFlow 3 and Day 3 Questions and Answers session
Mihaela Rosca, Google
6 July 2016 · 3:21 p.m.

Recommended talks

Component Analysis for Human Sensing
Fernando De la Torre, Carnegie Mellon University
29 Aug. 2013 · 11:07 a.m.