Embed code
Note: this content has been automatically generated.
Um small so these have the lunch break
use we just in front of this one also
we will have so the don't stop going on
tables on the should be useful suitable
for the of to people's movement usually
table for the user so just just you're
result was you really be in the cooking
okay okay now comes the more boring
talk. So it's it's a bit more hands on
it's a deeper dive and I don't expect
it to be as interesting for for most if
you so it'd be surprised if I see some
sleeping people alright so in this
topic we're gonna talk about two things
the first is how that answers then
stored this for econ charge. And then
they also talk about how how we use the
neural networks packages this the ad
this this display the stock basically
will get you to a place where you kind
of know now how most people use
storage. And then the next an X like
and the next time. Um I would I would
talk about more experimental stuff
that's a bit more interesting so think
of this as the middle part which gets
really boring us yeah yeah they'll be
available online I'll send you I'll
entrance selling. That's slide okay so
coming to pretenders and storage is and
George ten servers are as I said before
and I wish I race. They are through
major and memory so basically what that
means is let's say there's a two
dimensional tensor that's illustrated
here on in the middle. Um it has rose
of data and if you actually look at in
the system memory how how it's
represented because men like the
memories land here the first rose laid
out as is and then the second rose laid
out and the third and the fourth. Um
this is called a major and if you do it
the column eyes instead it's called
call a major. Um like I think matlab
this column major and number by Israel
major as well which is nice because the
the the layout and to large part the
format that fords and number by follow
is very similar so if you want to share
if you want to copy a utter sensor into
an umpire a or vice versa. And you
don't actually have to do any memory
copy so here in this example the cancer
here has the four of those and six
columns. So the size of the answer is
four by six and there is another
there's another thing that we common
use commonly use when we do any kind of
anti arrays it's called this tried. Um
so what's tried is is in that dimension
to access the next element how many how
many how many the elements and forward
in memory do you have to go. And so as
an example the orange block here the
first remote first column. It's a bad
memory location zero. Now to go to the
first yellow block how many how many
every access is do I have to do make
how about like how much I had I have to
go and that is I have to go six
elements forward and that's basically
that's right. So the stride here is six
in the first dimension and one in the
second dimension. It's one and the
second dimension because if I want to
access the next column for from going
for the first contra second colour for
example I just have to go one one
forward and the actual memory that you
see so why do we have the size in
stride. Um this actually give us a very
powerful way of indexing sub answers
basically choosing parts of cancers and
still operating on then that out doing
expensive operations like memory
copies. Um so as an example here. Let's
say I have a select operation of where
the I'm I'm selecting in the first
dimension the third element what that
means is I would want in the first to
mention that is over the rose I won the
third draw. Um that operation will have
to give me this this third row. And to
do this operation I don't actually have
to create a new memory copy the only
operate operation up to do here is
create just another cancer structure
that changes the size and the stride
and these storage offset that is where
my where might answer starts from and
and memory and it can still map to the
same underlying stories that that at
this tender points to so that also
would mean that if I change anything in
this sub tensor the values will also
change in the original tensor I've
illustrated this here. But these this
this substance Renault very actually
points out and memory and those are the
red red memory locations no if we do it
column wise so also as you see the
offset here now is thirty and which is
starting from this initial story
location how how much for their do I
have to go to start just start my this
up to answer this particular substance
or so west now do this colour wise if
we do this column wise we can still
create a select the sub tensor. So what
you notice here is that the the storage
is still contiguous in memory which
means distance or that answer that you
just created a by doing the select
operation here is every element to the
story just next to each other and this
is called the contiguous tensor. Um and
that doesn't hold if you for example do
column by selection. So if I the call
by selection so all of my calls here
are number and that's a selected in the
second dimension which is over columns
the third colour so I ask for this
particular call them. And that would
give me this particular cancer but if
you see how it's actually mapped into
the raw memory the it's the each of
these are not contiguous in memory but
you can still construct a tensor that
maps to this this particular substance
or by changing the size in this tried.
So the stride here is six which means
that from going from this orange shah
block to the yellow block after takes
six the I have to go six locations
forward and I will get my next element
in that answer and this dimension. And
the offset just points to the fact that
this particular stored stars from the
third element itself and so you start
with a relevant and then for every
element you one next use go six blocks
for and this is actually a very simple
we have mapping things the fact that
you have a tensor and the storage and
that answers map to storage and you can
extract sub answers. Um but it's
extremely powerful you can for example.
Um ask for the first three channels of
your image when they're say a hundred
twenty eight channels and operate on
that separately without having to do a
memory copies and so on. Um so that is
how tender in storage is work at at
like that at at at low level not
looking at some syntax. Um in torture
if you want to read low the package you
call the require sorry you call the
require function. So here we just
requiring towards the semicolon is
optional but if you use I python
notebooks torch actually it has support
for I put on the books where you can
you have a night porch colonel that you
can use which means that you can use I
python if you're familiar with that but
then you can use it as you always use
that has and line hell or to complete
and so on. So you load torture the
semicolon here is indict button
notebooks it wouldn't print out the
result and that's that's all it does so
the created tensor you create you had
the syntax for as part of the party
package you have several types of
dancers double tense reflect answer by
tensor along cancer and so on. Um and
you created tensor of size four times
six it's just the matrix the four by
six matrix. And cancers by default are
not initialised with any default values
so it's it's the standard now has an
initialised memory might cunts contain
all kinds of weird well is so let's
fill it up with a uniform noise you can
do that with the this call this colon
here is just a little a syntax for I
want to operate on this variables the
sequel and to saying a dot uniform of a
so you will keep seeing this call an
operator. It's just could like calling
method of a class. So you call uniform
here that fills the transfer that
uniform nice mean zero standard
deviation one if you actually pass
arguments of then you can actually
change them in a standard deviation you
can print that answer it will print a
screen in a nice format. Um and that's
basically the same tensor that we
wanted to create the ten zero will have
an underlying storage that you can also
access using the colon storage call and
that will actually turn the underlying
stores that you can directly manipulate
that for example and the other
operation we did previously was select
and similarly there's a select call
here the dimension and the element and
as you see you print out the D sept
answer here. It's has the same values
as the third that are doing here. And
to illustrate that the underlying ghost
stories a shared I show that if you
filled be that's some value let's say
with the values three the the the had
original tensor a in rotary also
changes values. And and this is a
pretty important detail to remember
when you're working it answers in sept
answers oh you don't get a call and you
get you get a single a vector when you
select a real it's a one dimensional
that so the print there is just showing
it column wise but that's about it.
Okay those are the basics of tenders I
I obviously wouldn't cover the whole
tensor library because it's has more
than a hundred and fifty tensor
functions of like you have that now do
we all stations compilations a lot of
blast calls a cancer manipulation
operations like narrowing indexing
masks selecting scatter gather logical
operators and so on. And it's fully
documented all the functions that
torture nicely documented with examples
what you expect from a nice library I
guess. And you also have in line help
boat and I torch as well as in the
regular George interpreter you can ask
for the hell by saying question mark
the function you're interested in and
it will in line give you they help you
line up that examples most of the time
coming to the next part I I talked
earlier about the jeep you supporting
words it's extremely seamless it's like
it's exactly like using the C C. P. U.
package except that instead of I
instead of using the torch dot float
answer or torso double tensor user uses
newtons or course taught couldn't
answer. And project couldn't answer is
afloat answer that sits on the jeep you
it also has all the mat operations
defined on it you can use it exactly
how you use the sepia cancer. But for
most of the operations because tense
for cancer operations jeep user usually
faster almost all the operations that
you try to do our faster and you're
only limited by the did the tensor size
you create is your only limited by the
the unwanted cheap P memory you have
which is usually much smaller than the
amount of CPU memory you have on most
systems okay so that's a basic overview
off towards that answers. I didn't go
into a lot of detail because it's
mostly once you get through the basics
it's mostly subtext may freely you look
at how you use non by or matlab
matrices for example you just look at
the functions you're interested and
then you would use that next I wanna
talk about training neural networks. So
neural networks the way you train them
there's a lot of mowing parts well you
can you networks. Um I just feel like I
just created a figure to map some or
most of the use cases can be mapped
into for example such a structure you
have most modern datasets I'm by mortar
I mean large datasets that don't fit.
And memory anymore you have of I'd say
sixteen gigs or sixty four gigs or you
know two fifty six gigs of CPU memory.
And you can no longer load your data
sets into memory like you load and then
Steve on the you know research still
carries on on and this for example
image that is the image that I in
classes dataset is one point two
terabytes and for as like guys as at
face book we consider that a small data
set so usually low these datasets and
some kind of disc either hard drive or
asses these are on some network file
system and then you have a data loader
that loads this data it it basically
you can ask for many batch of samples
attitude on the fly lo these many batch
of samples sent to process them augment
them do all kind of colour jitters and
cropping then all that and then it will
send that into some Q where your neural
network trainer can fall the many
batches off of that Q and the train
your neural network with the cost
function that you specified and the
optimisation algorithm that you specify
like a CD or add it or mass problem for
example and usually. Um these are multi
threaded or multi process. So the data
loader sits in a separate thread or
process and your main thread compute
the computes the neural network
process. And there are other there are
other right ease of this as well for
example if you are doing but serial
learning and bearings you have a much
smaller neural networks and these are
not that much faster than the jeep you
then on the CPU and you so one common
way to train these things is via how
well and by hog well you would have
multiple all these neural networks. Um
replicated sharing the same parameters.
And their train simultaneously in
parallel asynchronously. Um and of it
no no kind of synchronisation and
that's that's hard well and so and this
most common scenario you just have a
single thread for and a single neural
network that your training I'm gonna
cover that first okay so coming to how
these us how how these structures
actually map to a large packages. So
the data loader especially having
multiple data threads that have
callbacks once they're finished in the
main thread and so on. And the are
covered by the threads package be will
go over that briefly and you have the
trainer in fort itself there is no
notion of a train or the the researcher
just right still in training will this
is not common but not uncommon as well
like I mean Indiana for example every
and just racer and training will but in
cafe you would have a trainer that
takes your in fort above of the neural
network and these all were and it would
us all the whole thing so torch at all
also tries to maintain being very raw
the researcher and like fifteen twenty
lines of code writes their own training
will and that gives them the
flexibility to change it in weird ways
it needed. And in the third lecture we
will see how this kind of flexibility
would be very useful for example when
you're training adversarial networks
and the neural network and the cost
function are covered by the and then
package and we will go over it briefly.
And the optimiser is covered by the
optimal package we will also go over
that where we have a platter of we
didn't based optimisation algorithms
it's next starting with the and then
packages started and then package and
then go to adopt them and then lastly
threads because the loading is boring
so the and then packets as they said in
the in the last lecture as it just
briefly touched upon and towards the
neural network packets is has this
notion of building your neural networks
as stacks of Lego blocks and various
structures. So what we have is we call
containers and we have modules. So
containers are these very is structures
that implement. Um that that stack your
modules and and different ways I have a
visualisation coming up for that in a
second. And Montana and modules are
basically the the actual computations
that you want one for example in this
case a convolution over spatial just
images into D with three input feature
map sixteen out for feature maps and a
five but like curl that is added to
this sequential there. So that it's
back and the and the tenets activations
added right after that the now these
two are the input comes through the
convolution and the output of the
convolution goes that the tenets that
pitted and this goes into this max
pulling and so on so it's like the
sequential container is basically just
a linear container that passes the
input through all of these containers
and then give you the output of the
last layer of discontent or in this
example this this short example
implements this particular neural
network using this this seventy lines
of code and one one thing if if your
family with other packages and you come
to torch one thing to keep in mind is
that we implement the self mikes as
lots so max we do have a soft max there
but as most people who would know the
canadian it's computations for soft max
are unstable. So what most packages do
is they call the layer soft max and
they give the output says the soft max
but the compute the gradients of the
log so max we actually wanted to be
more transparent from the beginning so
you actually have a lyrical soft max if
you wanna shoot yourself in the foot go
for it. But you have a lyrical lots of
Mike that does the right thing we all
for example is use lots of mikes so you
would if you looked at the basic
example go to basic controls you would
actually like pretty much just use that
so each of these modules and then and
then package they have a common
interface that they have to define even
though they can like I mean these are
these are functions these three
functions are essential functions that
to define. And they can have custom
functions that they use in other ways.
So the three functions are about this
there's a typo here are update but big
right into it and I grabbed parameters.
Um a bit out but computes the output
given the input so vital to have a fix
a deducted is FFX and it gives out
reply. And a big red input computes. Um
DY by DX basically the gradient with
respect to the input the gradient of
the the input with respect to the
output. Um yeah good and then there is
accurate parameters so the the the the
the module you that you define can be
parametric or nonparametric so max
pulling for example is a nonparametric
module it doesn't have any parameters
that indians. Um but convolution for
example. And convolutional networks.
And it tries to change its filters in a
way that improves define the loss of
your network. And this is a parametric
might do. So it has a set of weights
and biases that are defined inside it.
So for a layer like max pulling you
don't need to define the I good
parameters so you can just leave it as
is but if you do have parameters in
your in your module you would want to
define this function that computes the
gradients with respect to the
parameters that you have in your market
and also part of the and then package
are lost functions like mean square
laws or like negative likelihood loss
or marginal loss and so on. And the
loss functions have a similar
interface. They have and update output
and then update grad input since lost
functions are not parametric you don't
have the egg right parameters there no
coming to the containers. Um the end
package has several several containers
the most common that you saw earlier is
sequential the modules and the white
there the take the input feature input
that is of of four channels and they it
just sends the input through and then
as a spits the output that and then
there's the con cat container which
takes the inputs. And then it sense the
same input to to these two sites that
it has can cat has get basically
creates a pipe for every input so what
this what this structure if you're
actually is is this whole thing is a
con cat which has to sequential as and
that and the sequential stem cell cell
four layers. So the contact a dual here
has two pipes that the input goes
through each of these the same input
goes three to these can get types and
then it has separate outputs that are
then concatenated together to give you
a single out but and then there's also
the the parallel container that let's
see you're given and input of two
channels and you have to and and you
have to to see controls that are added
to it to pipes it gives each of these
channels to each of these separate
pipes and then it gets outputs that it
concatenated together and sends to the
next layer. So as you've seen already
in these cases a container can have
other containers inside it you can
basically compose these things in a
very natural way you can we you can
compose complicated networks like resin
that or or go on that just using these
three can actually just using
sequential yeah just using these three
contenders you can actually create duh
de Vere structure that Google net is
and it's and and very the lines a code.
So getting to the could have back an of
the neural network packets as they
showed earlier using could afford
portrait answers was very very natural
like you had to change one line
similarly the and then packages also
equally natural to use if you have a
model that you define to actually
transfer the model to could all you
have to do is call colon put on the
model and it automatically now sits on
the GPU and it expects inputs to itself
from to be could at answers that's also
sit on the jeep you and now this model
for which computes the update out the
is done on the GPA so very easy to use
a very natural to use you never feel
like you're doing something special for
the GPU next comes the and then graph
package I don't I only have one slide
on this because and then graph and
there's not much to the end where
packets it's very very powerful the
ending graph package introduces
composing neural networks in a
different way instead of composing them
in terms of containers and modules all
you have to do is chained modules one
after the other. So an example is
probably the best way to showcase this.
Um in this example let's say you have
an let's see you have and then graph
where you want to create a a two layer
I'm not be with the tennis nonlinearity
what you do is you create some dummy
input layer is just for a best practise
is this is the actual air and entered
identity open close bracket. And then
graph. Um basically has is overloads
the call operator so that the second
bracket that tells you what it
disconnected do. So the input here is
not connected to anything else it is
the first later in your a neural
network. So it just has an empty
bracket not coming to the next part. Um
you could the first do there where but
you have a an actual air that is
connected to a linear layer that is
connected to input which is this
identity layer. And this whole thing is
not created in one shot and this is the
first it in there. And then you create
the next linear there which connects to
hedge one here which is the first in
there and that gives you the output.
And then when you want to create you
and then got you just define the the
input and the output module there that
that you want your and then grab to map
to so you create what you call in and
entity module. It's a short for a graph
module where in the first set of
parentheses you give all the inputs to
you a neural network. And in the second
set of the prep fancies you give all
the outputs you want from the neural
network and the ML P.s created and you
use it exactly like how you used the
previous and then modules it has the
same interface everything is the same
all it does is it looks at it basically
looks at what's connected to what and
it just creates a competition graph
there and there's not much else to and
then we have to be honest like I mean
it has some useful things like you can
actually but using graph is you can
actually create a visualisation of your
graph where if you have a very complex
crafted we it would be useful to see
what's going on in the graph so you
create. Um an SVG file that shows the
structure of your graphic descriptions
of each layer and your graph and how
they're connected. Um and you also have
a mode where if you haven't ever at
runtime in your neural network the and
then grab can automatically spit out
and that's VG file with the whole grass
structure. And with more that but
different colours for the which no D
ever a card in in case you want to like
visually see which of your neural
network module actually filled and come
like had some runtime error apart from
that and then grabbed is very basic
grey useful to create complicated
things like weird LSTM or other
frequent modules then I come to the
optimal package the option package is
written in a way where it knows nothing
about you know networks optimist
basically just as a bunch of
optimisation algorithms. Um including
and non non graded additional buttons
like a line search algorithms. And it
basically once a function of I go to F
of X a W where W or the parameters of
your system annexes the input actually
does any and care about ex the input it
just once why go to have of doubly. So
in this small example here I'm just
showing at the interface that often
takes you can havoc on fake that
defines all the parameters of your
optimisation. And for for each of your
training sample can create a function
that does that that that that does F
affects. Um and then you can pass that
too often but as you D in this example
very you're doing stochastic gradient
descent you pass the function the
function that computes a affects you
pass an X which is the parameters of
your system that you're trying to
optimise and the configuration and
often in as you D will run on this
function. It's a slightly different
it's it's it's decoupled from neural
networks for a very good reason we want
to do you like right the optima package
to be very generate like a black box
optimiser that you can just plug into
other places well the up in package has
a wide range of algorithms implemented
your standard stochastic gradient
descent averages you DOBFTS conjugate
gradients it out it impacts our as prop
they started line search this is an
interesting one and that's true of SCDR
our prop. And most recently C mas I
haven't really I haven't figured out
what the full form is but it's some guy
contributed this very recently. Um my
favourites here are as GD and adam and
our mess prop the kind of nice
everything else is I only used in
passing. So how does the opt in package
work for neural networks itself. So in
the end then package we have any
powerful a function call get parameters
that what it does is your network has
several modules several modules that
each of them can be parameterised let's
say you had three convolutional errors
each of them has their one parameters
that map to separate memory regions
still they call their on my logs
they're just sitting in different parts
of memory what we what get parameters
does is when you call this it maps all
of the parameters of and all the
parameters of your current neural
network on to a single contiguous
storage. And then re maps that answers
of each of these layers onto that
storage using the offsets and the
strides. And what that would give you
is a single vector that you can pass to
your optimisation package and
optimisation packaged oh oh oh
something have and the optimisation
packages doesn't have to know where
there's a neural net for or anything
else. It just once a vector of
parameters that it once to optimise and
so the and then package has this call
get parameters that will do that for
you it will remap all your parameters
to a single vector that you can that
then pass into the optimal package this
is probably the only harry detail the
hole and then a pin thing but it's a
very important detail and several
people have shot themselves in the foot
in the past using this okay no let's
actually look at how this example the
same example I've given is gonna map to
a neural network. So you want to define
this function F affects that racks are
the input the parameters up your
network. And you want to compute the
the neural network gradients DFTX which
are the gradients with respect to the
weights. And returned them and then the
LCD step is done after that so an
example here let's let me call my
function F well that basically computes
of of of affects scum W it say selects
a training example it loads to training
example let's say select the training
example from random right over here
actually size the training the next
training example in this in this table
called data and the inputs are and they
in the sample of one and targets and
sample of two inputs are the impostor
neural network targets are what you
wanted to be or what you call what your
loss function expects to compute the
laws. So if we first use your the way
with respect your rates because if you
have a previous optimisation instead
the gradients are sitting there
accumulated already. So is just zero
the gradients and the gradients
articulated in and all of a neural
networks to accommodate batch methods
when you're not doing when you context
you compute the batch in one shot so
you just compute a large about sample
by sample and the great escape
accumulated there. And this is very
useful when you're doing memory hungry
methods of optimisation especially so
what you do here have to reset the
gradients as you call criterion colon
for criterion is your loss function
model Colin forward inputs model call
important puts it returns the output.
So and they lost function takes the
output of you know network and the
target that if you wanted to be it
computes a loss. And then you call
model colon backward inputs comma the
gradients. So models backward which
computes but the big red input and
accurate parameters in one shot it
takes the inputs and the gradients with
respect to the output. So the great
interest illegally out that are given
from the backward call off your loss
function. And those are passing as a
second parameter inputs as the first
parameter that computes the that
basically will accumulate into DLDX the
gradients with respect to the weights
and then you sure you return the the
loss that you computed. And the LDX
which is weird to respect to the
weights. And then that closure that you
just define is called FE well right so
you define your LC parameters in this
case because you're doing as you D
learning a southern undertake DK which
is that how much lower it has to drop
off per sample weight decay momentum.
Um and for all he box in your training
little you for all the many batches you
have I guess you just call up in not as
CD of that function. Um and the X which
is the parameters of your network. And
at the SGD configuration parameters
which specify language and so on. And
that's it and the return value here is
one of them is the the loss. And you
just accumulate that and printed out to
make sure that your model is going down
in Los if your model goes up the noise
that's nice. Um that's the often
packets it might be a little dance. But
it's really really powerful and if you
don't understand that at the end of
like two or three we will be pointing
you guys to links to three notebooks
that you can go home and work on in
your own time they will have commons
they will take you to the basic example
of how to do things. Um and lastly the
threads package the threads package. Um
so don't laugh at the next slide
there's a small that there that's
funny. It's mostly an accident so we
created that threads package and at
some point I was writing example code
for myself on how to do data loading
using the threads packets. And they
call those the that frightful donkeys.
And I like I first and open source I
never like actually looked into why I
called it donkeys. But many people in
that arts comedy actually call data
loading threads donkeys. Um so the
examples here are just screen shots
from my my example so they might have a
variable called donkeys and like you
know so basically the way that that
trends package works is it creates
thread tools you can submit arbitrary
functions to distasteful and that
function will get I executed in so one
of the threads in that dreadful and you
can also specify return callback that
executes in the main thread once the
the thread finishes its computation. So
the way you create these threads is
actually very simple you just ask for a
as many times as you want you have some
initialisation functions that have to
be run when the threat is initialised
this can be like loading the functions
that you will call later and so on. And
there is a mode called shared serialise
which is very powerful in the threads
package what this does is it shares all
the ten serious between threads between
the main thread and all the worker
threads. And this this this is really
powerful because one when you're
returning cancers from your thread
spread bore to your main threat you
don't have to amend copy. It's all very
seamless you don't have to this
serialisation or D sterilisation. And
if you want to do hog well training you
can basically just created dreadful
ties in you know network to each of
your threads. And the net the the the
network will automatically be shared
among all threads and you can write
your training inside your thread and
it'll be a synchronous hog well then
it's very fast with like zero overhead
you don't have to collective parameters
to parameters server do the update and
send them back and so on. Um this is
the creating the threads and the slide
here showcases how you use the threads
there's one function that's the most
important it's call ad job the ad job
function takes an arbitrary close your
that you can define. Um you just as you
just defined a complication that you
want to do and that's the first
argument and the second argument is a
callback that is run in the main thread
once you finish doing this computation
and and the thread in in the date it's
right. And in the main thread in this
case for example what I did was and in
the data thread is basically loading a
particular a training sample of bad
size and that sample the that function
here returns inputs and labels and it
returns inputs and labels because the
main thread. And in the main thread the
closure that I defined separately the
function you have the inputs and the
labels that are sitting on the CPU that
come in and these are just some data
logging how much time it's taking to
look low the data and so on. And you
here input CP label sepia sitting on
the CPU there float answers inputs the
labels here or could it answers I copy
over the contents in the float enters
over to the correct answers to transfer
them but you be you I define my have to
well which is zero the great into was
like the parameters forward the art
that's forward the inputs to the model
get the outputs forward the outputs and
the labels to the loss function get the
our and then compute the great great
introspective the outputs of the neural
network past into the neural network
itself. And then returned the the
parameters of the neural network and de
lots and this is defined elsewhere when
you call get parameters on your neural
network the parameters and the grad
parameters the return. And then finally
I call up in the as you D here you can
replace as you deal with your favourite
algorithm for optimisation. So that's
it and this basically you went through
piece by piece a complete training will
for almost all case of neural networks
except like you know if you do weird
stuff like additional training or
something like for almost all
supervised cases at least. Um oh you
went through all the examples and the
last slide there was basically what
we're gonna cover in the next lecture
after you are sure ben just a
congenital models we will do a complete
example. It's only a hundred and sixty
lines ish so don't feel intimidated by
a complete example. We will do a
complete example of using and then up
him and threads for image generation we
will be if you look at the autograph
package and how you how to use it. And
then I will finally talk about sports
net which is a a small helper framework
that is sitting on top of torture and
then and all these things to abstract
of a common patterns that you do like
for example data loading data
augmentation. And all this stuff is
basically code that you copy paste from
one script to another. And towards that
kind of implements these for you in a
nice way and I'll briefly talk about
the that's the end of this session and
if you have questions I'll take them
now I did promise you this lecture was
gonna be boring there's a questionnaire
no okay so I just wanted to know who
easy is to use maybe layers you you
created the through doing then gruff.
And the mean either sequential sore
parallels or content. So it's extremely
natural when you create the G module in
the and then graph package. So once you
create wasted no it's the next okay
once you create the you module here.
This thing now can be added it's a
standard layer it can be added to
containers and so okay when you say
it's the standard lay your you the the
parameters are distinct if you are the
multiple times or the parameter Sir if
you add in mark if you had the same
thing multiple times the parameters are
not distinct okay but also the state or
not distinct and usually you wouldn't
wanna do that. You what you can do use
you can call clone on this thing and
that will create a replica okay thank
you alright and things for the talk on
what you normally do for testing and
debugging so before like I used to just
use this debugger called model debug
it's open source it's installed the
like the package manager. Um these days
I'm using the FB debugger package that
there's also open source. Um and
usually the FB debugger packet has a
mode where if you hit an error it will
automatically going to the debugger.
And you can and then like see what's
wrong for example. And that's very
useful and for testing. I just write
unit test for like all of my players
sense well okay thanks a thanks a lot.
So I mean I'm you know user and what I
find cool is the transparency between
the CPUNGU E. you know exactly run
Michael and see you just check for
dimensions and stuff and then if you
heavy workload ship it to the server is
or any thing like this and it works or
just have a bunch of if statements. Um
you don't have to do an if statement
you can write one single function
called cast that will like cast of all
no network into to keep you or we have
a another function called a course star
set the fall tensor type that it you
can set the default ends are tied that
you do operations and you can first
check it on CPU by setting the default
answer that for example float answer.
And then you can switch it to GPU and
then like all the declarations that you
do but forged a tensor alike by default
in the neural network they get created
with the the jeep you answered thanks
but well them memory monitoring
transfer from the subject you wore I
mean you start responded to the users
yes okay it's as they showed in the
last example it yeah here is it yeah
yeah so as they should hear the CPU
tense yours are not automatic like if
you get the sepia cancer as an input
your GPU module for example it will
just tear you you have to transfer them
yourself to the jeep you there's no
ambiguity there is no like debugging
issues there okay so users no more
question we're already doing so again

Share this talk: 

Conference program

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
4 July 2016 · 2:01 p.m.
Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
4 July 2016 · 3:20 p.m.
Day 1 - Questions and Answers
4 July 2016 · 4:16 p.m.
Torch 1
Soumith Chintala, Facebook
5 July 2016 · 10:02 a.m.
Torch 2
Soumith Chintala, Facebook
5 July 2016 · 11:21 a.m.
Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
5 July 2016 · 1:59 p.m.
Torch 3
Soumith Chintala, Facebook
5 July 2016 · 3:28 p.m.
Day 2 - Questions and Answers
5 July 2016 · 4:21 p.m.
TensorFlow 1
Mihaela Rosca, Google
6 July 2016 · 10 a.m.
TensorFlow 2
Mihaela Rosca, Google
6 July 2016 · 11:19 a.m.

Recommended talks

Limbic system using Tensorflow
Gema Parreño Piqueras, Tetuan Valley / Madrid, Spain
26 Nov. 2016 · 3:31 p.m.
Short IC Research Presentation 5: CleanM
Stella Giannakopoulo, EPFL (DIAS)
7 June 2018 · 12:25 p.m.