Embed code
Note: this content has been automatically generated.
Uh yeah yeah oh Yeah yeah yeah oh so we
are starting the last socialise this
through trouble us okay by me hide a
about on software again on that was
about we more ways you only had a user
speakers have to be the only so yeah
okay so I guess I'll start of the third
talk just as usual feel free to
interrupt me if there is a strange
questions way will have time for
questions at the end of this talk and
also for the panel and P shout out to
do some technical problems if you can
hear neons can see something on the
slide. So let's start talking about
neural networks intensive for because
the the example that we have so far was
very good for pedagogical account of as
a pedagogical example but you wouldn't
you really use that in practise and I'm
gonna show you some examples of how
what what you can actually use at home
or at work to to with with that simple.
So if you're here you probably know
what neural networks are and especially
the neural networks but they usually
look like this the idea is this your
article structure you always have this
input layer in which you feed the input
in this case the picture of I hope I
have that right. It's the bulk okay so
obvious I'm as good as the network
perfect. So now once you have the input
you always propagate information
forward and then doing learning the the
gradients flow back or so this is kind
of the thirty seconds introduction to
to neural networks and now we're here
to discuss how neural networks can be
used in terms of law and you have three
options depending or an how flexible
you want to be and how much work you
want to put into this. So you might be
able to use already trained models this
is very easy for this you don't
actually have to spend that much you
mutation or power or you might be able
to retrain one layer of one already
train model B would look at that also
in a bit or you can use higher label a
high level EP eyes to build on top
contender flow or you can define
entering your network directly in terms
of otherwise we saw with union
regression example so for each of these
I will going out into detail and
actually have a notebook to to see
exactly what what the entail so that
you get a feel for what particular you
would want to do. So if you want to use
already train models you only have to
do two steps one you'll all the graph
of the train model. And to you'll start
to doing inference. So the stats look
exactly like this. Oh it's not that
much code as you can see the first part
just loads the model from a path that
is given. So this is something that
only thing that is new here is that we
have important graph and they have so
before in the example in the last
lecture. I always define the raft in
used it in the same up program right.
But in this case I don't define the
graph in this program. I actually just
the use the here. So I loaded from us
what a someone else has defined. And we
are able to to easily locate. Now we
want to use this graph in here again
two things are different for the same
reason. I don't have now the code that
defines the graph here. But I have the
graph itself. So as we saw in the last
lecture when you want to run operations
on the graph with tons of we have to
initialise assertions of this has not
changed. Um but now instead of calling
session dot run with the valuable that
you have defined in the same program
you have to ask the graph to give your
tensor for particular domain that
answers or operations are associated
with a particular name that to the
program I can define. And here you can
actually query I one for the I want the
soft max because this is what you used
to get the probability distribution out
of a supervise montel for example which
is what we will look at right now. And
then into session to try and again for
the placeholder so this is the the same
thinking syntax that to use before when
you have to tell the network what to
run through through the graph right we
have done that we have to tell tensor
flow what it should run to the graph in
here again we can't say this is the
placeholder this is that it's that we
had before because that's not defining
this program but we can say the name of
the placeholder and just a whole
session dot that tries before and this
does exactly what you would expect so
this is very very short and and works
quite well it's actually have a short
example for this. C the he is so that
oh this example works with the image
net. So these are very big network that
we need to recognise object or
categories from images. And with this
before there's actually a very small
script that down to the mobile for you.
And us inference and we're gonna look
at some examples to see you also kinda
get an intuition how well it works and
how easy it is to use. So let me start
running isn't that this this image
comes this is the stalk image that
comes by default when you want to run
the example I'm gonna try that one
first so right now I'm I'm running the
code and it actually is downloading the
model and then it's running the
inference. And we will see what the
results are so as we expect this is
that we return to labels are giant and
up and up and that there when bear and
onto looks right I'm not as colleges
but this looks like a pent out to me
now if we move to another mentioned
this is actually not a part of that the
the preacher about probably is part of
the test set of mention enters the
picture that I personally took. Um and
I'm gonna show you exactly how to do
the same thing for this picture in this
case for that cancer for example we
just have to specify the image file as
an argument and I'm running it again.
It's about it's it's so the output of
the classification is that be tabby
cat. And if you're like me before
running this you didn't know what that
is but this is actually what okay so
this is actually a tabby cat because of
the stripes so this is correct. But you
might actually want or what what I
believe if I want to clarify that a
little bit. So the this also works I'm
now also gonna show you another example
again the I mean the picture. So that
not that actually that I really like
the golden gate bridge but this is
something that then network has
something before right it's I'm not
part of any they that's it as of yet.
So so I'm just gonna start trying this
and this is an example where the
network makes a mistake but some sort
of reasonable mistake I would claim. So
so right now it's running the in front
and it's loading the graph exactly like
in the code that I should before. And I
think this is the peer. It's not
actually appears so the second
suggestion is a suspension bridge that
is correct but the first suggestion is
not really right so as you can see it
kind of gets the gist of it but it's
not not really correct. Now how hard is
to do this so you see you. Um you can
clearly see that you can use this
rightly we're doing it live it takes
very little time I didn't have to train
model. But how hard it is to actually
write this button classified image
stoplights obviously this is part of an
about open source you can have a look
at it. And but this file is two hundred
lines which contain downloading the
although do I'm backing them although
doing in France and some nice and
transformation from the probabilities
that the model give you to the English
labels because the model does not give
you the strings right the model gives
you some tensor back and also some
boilerplate for imports comments and
the entire thing is two hundred lines.
So it's it's very very easy to do this
with the the already train models I
can't it And right so this is that you
want to do that in the case where the
labels that you want are exactly the
label that come image that right if you
want to classify some of the the labels
that are already there or if you don't
have your own dataset but let's it me
mention that you want to classify
flowers PC or your we passionate to
find out if in picture there's a dog or
a cat. So in that case you might not
want to just use the model as is you
might want to to it a little bit but
the but you know is is that you
actually don't have to retrain the
entire model or retrain a whole want
only with your data. So you can
actually take advantage of the features
that the big more to learn using the
emission a dataset and just we could
the last layer. So just we the
classification layer of of the model.
So that you learned the classes that
you want to see so and it in this is
also very easy and I hope it's very
clear white sorry why you would want to
do this because it saves you a lot of
computational power so again you are
not doing all the computationally
intensive task of learning at this is
kind of the distribution of natural
images this is what I should expect to
see as input you are just tweaking the
last part that does the classification.
And there's also an example of how to
do this but basically it's one command
line with tens of so if you have a
image classification problem. Um you
can just use your daytime and retrain
the last model of an exception model
and do it in one line that's that's
pretty cool. And something I we want to
mention here there's a lot of
documentation for all of these so in
case you decide one of these three
methods that I'm explaining out there's
plenty of resources online for you to
to look at now let's see that the
second use cases you've looked at this
but you think that or we using an
already trained network was not really
for you. You have a very specific use
case so you want to try something else.
Uh and you say I want to build my own
network fanciful but I want to I don't
we did very fast or not spend a lot of
time on it or I don't really want to
keep that we can act accuracy to the
zero point zero one that well then you
can actually use a TF floor which is a
high level API forty ancestral it is
part of that sort of also when you
install danceable you also get TF line.
And it implements dislike it learning
PI for those of you familiar with
python you might not this is the very
popular upright an API for machine
learning that implements a lot of
models including is jens and random for
instance all and this TF learned more
the P of learning a a PI allows you to
create this computational grafted I was
talking about before for neural
networks for you of just speed for non
actors competition on neural networks
are are in in in one line. So this is
pretty cool it's very easy to use so it
looks exactly like this. This
classifier holds the computational
graph that we saw before. So now I'm
actually gonna show you an example of
this to live an example that looks as
digits there's no machine learning
tutorial without putting up mister
right. So I'm gonna start running the
the courts and I'm gonna explain to you
want to one bit at the time what I'm
doing. So the first thing usually the
thing we do that all the necessary
imports. And yeah I'm actually getting
the daytime doing some preprocessing
because I'm using I wanted to make a
point to actually use the psychic
learning PI to get them mistake that
here. Because we're operating under the
assumption that we like this like it
learn a PIN you want to use that and
I'm using the and downloading them is
original data and this one comes into
digit is a pixel is from zero to two
five five which is not really great for
neural networks is better if they're
scaled. So in this case dividing each
pixel by two five five so that it there
in between zero and one and then I'm
splitting the daytime to training and
testing so this is pretty pretty
straightforward in case someone here is
not familiar with this. This is how it
looks like twenty by twenty eight
images. I can also run a little bit to
see multiple examples of these are just
random images from the training set.
And that this is the code that does the
creation of the graph. So this is for
lines and this is because I wanted to
be trendy and not to use the default
optimise their so actually can we move
three lines here. So you can do it in
one line if you want. And I'm just
gonna go ahead and create this model
and that one gonna train the model it's
wanna take around one or two minutes in
which you have time to ask me questions
but so now I'm just starting the mobile
this is how much it takes just for
reference to do everything that you
need including downloading the data
processing it visualising a little bit.
And so they'll they'll difficult part
two ideas for lines and now the model
is training. And while it's doing that
you have the option to ask a couple of
questions not for this example I have
for another example coming along but
actually that that sorry so you don't
need to handle sessions anymore when
you use the recipient I no it does that
for you. So does that under that so
that so this this force part creates
the classifier the first but the second
part deals with creating the execution
environment and what the what you pay
you if I want to run this one GP ooh I
think there are options to specify this
yes I mean there's no no one does
things without them being able to run
the GP anymore so income and learn how
to recognise the teachers faster you
machine yeah anyway it's it's gonna be
done very soon and then I I I'm
actually gonna show you a bit of what
kind of mistakes that that's because I
think that that's pretty interesting.
So any other questions. so so maybe
maybe one what is the model you know
well it's So this is complete fair
point this is just the feed forward
classifier with two layers in a hundred
the hidden units each switch little it
no it is not the problem there but you
can also equally easy to find the
content but this is just like a feature
for word of neural network with the
real activation. And it's done it. And
now I'm gonna run and see how well it
it. So I I had the result ready because
I I ran to put before but the the point
is that it's pretty good. It's actually
doing only two mistakes in a hundred
examples in a like time to ask a couple
of questions. So it's it's very easy to
use and gives relatively good result
obviously this is not state of the art
or anything so but I just want to give
a clear example. And now let's look at
how well it does on random image in the
test. So I'm just gonna be it from
here. So that you also give an idea how
how easy to it is later to look at
these results and visualise things
because and so we're again as when
python you can use all your of
favourite plotting libraries. So let's
get the random test example and plotted
to and it's also pretty so this one is
the correct it says it's a one and I
mean you should expect this if it's
wrong to out of a hundred times when I
pick one at random it's probably gonna
be correct right so here we see an
example of a correct classification but
how about if I want to see one way it
does something wrong to see does it
make mistakes along very trivial
examples or does it make the six
someone tweak your examples. I'm gonna
do this in not so efficient way but I'm
just gonna check when these two vectors
are equal and and I'm gonna get the
index of the first one where they're
not oh index around one one four four
for so I'm gonna call is there is this
is the index of a mistake right so here
it looks at when the classifier does
not agree with what we know to be the
actual values. And I'm also gonna like
that. And before can predicted. So now
we know there should be a mistake yeah
so probably this guy should be a two
and the network says it's a one. So
this is a mistake but it is doesn't
really like a two to me but in any case
this is kind of you can get an
intuition of when the moment as well
and one the one model does right and
you can do this again in four lines of
of code and fact now suppose what with
this I still didn't come into suppose
you are really interested in starting
things from scratch or perhaps a
researcher that want to come up with
new ideas and your models then you
actually maybe want to go from cancer
flow to go to actually the lowest level
possible work to make some of the lower
level variance with some of the higher
level ones. So control has a lot of
support for existing activation
functions we probably know where the
signal at any age someone as before
about cost functions differently there
a normalisation techniques and then
bidding so all these things that are
now very use they are already there.
And if you want to do this it's more
complex so it's not gonna be three
lines of code but it gives you a lot
more flexibility so depends again on
the spectral you always have to think
where you are on the spectrum how much
time do I want to spend of this how
much do I take care about improving the
model I do I want to potentially
defined a new layer type and this is
very ideal for for researchers. Now
actually have an example to go through
this to for completeness and it's
surprise also miss digits. So I'm gonna
cleared out you know so that you
actually trust me. I'm running
everything live okay so everything
disappears there's not put great. So
same idea I'm gonna get the data said
this time from right answer for all
because I'm no longer using this I
could learn API and the if P guys are a
bit different buttons are for also
gives you done this data set. So you
see it downloaded it extracted it again
we can visualise a couple of examples
by knowing already kind of get an idea
of how a this looks like. And now we
end up to the part about defining the
graph right so before defining the
graph was a couple of lines of code in
which we defined the optimiser. And one
line for actually defining the feature
for network classifier in this case
this is the graph definition. So we
have to choose how many layers we want
to define. We have to choose the batch
size the layer sizes so we're using as
before a hundred neurons per layer four
two layers the input size is given by
the dataset so twenty eight one two
times twenty sequel seven hundred
eighty four and the number of classes
that can because that's how many digits
we want to labels we want to be able to
classify. And now this shouldn't be a
surprise given the top that to you just
heard before we have to define to
placeholders this is a supervised
learning setting so we have the
examples and the lower the labels
associated with them so in this case X
withstand for the images so something
like this. And why withstand for the
labels associate so this is for example
in this case the three so this is the
placeholder definition. And now if you
want to define the graph you actually
have to define the weights and the vice
is very similar to the linear
regression example. So for neural
networks these are the weights are
matrices and the biases are a vector
the output is defined as you expect so
there's activation function which is
applied to one matrix multiplication
between the previous layer values and
the weights and you applied the why's
it so this is just the standard neural
network formula. And because I'm doing
this in a loop and why am I doing this
you know because I don't wanna copy
paste the code twice in case I want to
change my mind then use three layers
have to keep track of what my previous
layer sizes and values are so this is
this defines the hidden layers and how
you do the computation from the input
layer to the funeral Aaron from the
procedure there to the second in there.
But how about the output. So here we
want to use the soft max to get the
probability distributions for each of
the possible digits. So again we
defined the weights also matrix devices
the legit. So very similar matrix
multiplications blah plus additional
but here we don't have the rental
because we don't use that this
activation function directly so here
comes the trick of for numerical
stability you don't use soft max and
then cross entropy there's a function
that does soft core soft max cross
entropy with lodges that just takes the
logic to avoid the icon instability
that comes with soft max so you could
just use of max you might have suffered
because of this I have you might room
you might see a lot of finance coming
coming your way and again you have to
defined optimiser this is exactly the
same three lines as we saw before but
this does the same thing. But it does
it explicitly for the court is that
exactly the same thinking maybe they
need tele station that they use by
default is a bit different to the one I
have here but the concept is exactly
the same. So now I have actually run
the code which creates describe. So
again this just graze the computational
graph you have to have in mind this
same feature that we saw when we look
at that answer board them all money in
this the same kind of thing on your
network and now as before always I want
to start writing some start doing some
computations I have to create a
session. That's created. And now again
some waking time because I'm gonna go
through some examples and actually
start stuttering. So the training is
happening right now and someone ask
about them can support so I have here
the call that would just create the
summary writer so I did not add in
specific well make sure that we want to
see doing time. So I we will not see
that you were see going up or down but
we can actually visualise the graph so
exactly this graph that we are creating
here at the end that we show you how
how we can we can look at it this is
also taking a little bit but I just
wanna show that is the same idea same
principle. So any other question in the
meantime while this other network is
learning how to classify digits. if the
pencil borders actually live or you
visualise it only after that you have
finished training. So it depends how
you want to do it so here I was so
that's a bit this one just answer or
from the locks. So if you launch it at
the beginning you can look as it the as
the training happens yes in this case I
didn't do this because I'm not I'm
actually not recording any of the
accuracy or we we already saw them of
that but just want to show also how how
easy it is so this is just with one
line here and one line here in the this
is in the promotes I have the the bash
example here so it's done training
rate. And I'm sorry one question I Saw
that you were using before we inside to
love change you okay to this because I
I know that in general we should do
this is the don't what we're well you
have another function that so here you
can also do so down have the scan
function is so you can do this or you
can also use the equivalent of a formal
of the equivalent of us can intensive
role Is it the same adult of
performance for you. oh I I don't think
this has the same problems that panel
does okay so now that we is done and
here I'm computing also the precision
myself without the taking advantage of
the other functions. And it's also
that's pretty well so its rights also
around the ninety eight times out of a
hundred soul win a couple of minutes we
manage to train to different networks
to to do digits recognition and now I'm
gonna start that answer board will So
sorry one stupid question your training
on your laptop yes yes so the training
is that on my laptop on the CPU I'm
minor what's if you with this you of my
son okay that that that this is
actually a very important point this
did not run on on the GPU so if you
want on the you cute gonna be faster so
because I didn't want that any scalar
data we won a the see that we already
us on example but he here you would if
you have some helpful messages in case
of your stock when you're setting up
your scalar values. But we can see the
graph so this is how the graphics like
for a network and I'm just gonna zoom
in okay so you can see oh exactly how
like to have a you have a rattle you
have the gradient associated you always
have the matrix multiplication you also
see your placeholder variables. So you
get a very good idea of what what you
actually created when you define your
graph this is very useful especially in
the case where you're defining the
graph yourself rather than using a
higher level API so it's enough now but
of feed for a network so we saw kind of
the idea the ideas in the principles
that you can use to figure out what's
good for you obviously doesn't this
doesn't only apply to miss digit and
this doesn't only apply to feed for
networks is just the general idea where
do you fit into this landscape how far
do you want to to go. But you also
might care about the state of the art
model support in intensive also I'm
gonna talk a little bit about that now.
So you might be aware of the sequence
sequence model which are very much is
for for translation just days there was
a paper by or your being as and two
thousand fifteen two thousand fourteen
and two thousand fifteen from your job
angels that they sent and on those
results of how to use this kind of
models for translation. And the idea is
pretty simple you have an encoder that
is usually for translation on LSDM or
another are in that encodes the input
sentence and you have the decoder that
decodes the sentence into the target
language so in this case we're
translating from french to English
which might be helpful helpful if
you're here today and so this is again
a very useful in relatively new model
and I want to show how much it takes to
implement that in principle and
actually one line. So this is the basic
one you just have to say I want to grab
for a basic are and then with sequence
sequence aren't and these are the input
that you should you to the encoder
these are the input that used you to
the decoder and I want to use this kind
of cell so also the cell is the kind of
R and then sell that you want to use.
And this is it this is all that you
need to to do to get this kind of model
and this model has also lot of of other
variants of for example you might want
to use the one with embedding wanting
and code there or you might want to use
the tension and that's also only one
line. So it's a pretty straightforward.
And for are and then sells. They kind
of look like this but they also can be
easily created using the one line so
you can say I want the basic LSTM cell
of this size and feed that into the
sequence to seek one school that was a
bit about recurrent not role models but
how about the inception architectures
also called the implement which and
also available so you can actually get
the train fortunately so before but you
can also look at the code recently we
open source the code for dependency
part six also and which task if you
want to find out in the language what's
the action how are certain things
harder and things corresponding to each
other what is the object what is the
prepositions or you can actually do
that quite easy you can do that with
your data or there's already a train
model for English available it's called
part theme park space. So you can use
that if you want to look at English but
they're also the code is available so
you can just start playing with it and
understand more about language and this
one just a couple of examples that are
specifically given a by utterance of
low and of real people in their morsel
out encoders in syntax and also
present. But I also want to mention a
couple of really great community
examples in the hope that you will also
contribute and see how nice it is again
also coming back to this this point if
you have a community you can really
push the the field forward. So these
are a couple of examples and I'm gonna
go through each of them and discuss a
little bit what they do and again these
are all given by by contributors or not
but the part of a dancer from so you
might be aware of deep to networks
there a way to integrate the deep
learning framework into reinforcement
learning. And if you want to use that
with an civil there is the repository
available. So it's also very easy to to
start with that if you want to play
around with the neural art examples of
this as being very popular also helping
popular rising deep learning outside of
the motion and community because it's
very intuitive you have a picture and
you have a painting. And you input them
both to the model and the model return
to the picture kind of looking at the
at that painting. So if you want to do
this in terms of low it's the also
definitely possible char are and and
idea feeding into a recurrent neural
network one character at a time also
and impossible to do intensive role
also the to I mean their contribution
you might not hearers it's a deep
learning high level library I'd used to
work with analyse the back and not also
supports transfer from neural caption
generation and also this is an
implementation of the show and tell
paper if you are familiar with it so
that the basic idea is that you ask the
neural network to describe the picture
that you give so it's a combination
about convolutional and network the
encoders the convolutional network an
an hour and then as the decoder. So you
can also use this with danceable
English to chinese translation also
available in pencil flow if you're
interested and these this kind of sums
up examples that I wanted to show so as
you can see they're very diapers going
from in age to translation to art
examples of the communities really
interested in bringing these
distinctive dancer for line you can
also help with that and I'm gonna talk
a bit about another aspect also for
potentially more advanced service but
it might be very very important of how
to create your own operation. So that's
a floral. That's a lot of things it's
very flexible but maybe you want to do
something else right. And I'm gonna
show a little bit of what's the way to
to do that. But firstly when should you
create your own operation. So though
first obvious cases when you want to do
something and you can do this by
composition of already existing
operations the second use case is they
are all that do what you want so you
can combine them to to achieve your
goal. But you want to speed up the
computation so you might want you might
know what labour wait for instead of
calling one or after the other two
speed up the code if you just do it in
one all or you might want to memory
efficient implementation that can be
also made a better by combining
operations or you can have a more
numerically stable implementation in we
already saw this with the soft max
cross entropy example right so you can
have the soft max all you can have the
cross entropy all you can apply done
sequentially but for a numerical
stability reasons it's better to just
call them in one operation. So these
are tops to create your own operation
at the steps starting to create your
own operation intensive well some of
them are optional again depends on your
use case and what you want to do with
our all your operation. But usually the
it kind of goes like this you want to
read you starred operation in C plus
plus file so you want to have to tell
tense for all and we just trying on
your operation this is how it looks
like you have to implement it obviously
and they die the implementation of an
operation is called the cardinal and if
you want your kernel to run a multiple
devices you might have to implement
multiple kernels optional if you want
your ought to be used in python you can
use loads are provided potential flow
and create the python wrapper also and
one line. And you can write the
function to compute the gradients of
Europe so if you want to you use the
differentiation that comes with tens of
right if you might want to they want to
integrate your operation in training
the neural network then you need to
define the gradients your the output of
your all we respect your inputs and if
you want to benefit of shape in french
you can also write the function that
describes the input and output shapes
and obviously always always test your
recording. So now let's go back to to
the steps first you registered or it
looks a bit like this I have my own
operation I don't have a good name for
it so I'm gonna name at my own or this
is the important it's gonna be a good
compose that in thirty two and this is
not output this is a very simple step
then you know what you want to do so
here you just have to define the
implementation of the operation that
you have decided that you want to
implement. And it it comes with the
standard API that you have to use but
it's very simple you just take the
input from the all colonel context
object provided and you allocate the
output but the computation is something
that should be easy or relatively
straightforward because you know what
you want to do. And then you have to
register your current also on what we
did before we actually registered
operation. But now we have to say that
this operation it has discovered
registered to it so this is this is the
the TCU implementation associated with
my my operation. And then you can build
your kernel using your favourite
compiler or with the bill system that
comes with tense from which is the base
so now if you want to make your
operation run on the GPU you actually
have to define the put I implementation
of it and register it so this is the a
bit more more tricky part. We actually
have to write code I if you want your
operations to run that you you if you
want is an operation and pike then you
don't have to do much basically you
just have to create a take the excel
file that was created by the is using
the C plus plus court. And just load it
and now you have your module in which
you can call the the what that you
define so this is pretty
straightforward and you might also want
again to use the function to compute
the greedy and so if you want to apply
your operation and the neural networks
a setting you want the phone you want
to do this. And this is also simple and
up to you know I think that that's a
flow can help with this if you're
defining the new operation you have to
know what the gradients of the output
with respect to each of the input is
and from here you can just let in
several do its magic can you can
combine your all with all other
operations. And peace if you have a
require a really strange it need for an
operation and you have implemented just
sent upon request so that other people
can also benefit but but the in
principle these are kind of the steps.
And the most difficult part is to know
what you want once you know what you
want you just have to implement that
and the computer gradients for for that
operation oh and I all just something
that I want to stress about ten
syphilis general machine learning free
more actually more most in general
computation framework. So we can use
for it can be used for problems that
require differentiation optimise
station or linear algebra computation.
But I don't mind that it is made with
deep learning in mind so most of the
API support and the feature request
that will be addressed are actually
looking at at deep learning but you can
use it for for other other problems as
well and actually there is the
differential equation solvers intensive
flow if you want to play with that
little bit it's unavailable tutorial
online I'm not gonna go through this
right now. So this brings that's brings
me to the conclusion oh this talk and
the the general idea of that answer for
talks. So you're probably here because
you want to find out what's the best
thing for you right. And this again
depends a lot on your use case in
depends on how much time you want to to
spend on this if you actually don't
know anything about declining and you
wanna play around a little bit you can
you can train neural networks in your
browser. So there is playground all
kinds of rolled up or and show you a
little bit. And then here in D see so
with this and this you like you can
actually train very simple neural
networks just to get an event intuition
of how this looks like so you can
increase the number of neurons decrease
then both here you can change
activation function your regularly
station and you can change the problem
type and you can also start learning
and you can see how your model is
converging. So if you're not that old
familiar with deep learning this might
be a nice way to spend the thirty
minutes or so but this was just the way
example right. So if you want to the
start to using a machine learning for
real life problems you have multiple
options including using club ATCPI so
this is something that I'm not
discussed before and I think the neural
network case the more complicated you
go it's more flexible but you need to
spend more time and for to learn the
frame or to learn about the perks of
the models right so it depends where
you are on the scale. And now let's
look a bit at at this so if you just
want to not to really deal even with
cancer floor right you can just use
some Claude based API to get some of
the outputs that you want so there's
global translate API for translation
for speech for vision so you can do
also sentimental analysis with the
celtics API so there plenty of options
available without getting your hands
dirty with the actual machine running
go frameworks. So let's look a bit of
how it would look like with the
politician API so if you ask the
cognition API what's in the picture. It
can tell you that so for example here
clearly there's a lot of people running
and there's a marathon. So we can tell
you that in can also give the discourse
or some shit with it or you can find
out. What's what emotions people are
displaying so here it looks like
someone's the which a joyful and you
can also find out what's the text in
the pitcher and what's the language in
the in that text. So any again you
don't have to actually write the want
yourself or even the load already train
models the second option also very
simple is using up between model with
tends to flow we've already looked into
this one not gonna spend much time on
it. But the third and four options for
options were training your models or
creating and producing the models and
for this you have multiple options. So
you can I don't run that subplot open
source really senior physical machines
or in in your virtual machines are not
plough the environment or you can use
the clock machining API which allows
you to use answer pro but also a lot of
the other developer tools that the bill
provides. So again you have to think
where do you want to be here do you
want if you have your own physical
machines then you probably are here. So
depending on you use case if you
already using a lot of the global
developer tools then you probably want
to go here. Now if you want to develop
your own American model then he saw
it's relatively easy to define the
computational graph it's very flexible
and once you define it you create
efficient and you start training and
this is the example of how to to use
this for robotics for making robot arms
to pick up things we can have a look if
you're interested and so the which
obviously use distance from and just to
to include a a bit of a advertisement.
So we recently announced the gringos
Europe research centre based in zurich.
And we are encouraging people to apply
address software engineers or research
scientist and there's also a lot of
internship positions available so if
you're interested just check out the
the the website. And that's pretty much
it for eight me. Thank you very much
for listening and I hope you learn
something from this the at some maybe
maybe you can directly move to the but
the more panel because there is a need
more to on the I mean I'd are draining
so maybe you can keep your question for
one minute the time that we we sit yeah
yes okay I think okay oh so maybe we
should move directly to question so any
question yeah it's so just just to
quick questions is straightforward to
create aside these are three black
network using using their cell phones
as as easy as was shown the example and
another question can I create an
operational in the in the by don't
level so one logistical thinking UP
stand up when you answer the question
because I don't know oh okay no I know
in which direction to to look at so the
first question I I is it easy to create
a siamese network or or three black
network admin is the as simple as you
useful in your examples well it depends
yes basically yes the short answer is
yes and that's what was the second
question I can create operations in the
python level you or some examples in C
plus plus in you to the bindings
afterwards I you the operations you
have to register in C plus plus so
there is something plus plus part in
you definitely have to write if you
want the CPU implementation the kernel
has to be okay hi. Um can you give us
some insights on how the usually skate
so the inference that with that so from
the very like mean I'm I'm not asking
for like a getting me that or not sure
but I'm just asking what technology to
use in order to skate. And that serving
of tens of no there Spencer floor
servings itself a but do you combine it
what you combine it with in order to
achieve a a highly scalable system
again I'm gonna discuss Google internal
information if you want to see how to
serve dancer for exactly danceable
serving is where to look at yeah the
fanciful serving out of the box is
pretty sequential so I mean if you
tried like with that I mean if one
tries it with that tutorial based
version spit it sequentially you can
submit one query at the time I think my
question is that probably do something
else in order to to make it scale for
for work and I'm not gonna discuss it
okay just look at what fanciful serving
supports and then that's the open
source version yeah that concerns one
of the your slides about this clout
service for machine learning so if I
put data there what actually happens to
the data do actually guaranteed Reese
you know the the service it's going to
be sent to respect for example pry
possible there's okay so I actually the
question so ye if I'm correct you are
running benchmarking me sing so there
were some speed differences between the
values framework some point. I don't
know where we also moment and if we all
Stevie the LC differences or the you to
to something which is good you don't if
I'd or design tracy's or something
which is going to implement both I or
what the situation. so about when times
are slow started there was the speed
difference but no it's actually just
flows doing equally well that the other
frameworks except for anything on which
is a bit faster than the rest but the
problem with my benchmarks is that the
only colour comments. So it only covers
to use cases that people have for
content so what we want to know though
is that the framework supports a
recurrent nets comments like us
different like speech models and so on
us so I am working but to kinds of flow
case the TN and gender guys everyone in
the community to release a new set of
benchmarks that will capture the use
cases more uniformly of all the
researchers out there. And we will know
so and how all the frameworks perform
and that is the general skills as well
but that just for complex actually
cancer flows doing as well as other
frameworks except for you know So so at
the moment putting aside the benchmarks
a little or you you will not aware of
major differences in the design that
would that we know we backed
performance in some way or another. So
from my perspective the only also and
of skating up on something kind of like
the cool so they listen to it all ties
with you strongly comes off to as being
able to take advantage of really
village crystals or out of CGPS is it's
a case will talk so what is it you yes
I think in terms of the design so
there's basically two philosophies at
this moment it all frameworks one is
the whole write your neural network as
a computational graph. And then give it
to an execution engine that will just
treated appropriately and so on. And
there is another philosophy which is
that almost all of the distributed
computation is that you do with you
know networks and in general can XP
expressed as something in in terms of
computations that the high performance
computing groups have been doing for
many years which is called the MPI
class of collectives us or torch takes
the at this moment at least like that
that readers distributed package for
example for takes the approach where
you just use any PI that does the
distribution for you and it's oh picked
away and there is no separate execution
engine and so on. And answer flow and
cafe cafe to chain or these frameworks
take the other approach which is that
you have an execution engine and the
competition grab. And you would want to
do it in a more general generalised man
I want to ask you even notice little
does in to be a very good distribution
framework in that that's really
important nowadays that doesn't mean
that to there's not a lot of focus on
making it great on one machine so
there's a as the he said there's been a
lot of improvement since they need to
launch and there's still lot of work on
this so the in is definitely tool not
only focus on the disputed side it's
great to have and it's very useful but
the you should be able to have great
performances of thousands of labels on
one machine so we are time you're so
you've done okay sorry I cannot start
it I don't know I it if it's really a
question or more comment or something I
was working on but it was new and it
works for also several decades already
than I do to say I really like this
workshop I want us to say thank you for
organising this and I think was a quite
great success and it's quite hard
receptive audience and I realised also
that they're people from many
communities here some of them using the
library some of them very new to neural
networks. And it's very hard to find
something which is good for everyone.
And also for me there was some more
information which was new for me so it
was quite nice as thing and it's
difficult to decide what's to really
cover because it's called deep learning
methods and tools and how much to speak
about depending how much of a mess
that's how multiple tools we spoke you
mean about messes tools and
technologies the deep learning aspect
in general was a bit covered in
intonational presentation. But that was
only a very claims of the history and
this was in my face up in in a little
bit too few if you wanna cover history
whether it be maybe a bit more seems at
the moment that that was like two
thousand six the various a lot of
people learning and there are the names
just to mention I eva Nicole and ankle
who is considered as a father of the
learning an of course you know there's
the well I in there just some other
names would have been nice to be
mentioned as well. And maybe it might
be a suggestion because of putting the
slides online or something to just give
a small page or something of the
history of the planning as well Okay
well so thank you for one the for the
conference your workshop the I have you
and requested on the two older yeah I
recording tools and so and do you
expect that there will come more
graphical tools for Norman engineers.
So they can be involved more indeed
cloning and knocks programmers just
graphical tools to create your letters
for now I mean more graphical input you
you so have you can but this is
inherently my pieces and apparently a
programming task if you'll have to
debug it at some point you still have
to look at the at the cold right so you
still need someone to be familiar with
some some high level language such as
while or I think because it's not about
what happens if you see if you have
this if I understand the question
correctly if you have a graphical
interface it's harder to it at used
maybe I'm too much of a software
engineer. It's harder for me to see how
that would work in my daily work well I
assume you might be referencing to
something like simulating maybe like
you know where you have you can plug in
graphical things and then you run them
and double click on one of the things
and change the code or is that what you
are thinking about I mean more getting
more quick. Um I mean writing code is
kind of go close that's very flexible
learns all about it's a very to do is
where you're working if you know
different things that you want to do it
and you just want to take on the menu
in and select certain options and then
you can have the variability that you
need maybe for certain problems where
you can have a hierarchy of graphical
other choices so and so so from from my
interactions that many people in the
community there are a few projects in
the works in this direction. I think
defining a graphical tool that is very
effective for a new field is a is
somewhat of a a test a few type things
and converse to something that works
for most people and and there is a
project that is being put in the torch
community to like together neural
networks a graphically and it's being
done by you I PHD student was also
interested in your networks and like
that there's other there's something
built on top of cafe that is similar
the by and we D actually got Andrea
digits where you can adjust with the
few drop down menus trainer no network.
I think getting the power of defining
your most complex networks and the
graphical ways of somewhat hard from at
least our perspective as programmers
because the number of choices you can
take at each step is so large that it's
very hard to be expressed graphically.
But I think as or UI researchers coming
to the loop there probably eventually
be some solution that would be
effective for most people no more
questions okay you then maybe we can
stop you on the thanks again for your

Share this talk: 

Conference program

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
4 July 2016 · 2:01 p.m.
Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
4 July 2016 · 3:20 p.m.
Day 1 - Questions and Answers
4 July 2016 · 4:16 p.m.
Torch 1
Soumith Chintala, Facebook
5 July 2016 · 10:02 a.m.
Torch 2
Soumith Chintala, Facebook
5 July 2016 · 11:21 a.m.
Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
5 July 2016 · 1:59 p.m.
Torch 3
Soumith Chintala, Facebook
5 July 2016 · 3:28 p.m.
Day 2 - Questions and Answers
5 July 2016 · 4:21 p.m.
TensorFlow 1
Mihaela Rosca, Google
6 July 2016 · 10 a.m.
TensorFlow 2
Mihaela Rosca, Google
6 July 2016 · 11:19 a.m.

Recommended talks

An introduction to TensorFlow
Mihaela Rosca, Google / London, England
26 Nov. 2016 · 2:01 p.m.
Hasan Hosgel, ImmobilienScout24 / Berlin, Germany
27 Nov. 2016 · 11:45 a.m.