Embed code
Note: this content has been automatically generated.
Uh oh okay so we will start for this
served on the last day. So yesterday
was George onto based on soft also for
for I I don't know if you are if you
are very obvious a situation regarding
framework so it also frees Mcgill I'm
working for depending on on the on the
same as assume you say that it's it's
interesting because it does some
strains yes that does does not have on
remote to some strengths and
weaknesses. So to present to the cease
to do we have a right up on the usually
you know I mean I don't what's gone on
the so she will tell you If you can ask
questions on the oh it's Great so far
as technical tests can everybody hear
me and can everybody hear me well okay
perfect so it's at any point there's a
technical difficulty please let me know
because I I can't hear what you hear
and then let's get started so about
questions feel free to interrupt me at
any time if there's a a question that
you think is very relevant to the
slide. And we will have time for
questions at the end of each talk and
there's also the panel at the end of
the day but it's really important that
you get your questions answered because
that's why we're here. So yeah but part
of my microphone just filled out Phil
works. Okay let's just let's just
continue like this. I a little bit
about me I'm a software engineering in
Google research in this area "'cause"
I've been a total for around two years
now after I graduated from imperial
college London and today I wanna talk
to you a bit about answer flow and
specifically about the trade off yeah
test test test yes okay perfect. So
they're gonna talk specifically about
fanciful which is the deep learning
framework built by Google and the talks
are gonna be structured a little bit
differently so the first talk goes
about the core principles behind
fanciful and specifically what we want
from a deep learning framework and not
answer law actually means a lot of
these these requirements then in that
second poker actually gonna go through
a concrete example of how to use a of
low for something relatively simple
linear regression but we're also gonna
look at some really nice thing that can
civil gives you such as distributed
training how to use the GP you know how
to use some of the knife association
tools that we have and then the third
talk we're gonna focus specifically
form deep learning and your networks
and state of the art models and
community contributions and so on. So
firstly what is that several dancers
always this the standard softer for
general machine learning but it's great
for declining in particular and it was
it's open source or can get hot you can
check it out it was forcibly released
in November two thousand fifteen with a
very flexible license you so you can
use it as you please no I'm gonna show
though fee shall be your first so that
you we get a very high level overview
of what's one drummers on international
remote each For deepening over the last
few years I was initial research
project we've since collaborated with
about fifty different teams a to put
these systems in real products across a
really wide spectrum of work today it's
used heavily in our speech recognition
systems in the new photos product email
or if you need to come that experience
in the that entrance. So tense close
this mission dining library that you
stick a school for the blind implying
to a lot of different is doing both
artificial intelligence research and
deploying production models they're
really powerful at doing various kinds
of perceptual and language
understanding these models are able to
actually make it so computers actually
see actually able to understand what is
in an image when you're looking at what
is in a short video clip in that
enables a kinds of powerful product
machine learning it's a secret sauce
products of tomorrow it no longer makes
sense to separate tools for researchers
machine learning and people who are
developing real products there should
really be one set of tools the
researchers can use to try out the
crazy ideas and if those ideas work
they can move them directly into
products without having to rewrite and
the research side a list you then you
understanding to existing problems
advance acidity are existing problems
understand the problems where it's for
an engineering side because it it is
insights from the research. And use
them to individual products in front of
features they wanted us for quite
offencive flow is to allow
collaboration and communication between
researchers it allows the researcher in
one location to develop an idea next
alright and then just send code that
someone else can use on the other side
of the document a lot easier for yeah
I'm not going to have this as an open
source to really hopes that instruments
that effort up. So they expect
developers to be able to do a lot more
than they can do today we think we have
the bass machine learning of
destruction and and mediocrity share.
And that's we wanted oh oh so I guess
that gives a very high level overview
of what the aim of tens of low is an
authority touches upon a lot of the
discussion yesterday and the panel with
what frameworks are good for research
what radio frameworks are good for
development and so on. And I really
want to stress and we will look at this
in more detail in this talk is that ten
several ins to be a tool for everyone.
So what the aim is to bridge this gap
between researchers and developers and
they are scientists and so on. So this
topic focus on two things first why
does Google care about machine learning
to might wonder then the second thing
is what makes them so for good machine
learning framework and we're gonna go
through through this in to the second
one especially in detail. So firstly
why does Google care about machine
learning and specifically deep
learning. Well declining has this
really nice promise of universal
machine learning right. So the idea is
that you can use a similar set of
algorithms to do speech recognition
query understand the ink text to
speech. And whatever else you might
want to do and you don't have to do
with feature selection yourself so that
is a very nice promise. And the
advantage of deep learning is that
apart from giving this promises
actually works because it gives the
promise that the very nice promise but
if it doesn't work better than the
alternatives then it wouldn't be that
that useful. So then I think about
declining that now it's currently state
of the art in speech recognition image
recognition machine translation and a
lot of other applications not Google
we've seen a very big growth in the use
of deep learning from very little in
the the beginning of two thousand and
twelve to more than two thousand
directories containing a model
description file in the Google source
code repository so that's a lot of code
and a lot of models where are they used
to well and a lot of products turns out
so I hope you can see some of your
favourite products here I can so for
all those global keyboard inbox GE mail
drive you to all these products now are
better because of machine learning and
you also probably know about all my
goal which achieved this the
breakthrough in I that was taught to
not be possible at the moment you just
in a couple of years. And this is
another showcase of how Google is
interested to push the field forward
now going to products some of the
product that I mentioned before in the
product slide I'm gonna go a bit of
into detail into some of them. So you
might not inbox that's an email me now
provided by google. And in November two
thousand fifteen it launched this
feature called smart reply. So the idea
is very simple when you send me an
email me how do you wanna go for dinner
tomorrow. I have a couple of very
likely answers that I might give sure
why why not or how about today or how
about lunches that and you can see how
machine learning is a really good fit
for this task because from a lot of
examples you can kind of learn what the
possible answer is that an incoming
email and in boxes the smart reply
feature but inbox has been very well
received well it was initially launch
of April fools joke. But that also has
a matured a lot because in February two
thousand sixteen and restored up more
than ten percent of mobilising import
import supplies use market like which
makes sense because if I'm the goal
trying to catch the tram I don't want
to start typing I just press a button
and it does that for me. So that's
really great now another product that
makes a lot of sense to use mushy
learning is Google play music. So U
Penn euro listening history and given
certain types of music that be like it
or can recommend you play this it can
recommend you other channels that are
similar for example and so on one of
the new ones additions or relatively
new additions is the ability to
querying for those in a two clearly or
for according to some string in Google
for also if you're like me when it
probably you take literally thousands
of followers when you come home and you
wanna show your parents he actually
this really nice cherry blossoms it
would take you hours to scroll to
scroll that right well with this you
can just say hey these are the cherry
blossom for because I just carry them
in groups for those but that's also
saves you a lot of time and if you're
travelling you might be travelling to a
country where you don't really know the
language more descriptive and of the
inhabitants of the country. So you
might want to use this translate
feature which allows you to take a
picture is about particular sign for
example and translated so this combines
computer vision and translation to to
provide a better user experience now
this is about to why Google cares about
machine learning and how our product to
become better due to it and I let's
talk a bit about ten subplot which is
the engine that Powers a lot of these
features. So forcefully why build
tension flow in the first place vocal
have have not they're deploring system
it was called especially if it was
really great for scalability and
production training but it was not as
flexible as researchers would have
wanted. So was a bit again this this
trade off between research and
production I think things. And having
this understanding of already trying
something the first time really allowed
us to simplify the problem and to learn
from previous mistakes. And in order to
realise what we want from a machine
learning system and from a deep
learning system we have to think about
who uses the machine system right
different use cases have different
requirements. So as you probably can
imagine some researchers developers and
they just Landis want to use research
framework. And they all have different
goals in mind. So researchers want to
quickly the rate they want to
unspecified their new crazy idea and be
able to see if it works or not wanna
let a nist or something much bigger
such as in that developers want to take
these ideas and quickly put them into
products without having to wait for one
year without having to port some code
we'd written in a research system and
they just find this they just want to
tweak these ideas that research have a
researchers have on their own data sets
to get a maximum performance. So with
this in mind. This is what we think
that one would want from the research
system. So forcefully ease of
expression for a lot of crazy ideas
that you might have scalability you
want to be able to run your experiments
pretty quick portability this
especially important for developers you
want to be able to run on a variety of
platforms quite easily reproached
reproducibility so that research chairs
around the world can collaborate they
can sure call they can share models.
And the production readiness so again
the idea of going to really quickly
from research to real products. And I'm
gonna iterate through all of these I'm
gonna take them one at a time. And
actually show you how tense from needs
me meets give each of these so let's
start with ease of expression. So
architecture behind dancer problem. It
is very flexible. And the core idea is
compare the computational perhaps
something that it's similar to other
framework and the idea that you is that
you always specify your computation as
a director sickly graph and you then
add optimisation on top of that and the
general procedure when you work with
sense of flow is to define a graph in
the high level language so you don't
want to deal necessary with memory
management this on when you just want
to be able to say hey this is how many
of them all those should look like the
graph is then compiled and optimise.
And then executed I dream parts you
might not want executing paragraph
fully on the available label devices
which might be CPUGPU or what are the
device you might want to use. So the
core of tens of fuel the core execution
system is in C plus plus and this
allows it to be in a very efficient and
very a good in terms of speed. But
you'll we'll all different front ends
and then and terms of how you want to
specify the computation so you can
specify your computational bracken
python C plus plus today but if you
really like job ah you can actually add
another front and easy another
important point when talking about
things that expression are interface
it's again different people want
different kind of interfaces if I'm a
researcher I want to be able to specify
my models up to the matrix
multiplication level right I want to be
able to say this to answer targets with
a together with the stance Erin I
applied this operation or on them. But
you might also want to be able to use
higher level API is you don't want to
go to the matrix multiplication level
for a CNN or for a deep neural matter
because you might we use the same code
again and again and so on. That's of
little allows you to go both ways
depending on your on your which case
which is very useful now about
scalability. So imagine you you
probably are already aware of this
particular run experiments and they
take a lot of time this can become
easily cumbersome social experiment
takes a couple of minutes or hours this
is pretty great right I can start my
experiment I can easily see again have
this feedback loop about my idea is
good I'm gonna pursue this or am idea
doesn't really work or I have a bargain
I want to to debug it does little bit.
And this is great for research right
you get this good feedback loop if
experiments take a couple of days
that's horrible at this point you
probably already start trying multiple
ideas in parallel and trying to see how
each of them or call because you have
to wait a couple of days. Now if you go
two weeks then you you can see how
appropriate sporting right you can only
try your best ideas out now if things
takes up more than a month probably
it's not really worth trying. So how
does that several allow you to run
experiments quickly well you can use to
be use you can use multiple ports and
multiple cheap you cards and you can
also distributor training on multiple
machines so if you have in your lab the
cluster you can you can use that to
decrease your experiment I'm you know
when you want to distribute computation
you always have to have into account
communication overhead right. So if I'd
dispute computation on two machines but
I only get ten percent speed
improvement that's not really great
because I'm using a lot of
computational power for very little
again. And importance of flow there are
two solutions in particular that I'm
gonna name here that I use to avoid
this communication overhead. And the
first one is to exploit model
parallelism. So especially for a
articulating our couple models that are
pretty good for that. And the second is
to exploit data for Ellison because our
training sets can be you apart can
split into part in is them at the same
time. So let's look at each of these so
for more liberalism how do you do that.
Well you can use instruction
parallelism one single for this is
pretty good it's pretty much free when
you get to do this across course you
have to use straight paralysis them
which is almost free unless you have to
go to war sockets. And across devices
yeah so if you go in between multiple
GP use you are often limited by PCIE
bandwidth and across machines you are
very off of limited by network and they
do it or latency and with this in mind
let's look at how model paralysis
actually works for a network like a
convolutional your network. So the idea
behind convolutional neural networks is
that you have this image that close to
to layers the this is the input layer
layer one layer two and then you get
this fine the representation of the
image at the end. And you have these
kernels also called local receptive
fields that get applied to each part of
the image patch so you can see them
moving around like this in the way you
can split this mortal into multiple
machines are multiple chorus is by
partitioning parts of each layer that
are gonna communicate much together to
be on the same machine. So can you want
will avoid to put parts of the model on
a machine and part of another model
another machine in these two parts have
to communicate all the time because we
end up with this overhead. But in this
case if we do it like this we can we
minimise the network traffic because
when we compute the values of the
neurons at this layer we more or less
always look at the ones in the same
partition on the same machine apart
from these ones that the boundaries. So
you can't really because of having a
compulsion guarantor for you can
completely avoid it but you can
minimise now they paralysis. So the
difference is that we use are also
getting might be and usually we use
batches anyway and the idea behind it
probably them is how about we use we
train multiple batches at the same time
or we see examples by different model
directly costs at the same time so the
idea here is that I don't have one
model. I have multiple model replica as
that copy the pro parameters and they
each to their specific computation and
then they tell a parameter server that
keeps kind of the gold standard for
what the parameter value should be how
to update the parameters. So this is
kind of how it looks like. So as I said
you have multiple a model Ripley
because you have the data that goes
here in parallel each replica sees
different examples and then one the
model directly come as computed
finishing its update so for example in
the case of neural networks gradient
really dysfunctional it sends this
update to the parameter server sense
it's okay please update the parameter
server at the parameters and sold all
the other probably colours now when you
think about this the picture. You have
to understand that there are two ways
to do these updates right this directly
cocktails the parameter server updates
the parameters this Ripley cut does the
parameters or were updated the
parameters and so on. But you can then
combine these updates into one single
update right if we want to be as close
to the original algorithms for example
we understand we want to do this update
synchronously so we wait for all
directly us to finish we combined
obvious together and we apply them only
once to the parameters or four so this
is how it looks like this model has
computer to update this model replica
has completed an they this model
replicas computer didn't know if they
get combined together and then only one
day a one update get sent to the
parameter server. So this is actually
equivalent to having an and times
larger batch size of the computation
that you do the the training that you
do it is exactly as if you have one
model with ten times larger than size
the pro is that you have no gradient
stillness or the model replicas are not
operating a borrowing from still
gradients. But the clone of this
approach is that if one machine fails
then you have to recover and wait the
other machines have to wait for for
this one to to recover you can also do
an synchronous updates. So here the
difference is this part each more
directly got can update the parameter
service. But as you can imagine here
the problem is that one model directly
come I sent and updates with respect to
some parameters that are no longer
there because some of the replica has
modified and so it's not really the
same as what we usually do but the
problem is that it's relatively full
and and in practise it works if you
don't push it would too many rep
because it works. But but both kinds of
updates of both with synchronous and I
think Ron yes you really want model
computation to be large enough. So that
it's worth sending the parameters over
the network. So we saw what sorry we
saw here that each model directly count
has a copy of the parameters of the
parameters server is sending the
parameters over the network to the more
directly by every time this there was a
thing. So if you say these parameters
all the time and if there are a lot of
them use you waste a lot of time by
just sending the parameters of or the
network. So tight gaze to strike this
balance between the computation that
the network bus with one set of
parameters without needing update
versus the number of updates that you
do and given this this P down to depend
on the kind of one also for very dense
models you can get ten to forty speed
up compute directly cars and sparse
models that have less parameter support
many more red because even up to one
thousand and in terms of more doubles
certain models to use each parameter
many times so for example convolutional
networks apply the kernel for the local
receptive field on all possible patches
of the image right to that means that
the same parameters is your arm used a
lot before you need an update. So that
makes them good candidates for data
paralysis same for my current models.
So if we can models are big used for
very much used for sequences that what
they're built for so if I want to do
for example some language modelling and
I feed into the network we have as
giving a talk about the answer for all
either one word at a time so me had a
lot is giving a talk a or one character
at a time the model uses the same
parameters for each input until I'm
done with this sentence. So that makes
them what candidates for this kind of
late upper Allison because they do a
lot of computation before they need to
know so now let's look at some numbers
the part is how this helps. So this
plots the image Annette inception
synchronous training. So you probably
know from yesterday what the inception
model is it's a very big architecture
that this current here trained on the
internet and you can see that time in
hours versus the obtain precision on
one CPUNGP use in fifty two pews. So if
we look at one GPU versus fifty GP was
and we fix our precision so I say that
if my model has zero point five where
other one preparation precision I can
go home for the day I've done my work
and I'm happy to go home well if I use
a one GP you then I have to stand for
for three days if I use. PGPU was in
two point six hours I can already go
home so that's a very big difference at
thirty times difference. So not is that
it's not mean you're right I increase
the number of TP use fifty times. But I
still get the thirty time so they're
still an overhead. But I still get the
massive massive improvement here. Now
if we look at the GP was versus fifty
GP use a different accuracy levels. So
it's zero point six and zero point six
five you will also see around the four
times speed up like going from ten jeep
used to five GPU so this is a pretty
good and this is kind of how the graph
looks like for how we when you increase
the number of workers. Um voices we
increase the number for orders how many
examples per second more the the model
can see so you see that if you use a
hundred workers you get a fifty six
speed up versus if you use one if you
use a sixteen workers to get a fifteen
speed up courses if you one so what's
where again it's nothing actually don't
get them a hundred times speedup if use
one it's clear that there is some
overhead. But but can still speed up
things consider and so they got
relatives and not only great in theory
and and so on it's actually very
important for also this indigenous
inception training actually use this
fifty GP use smart reply I talked about
this the feature of inbox look a bit
earlier it uses the sixteen ripped
across to train the model each with
multiple GP use. And the state of the
art language model one billion moral
benchmark uses both data and model
parallelism one thirty DG you so this
is actually very much used in practise
now this talks a lot about multiple
devices but how about one device ten
several performance I put this here
because it's related to the to the my
previous slides. So abundant supply was
initially we in November two thousand
fifteen it definitely had some speed
issues. But it has improved and it
continues to improve so you can see
here this this number one these and
numbers it's getting quite good but
they're still definitely a lot of work
work to do in this disrespect now about
portability. So as I said before it's
very important that you have a machine
learning framework that runs on a
variety of platforms because that also
decreases the time between researchers
coming up with ideas and the time you
want to you have to production Isa
model and also saves a lot of developer
time because they don't have to port
code from one one architecture to the
other. So that's a flow works on CPUS
keep use one up for phones distributed
systems and even customise your
maligned hardware so it's very very
flexible in that way and if you're
interested in how to do this there is a
a lot of tutorials out there of how to
do how to use to answer for both and
read and I was so here are some screen
shots on how to use them in each net
already trained models you don't have
to train your own model to do image
recognition on a right and this is all
and I was so if you want to see that in
the speech or there's some ice cream
and chocolate sauce then you can build
an up to coming to show you that now
how about reproducibility and so it's a
flaw is open source as I said but
flexible the Apache two point zero
license. And this is very important for
us because we think that this really
helps push English learning research
for word because researchers cannot
publish code for new algorithms tends
the flow they can create repositories
for train models. And that's really
also makes research papers reproducible
how about the external adoption of tens
of so if we look at the cute have most
of the people planning framework that
we we are familiar with on our own
guitar. That's a flaw has twenty seven
thousand stars or did when I created
the slides and ten dollars and forks so
it's so much popular then again the
other frameworks in terms of get a bus
tires and forks. And this is even
though it was lunch and only November
two thousand fifteen also in terms of
external adoption in seventy two hours
after lunch they were more than fifty
thousand installs. And more than five
hundred thousand since November two
thousand fifteen. And despite it being
launched only November two thousand
fifteen it is them most for people in
two thousand fifteen out to pick up. So
we think that's pretty another four
point of tends to flow our tutorials
and documentation. So it's very hard to
start with any framework especially if
you're a beginner with machine learning
to don't know much about machine
learning or planning in particular you
also have to learn to frame or you also
have to learn how to deal with machine
learning and I think that's a flaw has
a really wide variety of tutorials out
there. And it caters to both needs. So
if you already are very much familiar
with deep learning you can provide the
expert in this tutorial which keeps a
lot of the deep learning details and
just goes into hate this is how you
stance of role or you can use the intro
miss tutorial which goes a log into the
details of how the model actually
actually works. And of course if you
want to find out even more about how
the internals of ten for works. There
is a excellent white paper we used in
two thousand fifteen that talks a lot
about the internal computation engine
and even though the optimisations
performed by answer flow and so on. I
definitely recommend it. Now about
production readiness. So it's very
important these days especially with
with declining advancing so fast to be
able to integrate these new models in
this meeting breakthroughs in products
to actually make them available and
useful to people that use their phones
or their laptops everyday. And that's a
it's actually very easy to train models
in python so this is ideals very high
level. And then developers can use this
into C plus plus enough to serve
production cost of of course is very
very efficient in much better for
production code. And them because you
can use the role models that developers
don't have to train models themselves
they can just used the ones that the
researcher strain. It's not an as a
concrete example going back to smart
reply inbox in four months it was
stolen from research in deep learning
product to the project to launch
product that you can you all use on
your phone now. So definitely having
this short iterations cycle and having
the same tool used by everyone helps a
lot with moving moving much faster so
in conclusion for this for sport I
think I was a bit because I was talking
machine learning is definitely changing
the world is changing how we use our
phones how we use our computers how we
think about what problems we can solve
or not a lot of problems that we
thought to not be able to solve right
now are becoming easily easier to be
cracked. And the nice part is that you
can be part of it so when you think
about solving a problem. You you should
actually think should I use machine
learning for this can I use a machine
learning for this and there's a lot of
tools out there including tensor for
all that are free that have a lot of
tutorials and a lot of documentation.
And they can really help you help you
get started with this. And just I think
this is the the incoming message
especially for for those who have not
one already into this mindset it's very
easy to get started in is very easy to
make an impact these days with all
these these available tools. So that
why will take questions if you have
some oh and then we'll we'll continue
with the second talk okay yeah I thank
you for the top. So I wanted to know is
is there any or other or this framework
does not open source that using the
will was not open source here I think
the difficult questions here I rather
not comment I would just say that it's
yeah yes so you see that things might
take more to get out there because
they're very high standards to make
things open source so for example the
distributed training was not in the
first open source please but it got
there now right. So that's that's what
I can say things are are getting up oh
thanks for the talk it's not a question
about the internals of Google are there
any projects where you try and then you
decided not to use Spencer flow again
I'd rather not comment but I don't
think that's awful has any specific
limitations so people are definitely
it's it's not like it's an interesting
thing that there are problems with it
is definitely very much used and it's
made if there are problems I'm sure
that people are gonna fix it. So I'd be
very surprised but again. It's always
asking and asking about open source
file maybe I one question there is a
lot of contributions from external to
Beatles so there are plenty of
contributions for and me Reading cards
and we will go through through this
also later small both to the core
repository there plenty of external
contributions. And also so feature
requests the idea is to if you want
something don't just assume a it's not
there I'm gonna try later just ask for
it and for example for dumb distributed
oh the way to specify your class to for
the the distributed computation and
talk about that in the second talk. Um
it's a bit cumbersome today so people
we are actually asking people what you
want to see right so it's not only that
of course we accept contributions and
if you look on the get up a repository
to actually a lot of very interesting
ones and people are collaborating even
meeting together just we want to do
this not as a four point two pewter but
just we're gonna meet encoders to get
transcend the patch in the patches get
integrated with the repository so
definitely and to you know T algorithms
for quite in this and that are used in
the distributed version of the
synchronous one and you synchronous one
I mean I have ideas about suspects
maybe a more useful the synchronous and
downpour for the asynchronous but so I
mean you can specified optimiser that
you want to use is just that the weight
updates will get applied is different.
So that but our burden it's not that
when you choose to do they got pearls
and you it will fix the algorithm for
you. Because when you build the
computational graphic unspecified
optimiser. And it's just that how the
updates get applied to the parameter
server that changes between the yeah
but there's a there's a constraint on
that depending on on on whether a
department or server was executive
director executive communication
there's a limitation of which kind of
distributed algorithm you can actually
apply to get the stochastic gradient
right. So downpour for example is
famous for the fact that not only a
synchronous but that executors can
communicate between themselves which
brings a downpour to some kind of to
some H is in in in some cases where
you're great search get to so you have
a centralised parameter server
executors talking to it without talking
to each other and that's you have to
specify get to visit mice yeah so
actually go back or yeah it's cool. So
I think it's less about what were yeah
so it's less about the optimiser
because the optimiser will just it's
more about I think the questions more
about how you update these parameters
right because optimiser will run here.
And then the optimiser would tell you
this art topic that I need to do in
these updates in the synchronous case
get applied in the order that they come
to the parameters fervour. And then and
I see in the synchronous case they get
combined in the the usual way. I think
for the talk I I'm actually wondering
like another point C plus plus and bite
on and what these needed to bring it
all on draw it all wireless phones
doing need to rewrite the four were
also in joe. So things I'm not aware
for I was four and right there is a an
example that you can have a look at
look out there I some J and I called
that you have to deal with so the the
there it's not a big because with and
really have to do with Java you have to
do with the bit with dismissing this
but there is so for for that at that
actually there's actually examples
online it's not that much cold. So I
have to look at it it's not that much
but you definitely have to deal with
this a way of holding C plus plus from
from Java So thanks for yeah being with
us today that's easy an idea time I I I
would like to know how you use their
stuff law or if you are developing it
to activate though if you use a oh what
you're doing with these are job at
Logan actually okay so I am not
developing as a plot but I am
successfully and happily and successful
and happy user. Um I specifically work
with recurrent neural networks for an
LP related tasks and this is what I've
been using passive role quite well for
some time now if I think the the thing
about it is that it depends a lot where
where you come before. So I knew
fountain from before. So it makes it
very easy to adopt civil but because we
have all these tutorials I think I was
actually surprised by looking at some
of them of okay I could easily get
this. So even if you don't know machine
nineteen if you don't know a a prior
framework so it's not only about ten
suppose not only about switching from
one other frame onto the other it's
about encouraging people that currently
don't use the learning particularly at
all to start doing it because it's as
we will see some examples is actually
surprisingly easy to use and gives very
good for a user expanded because it in
python for the part that you use mainly
for training in experimentation you can
easily integrate with your free with
things that you really use for python
data analysis right so I'm the tape I
can person and I really like that I can
use with ten so for the result station
tools that I was used to for in the
last five years. So I I definitely see
the python part that's a plus it was
also discussed yesterday so the the
licensees Apache to yes. So it's it's
kind of business friendly so yes I'll
do you keep track on what companies are
actually going to awful and product are
likely maybe you do not have a way of
incorrect but some for this I'm not
aware of anything of keeping track of
it. But the point is to just put it out
there give as much support as possible
so that people start using it hi. So I
was wondering whether that's a flaw is
paralysed then on a single machine via
multiple colours out of the box or is
it something that a need special like
configuration I think this is something
that you have to configure yourself
yes. example I mean from PVP previous
experience with kathy. It's simply a
matter of using a a normal library for
privatisation other month because you
just launch things on multiple colours
and they are actually paralyse you get
the speed improvement directly like
with one one line So for example by
using open plus you mean or well it's I
think it's I I don't remember the
library name but I think it's not
knowing about sniper ah okay so I think
I I don't have an example of
specifically how to use more multiple
processes but if you have multiple GPU
was for example on the same machine
that is pretty easy. So I wish one
example of how to distribute your graph
one if you have to keep you card or to
CPU card then that is very easy to do
and basically one line okay so maybe we
can of the coffee break because sums
that fifteen which is always a bit
short again brings me either yeah you

Share this talk: 

Conference program

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
4 July 2016 · 2:01 p.m.
Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
4 July 2016 · 3:20 p.m.
Day 1 - Questions and Answers
4 July 2016 · 4:16 p.m.
Torch 1
Soumith Chintala, Facebook
5 July 2016 · 10:02 a.m.
Torch 2
Soumith Chintala, Facebook
5 July 2016 · 11:21 a.m.
Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
5 July 2016 · 1:59 p.m.
Torch 3
Soumith Chintala, Facebook
5 July 2016 · 3:28 p.m.
Day 2 - Questions and Answers
5 July 2016 · 4:21 p.m.
TensorFlow 1
Mihaela Rosca, Google
6 July 2016 · 10 a.m.
TensorFlow 2
Mihaela Rosca, Google
6 July 2016 · 11:19 a.m.

Recommended talks

The Grid: Website designed by artificial intelligence
Henri Bergius, VP Engineering at The Grid, an AI that builds your website based on its content
12 Feb. 2016 · 2:29 p.m.