Embed code
Note: this content has been automatically generated.
are as a set financial services
industry so I have to see players also
media and entertainment and all the all
the Telecoms very interested in this
now. And and also medical applications
and Siemens for example and I will have
a video coming in minutes but I
hopefully will work but Q did I I is
designed specifically for pascal. I
mean this is essentially everything as
it says that you need to accelerate as
applications. Um and I'm not sure if
this is going to okay trying to run
this but it is not want see okay so can
one okay so just to give you a bit more
insight and into how did yes obviously
this is a server itself but housing
eight task of on board and it will
basically give you three the last where
we three times higher where loads with
the HBM to start memory and with the
pipe the page migration and and be link
you have this virtually unlimited
memory space. So that's highlighting
then shook you mash and then and if any
band itself. So the unit itself is a
sex about it it actually has full power
supplies itself with sixteen hundred
Watts each. So it's quite hefty machine
but is is three you said so and what
we've done is it's not just a hardly
unit we also use contain isolation to
include every piece of software that
you would effectively need and that
obviously comes with. Q site. Um which
is everything on board that you
actually need but also let me see okay
so optimise the the Pascal unified
memories of mentioned all the libraries
that you'd actually need also includes
and graph as a mentioned earlier and
every all the the basics such as Q
glass and Q spouse. And any developer
tools that you'd actually need and the
way we do that is by using and doctor
or actually off and fork of doctors and
read a cat. And and initially it would
be digits. QD and then obviously cuter.
And and including fee on a torch. And
we cafe so I saw the focused cafe the
reason we have our own focus because we
we do software development a whole lot
quicker than the than the main branch
of cafe you then will integrate. And
but we have our own initially tends to
flow C and CK calgary carrots and chain
and makes net so we incorporate a lot
of the different frameworks in that in
contain the based and in in contains
themselves but because it's containers
you can essentially just take a box and
say I'd like this stuff like this and
like this and then it all just goes on
on board as a bespoke system for you.
And and the obviously this is a very
secure as it secure as Klaus software
is today. Um and read doctor repository
which you manage access for and when we
have updates we essentially send you an
email and you click and download. And
they were also wondering again some of
our other and video software things
like hire a which is a rendering. And
and you know this is a local products
we for graphics themselves and all
those will be incorporated should
customers need them and and you get the
option of whether or not you want the
cloud based maintain supported version
or you can just have a standalone I
west bags. Um we down and and and it's
yours to run within you do you are you
your own update six extra. So
essentially you would you would log
into and area called computed and T
don't common you manage all you
software that waste you know no no
longer need to mess about with all the
different dependence is the required
for the very specific the learning
software that are out there and the the
the hosting site just tend to be quite
popular. So that stuff that software
stack also comes with them very simple
packages. And to and to include. So as
a set it obviously you go cheap you
optimise limits you've got the drivers
themselves and we don't have then
you've call software which again is all
optimise digits. And the actual
frameworks themselves that are again
all GP optimised. So things are just
speeding up and speeding up and
speeding up. Um two separate modes for
the actual software side is I said a
moment wellness too much because I
think you're more interested in the
research than the actual purchase but
essentially that area that I mentioned
the container hosting allows. And
things like scheduling locally using
sauces the the scheduler. And clap
close to management and and this is
where we can handle all the updates and
security and there is a lot of updates.
And I I know I I'm watching this real
progress and it really is incredible
just just how quick things a working so
it is a way to handle the updates and
this is probably the best way I'm
seeing a lot of people using things
like doc to to make and even even
creating yeah elements to run groups
and separate environments where perhaps
one of you is is working on cue to six
environment with the various things and
any also that and somebody else is
testing on Cuba right it's a very
simple way of doing it. And in the
meantime that S the cards I still
available. So these provide the mixed
precision and are still very much in
demand and for the the accuracy side of
of mix precision of and double
precision and again they will be around
for quite a long time that they're not
going anywhere because it's really
about increasing productivity. And
there is also something that you you
need to take into account the whole
world out that and have business and
and injustice actually trying to catch
up even with these cells whatever level
of the landing on the standing in
dollars that you're actually that's
right now personally everybody in this
room you are probably two or three
steps ahead of the law business out
there. So we're like it's all about
increasing the actual productivity that
we have to do. It's about jumping on
this bullet train is D learning and and
actually holding on and and keeping up
with it so it it is a big thing to turn
off. And simulations one on CPU to
today I mean that you essentially
especially in and big Gates sensor
wheels. And it can mean days days two
weeks and you generally don't have the
visualisation capability doing that. If
you switching to GPU you actually bring
that some been that simulation down to
two hours. And if not days. And you're
able to actually work within provide
visualisation find out how it's going
while you're doing it. So really does
make a big difference I've included
that simply because Carl itself and and
using GP use and and this is really for
those big challenges you know it I
mentioned earlier just come back from
and the first week of a six week
program using GP use and the learning
for astro detection you know do we now
being being able to address some of
life's really massive challenges. And
and this the work that you're doing and
hopefully about to learn how to do even
better and they will allow us to create
solutions and at least gains more
understanding a lot quicker to some of
these grand challenges and and forty is
I've mentioned a couple of times is
currently available now as opposed to
Pascal which will be like to this year
and the to the wider. And so the wider
public Pascal is the single precision
all the cars obviously have
capabilities of those but it it was
developed a single precision which is
really what D dealing needs. Um as
opposed to double but it's also edging
up towards the the the larger scale and
distribute it and so we we added
another twelve gigabytes so now twenty
four gigabytes on board so that's a
over three thousand cool GPU that's
available today and and I mentioned
only the M four so this is the
inference side we essentially. And a
landing really has two main where loads
which is the the compute intensive
training and then the inference. So
this is deploying to mobile phone or
onto an intelligent video onto. It's
small cameras for example. So we
essentially split the willows and for a
Oh so it is wonderful to be a right and
comments is really I don't know how
anybody get any work done. And this
place because it's so amazing outside
first of all and there's a lot of
acronyms hearsay rasta probably I
apologise for that hardware and
software update but I'm actually a deep
landing solutions architects for an
idea hands the LSA actually have the
the longest title that I know about
because I'm also community manager and
and here as France was that is seven
your middle east and Africa but the
learning is everywhere as we know so
that means that I'm sort of luckily it
it an amazing job I get to fly all over
the world but I also get to see the
learning and to advise on the
application of daily living across the
board in every kind of application have
had to really sort of strange
applications but I've also has some
very impassable applications that I I'm
lucky enough to to be involved that and
because it really is everywhere it is
altering the way the companies operate
is altering the way that we provide
services to to our customers and it's
also everywhere in the sense that for
example the cloud. And a large number
of applications a huge number of
startups popping up everywhere and
especially with IWS another acronyms so
some some web services a lot of
startups tried to begin that because
it's just easier to actually implement.
Um and so there's a lot of a do we have
power back and very we have image
speech language sentiments analysis for
example but all in in medicine and
biology that was my initial research
was using D learning to find minute
features and histology images. So
prediction even just a binary
classification of whether or not that
was a presence of of of a a problem
that we could then flag that image to
go up to the human I think they're
really big part of the learning is
human assisted a I So this is ourselves
been augmented by the amazing
capability of of dealing. But it's also
everywhere media and statement "'cause"
as that you actually have showed you
earlier and you can go from just
driving into Pollard say you know being
able to drive down the travelled given
enough data you can agitation machine
I've actually I don't if you can so the
stick I've actually just come back from
starting off week one of the project
would not sat on the set institute. So
they are being very powerful in the
sense of that was a terrible problem
actually by using the learning in them
to detect. But also ultimately to
mitigate asteroids. And potentially
hazardous asteroids so that's probably
the the best application yeah that up
in involved in very exciting autonomous
machines of above text also ramping up
very much and that was that was
actually an internship that I was look
enough to be involved in by turning
actual whereabouts individual robots
putting them into an gangs engine so
you can then do simulations. And it
turns out obviously this is everywhere
that and they D was already involved in
because we grew up around gaming we
grope program gaming environments. So
I'm seeing this alignment alignment
especially things like we are
everything's now coming together and
it's it's it's great to be in this
position price of the to the week over
in then California with not so I was
actually I see "'em" out along with
them a lot of people in this room. And
then just a few of the predictions that
the colleagues and then I came up with
from ICML and also CVPR and that this a
this a lotta conference is going on
about date learning. So an LP and then
know he definitely the best is yet to
come there is a a a lot of work going
on that and you actually show just with
them with textbooks to mow the it is
going to life's gonna get increasingly
easier they yeah the more able to just
speak to the to the gadgets the the
when our carried around with this all
the time. Um we are as I mentioned in
medical but also in drug discovery
meetings and themselves perhaps at some
point it'll be a a virtual me standing
here and I won't have to make five
thousand three hundred mile journey. Um
but it's going to make life a whole lot
easier in that sense but also unseen
mass deployment in the exercise the
financial services industry. And where
we using the the ability dealing in to
take all the data that that we produce
an older historical data that these
financial house is already hold on
those and and gain insight and and that
leads then say better products for us
better services. Um as I mentioned
reinforcement learning with robotics is
huge. And cyber security is really just
getting going now lots of applications
that are for any of you in the audience
particularly interested in and that's
side there's probably a lack of skills
actually but a lot of movement now. And
also extremely low power budget. So
Google some TPU and also in new remote
pickup in following your more fit now
for about two years and it's very
exciting I'm also having discussions
with people about evolutionary
algorithms and although I only did
perhaps a couple of modules and then
you know best about this. It's very
very exciting what you can actually do
we genetic algorithms to speed up even
perhaps find in the end the correct
weights initially. Um and then
releasing straight into the neural
networks themselves frame what
universal isolation this is something
that I'm sure sameness and then can can
go into in that bit more detail
tomorrow but it's it's bringing
together this large scale distributed
the learning and since we have T living
in the cloud this is a very big issue.
And but again we're still working we
still have challenges we're not there
yet. Um and I think it will be just an
evolutionary path and we'll keep on
learning justice evolutionist
unsupervised learning and how do you
teach a machine common sense. It's it's
a big question and nothing will
probably be trying answer that for a
long time coming I'm always that
problem with generally so but I think
you know at the moment this this then
seventy shockwaves from test list
crash. And we have to get over hurdles
like that it wasn't that long ago when
virgin galactic update crash and that's
again sets by things on the space
program space is hard declining is hard
you know we will get that but some that
will be bumps in the road. Um and also
anthropomorphic bias it so that's
close. And that the best way to sort of
illustrate this is Boston dynamics
where am then every single fantastic
video that they share with their am
robots that with the kicking the robot
all that make it run round corners and
throwing banana peels infants them and
it it's that sense that we're dealing
with a I would in with their robotics
with you know perhaps human form with a
I And we think of them as as humans
that tends to put in and serious prices
in the in the theoretical side of
things as well as you know the obvious
facts it. And and up the singularities
well I'm not sure about twenty twenty
five this is the the latest but it
certainly makes for a very challenging
and exciting failed so we all know
about some hours ago and and the the
only reason I'm putting the slide up
everybody's heard about this but it is
because of the actual impressive
progress that we're making but also
using an ensemble of in this case
convolutional neural networks and
reinforcement learning and you want to
call it research because the thing
about go is that we we did think it was
an intractable problem. And for
computers and I actually it is and so
you actually cut. And make it make a
branch of the actual tree and then this
is the months call it research there's
ways. So overcome challenges and I
think that's what we're really good at
humans. So it's you you tend to forget
that artificial intelligence is created
by bios. So the interesting thing
without for go is putting all those
things together. And and being able to
use a a right to teach a I to teachers
all that together again brings brings
us back to the to the human assisted
signs. And you know so far is match for
and see dolls move seventy eight and he
knew that I I would be tricked by
plagiarism or copying an actual
strategy. So and video started out in
the gaming will and hardware and GPS
and then we realise that within needed
to provide the software for it and then
we were hit just as everybody else was
with this T learning explosion. And so
we have to become a platform company
that enabled everyone and what we did
was we works and still work very
closely with everybody in the field. Um
and we try to align everything away
doing and we talk and we work alongside
research is to make sure that what
we're doing fits with what you're
doing. So we're all in alignment. Um
basically we have to just do our best
to keep up with this revelation and
it's it's also part of my job to to to
do that to keep up with the field and
it really is unprecedented. Um I think
was February twenty fifteen when I
first downloaded tools. And then place
box F beacon was was released. And we
were QD and then one. QT and then is a
library of accelerated primitives. And
we're now five actually five point one
I'm gonna give you know they are not
later and it really is unprecedented in
in the software will just how quickly
all this is something and for those of
you who are inactive deal in the
research and so you'll agree with me.
So we had to transition to becoming a
deal in income platform company. And so
how about is on a simplistic level is
hardware along the bottom. So the slide
probably needs updating now with the
latest egypt's one but we've got are
actual data box here which is still a
very popular unit because it's a
literally plug in and play appliance
and that has full time next is in that
and test la is our production so these
are highly qualified well capable for
the intense training. That's needed.
And test there is a deployment side
where is you know you could very simply
use a consumer called to do testing we
also have job yes which is the
automotive side we now have an entire
you know side of the business based
events it's a self driving cars and
automotive and also are embedded side
and I'll give a few more slides on that
later on top of that lies the untidy
Lenin SDK but that comes with
everything else that embody does which
is graphics and and and a whole lot
more and then of course we have all the
frameworks and they're very very very
many of them. And these all provide
building blocks that's essentially make
it easier for you. Um it's it means
that you don't have to be in engine
coda it it's literally taking parts of
building layers and neural networks and
just calling functions it really just
make things a lot simpler. So that's
just a basic overview of the the
platform itself but the the and tie
LSTK is it shows here. And includes all
game development virtual reality. Um
autonomous machines did learning so
there's a there's a whole lot of
expertise that kate's expertise that
have actually gone into this as well as
the specialist E learning STK and
that's all online that you can just
access in one fills with no with the
the and the DOSTK all in one place so
powerful tools and libraries for
accelerates in pretty much every part
of fifty learning. And and that's the
this is a a collaborative effort with
them would leading research and and
commercial organisations and worldwide
and and they depend on this library
they add to this library and they were
with us and to make sure that every
single part of that is seven is the
best that it can be and is able to
adapt very quickly to all changes in
and hardware which is a slightly
slower. Um pipeline than the actual
software but some it'll has to line and
has to align very quickly and rapidly
with the the itchy to progress this
this going on indy learning. Um so
every single major framework it it has
to be said that and they raise and run
by humans and unfortunately we need to
sleep and an eight and there are only a
finite number of actual software
developers so we can't cover every
single framework there are forty fifty
of them. And that comes down to
basically the fact that there are very
many and programming languages there
are very very many applications but we
do we we have started with with cafe
only now cover. Um a lot of the
majority of this and the central
section here that you can say and
across the board for as many different
applications as possible we do need to
catch up a little on them on speech but
along the button STK includes a whole
variety things I'm actually gonna go
into slightly more detail on but it it
is vital and that the majority this
work is performed in parallel obviously
training is you know is a huge
computational challenge and getting out
in hotter the more data were able to
actually throw into as as you're sure I
like to yeah but the yes TK brings in
all those already accelerated and parts
of the code that you actually need to
to make life a lot easier for you. So
developing come applications the
benefit from printing learning and is
slightly different from traditional
software development. So software
engineers have to carefully craft. And
lots and lots of source code and to
cover every possible imported the
application itself. And can receive and
at the core of the D learning
application much of that source code is
replaced by obvious of the neural
network. And and to build a dealing
application and you would first design
trying validate a neural network to
perform a specific task. And then that
could be anything identifying vehicles
and image Reading a speed limit sign
for example as the cars whizzing past
and translating English to chinese. And
so we also have digits which and this
actually came about from C yes. Um I
think in twenty fifteen we're actually
working with them a self driving car
demo. And it was my colleague look you
developed and is still developing on
it. So it's actually the the progress
on it is is slow in the sense that
actually can and then a small team. But
it is highly yeah I mean it's very
visual so it it's it does help you in
the fact that if you for example start
just starting out or whether you need
something where you get a lot more
visual to it. And have a lot more
visual input. And it provides a
graphical work flow based environment
essentially and it's just a really good
gooey it allows you to to designed to
train to validate the actual network
obviously there is no one size fits all
you'll net where you you have to
develop it run your own application
around you and motive of input data.
And and what we've also developed and
I've got a couple slides on this is a
GISSGP inference engine and this is
part of the actual work flow itself and
within dealing dealing that you would
go from digits to create the actual no
network and then find out in to the
actual inference side to provide even
more performance so the train neural
networks and essentially integrated
into and a software application. And
the feature new inputs to underlies all
in the as an inference on based on the
actual training. And and they can then
be deployed either on file service on a
mobile phone or wherever you working
drawings automobiles et cetera. So the
the amount of time and how well they
actually takes to complete inference
tasks. And is the most important
consideration there is a whole lot of
where that goes into the D learning
side in the training side but
ultimately what we're trying to do is
provide a solution to customers so
that's the demo that was shown where
music and then potentially ask
questions and get feedback. And the
inference is a really important part
you you have to understand that
research is one thing but that then has
to translate into the commercial world
and and you know be product eyes and
hands the amount of stuff to stand
bodies. And seeing at the moment and
also provides support for and it
determines both the quality of the if
the user experience on the cost of
deploying the application how much
performance you can actually get from
nine inference side. And so have an
energy efficient and height certainly
high throughput application is
absolutely vital and so we've been
listening to the people that we've been
working with and then therefore put
together. GIE to to to help so this
essentially automatically optimise is
the train networks that that you have
for one time performance. Um we have
figures the sixteen times more
performance walked on the on I'm for
I'll explain what the M four is like a
but this is a specific artist to about
the third of the size of the and max
well forty card specifically for
inference because it's only five of
between fifty and seventy five Watts to
actually power as opposed to about two
hundred plus for average cards. And
clap providers are therefore able to
more efficiently process images and
video in hike to scale data centre
environments which is one of the major
part of of what we operate what we do.
And but also when you think of the
amount of work that's now going into
automotive and embedded deployments
it's really essential to have a a
powerful high performance but low
power. And way to to to do inference
maximising throughput which is more
than often the bottleneck in then any
pipeline but also we have to train with
popular frameworks and deploying and to
production level. And and having more
accurate models but with lower latency
and so leveraging can FP sixteen and
being able to get those responses fast
and nobody wants an application on a
mobile phone that is really cool but
takes three four seconds to respond see
that that sounds hardly anything but
when you actually want you know thing
could be self only using you phone you
don't wanna be waiting for for that
long. So hence we have gee I so the
declining software itself digits is is
already now and it's direction number
four again fairly unprecedented we have
GIE and and QD and then as I mentioned
this this is cuter. Um fatigue neural
networks does anybody not know what you
as okay I'm just checking. And so cute
have thirty neural networks would now
it at five point one and and five point
one is is basically additional and
performance of the resin that's
residual net and also VGG style
networks. Um so digits full is bringing
in an additional so you can essentially
perform height promises sweeps and
it'll only make and then batch sizes
for improved accuracy and police also
bringing in an object detection. Um and
this is to work in line in the pipeline
with GIE but also GRE sorry about all
the acronyms so that's OGPS engine. Um
which I'm not going to provide too much
detail on there but again it's just
another tool to to a deployment. And
and you essentially are taken data
management and the options are there in
in digits to makes things simple for
you. And into digits and then straight
into into deployment and and it's so
Justin and to end fullest hopefully
seamless production line. And creating
models network design and is the full
visualisation that you actually get as
it says he can process you can
configure monitor the progress and
especially with multi GPA and when to
jesus first brought out of the way it
was a single. GPU in very quickly then
went to multi GPA which is why we
brought out the the digits that box
which had for time next is so it was
designed to go for four with the latest
iteration the did yes one we're now
looking at eight and many people are
using and multiple so on June twenty
eight and plus GP use and you have the
capability to to stop start and just do
multiple jobs and and in training and
and keep an eye on on what's going on
in your prototyping. Um so the said it
was is bringing in and be able to track
datasets results train neural networks
you can analyse yeah the accuracy and
the loss in real time all while you're
actually putting together the networks
the using and objects text is a brownie
thing we would brought in in time for
ICML and this slide is showing just a
few the applications we've been working
with cats prefer quite a long time to
cast peered round manufacture. And the
the top left is indicating a commercial
use that we're working on the way would
essentially using the aerial data so
that clients could could use these
trains to calculate distances sizes in
volumes and and essentially be able to
classify the since types of vehicles
that would driving around the actual
the quarry area and and that but also
you can you can see here would doing a
whole lot of where with intelligent
video on the left X and surveillance
and we were with them a lot of people
her to security one of the big names
for and tie a crowd surveillance and
personally involved with London
operation cold. And face what which is
a bit of a take on face book but
essentially what they're doing is they
using intelligent video analysis to and
and that's incorporated into existing
infrastructure of cameras throughout
the actual city I'm working with them
at least. So that the second the a
crime takes place people can report it
on a mobile phone out and then that
gets a bloated straight away you have
instant facial detection going from the
clouds in the actual database of
criminals and and it's it's really
speeding things up it's also things
like safety on platforms and train
platforms and a host of other things
including the medical diagnostics. So
for example G healthcare as and new
virus works is just an excellent
diagnostic tool to help diagnose heart
disease and this many many all the
similar approaches and the medical will
Q to five point one okay so this is now
introducing. And the the the the point
one is to add actually support for
resonance and VGG book you to five
itself brought in support for LSTM and
single gated are intense us a recurrent
neural networks and also cases
recurrent neural networks and police
actually optimise for Pascal which is a
latest chip architecture enough is FP
sixteen so that's a little bit
precision which means little memory
usage that's increase performance and
for popular compilations like three by
three filters the the performances
three D compilations themselves is also
very much improved with the libraries
think you D and then five so that is
pretty much salaries in any domain
using volumetric data which includes a
lot of medical applications so great
move forward. And this essentially
faster forward and backward
compilations and also support for when
the ground which is a very optimised
algorithm yeah I've mention the the VGG
and and resonant so that's the point
one side which effectively is when we
look at five point nine times speed on
that but the the five point one also
gives almost another another triple
faster training on and forty this is
the thing those numbers of verses a
cute indentation full okay this you'll
get a copy of this live so don't space
to memorise this but this and this
great we we have a an online block or
parallel for all which my colleague
insists has to be that of the highest
technical content so it's it's nobler
but going if you can actually check out
that some that URL that for optimising
recurrent neural networks with them
with five point want to give you an a
bit more explanation boats as an it
situation and of of a given layer only
depends on its operation and minus one
I thought laymen in recurrent. And
iteration and of the previous layer is
therefore possible to start on layers
before you've even finished on the
previous like this is actually a really
powerful thing when you when you talk
about using GP use if there are two
places essentially twice as much
parallelism available and easily
updates that the colleagues working on
with QTN insofar as using GP use and to
get the best performance out of
recurrent neural networks and which
which often have to expose much more
parallelism. And so that's the the the
work that would been working on with
kitchen and five I'm Nicole is is a a
brownie package it's it's out there and
get hope and feel free to to use it is
it is it a work in progress but it is
essential for anyone who's more
familiar with NPI as opposed to cuter.
So this is as it says that patent after
and all that collectives and
accelerating specifically for multi GPU
meets that need means that you have to
work on on that data flow and bringing
down like since is so that's really
good again open source some get hope
you can go and check that out and graph
no this is set to actually it's
actually available too early access and
I don't have everybody's aware this a
provisions of the start but if you
apply to develop a dot video dot com we
can provide any access to a lot of the
software that we push out see them part
of the process in testing and before is
actually sent out to production so you
can either get in touch with me
literally just go online to develop a
dot until you don't come for that and
and we graph I believe is is still
really access at the moment but and
this is very much needed in the sense
that as we get more more data graph big
graph until like six is becoming a huge
deal and we need to be able to cope
with that with GP use at the same time
I I believe some of the figures
favourite two and a half billion edges
being able to be competed on a Maxwell
forties and salsa about fifty
milliseconds and iteration. So you can
get a hold of that it's violently
access of belief but again another
great piece of work Q spa so this is
primarily and then an LP natural
language processing that's and the U be
using Q sparse again that's that you
can just look it up develop it on the D
dot com Q box and for it's essentially
takes dance makes sees by spots buttons
okay I see see this is this is really
more for sort of large industry where
you have like a C code that you
actually cannot just just take and we
paralyse the the whole lot so you can
actually take sections of the of the
code and and simply use he's proximate
to surround and paralyse certain
sections of the code this is actually
really really useful and there's a
whole lot of people putting this in
play to actually speed up parts of code
again that's and that's open source. So
this this and this is a suppose the
questions you will mind leaving then
ups to mess and I'll leave it to you as
to who who has all do you think he's
got the largest market share our what
currently say that Google stands to
flow and then face pops and which of
face book consortium with twists Erin
indeed mind et cetera still the the two
horse race but everybody has that
preferences when it comes to to
frameworks and really and that's the
big question is what is the preferred
method F and GPU allocation when it
comes to using can using applications
they these are this frame I a of that
for you to to aid you with with
building blocks but really have to
bring in to play how we you use GPU so
it's really so my my questions assume
if how you know can we introduce
heuristics into frame was like so sure
cafe or or tends simplicity we can
automatically configure perhaps
bringing in concepts I don't if
anybody's familiar with the one where
trick so we had to beauty paper from I
was conscious K where essentially use
eight apparently some in the calm. And
convolutional a is and then model
powerless them in the fully connected
layers maybe we can you start using I
and perhaps even some heuristics to to
make that automatic configuration easy
for is that something that we can work
of who's got the log the largest market
share really does come down to
usability I think but also I think can
sing miss would agree with mates it's
support as well. And that's a big deal
it needs to be able to provide support
to use a user there are as I said
earlier many only this is just a sample
of how many different applications is
it really comes down to the application
that you working on and I'm whatever
language you familiar with but to be
honest I'd never use lower and and I'm
picked up talks within about a week
it's very simple and then you can say
you pick if you're if you're very
familiar with matlab and then Andrei
fidelity it's at Oxford and has created
mark on that for you in this also has
limited a little bit about GPP friend
like it's is really down to to your
choice but they are that I'm I'm seeing
this quite a lot of the moment. And
there was the the summit recently and
then San Francisco and data bricks
which is which is box framework shall
we say it's a little bit more money
basically they're gonna bring in in GPU
instances into into data breaks very
soon so this is an spark. And and also
tends to play now stencil frames. And
but this is again different very large
scale distributed data learning and
it's a nice to bring in but and see in
a lot of a spark and a lot of talk
about spark. And in fact I think it was
I forgot his name ten ten point I think
it was so he gave at all kids but this
box Francisco summit. And one point
anyways and certainly made a comment
there's so many questions about dealing
itself that you made the comment you
know is anybody going to questions
about spock. Um because it was really
all about E learning in a more said
that having to actually integrate and
and so what's what's nice places a lot
happening that so just to give a quick
update downloading for time on the on
declining hardware so as you know what
a GT six SGPU tech conference in April
we introduced and Pascal or I miss
codename P one hundred and and this is
then a combination of about three is
with of of investments and and research
and development. And it essentially
packs five roughly five ten or twenty
tera flops in in either double single
half precision and it's actually bill
to go alongside Q eight which isn't yet
fully out yeah uses chip on wafer on
substrates technology so HBM to hype
and with memory starts memory and is
currently the four by four stuck to
memory it will be eight bike four
eventually so that's thirty two gigs on
board. And and Pascal essentially
unifies and they agency a single
package to deliver really unprecedented
compute efficiency about terabyte per
second is approachable by and feeling
technology so this is like a hybrid
I've got a a brief video to show you to
explain it a little bit more and for
scaling across multi GPA with this
basically fivefold increase and the
actual interconnect bandwidth and but a
cost fast hardware is nothing without
the optimised software and so working
alongside DL developers is upset with
essentially brought everything
together. And onto this one card
unfortunately Pascal isn't widely
available right now because we've had
some well the major supercomputer
clients that have you know really make
discarded a premium or hoping by the
end of the year it will be available to
to the wider audience. And in the
meantime we've got matched well and
forty cards the have now twenty four
gigabytes onboard. But Pascal and in a
nutshell is it says here atomic memory
operations really important and in
parallel programming they essentially
allow concurrence rights to correctly
perform reads modify write operations
on shared data structures a gemini
actually out domain in GP directs was
was introduced back you know what and a
consensus as the GK want and so it's at
the moment. We have I I just assume it
is pretty Kepler and says Kepler marks
well and now we have Pascal
architectures. Um and the the idea a
features actually introduce way back
then and and that it essentially allows
third posse devices. So if any bands
for example SS D.'s to directly access
the the memory on multiple GP A.'s
within all within the same system. And
and eliminating unnecessary a memory
copies and and obviously then
dramatically Lois CPU overhead and
decreases latency across the board
which is really vital. So you'll you'll
get these slides anywhere number to hop
onto remotes with them with that but
doubling doubling bandwidth of of GP
direct. Um is that is a just just very
important to many many use cases
especially dealing because deal any
machines I've got a very high ratio of
the GP A.'s to CP use. Um in some cases
eight in some cases hundred and twenty
eight GP use to to the one CPUC is very
important for the GPPC so ways to ray
quickly without falling back. Um in the
to fold that's the CPA so much to give
data transfer so it really does make a
a big difference and these are just
some of the the reasons why
unfortunately Pascal isn't that widely
available at the moment because they're
and the currently powering these corals
summer and sierra so Lawrence Livermore
and that the world essentially aiming
for this axis scale computing. Um scale
now and I think Lawrence Livermore
working in more than about a hundred
passes locks. But this is also IBM
power plus GPP all interconnected with
envy link. So the the P one hundred
just to to go slightly more into it and
HBM two and essentially includes
multiple vertical stocks have if memory
dies it's a passive silicon into poser
that connects the stocks and the GP you
die it connects to full HBM two stacks
it will be eight in the next iteration.
So sixteen gig seem to be and thirty
two and it's two five twelve memory
control is the connect to each of those
HBM two stocks. And which gives you
overall a very effective four thousand
nine six bit HB into memory interface
which basically means it's quick and
and to avoid network scaling issues GP
you compute asks can be can be in now
interrupted at the instruction level.
Um so the context is is essentially
swaps to the GPD one and compute
preemption means that when you have a
long running all pounds badly behaving
applications. And they no longer
monopolise the actual system and
especially if it's sharing the GP for
computation display which a lot of
applications count obvious you can use
the computer on the visualisation with
the same card. And body also permits
interactive people getting of compute
kernels and emotional oh granularity
and then them before and also extends.
So unified virtual memory in page
migration. So this is all Pascal this
is all part of the the actual chip
itself extent in GP addressing
capabilities it's not large enough to
cover forty eight bit virtual address
space is of of modern CP use as well as
the GP user memory obviously. And the
indy five memory allows the program to
access full address spaces. Um of all
the CPZGP using the system as as a
single virtual address space and so
this is to some extent on limited by
the physical member size have you you
know you you effectively have an
unlimited amount of of of memory. Um
I'm gonna run through I don't think you
need a Pol moving swiftly on okay this
allows you to secure just six where
whereabouts is to bring out cuter write
cutest six actually faced introduce
unified memory. Um so is essentially
creating the pool of manage memory to
share between CPU and the GP bridging
the divide. And so this is essentially
so accessible using a single point so
not makes bit more fun so you got
simpler programming and then the memory
models and the unified memory self fly
was the ball the red for entry to run
parallel programming. So then we
introduced a some GTC this year in
april. And EDTX one I believe they
Jensen also gave away kind of them to
pioneering research labs and I believe
stanford's already got less and this is
essentially a hundred and seventy
tariff flops that's FP sixteen and a
bout some engines sixty eight gigabytes
per second aggregate bandwidth which
you could equate to about two hundred
and fifty say this and in the data
centre and I do have an idea. But just
a couple more slices gives you a little
bit more in depth on the on that
apology for the eights there I have
scowls indeed TX one and this case and
a bit more detail on the cube next
apology that with their we've used is
actually no need to use and PCIE for
the GPGPU communications your primary
concerns when when you're ranging PCIE
it's if it's if ever ran fit for this
mess our it's it's maximise host to
device and network bandwidth depending
on the application and and also to to
ensure that the for GPZ each quad so
that's no materials notes assyrian
before to seven zero to quads and to to
ensure that there are and attached to
the same CPA and this gives then
basically logical subdivision if if the
the IGP you keep mesh into these two
quotes. Um and you have and we link
fully featured which gives full
connection for the for the actual for
the P one hundreds in in each of those
two quarts all tied together by for
additional and be links overall this
just a whole lot of interconnect
improvement and gone across the board
to really low what the latency because
the one thing about learning is there
is a lot of data transfers in the in
the training when you actually teaching
these machines and to to learn specific
tasks. And this the the whole whole
little more details online knock and
I'll send you the links to so really
digits one is primarily a unit for HPDA
is it's known course a high performance
dates on the list six and there are a
lot of different businesses and
customers now needing this requirement.
And to work through all the data that
they actually have in house and and to
extract inside gain inside and further
and further in a much deeper than
wherever capable of before so it's
essentially gots seven terabytes of us
to stay on board back and then
obviously be and be attached to a and
then a fair system. And then they the
the typical use cases and saying there

Share this talk: 

Conference program

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
4 July 2016 · 2:01 p.m.
Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
4 July 2016 · 3:20 p.m.
Day 1 - Questions and Answers
4 July 2016 · 4:16 p.m.
Torch 1
Soumith Chintala, Facebook
5 July 2016 · 10:02 a.m.
Torch 2
Soumith Chintala, Facebook
5 July 2016 · 11:21 a.m.
Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
5 July 2016 · 1:59 p.m.
Torch 3
Soumith Chintala, Facebook
5 July 2016 · 3:28 p.m.
Day 2 - Questions and Answers
5 July 2016 · 4:21 p.m.
TensorFlow 1
Mihaela Rosca, Google
6 July 2016 · 10 a.m.
TensorFlow 2
Mihaela Rosca, Google
6 July 2016 · 11:19 a.m.

Recommended talks

ACM-W Athena Lecture: Large-Scale Behavioral Data: Potential and Pitfalls
Susan Dumais, Distinguished Scientist, Microsoft and Deputy Managing Director, Microsoft Research Lab
23 April 2015 · 8:36 a.m.