Embed code
Note: this content has been automatically generated.
00:00:00
Oh so it is wonderful to be a right and
00:00:05
comments is really I don't know how
00:00:06
anybody get any work done. And this
00:00:08
place because it's so amazing outside
00:00:11
first of all and there's a lot of
00:00:12
acronyms hearsay rasta probably I
00:00:14
apologise for that hardware and
00:00:17
software update but I'm actually a deep
00:00:21
landing solutions architects for an
00:00:23
idea hands the LSA actually have the
00:00:26
the longest title that I know about
00:00:29
because I'm also community manager and
00:00:31
and here as France was that is seven
00:00:34
your middle east and Africa but the
00:00:37
learning is everywhere as we know so
00:00:42
that means that I'm sort of luckily it
00:00:44
it an amazing job I get to fly all over
00:00:46
the world but I also get to see the
00:00:48
learning and to advise on the
00:00:52
application of daily living across the
00:00:54
board in every kind of application have
00:00:57
had to really sort of strange
00:00:59
applications but I've also has some
00:01:01
very impassable applications that I I'm
00:01:04
lucky enough to to be involved that and
00:01:07
because it really is everywhere it is
00:01:10
altering the way the companies operate
00:01:12
is altering the way that we provide
00:01:14
services to to our customers and it's
00:01:20
also everywhere in the sense that for
00:01:23
example the cloud. And a large number
00:01:25
of applications a huge number of
00:01:27
startups popping up everywhere and
00:01:30
especially with IWS another acronyms so
00:01:32
some some web services a lot of
00:01:34
startups tried to begin that because
00:01:37
it's just easier to actually implement.
00:01:40
Um and so there's a lot of a do we have
00:01:42
power back and very we have image
00:01:44
speech language sentiments analysis for
00:01:48
example but all in in medicine and
00:01:51
biology that was my initial research
00:01:53
was using D learning to find minute
00:01:56
features and histology images. So
00:01:59
prediction even just a binary
00:02:03
classification of whether or not that
00:02:04
was a presence of of of a a problem
00:02:07
that we could then flag that image to
00:02:09
go up to the human I think they're
00:02:12
really big part of the learning is
00:02:15
human assisted a I So this is ourselves
00:02:20
been augmented by the amazing
00:02:22
capability of of dealing. But it's also
00:02:24
everywhere media and statement "'cause"
00:02:28
as that you actually have showed you
00:02:30
earlier and you can go from just
00:02:33
driving into Pollard say you know being
00:02:35
able to drive down the travelled given
00:02:38
enough data you can agitation machine
00:02:41
I've actually I don't if you can so the
00:02:44
stick I've actually just come back from
00:02:47
starting off week one of the project
00:02:49
would not sat on the set institute. So
00:02:52
they are being very powerful in the
00:02:55
sense of that was a terrible problem
00:02:56
actually by using the learning in them
00:02:59
to detect. But also ultimately to
00:03:03
mitigate asteroids. And potentially
00:03:06
hazardous asteroids so that's probably
00:03:08
the the best application yeah that up
00:03:09
in involved in very exciting autonomous
00:03:13
machines of above text also ramping up
00:03:16
very much and that was that was
00:03:18
actually an internship that I was look
00:03:20
enough to be involved in by turning
00:03:22
actual whereabouts individual robots
00:03:25
putting them into an gangs engine so
00:03:27
you can then do simulations. And it
00:03:30
turns out obviously this is everywhere
00:03:32
that and they D was already involved in
00:03:34
because we grew up around gaming we
00:03:37
grope program gaming environments. So
00:03:39
I'm seeing this alignment alignment
00:03:41
especially things like we are
00:03:42
everything's now coming together and
00:03:45
it's it's it's great to be in this
00:03:47
position price of the to the week over
00:03:50
in then California with not so I was
00:03:52
actually I see "'em" out along with
00:03:53
them a lot of people in this room. And
00:03:56
then just a few of the predictions that
00:03:58
the colleagues and then I came up with
00:04:01
from ICML and also CVPR and that this a
00:04:06
this a lotta conference is going on
00:04:07
about date learning. So an LP and then
00:04:11
know he definitely the best is yet to
00:04:14
come there is a a a lot of work going
00:04:16
on that and you actually show just with
00:04:19
them with textbooks to mow the it is
00:04:22
going to life's gonna get increasingly
00:04:25
easier they yeah the more able to just
00:04:27
speak to the to the gadgets the the
00:04:30
when our carried around with this all
00:04:32
the time. Um we are as I mentioned in
00:04:35
medical but also in drug discovery
00:04:38
meetings and themselves perhaps at some
00:04:40
point it'll be a a virtual me standing
00:04:43
here and I won't have to make five
00:04:44
thousand three hundred mile journey. Um
00:04:48
but it's going to make life a whole lot
00:04:50
easier in that sense but also unseen
00:04:53
mass deployment in the exercise the
00:04:55
financial services industry. And where
00:04:58
we using the the ability dealing in to
00:05:02
take all the data that that we produce
00:05:04
an older historical data that these
00:05:06
financial house is already hold on
00:05:09
those and and gain insight and and that
00:05:12
leads then say better products for us
00:05:14
better services. Um as I mentioned
00:05:17
reinforcement learning with robotics is
00:05:20
huge. And cyber security is really just
00:05:23
getting going now lots of applications
00:05:26
that are for any of you in the audience
00:05:28
particularly interested in and that's
00:05:30
side there's probably a lack of skills
00:05:33
actually but a lot of movement now. And
00:05:36
also extremely low power budget. So
00:05:39
Google some TPU and also in new remote
00:05:42
pickup in following your more fit now
00:05:44
for about two years and it's very
00:05:46
exciting I'm also having discussions
00:05:49
with people about evolutionary
00:05:50
algorithms and although I only did
00:05:52
perhaps a couple of modules and then
00:05:54
you know best about this. It's very
00:05:57
very exciting what you can actually do
00:05:58
we genetic algorithms to speed up even
00:06:01
perhaps find in the end the correct
00:06:03
weights initially. Um and then
00:06:06
releasing straight into the neural
00:06:08
networks themselves frame what
00:06:10
universal isolation this is something
00:06:12
that I'm sure sameness and then can can
00:06:16
go into in that bit more detail
00:06:17
tomorrow but it's it's bringing
00:06:19
together this large scale distributed
00:06:22
the learning and since we have T living
00:06:24
in the cloud this is a very big issue.
00:06:26
And but again we're still working we
00:06:28
still have challenges we're not there
00:06:29
yet. Um and I think it will be just an
00:06:33
evolutionary path and we'll keep on
00:06:35
learning justice evolutionist
00:06:37
unsupervised learning and how do you
00:06:39
teach a machine common sense. It's it's
00:06:43
a big question and nothing will
00:06:45
probably be trying answer that for a
00:06:47
long time coming I'm always that
00:06:49
problem with generally so but I think
00:06:51
you know at the moment this this then
00:06:54
seventy shockwaves from test list
00:06:56
crash. And we have to get over hurdles
00:06:58
like that it wasn't that long ago when
00:07:00
virgin galactic update crash and that's
00:07:04
again sets by things on the space
00:07:05
program space is hard declining is hard
00:07:09
you know we will get that but some that
00:07:11
will be bumps in the road. Um and also
00:07:14
anthropomorphic bias it so that's
00:07:16
close. And that the best way to sort of
00:07:19
illustrate this is Boston dynamics
00:07:20
where am then every single fantastic
00:07:23
video that they share with their am
00:07:24
robots that with the kicking the robot
00:07:27
all that make it run round corners and
00:07:29
throwing banana peels infants them and
00:07:31
it it's that sense that we're dealing
00:07:33
with a I would in with their robotics
00:07:35
with you know perhaps human form with a
00:07:38
I And we think of them as as humans
00:07:41
that tends to put in and serious prices
00:07:44
in the in the theoretical side of
00:07:46
things as well as you know the obvious
00:07:48
facts it. And and up the singularities
00:07:51
well I'm not sure about twenty twenty
00:07:53
five this is the the latest but it
00:07:56
certainly makes for a very challenging
00:07:58
and exciting failed so we all know
00:08:02
about some hours ago and and the the
00:08:05
only reason I'm putting the slide up
00:08:06
everybody's heard about this but it is
00:08:08
because of the actual impressive
00:08:10
progress that we're making but also
00:08:13
using an ensemble of in this case
00:08:17
convolutional neural networks and
00:08:19
reinforcement learning and you want to
00:08:21
call it research because the thing
00:08:23
about go is that we we did think it was
00:08:26
an intractable problem. And for
00:08:29
computers and I actually it is and so
00:08:31
you actually cut. And make it make a
00:08:34
branch of the actual tree and then this
00:08:36
is the months call it research there's
00:08:38
ways. So overcome challenges and I
00:08:40
think that's what we're really good at
00:08:41
humans. So it's you you tend to forget
00:08:46
that artificial intelligence is created
00:08:49
by bios. So the interesting thing
00:08:53
without for go is putting all those
00:08:54
things together. And and being able to
00:08:57
use a a right to teach a I to teachers
00:09:01
all that together again brings brings
00:09:03
us back to the to the human assisted
00:09:05
signs. And you know so far is match for
00:09:09
and see dolls move seventy eight and he
00:09:14
knew that I I would be tricked by
00:09:16
plagiarism or copying an actual
00:09:19
strategy. So and video started out in
00:09:25
the gaming will and hardware and GPS
00:09:29
and then we realise that within needed
00:09:31
to provide the software for it and then
00:09:33
we were hit just as everybody else was
00:09:36
with this T learning explosion. And so
00:09:38
we have to become a platform company
00:09:40
that enabled everyone and what we did
00:09:42
was we works and still work very
00:09:45
closely with everybody in the field. Um
00:09:48
and we try to align everything away
00:09:50
doing and we talk and we work alongside
00:09:52
research is to make sure that what
00:09:54
we're doing fits with what you're
00:09:56
doing. So we're all in alignment. Um
00:10:00
basically we have to just do our best
00:10:02
to keep up with this revelation and
00:10:04
it's it's also part of my job to to to
00:10:07
do that to keep up with the field and
00:10:09
it really is unprecedented. Um I think
00:10:11
was February twenty fifteen when I
00:10:14
first downloaded tools. And then place
00:10:17
box F beacon was was released. And we
00:10:21
were QD and then one. QT and then is a
00:10:24
library of accelerated primitives. And
00:10:26
we're now five actually five point one
00:10:28
I'm gonna give you know they are not
00:10:30
later and it really is unprecedented in
00:10:32
in the software will just how quickly
00:10:35
all this is something and for those of
00:10:36
you who are inactive deal in the
00:10:38
research and so you'll agree with me.
00:10:40
So we had to transition to becoming a
00:10:43
deal in income platform company. And so
00:10:46
how about is on a simplistic level is
00:10:50
hardware along the bottom. So the slide
00:10:54
probably needs updating now with the
00:10:56
latest egypt's one but we've got are
00:10:58
actual data box here which is still a
00:11:02
very popular unit because it's a
00:11:04
literally plug in and play appliance
00:11:07
and that has full time next is in that
00:11:10
and test la is our production so these
00:11:12
are highly qualified well capable for
00:11:16
the intense training. That's needed.
00:11:19
And test there is a deployment side
00:11:22
where is you know you could very simply
00:11:24
use a consumer called to do testing we
00:11:28
also have job yes which is the
00:11:30
automotive side we now have an entire
00:11:32
you know side of the business based
00:11:33
events it's a self driving cars and
00:11:36
automotive and also are embedded side
00:11:38
and I'll give a few more slides on that
00:11:41
later on top of that lies the untidy
00:11:44
Lenin SDK but that comes with
00:11:46
everything else that embody does which
00:11:48
is graphics and and and a whole lot
00:11:51
more and then of course we have all the
00:11:53
frameworks and they're very very very
00:11:55
many of them. And these all provide
00:11:56
building blocks that's essentially make
00:11:59
it easier for you. Um it's it means
00:12:02
that you don't have to be in engine
00:12:04
coda it it's literally taking parts of
00:12:08
building layers and neural networks and
00:12:12
just calling functions it really just
00:12:14
make things a lot simpler. So that's
00:12:16
just a basic overview of the the
00:12:18
platform itself but the the and tie
00:12:21
LSTK is it shows here. And includes all
00:12:24
game development virtual reality. Um
00:12:28
autonomous machines did learning so
00:12:31
there's a there's a whole lot of
00:12:32
expertise that kate's expertise that
00:12:34
have actually gone into this as well as
00:12:36
the specialist E learning STK and
00:12:38
that's all online that you can just
00:12:40
access in one fills with no with the
00:12:43
the and the DOSTK all in one place so
00:12:49
powerful tools and libraries for
00:12:51
accelerates in pretty much every part
00:12:53
of fifty learning. And and that's the
00:12:56
this is a a collaborative effort with
00:12:58
them would leading research and and
00:13:00
commercial organisations and worldwide
00:13:03
and and they depend on this library
00:13:07
they add to this library and they were
00:13:10
with us and to make sure that every
00:13:12
single part of that is seven is the
00:13:14
best that it can be and is able to
00:13:16
adapt very quickly to all changes in
00:13:19
and hardware which is a slightly
00:13:21
slower. Um pipeline than the actual
00:13:24
software but some it'll has to line and
00:13:27
has to align very quickly and rapidly
00:13:30
with the the itchy to progress this
00:13:32
this going on indy learning. Um so
00:13:36
every single major framework it it has
00:13:41
to be said that and they raise and run
00:13:43
by humans and unfortunately we need to
00:13:46
sleep and an eight and there are only a
00:13:48
finite number of actual software
00:13:50
developers so we can't cover every
00:13:53
single framework there are forty fifty
00:13:56
of them. And that comes down to
00:13:58
basically the fact that there are very
00:13:59
many and programming languages there
00:14:02
are very very many applications but we
00:14:05
do we we have started with with cafe
00:14:08
only now cover. Um a lot of the
00:14:10
majority of this and the central
00:14:12
section here that you can say and
00:14:14
across the board for as many different
00:14:18
applications as possible we do need to
00:14:20
catch up a little on them on speech but
00:14:23
along the button STK includes a whole
00:14:27
variety things I'm actually gonna go
00:14:28
into slightly more detail on but it it
00:14:34
is vital and that the majority this
00:14:36
work is performed in parallel obviously
00:14:40
training is you know is a huge
00:14:41
computational challenge and getting out
00:14:44
in hotter the more data were able to
00:14:46
actually throw into as as you're sure I
00:14:49
like to yeah but the yes TK brings in
00:14:54
all those already accelerated and parts
00:14:59
of the code that you actually need to
00:15:01
to make life a lot easier for you. So
00:15:06
developing come applications the
00:15:10
benefit from printing learning and is
00:15:13
slightly different from traditional
00:15:15
software development. So software
00:15:18
engineers have to carefully craft. And
00:15:21
lots and lots of source code and to
00:15:23
cover every possible imported the
00:15:24
application itself. And can receive and
00:15:28
at the core of the D learning
00:15:29
application much of that source code is
00:15:32
replaced by obvious of the neural
00:15:33
network. And and to build a dealing
00:15:36
application and you would first design
00:15:39
trying validate a neural network to
00:15:42
perform a specific task. And then that
00:15:45
could be anything identifying vehicles
00:15:48
and image Reading a speed limit sign
00:15:51
for example as the cars whizzing past
00:15:54
and translating English to chinese. And
00:15:57
so we also have digits which and this
00:16:02
actually came about from C yes. Um I
00:16:05
think in twenty fifteen we're actually
00:16:08
working with them a self driving car
00:16:11
demo. And it was my colleague look you
00:16:15
developed and is still developing on
00:16:17
it. So it's actually the the progress
00:16:20
on it is is slow in the sense that
00:16:21
actually can and then a small team. But
00:16:25
it is highly yeah I mean it's very
00:16:28
visual so it it's it does help you in
00:16:31
the fact that if you for example start
00:16:34
just starting out or whether you need
00:16:36
something where you get a lot more
00:16:37
visual to it. And have a lot more
00:16:39
visual input. And it provides a
00:16:42
graphical work flow based environment
00:16:45
essentially and it's just a really good
00:16:48
gooey it allows you to to designed to
00:16:52
train to validate the actual network
00:16:55
obviously there is no one size fits all
00:16:57
you'll net where you you have to
00:16:59
develop it run your own application
00:17:00
around you and motive of input data.
00:17:04
And and what we've also developed and
00:17:07
I've got a couple slides on this is a
00:17:09
GISSGP inference engine and this is
00:17:13
part of the actual work flow itself and
00:17:16
within dealing dealing that you would
00:17:18
go from digits to create the actual no
00:17:21
network and then find out in to the
00:17:23
actual inference side to provide even
00:17:25
more performance so the train neural
00:35:37
memory it will be eight bike four
00:35:40
eventually so that's thirty two gigs on
00:35:43
board. And and Pascal essentially
00:35:46
unifies and they agency a single
00:35:48
package to deliver really unprecedented
00:35:51
compute efficiency about terabyte per
00:35:55
second is approachable by and feeling
00:35:59
technology so this is like a hybrid
00:36:01
I've got a a brief video to show you to
00:36:04
explain it a little bit more and for
00:36:07
scaling across multi GPA with this
00:36:10
basically fivefold increase and the
00:36:13
actual interconnect bandwidth and but a
00:36:15
cost fast hardware is nothing without
00:36:18
the optimised software and so working
00:36:21
alongside DL developers is upset with
00:36:24
essentially brought everything
00:36:25
together. And onto this one card
00:36:28
unfortunately Pascal isn't widely
00:36:34
available right now because we've had
00:36:36
some well the major supercomputer
00:36:38
clients that have you know really make
00:36:40
discarded a premium or hoping by the
00:36:43
end of the year it will be available to
00:36:45
to the wider audience. And in the
00:36:48
meantime we've got matched well and
00:36:50
forty cards the have now twenty four
00:36:52
gigabytes onboard. But Pascal and in a
00:36:56
nutshell is it says here atomic memory
00:36:59
operations really important and in
00:37:01
parallel programming they essentially
00:37:03
allow concurrence rights to correctly
00:37:07
perform reads modify write operations
00:37:10
on shared data structures a gemini
00:37:14
actually out domain in GP directs was
00:37:17
was introduced back you know what and a
00:37:20
consensus as the GK want and so it's at
00:37:23
the moment. We have I I just assume it
00:37:27
is pretty Kepler and says Kepler marks
00:37:30
well and now we have Pascal
00:37:32
architectures. Um and the the idea a
00:37:36
features actually introduce way back
00:37:37
then and and that it essentially allows
00:37:40
third posse devices. So if any bands
00:37:43
for example SS D.'s to directly access
00:37:47
the the memory on multiple GP A.'s
00:37:49
within all within the same system. And
00:37:52
and eliminating unnecessary a memory
00:37:54
copies and and obviously then
00:37:57
dramatically Lois CPU overhead and
00:38:00
decreases latency across the board
00:38:03
which is really vital. So you'll you'll
00:38:07
get these slides anywhere number to hop
00:38:09
onto remotes with them with that but
00:38:11
doubling doubling bandwidth of of GP
00:38:15
direct. Um is that is a just just very
00:38:18
important to many many use cases
00:38:20
especially dealing because deal any
00:38:24
machines I've got a very high ratio of
00:38:26
the GP A.'s to CP use. Um in some cases
00:38:30
eight in some cases hundred and twenty
00:38:32
eight GP use to to the one CPUC is very
00:38:36
important for the GPPC so ways to ray
00:38:38
quickly without falling back. Um in the
00:38:42
to fold that's the CPA so much to give
00:38:45
data transfer so it really does make a
00:38:47
a big difference and these are just
00:38:51
some of the the reasons why
00:38:52
unfortunately Pascal isn't that widely
00:38:54
available at the moment because they're
00:38:57
and the currently powering these corals
00:39:00
summer and sierra so Lawrence Livermore
00:39:04
and that the world essentially aiming
00:39:07
for this axis scale computing. Um scale
00:39:11
now and I think Lawrence Livermore
00:39:13
working in more than about a hundred
00:39:14
passes locks. But this is also IBM
00:39:18
power plus GPP all interconnected with
00:39:21
envy link. So the the P one hundred
00:39:25
just to to go slightly more into it and
00:39:28
HBM two and essentially includes
00:39:31
multiple vertical stocks have if memory
00:39:36
dies it's a passive silicon into poser
00:39:39
that connects the stocks and the GP you
00:39:42
die it connects to full HBM two stacks
00:39:47
it will be eight in the next iteration.
00:39:51
So sixteen gig seem to be and thirty
00:39:55
two and it's two five twelve memory
00:39:57
control is the connect to each of those
00:40:00
HBM two stocks. And which gives you
00:40:03
overall a very effective four thousand
00:40:05
nine six bit HB into memory interface
00:40:09
which basically means it's quick and
00:40:12
and to avoid network scaling issues GP
00:40:16
you compute asks can be can be in now
00:40:19
interrupted at the instruction level.
00:40:21
Um so the context is is essentially
00:40:25
swaps to the GPD one and compute
00:40:28
preemption means that when you have a
00:40:30
long running all pounds badly behaving
00:40:32
applications. And they no longer
00:40:36
monopolise the actual system and
00:40:38
especially if it's sharing the GP for
00:40:41
computation display which a lot of
00:40:43
applications count obvious you can use
00:40:45
the computer on the visualisation with
00:40:47
the same card. And body also permits
00:40:49
interactive people getting of compute
00:40:52
kernels and emotional oh granularity
00:40:54
and then them before and also extends.
00:41:00
So unified virtual memory in page
00:41:02
migration. So this is all Pascal this
00:41:04
is all part of the the actual chip
00:41:06
itself extent in GP addressing
00:41:09
capabilities it's not large enough to
00:41:13
cover forty eight bit virtual address
00:41:16
space is of of modern CP use as well as
00:41:18
the GP user memory obviously. And the
00:41:21
indy five memory allows the program to
00:41:23
access full address spaces. Um of all
00:41:26
the CPZGP using the system as as a
00:41:28
single virtual address space and so
00:41:31
this is to some extent on limited by
00:41:34
the physical member size have you you
00:41:37
know you you effectively have an
00:41:40
unlimited amount of of of memory. Um
00:41:43
I'm gonna run through I don't think you
00:41:48
need a Pol moving swiftly on okay this
00:41:52
allows you to secure just six where
00:41:55
whereabouts is to bring out cuter write
00:41:58
cutest six actually faced introduce
00:42:00
unified memory. Um so is essentially
00:42:02
creating the pool of manage memory to
00:42:05
share between CPU and the GP bridging
00:42:09
the divide. And so this is essentially
00:42:14
so accessible using a single point so
00:42:18
not makes bit more fun so you got
00:42:20
simpler programming and then the memory
00:42:23
models and the unified memory self fly
00:42:26
was the ball the red for entry to run
00:42:29
parallel programming. So then we
00:42:34
introduced a some GTC this year in
00:42:38
april. And EDTX one I believe they
00:42:43
Jensen also gave away kind of them to
00:42:48
pioneering research labs and I believe
00:42:50
stanford's already got less and this is
00:42:53
essentially a hundred and seventy
00:42:55
tariff flops that's FP sixteen and a
00:42:58
bout some engines sixty eight gigabytes
00:43:00
per second aggregate bandwidth which
00:43:04
you could equate to about two hundred
00:43:06
and fifty say this and in the data
00:43:10
centre and I do have an idea. But just
00:43:14
a couple more slices gives you a little
00:43:16
bit more in depth on the on that
00:43:18
apology for the eights there I have
00:43:21
scowls indeed TX one and this case and
00:43:24
a bit more detail on the cube next
00:43:25
apology that with their we've used is
00:43:27
actually no need to use and PCIE for
00:43:31
the GPGPU communications your primary
00:43:36
concerns when when you're ranging PCIE
00:43:39
it's if it's if ever ran fit for this
00:43:41
mess our it's it's maximise host to
00:43:44
device and network bandwidth depending
00:43:48
on the application and and also to to
00:43:51
ensure that the for GPZ each quad so
00:43:55
that's no materials notes assyrian
00:43:58
before to seven zero to quads and to to
00:44:01
ensure that there are and attached to
00:44:04
the same CPA and this gives then
00:44:06
basically logical subdivision if if the
00:44:09
the IGP you keep mesh into these two
00:44:12
quotes. Um and you have and we link
00:44:17
fully featured which gives full
00:44:20
connection for the for the actual for
00:44:22
the P one hundreds in in each of those
00:44:25
two quarts all tied together by for
00:44:28
additional and be links overall this
00:44:32
just a whole lot of interconnect
00:44:34
improvement and gone across the board
00:44:36
to really low what the latency because
00:44:38
the one thing about learning is there
00:44:40
is a lot of data transfers in the in
00:44:42
the training when you actually teaching
00:44:45
these machines and to to learn specific
00:44:47
tasks. And this the the whole whole
00:44:50
little more details online knock and
00:44:52
I'll send you the links to so really
00:44:56
digits one is primarily a unit for HPDA
00:45:03
is it's known course a high performance
00:45:04
dates on the list six and there are a
00:45:06
lot of different businesses and
00:45:09
customers now needing this requirement.
00:45:12
And to work through all the data that
00:45:14
they actually have in house and and to
00:45:17
extract inside gain inside and further
00:45:20
and further in a much deeper than
00:45:21
wherever capable of before so it's
00:45:25
essentially gots seven terabytes of us
00:45:28
to stay on board back and then
00:45:30
obviously be and be attached to a and
00:45:33
then a fair system. And then they the
00:45:35
the typical use cases and saying there
00:45:38
are as a set financial services
00:45:41
industry so I have to see players also
00:45:44
media and entertainment and all the all
00:45:48
the Telecoms very interested in this
00:45:50
now. And and also medical applications
00:45:53
and Siemens for example and I will have
00:45:58
a video coming in minutes but I
00:46:02
hopefully will work but Q did I I is
00:46:05
designed specifically for pascal. I
00:46:09
mean this is essentially everything as
00:46:11
it says that you need to accelerate as
00:46:12
applications. Um and I'm not sure if
00:46:19
this is going to okay trying to run
00:46:28
this but it is not want see okay so can
00:46:38
one okay so just to give you a bit more
00:46:52
insight and into how did yes obviously
00:46:55
this is a server itself but housing
00:46:58
eight task of on board and it will
00:47:07
basically give you three the last where
00:47:10
we three times higher where loads with
00:47:12
the HBM to start memory and with the
00:47:16
pipe the page migration and and be link
00:47:19
you have this virtually unlimited
00:47:21
memory space. So that's highlighting
00:47:31
then shook you mash and then and if any
00:47:36
band itself. So the unit itself is a
00:47:44
sex about it it actually has full power
00:47:46
supplies itself with sixteen hundred
00:47:49
Watts each. So it's quite hefty machine
00:47:51
but is is three you said so and what
00:47:59
we've done is it's not just a hardly
00:48:01
unit we also use contain isolation to
00:48:05
include every piece of software that
00:48:08
you would effectively need and that
00:48:11
obviously comes with. Q site. Um which
00:48:15
is everything on board that you
00:48:17
actually need but also let me see okay
00:48:22
so optimise the the Pascal unified
00:48:25
memories of mentioned all the libraries
00:48:27
that you'd actually need also includes
00:48:28
and graph as a mentioned earlier and
00:48:31
every all the the basics such as Q
00:48:35
glass and Q spouse. And any developer
00:48:38
tools that you'd actually need and the
00:48:40
way we do that is by using and doctor
00:48:43
or actually off and fork of doctors and
00:48:46
read a cat. And and initially it would
00:48:51
be digits. QD and then obviously cuter.
00:48:54
And and including fee on a torch. And
00:48:58
we cafe so I saw the focused cafe the
00:49:00
reason we have our own focus because we
00:49:02
we do software development a whole lot
00:49:04
quicker than the than the main branch
00:49:06
of cafe you then will integrate. And
00:49:09
but we have our own initially tends to
00:49:11
flow C and CK calgary carrots and chain
00:49:15
and makes net so we incorporate a lot
00:49:17
of the different frameworks in that in
00:49:20
contain the based and in in contains
00:49:23
themselves but because it's containers
00:49:25
you can essentially just take a box and
00:49:27
say I'd like this stuff like this and
00:49:29
like this and then it all just goes on
00:49:31
on board as a bespoke system for you.
00:49:35
And and the obviously this is a very
00:49:37
secure as it secure as Klaus software
00:49:40
is today. Um and read doctor repository
00:49:43
which you manage access for and when we
00:49:45
have updates we essentially send you an
00:49:47
email and you click and download. And
00:49:51
they were also wondering again some of
00:49:52
our other and video software things
00:49:55
like hire a which is a rendering. And
00:49:58
and you know this is a local products
00:49:59
we for graphics themselves and all
00:50:02
those will be incorporated should
00:50:04
customers need them and and you get the
00:50:06
option of whether or not you want the
00:50:07
cloud based maintain supported version
00:50:10
or you can just have a standalone I
00:50:12
west bags. Um we down and and and it's
00:50:15
yours to run within you do you are you
00:50:17
your own update six extra. So
00:50:19
essentially you would you would log
00:50:20
into and area called computed and T
00:50:22
don't common you manage all you
00:50:25
software that waste you know no no
00:50:27
longer need to mess about with all the
00:50:29
different dependence is the required
00:50:31
for the very specific the learning
00:50:34
software that are out there and the the
00:50:39
the hosting site just tend to be quite
00:50:41
popular. So that stuff that software
00:50:43
stack also comes with them very simple
00:50:45
packages. And to and to include. So as
00:50:51
a set it obviously you go cheap you
00:50:52
optimise limits you've got the drivers
00:50:55
themselves and we don't have then
00:50:56
you've call software which again is all
00:50:59
optimise digits. And the actual
00:51:03
frameworks themselves that are again
00:51:05
all GP optimised. So things are just
00:51:07
speeding up and speeding up and
00:51:09
speeding up. Um two separate modes for
00:51:12
the actual software side is I said a
00:51:15
moment wellness too much because I
00:51:18
think you're more interested in the
00:51:20
research than the actual purchase but
00:51:23
essentially that area that I mentioned
00:51:25
the container hosting allows. And
00:51:28
things like scheduling locally using
00:51:29
sauces the the scheduler. And clap
00:51:32
close to management and and this is
00:51:34
where we can handle all the updates and
00:51:36
security and there is a lot of updates.
00:51:39
And I I know I I'm watching this real
00:51:41
progress and it really is incredible
00:51:43
just just how quick things a working so
00:51:46
it is a way to handle the updates and
00:51:48
this is probably the best way I'm
00:51:50
seeing a lot of people using things
00:51:53
like doc to to make and even even
00:51:56
creating yeah elements to run groups
00:51:58
and separate environments where perhaps
00:52:00
one of you is is working on cue to six
00:52:03
environment with the various things and
00:52:05
any also that and somebody else is
00:52:07
testing on Cuba right it's a very
00:52:09
simple way of doing it. And in the
00:52:11
meantime that S the cards I still
00:52:14
available. So these provide the mixed
00:52:17
precision and are still very much in
00:52:21
demand and for the the accuracy side of
00:52:24
of mix precision of and double
00:52:27
precision and again they will be around
00:52:30
for quite a long time that they're not
00:52:32
going anywhere because it's really
00:52:35
about increasing productivity. And
00:52:38
there is also something that you you
00:52:40
need to take into account the whole
00:52:42
world out that and have business and
00:52:45
and injustice actually trying to catch
00:52:47
up even with these cells whatever level
00:52:50
of the landing on the standing in
00:52:53
dollars that you're actually that's
00:52:55
right now personally everybody in this
00:52:57
room you are probably two or three
00:52:59
steps ahead of the law business out
00:53:01
there. So we're like it's all about
00:53:03
increasing the actual productivity that
00:53:05
we have to do. It's about jumping on
00:53:07
this bullet train is D learning and and
00:53:10
actually holding on and and keeping up
00:53:12
with it so it it is a big thing to turn
00:53:14
off. And simulations one on CPU to
00:53:16
today I mean that you essentially
00:53:19
especially in and big Gates sensor
00:53:21
wheels. And it can mean days days two
00:53:24
weeks and you generally don't have the
00:53:26
visualisation capability doing that. If
00:53:28
you switching to GPU you actually bring
00:53:31
that some been that simulation down to
00:17:29
networks and essentially integrated
00:17:31
into and a software application. And
00:17:34
the feature new inputs to underlies all
00:17:36
in the as an inference on based on the
00:17:39
actual training. And and they can then
00:17:41
be deployed either on file service on a
00:17:43
mobile phone or wherever you working
00:17:46
drawings automobiles et cetera. So the
00:17:52
the amount of time and how well they
00:17:54
actually takes to complete inference
00:17:56
tasks. And is the most important
00:17:59
consideration there is a whole lot of
00:18:02
where that goes into the D learning
00:18:04
side in the training side but
00:18:06
ultimately what we're trying to do is
00:18:09
provide a solution to customers so
00:18:13
that's the demo that was shown where
00:18:16
music and then potentially ask
00:18:18
questions and get feedback. And the
00:18:21
inference is a really important part
00:18:22
you you have to understand that
00:18:24
research is one thing but that then has
00:18:26
to translate into the commercial world
00:18:28
and and you know be product eyes and
00:18:30
hands the amount of stuff to stand
00:18:32
bodies. And seeing at the moment and
00:18:35
also provides support for and it
00:18:38
determines both the quality of the if
00:18:40
the user experience on the cost of
00:18:42
deploying the application how much
00:18:44
performance you can actually get from
00:18:45
nine inference side. And so have an
00:18:47
energy efficient and height certainly
00:18:49
high throughput application is
00:18:51
absolutely vital and so we've been
00:18:55
listening to the people that we've been
00:18:56
working with and then therefore put
00:18:58
together. GIE to to to help so this
00:19:01
essentially automatically optimise is
00:19:03
the train networks that that you have
00:19:05
for one time performance. Um we have
00:19:09
figures the sixteen times more
00:19:11
performance walked on the on I'm for
00:19:14
I'll explain what the M four is like a
00:19:17
but this is a specific artist to about
00:19:20
the third of the size of the and max
00:19:23
well forty card specifically for
00:19:26
inference because it's only five of
00:19:28
between fifty and seventy five Watts to
00:19:30
actually power as opposed to about two
00:19:32
hundred plus for average cards. And
00:19:36
clap providers are therefore able to
00:19:37
more efficiently process images and
00:19:39
video in hike to scale data centre
00:19:42
environments which is one of the major
00:19:44
part of of what we operate what we do.
00:19:46
And but also when you think of the
00:19:48
amount of work that's now going into
00:19:49
automotive and embedded deployments
00:19:52
it's really essential to have a a
00:19:53
powerful high performance but low
00:19:55
power. And way to to to do inference
00:19:59
maximising throughput which is more
00:20:02
than often the bottleneck in then any
00:20:06
pipeline but also we have to train with
00:20:08
popular frameworks and deploying and to
00:20:11
production level. And and having more
00:20:15
accurate models but with lower latency
00:20:18
and so leveraging can FP sixteen and
00:20:22
being able to get those responses fast
00:20:25
and nobody wants an application on a
00:20:27
mobile phone that is really cool but
00:20:29
takes three four seconds to respond see
00:20:32
that that sounds hardly anything but
00:20:34
when you actually want you know thing
00:20:36
could be self only using you phone you
00:20:37
don't wanna be waiting for for that
00:20:39
long. So hence we have gee I so the
00:20:44
declining software itself digits is is
00:20:47
already now and it's direction number
00:20:49
four again fairly unprecedented we have
00:20:51
GIE and and QD and then as I mentioned
00:20:54
this this is cuter. Um fatigue neural
00:20:57
networks does anybody not know what you
00:20:59
as okay I'm just checking. And so cute
00:21:05
have thirty neural networks would now
00:21:06
it at five point one and and five point
00:21:09
one is is basically additional and
00:21:11
performance of the resin that's
00:21:13
residual net and also VGG style
00:21:16
networks. Um so digits full is bringing
00:21:23
in an additional so you can essentially
00:21:28
perform height promises sweeps and
00:21:31
it'll only make and then batch sizes
00:21:33
for improved accuracy and police also
00:21:36
bringing in an object detection. Um and
00:21:41
this is to work in line in the pipeline
00:21:44
with GIE but also GRE sorry about all
00:21:48
the acronyms so that's OGPS engine. Um
00:21:52
which I'm not going to provide too much
00:21:54
detail on there but again it's just
00:21:55
another tool to to a deployment. And
00:21:58
and you essentially are taken data
00:22:00
management and the options are there in
00:22:03
in digits to makes things simple for
00:22:04
you. And into digits and then straight
00:22:07
into into deployment and and it's so
00:22:09
Justin and to end fullest hopefully
00:22:12
seamless production line. And creating
00:22:15
models network design and is the full
00:22:17
visualisation that you actually get as
00:22:19
it says he can process you can
00:22:21
configure monitor the progress and
00:22:23
especially with multi GPA and when to
00:22:26
jesus first brought out of the way it
00:22:27
was a single. GPU in very quickly then
00:22:30
went to multi GPA which is why we
00:22:33
brought out the the digits that box
00:22:35
which had for time next is so it was
00:22:37
designed to go for four with the latest
00:22:40
iteration the did yes one we're now
00:22:42
looking at eight and many people are
00:22:44
using and multiple so on June twenty
00:22:47
eight and plus GP use and you have the
00:22:51
capability to to stop start and just do
00:22:58
multiple jobs and and in training and
00:23:00
and keep an eye on on what's going on
00:23:02
in your prototyping. Um so the said it
00:23:07
was is bringing in and be able to track
00:23:10
datasets results train neural networks
00:23:13
you can analyse yeah the accuracy and
00:23:16
the loss in real time all while you're
00:23:18
actually putting together the networks
00:23:20
the using and objects text is a brownie
00:23:25
thing we would brought in in time for
00:23:29
ICML and this slide is showing just a
00:23:32
few the applications we've been working
00:23:34
with cats prefer quite a long time to
00:23:36
cast peered round manufacture. And the
00:23:39
the top left is indicating a commercial
00:23:45
use that we're working on the way would
00:23:47
essentially using the aerial data so
00:23:49
that clients could could use these
00:23:52
trains to calculate distances sizes in
00:23:55
volumes and and essentially be able to
00:23:59
classify the since types of vehicles
00:24:01
that would driving around the actual
00:24:03
the quarry area and and that but also
00:24:07
you can you can see here would doing a
00:24:09
whole lot of where with intelligent
00:24:10
video on the left X and surveillance
00:24:13
and we were with them a lot of people
00:24:16
her to security one of the big names
00:24:20
for and tie a crowd surveillance and
00:24:23
personally involved with London
00:24:25
operation cold. And face what which is
00:24:28
a bit of a take on face book but
00:24:31
essentially what they're doing is they
00:24:33
using intelligent video analysis to and
00:24:37
and that's incorporated into existing
00:24:39
infrastructure of cameras throughout
00:24:41
the actual city I'm working with them
00:24:42
at least. So that the second the a
00:24:46
crime takes place people can report it
00:24:48
on a mobile phone out and then that
00:24:50
gets a bloated straight away you have
00:24:52
instant facial detection going from the
00:24:55
clouds in the actual database of
00:24:57
criminals and and it's it's really
00:24:59
speeding things up it's also things
00:25:01
like safety on platforms and train
00:25:04
platforms and a host of other things
00:25:07
including the medical diagnostics. So
00:25:10
for example G healthcare as and new
00:25:13
virus works is just an excellent
00:25:15
diagnostic tool to help diagnose heart
00:25:18
disease and this many many all the
00:25:19
similar approaches and the medical will
00:25:22
Q to five point one okay so this is now
00:25:27
introducing. And the the the the point
00:25:30
one is to add actually support for
00:25:32
resonance and VGG book you to five
00:25:34
itself brought in support for LSTM and
00:25:38
single gated are intense us a recurrent
00:25:41
neural networks and also cases
00:25:43
recurrent neural networks and police
00:25:45
actually optimise for Pascal which is a
00:25:47
latest chip architecture enough is FP
00:25:51
sixteen so that's a little bit
00:25:52
precision which means little memory
00:25:55
usage that's increase performance and
00:25:58
for popular compilations like three by
00:26:00
three filters the the performances
00:26:04
three D compilations themselves is also
00:26:07
very much improved with the libraries
00:26:09
think you D and then five so that is
00:26:12
pretty much salaries in any domain
00:26:13
using volumetric data which includes a
00:26:16
lot of medical applications so great
00:26:18
move forward. And this essentially
00:26:20
faster forward and backward
00:26:22
compilations and also support for when
00:26:25
the ground which is a very optimised
00:26:28
algorithm yeah I've mention the the VGG
00:26:33
and and resonant so that's the point
00:26:35
one side which effectively is when we
00:26:38
look at five point nine times speed on
00:26:40
that but the the five point one also
00:26:43
gives almost another another triple
00:26:46
faster training on and forty this is
00:26:50
the thing those numbers of verses a
00:26:51
cute indentation full okay this you'll
00:26:59
get a copy of this live so don't space
00:27:01
to memorise this but this and this
00:27:02
great we we have a an online block or
00:27:05
parallel for all which my colleague
00:27:09
insists has to be that of the highest
00:27:11
technical content so it's it's nobler
00:27:14
but going if you can actually check out
00:27:15
that some that URL that for optimising
00:27:18
recurrent neural networks with them
00:27:21
with five point want to give you an a
00:27:23
bit more explanation boats as an it
00:27:26
situation and of of a given layer only
00:27:30
depends on its operation and minus one
00:27:33
I thought laymen in recurrent. And
00:27:36
iteration and of the previous layer is
00:27:37
therefore possible to start on layers
00:27:40
before you've even finished on the
00:27:41
previous like this is actually a really
00:27:43
powerful thing when you when you talk
00:27:46
about using GP use if there are two
00:27:48
places essentially twice as much
00:27:50
parallelism available and easily
00:27:52
updates that the colleagues working on
00:27:55
with QTN insofar as using GP use and to
00:27:59
get the best performance out of
00:28:01
recurrent neural networks and which
00:28:03
which often have to expose much more
00:28:05
parallelism. And so that's the the the
00:28:08
work that would been working on with
00:28:09
kitchen and five I'm Nicole is is a a
00:28:15
brownie package it's it's out there and
00:28:17
get hope and feel free to to use it is
00:28:19
it is it a work in progress but it is
00:28:24
essential for anyone who's more
00:28:26
familiar with NPI as opposed to cuter.
00:28:29
So this is as it says that patent after
00:28:32
and all that collectives and
00:28:35
accelerating specifically for multi GPU
00:28:37
meets that need means that you have to
00:28:40
work on on that data flow and bringing
00:28:43
down like since is so that's really
00:28:44
good again open source some get hope
00:28:47
you can go and check that out and graph
00:28:50
no this is set to actually it's
00:28:53
actually available too early access and
00:28:58
I don't have everybody's aware this a
00:29:01
provisions of the start but if you
00:29:03
apply to develop a dot video dot com we
00:29:06
can provide any access to a lot of the
00:29:08
software that we push out see them part
00:29:10
of the process in testing and before is
00:29:13
actually sent out to production so you
00:29:16
can either get in touch with me
00:29:17
literally just go online to develop a
00:29:19
dot until you don't come for that and
00:29:21
and we graph I believe is is still
00:29:23
really access at the moment but and
00:29:25
this is very much needed in the sense
00:29:27
that as we get more more data graph big
00:29:31
graph until like six is becoming a huge
00:29:33
deal and we need to be able to cope
00:29:36
with that with GP use at the same time
00:29:39
I I believe some of the figures
00:29:41
favourite two and a half billion edges
00:29:43
being able to be competed on a Maxwell
00:29:49
forties and salsa about fifty
00:29:51
milliseconds and iteration. So you can
00:29:54
get a hold of that it's violently
00:29:57
access of belief but again another
00:30:00
great piece of work Q spa so this is
00:30:03
primarily and then an LP natural
00:30:07
language processing that's and the U be
00:30:10
using Q sparse again that's that you
00:30:12
can just look it up develop it on the D
00:30:15
dot com Q box and for it's essentially
00:30:19
takes dance makes sees by spots buttons
00:30:22
okay I see see this is this is really
00:30:26
more for sort of large industry where
00:30:28
you have like a C code that you
00:30:30
actually cannot just just take and we
00:30:33
paralyse the the whole lot so you can
00:30:35
actually take sections of the of the
00:30:38
code and and simply use he's proximate
00:30:42
to surround and paralyse certain
00:30:44
sections of the code this is actually
00:30:46
really really useful and there's a
00:30:50
whole lot of people putting this in
00:30:53
play to actually speed up parts of code
00:30:56
again that's and that's open source. So
00:31:00
this this and this is a suppose the
00:31:02
questions you will mind leaving then
00:31:04
ups to mess and I'll leave it to you as
00:31:08
to who who has all do you think he's
00:31:12
got the largest market share our what
00:31:14
currently say that Google stands to
00:31:16
flow and then face pops and which of
00:31:19
face book consortium with twists Erin
00:31:23
indeed mind et cetera still the the two
00:31:27
horse race but everybody has that
00:31:28
preferences when it comes to to
00:31:31
frameworks and really and that's the
00:31:33
big question is what is the preferred
00:31:36
method F and GPU allocation when it
00:31:39
comes to using can using applications
00:31:41
they these are this frame I a of that
00:31:44
for you to to aid you with with
00:31:46
building blocks but really have to
00:31:48
bring in to play how we you use GPU so
00:31:52
it's really so my my questions assume
00:31:56
if how you know can we introduce
00:31:58
heuristics into frame was like so sure
00:32:01
cafe or or tends simplicity we can
00:32:03
automatically configure perhaps
00:32:06
bringing in concepts I don't if
00:32:09
anybody's familiar with the one where
00:32:11
trick so we had to beauty paper from I
00:32:14
was conscious K where essentially use
00:32:17
eight apparently some in the calm. And
00:32:20
convolutional a is and then model
00:32:22
powerless them in the fully connected
00:32:24
layers maybe we can you start using I
00:32:27
and perhaps even some heuristics to to
00:32:29
make that automatic configuration easy
00:32:32
for is that something that we can work
00:32:34
of who's got the log the largest market
00:32:36
share really does come down to
00:32:38
usability I think but also I think can
00:32:42
sing miss would agree with mates it's
00:32:44
support as well. And that's a big deal
00:32:46
it needs to be able to provide support
00:32:48
to use a user there are as I said
00:32:52
earlier many only this is just a sample
00:32:54
of how many different applications is
00:32:59
it really comes down to the application
00:33:01
that you working on and I'm whatever
00:33:04
language you familiar with but to be
00:33:05
honest I'd never use lower and and I'm
00:33:09
picked up talks within about a week
00:33:11
it's very simple and then you can say
00:33:15
you pick if you're if you're very
00:33:17
familiar with matlab and then Andrei
00:33:19
fidelity it's at Oxford and has created
00:33:22
mark on that for you in this also has
00:33:24
limited a little bit about GPP friend
00:33:29
like it's is really down to to your
00:33:32
choice but they are that I'm I'm seeing
00:33:35
this quite a lot of the moment. And
00:33:37
there was the the summit recently and
00:33:39
then San Francisco and data bricks
00:33:45
which is which is box framework shall
00:33:51
we say it's a little bit more money
00:33:53
basically they're gonna bring in in GPU
00:33:54
instances into into data breaks very
00:33:57
soon so this is an spark. And and also
00:34:00
tends to play now stencil frames. And
00:34:02
but this is again different very large
00:34:04
scale distributed data learning and
00:34:07
it's a nice to bring in but and see in
00:34:09
a lot of a spark and a lot of talk
00:34:13
about spark. And in fact I think it was
00:34:18
I forgot his name ten ten point I think
00:34:22
it was so he gave at all kids but this
00:34:26
box Francisco summit. And one point
00:34:30
anyways and certainly made a comment
00:34:32
there's so many questions about dealing
00:34:34
itself that you made the comment you
00:34:35
know is anybody going to questions
00:34:36
about spock. Um because it was really
00:34:39
all about E learning in a more said
00:34:41
that having to actually integrate and
00:34:42
and so what's what's nice places a lot
00:34:45
happening that so just to give a quick
00:34:48
update downloading for time on the on
00:34:51
declining hardware so as you know what
00:34:54
a GT six SGPU tech conference in April
00:34:58
we introduced and Pascal or I miss
00:35:00
codename P one hundred and and this is
00:35:03
then a combination of about three is
00:35:05
with of of investments and and research
00:35:09
and development. And it essentially
00:35:11
packs five roughly five ten or twenty
00:35:15
tera flops in in either double single
00:35:18
half precision and it's actually bill
00:35:21
to go alongside Q eight which isn't yet
00:35:24
fully out yeah uses chip on wafer on
00:35:28
substrates technology so HBM to hype
00:35:31
and with memory starts memory and is
00:35:35
currently the four by four stuck to
00:53:34
two hours. And if not days. And you're
00:53:38
able to actually work within provide
00:53:40
visualisation find out how it's going
00:53:42
while you're doing it. So really does
00:53:44
make a big difference I've included
00:53:50
that simply because Carl itself and and
00:53:54
using GP use and and this is really for
00:53:57
those big challenges you know it I
00:54:00
mentioned earlier just come back from
00:54:01
and the first week of a six week
00:54:04
program using GP use and the learning
00:54:07
for astro detection you know do we now
00:54:10
being being able to address some of
00:54:12
life's really massive challenges. And
00:54:16
and this the work that you're doing and
00:54:18
hopefully about to learn how to do even
00:54:20
better and they will allow us to create
00:54:23
solutions and at least gains more
00:54:26
understanding a lot quicker to some of
00:54:28
these grand challenges and and forty is
00:54:32
I've mentioned a couple of times is
00:54:35
currently available now as opposed to
00:54:37
Pascal which will be like to this year
00:54:39
and the to the wider. And so the wider
00:54:43
public Pascal is the single precision
00:54:46
all the cars obviously have
00:54:48
capabilities of those but it it was
00:54:50
developed a single precision which is
00:54:52
really what D dealing needs. Um as
00:54:55
opposed to double but it's also edging
00:54:59
up towards the the the larger scale and
00:55:02
distribute it and so we we added
00:55:05
another twelve gigabytes so now twenty
00:55:08
four gigabytes on board so that's a
00:55:11
over three thousand cool GPU that's
00:55:15
available today and and I mentioned
00:55:18
only the M four so this is the
00:55:19
inference side we essentially. And a
00:55:22
landing really has two main where loads
00:55:24
which is the the compute intensive
00:55:26
training and then the inference. So
00:55:28
this is deploying to mobile phone or
00:55:31
onto an intelligent video onto. It's
00:55:34
small cameras for example. So we
00:55:36
essentially split the willows and for a

Share this talk: 


Conference program

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
4 July 2016 · 2:01 p.m.
Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
4 July 2016 · 3:20 p.m.
Day 1 - Questions and Answers
Panel
4 July 2016 · 4:16 p.m.
Torch 1
Soumith Chintala, Facebook
5 July 2016 · 10:02 a.m.
Torch 2
Soumith Chintala, Facebook
5 July 2016 · 11:21 a.m.
Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
5 July 2016 · 1:59 p.m.
Torch 3
Soumith Chintala, Facebook
5 July 2016 · 3:28 p.m.
Day 2 - Questions and Answers
Panel
5 July 2016 · 4:21 p.m.
TensorFlow 1
Mihaela Rosca, Google
6 July 2016 · 10 a.m.
TensorFlow 2
Mihaela Rosca, Google
6 July 2016 · 11:19 a.m.

Recommended talks

L.I.F.E. - express your truth through your accurate data
Dario Ossola, PhD, Algorithm R&D Coordinator, L.I.F.E. Corporation
3 June 2016 · 2:52 p.m.