Embed code
Hmmm oh okay so we'll start the session
with one more participants so midnight
are all skies formal and she will be
talking to more about that on soft so
on so we can ask questions about
frameworks both so Bout top shown
console to user disown around right no
she's you know okay so that's why a day
is is that a bunch of questions that we
ask hua websites so I may sometime or
like today would ask by one of those
one because it's it's okay yeah I one
so what would be the most that was the
best chance there are goes are much
learning techniques so what about
decision trees for instance yeah you
sure okay if you if you where we we we
can you could yes exactly so something
when you you are not allowed to have
gradients on the back pop on offence ah
okay so is a microphone I'm trying to
understand by problem. And realise it
and I think there is a general
principle behind backdrop which is
trying to find ways to do credit
assignment you can think about what's
going on and some are L reports betting
incumbents back prop is is great I mean
it's it's working quite well but but
maybe there's some more general
principles that could you applied that
the work you can when the changes we
care about and not infinitesimal which
is one of the weaknesses of that
problem so yeah I don't wanna talk more
about it but I for me. D planning is
not back problem declining is about
learning this representations the
distributor station running good
representations and backdrop is the
best we have now but I hope we can find
better something. So So actually for
it's like or a lot of and all P tasks.
Um keep signing is not a very good tool
like usually just bag of words followed
by some SVM is like it's faster the
either the same performance are better
and it scares very nicely and also work
to back is not designing either and
it's like used everywhere. And it's
really effective but things. Um okay
and also pay means is so what about as
not deep learning but it's it's shallow
but it's representation learning and
just distributed representations. But
yeah it's Monty obviously I mean how
much make alone would keep saying where
do back is not defined things that I
agree make sense bend like K means then
like you know all these clustering
algorithm that we use everyday followed
by some kind of SVM or something like
these are still most effective for in
like a lot of cases that I I don't
think there we go so here's an example
working means is very useful sometimes
you want to take a quick hard decisions
for example we're working on the
scaling memory nets and you want
something like hashing or clustering.
Um that can quickly you know find some
sort of good answers. Um if we use the
normal soft attention just competition
too expensive but but there are other
algorithms that can help us there. So
no trees no I I wrote a paper about how
trees are bad because the generalise a
locally and so they can essentially be
killed by the curse of dimensionality.
Now when you do a forest or any kind of
combination of trees like boosting you
actually go deeper by one level and you
get some nice kind of this
representation "'cause" if you think
about it each truly is two you sort of
one aspect of the problem and and the
the it's the composition of all the
leaves that you've selected for each
tree which is a presentation of your
data. So it's actually pretty powerful
representation problem is right now
yeah I don't it's not clear how to go
beyond these two levels and also except
for boosting there's no it's it's not a
how to train these things jointly for
example. But yeah that's so it's not
something that's the that's a centre of
so if your topic up users press easy to
drink to bring into trees this ability
to to extract a certain type of
information that which is trying to
that to another little trees which is
to duty some sort of the tree you could
do that but I think you first you would
need to do some kind of yeah and if
it's not you know how you could
optimise jointly all the trees. So so
here's an example where you know there
are things we'd like to optimise. But
back problem can be used to optimise
them and so trees are kind of greedy
things that have greedy all buttons but
it's they're not is you can't
generalise to you know trees on trees
entries because we don't have any agree
all ribbons for that. Maybe questions
in the room oh I I was wondering
whether there is a mean when it comes
to frameworks. So and today we saw a
nice going by and in that in that
slides where we see that that's a flaw
is more closer to that that that
production one that or she's closer to
research is there I don't see before
going because everybody's doing. And
research but there are but there's a
lot of hope that it's got could be
turned it to production it was actual
actual existence and it it is difficult
to change frameworks as we go from
research to production is there a
recipe for going the whole cycle from
from research to production with one
framework everybody could that be on or
is it like something very preferential
or subject So the question is is there
for a market that researchers beta
scientist and developers can use and I
think we will see a lot more about this
tomorrow but this is the point of
cancer flow and it was built with this
in mind because I don't know there are
a lot of very good researchers and a
lot of what developers that want to put
this research ideas into production.
And the feel this the deploring here
right now is moving so fast that if you
have two different systems you end up
with your ideas in production being
completely out of date. So this is what
I actually aims to do to be the system
that researchers and this is what's
happening right now now researchers use
answer flow and the same models are
being production also but then several
easily. other questions okay then I
will go through my oh yeah hello. Um my
question is concerned about
unsupervised learning and the
understanding what are the features
that will there be a simplified
unsupervised learning because if you
have images and then we get the like
which genitive model we can see the
images that we generate but if we have
other types of data which don't have a
visual interpretation how do we go
about the assessing that live in fact
also the generative model. And the
secondary how could we use this to
perform some kind of clustering or
understanding of the data it's a very
good question even for images it's not
completely that's satisfactory to only
look at was generated and there is some
some nice discussions about whether
even you know having good generation
doesn't necessarily mean we have good
features that in the sense of using
them for particular task it's not clear
was the right even if you stick within
generation is not clear what's the
rights measures we should use to know
that we have a good generator. So this
lot of open proper problems about you
know how do we evaluate instruments
learning in general this is really a
field where of papers are being which
in these days and you know I we don't
know what the right answers or
classical answers to question are
simply to take the unsupervised
learning as a helper for some
supervised learning task right. So you
could do some nice provides nothing you
could do well the preach retraining
which is a form of sony's prize
learning. Um yeah things like transfer
learning with unsupervised learning has
been done before. So you you basically
define another task which hopefully
would be can helped by using the
features or the regular eyes or
whatever coming from an splice I so
that's kind of not completely
satisfactory because it may measures
some aspects and maybe not other
aspects but yeah that's what we have
now. So an ideal and answer to your
question from from conceptual point if
you would be something like not a
single task. But a very rich wind
family of tasks. So if I can define so
we let me give you a concrete example
would say that I could have a task
which is something like visual question
answering. So so you have to you you're
given any mention and somebody asks a
question natural language and you have
to answer and if the questions are
really very completely open and you
know the kinds of questions then
presumably if you're doing a good job
is if you're able to solve any you know
semantic understanding question about
the image express natural language.
Well this you know presumably means
you've extracted the right information
from the image. So I I I think we can
conceive of very broad tasks that go at
all of the aspects and the data because
if if I throwing some aspect of the
image somebody may come up with a
question that not gonna be able to
answer right my features away things
that that the question could ask about
know this by still in this particular
example because obviously humans are
not gonna ask you know is the pixel
three twenty one seventy six greater
then pixel little but that's not the
kind of question you gone again Yes I
have a question about the many people I
speak with the is comes to from signal
processing have been working with the
image processing and different things
before and they usually see that are
very sceptical too deep longing because
it's services a black box and
everything so within it. And which is
about yeah yeah if if features but what
usually in in the it class together the
signal processing you have very good
knowledge of the lower levels. And
there are very good to models of that
also with the agenda line detectors the
way the structures and things like
that. But the problem as being the
semantic up and the but couldn't we
inherit the the lower thereabouts and
the knowledge from all resource that
has been a for there into the deep
learning and concentrate more on the
semantic solutions I think that's
already what's happened a lot of the
early research with convolutional nets
especially the the period where we use
a lot of expertise lining was actually
focusing on the evaluation that metric
was how does it look like we're getting
Gabor filters and what does it look
like filters that you would expect
using you know sensible signal
processing. And that was it of course
quotable evaluation as my first part of
the answer the second part is it's not
where people focus right now people
focus you know the design of the
architecture isn't anymore like how to
do the first two years that's not where
the action is the action is precisely
where you're talking about the semantic
aspect thinking about objects thinking
about scenes and you know a high level
relationships and so on but the still
many yeah many deep learning past is
very much about creating data and
augment the data with the different you
your metrics. And that is you know way
already solidly in the lower levels if
you have a proper models and so I think
you want to take a step above that yeah
people have tried that so the reason
we're using these complex with
different nations is because before
people we're doing exactly what you say
they were taking handcrafted features
which were invariant two kinds of
things and sticking some machine
running on top including your meds but
it turns out that he works better if
you learn the whole thing into and so
what you're suggesting is in you know
something we should do is something we
have done. And maybe we can do it
better but it has been tried it it's
it's exactly where we come from oh
right. So so I have at face but we do
have some research going on and decide
where we want to clearly have learned
from all of the research and signal
processing. Um last year we published a
paper called the complex well dude
continents which basically inspired
from wavelet packet transforms. Um they
don't work as well as well yeah that's
but regardless I think it's important
to understand. Um why applying
traditional signal processing methods
directly just doesn't work as well. Um
we we do have a collaborations that and
why you professors for example David
here man there's a lot of ongoing work
but we at this moment we don't see
anything promising enough to be excited
about oh one question actually about
the batteries is there a successful way
to integrate the motion of time on it
the like some successful application
like time series and predicting that
that's that's fine yes three letters
are and and we carry on that and there
are many forms. They're just design
exactly for that. And they're working
beautifully well I have some more
advanced questions to this island I've
seen. a nice idea of by interpolation I
that And there are and for an and yeah
cool Marcus investigations and capacity
into thousand fifteen is correct yeah
but in general how okay yes quite well
do you have some wrestlers beyond the
things that already exists I don't know
I think personally I would be
interested in understanding more the
structure of the dynamics. Um how the I
can spectrum of the jacobian changes
and both are in training or trying the
sequence another interesting question
is what information is preserved in the
state. So if you think about what a
recurrent net does is it reads a
sequence and that any point it has a
vector steak that is a function of that
and of course it has to throw away some
information and and it's gonna keep
some information. So we could use some
kind of monitoring devices maybe to try
to figure out what a particular
recurrent net is remembering and what
it's forgetting about the input but
yeah I I think it's a great question
and more could be done to understand
what's going on in a would maybe help
us design better recurrent that's as
well better architectures oh so at some
point soon as you were saying that face
book imagine it is a small data set. So
so what the future be of the
interaction between the getting X on
the on the combined because it is a is
a is a bit of feeling that's we soon if
not already well not being the same
category. So we go to conferences and
you have to wonder about we have if you
you have that's a good question there
are actually larger dataset sleeker
hundred million dataset. It's hundred
times larger than they measurement. But
it's weak labels and that's something
we've been really interested in and it
is true or data sets are larger and
magnitudes but we also don't publish
any research on our data sets we play a
level game in the sense that we play
the game that the academics are playing
every work on the datasets that are
public and there's work published
around it it's it's not only cushion
that I'd so it's a question of
competition power also I discuss
sometime people were private companies
are the at at the at the hurt of
getting good results especially when
you play the rock ending game four
ounces if you if you guys to such with
one thousand GP use we do agree touch
we stooges use and it's it's it's over
also again what would be the way of
dealing with this. this is something
that I didn't think about a lot because
then I running go visit research labs
and also some of my interns come in and
there's a there disconnect there
between industry and academia. I mean
some labs for example your shows that
the or or you know rich enough to or
you know they they get donations for
like a lot of cheap use but like most
labs I know there's like one or two
jeep use where you do most of a
research and there is no clear answer
like at face but we are trying to
bridge just disconnect we are donating
machines to many research that in
europe. And it's an ongoing program. Um
that's one way to bridge the gap but if
you ask how do you how do you bridge
the income back got that in the real
world. Um how do it is the the
disparity between the rich and the
poor. I think it's a hard problem. Um
and that's that's it's it's it's it's a
hard question as well to to answer your
cat so here's here's a simple
suggestion you know make your tax
returns public in other words you know
declared your paper how many GP use
you're using we actually we talk about
how many jeep you think it's something
that we should it should become a habit
and that we viewers would take that
into account in their judgement because
you can't compare to papers where you
know that has one to diffuse the other
as a hundred for the same job So this
is what I wanted to to cause not a
simple because somebody could fake that
you only have to give you right but
people can I but that was some do do
you think it would make sense I suggest
is really too few people I was hoping
to but right no it's it doesn't seem to
to me so interesting to others but
would it make sense that people have to
declare the amount of clubs they burnt
for the paper including the grid search
including everything so to have a rough
is even of earth estimate that that
would kind of be helpful no I think it
makes no sense at all because doing
better research is not a function of
the plot to burn the I mean we could
burn a billion plot and still publisher
that paper it's not enough to have
computers but it it can help to give
you the little edge for you know
beating the benchmark so I think there
is another way that we could both agree
which is something I talked about
yesterday change the focus of the
evaluation from purely the numbers to
something more about the ideas and I
know it's harder is much easier so well
you're far from the benchmark reject to
try to actually think about all is this
really interesting any like is you know
what what we feel about this idea is it
is it reasonable and this is something
that could have an impact if it works
which harder to do the evaluation but
so he in the here's another thing I
think what's gonna happen naturally is
a segregation of the tasks all of the
research goals right so so one of the
main reasons for doing the kind of
research we're doing in my lap within
spies leading under the models that you
don't need to have a million examples
to do that you can you can develop new
ideas and test them on small datasets
in fact most new ideas fail on and this
and so you don't need to go very far to
know that it doesn't work. Um it's it's
so I I think we'll see some sort of
research topics that are gonna be more
explored by academia. And some research
topics that require doing things like
you know producing the state of the art
in some can be difficult computer
vision task focused more more I
industrial apps. It's gonna be sad but
I think that's where it might be going.
So the other alternative is we come in
those two people sitting there to to
make that kind of attitude is So I just
wanna say that I think I bit also what
the yours was that it's I think less as
a competition more as a symbiotic
relationship. And academia and industry
can complement each other and from
double side there are hundreds of
grants the done to research labs every
year of visiting scientists that's just
common work that well four months and
also open sourcing tons of little helps
as well right I mean the idea is to
help the community not only help or
sell yeah but okay that that may turned
a bit to put it together but I I I
understand that's the phrase book
Conger sees a solution has faced book
on Google funding the results but I
mean we there are so many things to
discuss here you wanted to add
something okay can okay Of first yeah
well we actually a totally agree a
specific with that donating to
universities even though you know we're
not Intel rain D so we actually usually
one or two you use actually have been
doing these mostly in the US but
however you know we're open to
corporation you're also going to being
a few other aspects and I should be in
here in this conference and there's
another node centric approach even you
S rises anymore tightly you were
there's also these came out machine
learning you think your own things like
how do approach to park actual I'll
talk about "'em" or have done some work
with the rice university which she you
know each accelerate practise practise
and you use in that again takes you
will for dirty to be able to handle
bigger datasets. So again there's one
interesting direction to to what you're
looking at the down thinking also for
this there's tired I was like a graph
lab in the US to know there's a company
called actually about comedies are both
know probably this came out you know a
holiday to many companies that also
work in that space And so two things
I'm very and I'm sure I'm date some
extent as well give out a lot of GPS
we've got a hardware academic ground so
if there is anyone you just go online
and put a proposal three and we tend to
I'm actually I'm actually we now for
giving away way too remote look nine
sales. Um so there is that but the
other thing is we will kind of just
assuming that the learning is gonna
continue the way it's and it's going
which is I really really intensive
training. And then you have your
inference and there are already
research is to collect that entire work
load. Um you know what mention names
but I mean I'm I'm talking to people
who have upbringing and and things that
expectation minimise asian and where
you're going more the biological we
wear and this a space comes back to
attention models things like that where
you're recognising the features before
you even then go to the training which
and this flipping of the workload means
that you don't need as many GP use and
you can also have massive massive data
sets because you're not doing this
intensive training it so you actually
found and and that's just one thing do
learning is not gonna remain as it
currently is we're not there and there
is is as far as I'm concerned as far as
I know only about one or two people
actually looking at this because the
majority people just assume the living
is this this huge training and then the
inference. Um but that's gonna change
the field you have in Macy's these
people make at way that that will
change the failed for GP you use but
there's maybe another solution that we
come to us from hardware guys. So there
are lots of companies who are trying to
compete with envy via. And build the
next generation of neon that chips. Um
this could give us a hundred full speed
up in the next couple of years. And it
could level the playing field if if
these chips also sold in in a commodity
products. And they're gonna be cheap.
And it's gonna make it hopefully much
easier for research. That's a
possibility that I hope will happen
yeah I think we both on the speaker
that actually some interesting because
what's something that you mentioned
like if I were to commute to harder to
days neural networks by the time I had
rates ready to be obsolete. So we have
extra yeah I don't think I don't think
so. I think I think a lot of the
building blocks will be there actually
shows but I agree with you that's
exactly it and we need to identify the
the building blocks a making those
available in making those programmable
yeah that's that's actually that's but
the other thing is that the differences
in hardware very reminds you know for
example Pascal was like three years of
aren't indian and it's what should
bring it to to market. So that's very
very slow compared to the advances that
you make in in software and I'd have to
get on is numbers you know on a really
quick. And and and it's not so much the
the neon framework it's the actual
shape that that developing even though
it's still okay I think we have to get
it out next year yeah yeah basically
they use the RG B.s right now to do the
the simulation but you know when it
comes out I mean this is a dedicated
chip so you know where well where
everyone and all tabs are actually
working this got grey and yeah be it
could be used for training not just for
infants yeah but the the point is that
the software advances I think you you
have to realise that that is gonna
happen a lot quicker than hardware you
know I mean we still talking next year
before Madonna bring bring the chip
out. And in the meantime you know this
people doing tests only on on CPU at
the moment the claiming two hundred
times speed up using genetic
algorithms. And the neural networks and
this gonna be way more advances in
software thing even before the the
hardware comes in sorry I'm very
curious to hear a less busy numbers and
well to do. So I'm speaking as loud as
I can honestly I'm really curious to
hear cast position on this and what
Google is doing a with that you you the
question is a curious to see what the
goes doing with the new UITP you yeah
so obviously it's not a secret that
Google uses a lot of deep learning. And
speeding up the training is very
important so you pews are obviously
still very much in use and TP use our
as you probably know already used by
problem products for one here so around
brain uses the user and bring one of
the search out. So it's cranking out
where they're just not the very simple
base as you can imagine and part of it
is neural network that helps with the
ranking and that uses to be for example
so we we definitely see neural network
or machine I machine going specific
hardware helping but again. It's not a
focus on hardware versus softer they're
both that that thing and it's not
facing a bit rather it's getting at at
that thing at the same time and says
likes sea horses we want to have the
hardware to enable us to do the best
possible research sh cool I had a
general question ah yeah yeah it's it's
more search a word stuff presentation.
So is there some sort of intuition
behind why cans work better than
variational autumn colours because
variation on encounters have a nice
elegant formulation but can simply
before in the past "'cause" they can't
"'cause" variational or I don't colours
can scale this over safari believe can
sell so it's a good question I think
different researchers may have
different opinions about this that what
happens with very small encoders is
that the it tends to as I said to lose
too much information about the input in
their later representation by adding
too much noise somehow and even if you
just yeah and then what happens is
that's the decoder sees the same
representation being associated to
different a axes right so I I it's
trying to do a one to many mapping and
it does it by having a deterministic
function fall by at some gaussian
noise. So what what you're getting is
that the mean of that gaussian is going
to be somehow in the middle of many
images that correspond to the same
later no presentation roughly speaking.
So what happens that's what you get a
blurred images image is the up because
the the average of a bunch of images is
a kind of a blurry image whereas gas
doesn't have this issue at all it can
produce very very sharp images but it
has other issues it may miss boats
other it it may give zero probability
to things that should happen in the
world of course when you generate
samples you don't necessarily see this
that is a whole world that is missing.
So it looks nice but if you were to
compute the log like you have again you
get infinitely bad log like you so yeah
they they have their advantages and
disadvantages so maybe again imagine
general question so there is a bit the
feeling that I go result was it a bit
of a surprise for even the people in
the field. So what what would be
according to each of you something
which is ask clearly. That's redefine
because this is pretty pretty clearly
define these are let's say that you
don't expect to happen before ten years
and if it was happening before ten
years you would be very surprised well
is is just too much of a perspective
question is that okay before okay two
years two years. I would say is
starcraft within if it if it gets all
within two years that would be that
would be very very impressive so
without the go I think there was
sentiment the your for all I got was so
I went initial paper came out that I'll
forego I mean that goal will be thought
to because the initial results really
promising just what that's the building
this about it or with starcraft I think
they are very hard problems in it to us
all first for example doing
assimilation and in inside the model
like all go has an advantage of having
the simulator it can predict different
moves and then see if they're ballad or
not that's not applicable either to
starcraft or to the real world but
basically doing planning in this late
and space and another thing is also the
action spaces are much larger which
means we won we need a system that can
do hierarchical actions really
effectively or even in for the
hierarchy of actions automatically and
I would say if that in two years
something like this happens that would
be amazing and surprising natural
language understanding yeah I don't
know what's the benchmark there is one
of course that ring test but the
problem with that ring test is that
it's not just about natural language
understanding it's also about fully I
so you have to understand you know
everything about the will that humans
typically know about but you could
imagine a trying tests geared at a
particular domain like be able to
answer technical questions about Linux
or you put two for example there's data
set for this too small but in doing it
as well as a human. I think that's
something. That's not impossible but I
doubt that will have it in two years
but if if we do it would be a great
success I think I would be very
surprised if we get in use machine
learning algorithms to generalise as
well as we do so also a bit related to
transfer learning if unsure what to
your old two features of T rex the next
day that we are to run around the house
and we'll say artist direct this is not
the T rex right we are very far from
that now I I'm not at liberty to guy.
But but no not a bad idea is just two
examples is the enough for a child
because as usual said in his talk right
we understand things about the world
anyway easily able to generalise and
right now we're very very far from that
should read some papers that came out
recently using the only got dataset
where it looks like you're you know
we're able to do a fairly good job with
one or two or three examples using sort
of one trouble on different you one
shot learning techniques I and I think
it's all this problem at all right but
but there's been some recent progress.
So we could see more of that in the
next and of course the magic comes from
the fact that you've already seen
hundreds of other similar in this case
similar alphabets and then you can
generalise to a new alphabet with you.
You know we ways of writing specific
actors but shouldn't exclude this
because also the child has the
knowledge about the world so you
shouldn't assume that learning will not
come from nothing right we just one
more generalisations so within the next
two years of roughly I don't think it
will happen but possibly within the
next five fucking john but I really
would like to see and this is strange
thing fair estimates that from an very
your morphing chips and and
developments that and that's really
bringing down the the power budget but
also ramping up the capability. Um I
don't exactly know where this more
slower and capabilities going to sort
of P can get its next. So the second
wind from a goddess that but normal for
chips are probably very very important
for getting to a GI and and I think you
know getting to a TI is is a really
important thing ignoring all the scary
stuff and what could go wrong at such a
but we need to get that kind of
capability. And I suppose if you and if
you go away from D living for for one
second the other thing would be the the
space program pushing this to you know
to for the for the or getting passed
for the for the limits you know things
like you know you know most things.
Well next year will fly twenty eighty
to to get a know that that kind of
focus is is gonna really yeah take this
feels it's a difference different
sectors I think I actually just
summations visit those speak if brought
their crops are no the the reason for
having it actually or X box comes out a
Christmas level type terrify machine
you two have been on the top five
hundred list a few years ago. I in
actually right now are not the US
department of energy's having these
access key program a need to be able to
annex extra four system in twenty
thirty two or so. But again we need to
bring those flops to there hopefully
system no questions and you know yeah
but yeah sorry I'm asking many
questions but this time or maybe to the
hardware produces. Um fan of recurrent
neural networks and especially might it
used is not because they're not good
but because they're not so really fast
there are in some laps. goods
implementations but I wonder uses some
I don't know natives to disappoint
coming soon follows such architectures
and so I know you can very well. And
from from what I know he brought out
there that LSTM simply because there
wasn't any decent way to paralyse with
GP and this is like the the pretty case
it's a a lot of where the we started
doing and keep you know and fives
obviously now offering I mean it's only
single or and then at the moment. But
we are working on it but that paper.
And what really surprises me and and it
does all the time actually that it's
yeah is is so quiet about what they do
you know that it is but it it which is
surprising you know when we know what
your is like right but that that type
is probably should have been pushed
around that a whole lot more because
it's it was purposely. So that you
could use it with with GP A.'s and it's
it's really just a very another elegant
solution because it it this full
concepts of the actual volumetric data.
And then you know if you if you haven't
read the permit of a CM paper just take
a look at it this. It's useful but it
it did drive us and to be honest when I
first started and video I did say you
know why are we not covering on "'em"
but to be honest I think a a year ago
there wasn't that much activity with
with our own and especially on the
white afield anyway. So I was pushing
you know we need to double more but
again it takes time we we've only got
finite number of people so we'll gather
but it was purposefully. So that you
could implement on GPA "'cause" they've
been using GP news. And opinion you a
big proponent of those for for a long
time I don't know what yours while
you're saying that are intense and use
that much my lab they're used all over
the place. I mean some you know variant
of L already oh I mean if you're in the
via research later for and the
publication means CNN and so the much
more here's switching conference you go
if you go to CP or maybe you don't see
that much but if you go to Lena
language related conferences that
enables you different picture but these
are the one I mentioned I mentioned the
multidimensional oh the
multidimensional yes oh yeah well it's
because you I was only lasted that and
that they were that paper is stalling
and and and you can so the same then
there's but a lot more check out you
know three D volumetric data for GPAXS
to the the is quite a lot different
type is that that are out there now so
fast a military and he gave a paper
LGTC conference and and I think that
what is going on base he's so the jump
strangest also because of that one
paper. But is this quite a lot we
devalue much is really ramping up now
because as a obviously the the medical
applications. Um but again it's you
know we write the beginning of this
where like when you get to three day
and "'kay" so maybe one one more
question and then we can stuff here so
the question for the a framework people
so it's it's one of the question that
you got quite a lot of votes on the
website was what are the and the
inevitable trade offs in a framework so
we you you you showed this kind of
gradient between and destroy and work
too when we see that a sex to be to one
side to be changes are on maybe like it
when question which was asked at the
beginning it ye it's I was loose and
nobody the right of so it's impossible
to have the best of both worlds or is
it simply that we do not have yet come
with the the right the right overall
thing or what we call them so I think
regard regarding the initial trade off
question something that clearly comes
to mind is what's the right level of
abstraction. So ideally you want to
have the things always being composed
of different operations and have the
operations and everything being very
modular but sometimes if you do that
you have your call this lower right
because you can't optimise for example
if you spend your time actually writing
your into and program that say in C or
C plus plus you can really optimise the
bare or to put that same for speed
memory usage and so on but if you want
to have a composition allergy then you
trade off a little bit of the the speed
and also this comes up with numerical
stability. So as we know soft max is
not really that remote "'cause" stable
stable peaceful max and then cross
entropy so for example intensive you
have soft max with cross entropy on one
not ideal for generalisation but
sometimes you you have to combine these
operation so I think this is something
that when you design a deep learning
frame and this is one of the hard
questions where do you stop where where
is the level of composition allergy
that you want to go with and for
example for tens of low for a compared
to do forced DB deplore in systematic
well it's more compositional so and it
turns out that if you do it right you
can even make it faster so it's faster
than the person system also more
modular but I think overall this is one
this is the first thing that comes to
mind I think there is of course this
tradeoff exist but the there are some
tools to really improve on both fronts
and one of them is a very old one it's
called the compiler and the compiler
allows you to have a lot of flexibility
and and modularity but you know once
you've specified the computation you
can use the compilers intelligence
which could you use machine learning
you know to to make it efficient.
Instead of having a human design right
in you know has been designed to try to
you know make it easy for putting
compiler technology but you know now I
think we could do a lot better if we
put in like professional compiler
writers to do these kinds of things
hopefully does it flow will get their
but I think this is a direction where
we could have both ease of you know
design flexibility. And efficiency and
you know efficient implementation and
production ready think the remote there
but make good points and it there's a
comment the in their where we are not
like I mean your question was do we
always have to make these trade offs
between research and the production
like faster and flexibility in an ideal
world we done you actually should be
able to write the most flexible most
fast is the thing but yeah do you have
not yet the the research on the
compilers and the grab placements and
basically a bunch of system research is
still not there yet to build such a
tool and that's why people do these
things by hand. Um when that research
catches up and I'm sure it will catch
up like thing we will move closer to
words like unified system that does but
things really well "'kay" so and use a
question okay so maybe that's enough

Share this talk: 


Conference program

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
4 July 2016 · 2:01 p.m.
Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
4 July 2016 · 3:20 p.m.
Day 1 - Questions and Answers
Panel
4 July 2016 · 4:16 p.m.
Torch 1
Soumith Chintala, Facebook
5 July 2016 · 10:02 a.m.
Torch 2
Soumith Chintala, Facebook
5 July 2016 · 11:21 a.m.
Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
5 July 2016 · 1:59 p.m.
Torch 3
Soumith Chintala, Facebook
5 July 2016 · 3:28 p.m.
Day 2 - Questions and Answers
Panel
5 July 2016 · 4:21 p.m.
TensorFlow 1
Mihaela Rosca, Google
6 July 2016 · 10 a.m.
TensorFlow 2
Mihaela Rosca, Google
6 July 2016 · 11:19 a.m.

Recommended talks

Présentation de l'Idiap
Francois Foglia, Directeur-adjoint
6 Nov. 2014 · 7:03 p.m.