Embed code
Note: this content has been automatically generated.
Yeah and you him this might I see
question but you're right questions we
need nonlinearity is the system
otherwise the whole thing would be
here. Well why does work oh you don't
know you know in two thousand ten we
started thing would and you got this
result that placed like injured or sick
like variables it really works a lot
better for training deep that's like
you know no two or three years but four
five six seven eight initially we
thought it would be a problem because
the relatives flat place oh and the
fuck was this is that if the derivative
information actually what happens is
that it does slow because there's
always a significant proportion like
half of the units which or the wish. So
information does in one hypothesis that
we played with is maybe works better
because there's a kind of symmetry
breaking going on one and so when you
train only a few or so half the unit
specialised for that example and the
other ones don't try to go for example
where is tension units all of the units
get a signal that oh you should do
something to get the error down for
that example the other obvious thing is
maybe the media part is really
important because what it means is that
for any particular example there's a
maple around that example where it's
just that when you're mapping inputs
and outputs. And we understand a lot
about linear mapping obtained by
multiplication of of mixes together
this beautiful paper for example by
Andrew sex realising this phenomenon
and you understand how converges
proceed but of course here reads that
particular because the linear mapping
you get is different depending on which
subset of units are active each time.
So there's a lot we don't understand
yeah we have some some ideas but it's
something I think the next generation
should be investigating you know wise
is working people are also modified
real to work even better like you know
sort of having a zero actually have a
little bit of slow so so you know it's
not the end of the story. Okay yeah
hello but I think everybody a very
curious about the hyper parameter to
think us it's like the number of
keeping a years. And number but you can
you need in each of the the year
especially on the the conversion your
network depending on the training data
number of the training data and
depending on the the difficulties of
the task. So could you have any
intuition about the to think the like
the perfect or depending on the data
and the difficulties that's another
area where more work needs to be done
right now we think of the problem of
finding her parameters as an
optimisation problem itself where were
optimising violation. So we can
actually use me tools from optimisation
you know to search for good parameters
now this is a bit difficult because
each step of trying a particular
parameter configuration is pretty
expensive because it involves training
the whole but actually this is how we
do it. We go and try many
configurations and based on the results
we try out the configurations. And it's
really an optimisation process usually
with the human in the loop. But you
could you could also do it more
dramatically so there's a lot of work
on what's called vision optimisations
or privileged optimisation swear use
completely automatic way of proposing
annexed configuration of the parameters
to try so these are fairly complex
methods and in fact you'll nets are
starting to be used due to help with
that summarisation. But in practise
this very very simple way to do the
optimisation which is called a random
search basically you just lounge twenty
or thirty different configurations. And
you see what happens and if you don't
have a good enough results you launch
more and and you have to be a little
bit smart about the set of values to be
tried for each of the parameters based
on prior experience but it's not really
complicated once you get used to doing
this a lot of practitioners to get a
sense of it's good of intuitive sense
of what works what doesn't work. And
they will manually explore because you
may not have access to you know three
GP use. Um and so it becomes more like
an art but in principle it it it could
all be or maybe it's just a matter of
computational resources. So so I go
actually with the put online to select
some questions on the the words one
only to to these which is the the new
intensive work around the prison able
to automatic you know just to survive
today to itself so they have been you
know to cascade correlation right. And
your little ones welcomes why where is
on the way isn't it more and
transceiver research looked at I think
it's been looked at there is a paper in
two thousand fourteen called hyper grad
from right handedness is group that
opposes the out of searching over have
a hyper parameters as making them
defensible using some kind of momentum.
But these things take so much
computational resources like searching
through building models around hyper
parameters and configurations about
parameters are expensive enough that
research. That's more serious has been
pretty slow. And also on the note up
like what or like how do you find
better hyper parameters. Um I I so both
sides of the kind of having not enough
resources that and why you as a grad
student. Um and also at face but maybe
have almost unlimited resources. Um at
NYUV we used to call this method GSO
read student optimisation. Um there
they casted essentially over over
couple years will form an internal
model of what works and what doesn't I
think there is an optimisation tries to
do this as well it tries to build a
model from like past experiences past
experiments that are done that it can
use to draw better hyper that that
hyper parameters samples. Um an
explorer better but just fresh out of
ICML just last week or so. Um there
have been claims that page and
optimisation for searching for hyper
parameters that and really where you
can just doing a bit more random
searches actually equally effective. Um
it's yeah it's very very empirical at
this point like you don't have a it's
valuation the four hypothetical is Okay
I just just a quick comment "'cause"
that producer or both at asleep using
GP use an L for ED actually two things
I remember seeing with paper I think is
going to fire cafe out of berkeley. I
get in one very nicely we speak about
doing character to isolation but I
think I'd like to make is also quite
words like always keel out of machine
learning that would be to enable you to
do all those searches of more more
closely parallel using a framework like
face books very I just wanna say that
it it also depends a lot on the kind of
problem you're working on so if
training takes two hours then you can
use you you can you know you can use
have the optimisation or random search.
"'cause" you could lounge many
experiments but if training takes two
weeks like training for image that then
you know that's not gonna cut it and
that's why this GS so oh my questions
okay thank you for the doctor talks
whether what like to have a somehow
speculative question as leaving or can
you describe a link between probably
still graphical models and dip learning
or can they benefit from each other or
there's a lot of links you know the
first ways that we discovered to train
the networks were based on using and
specialise probabilistic models of
learner presentations like RP m.'s. Um
and so that was you know say two
thousand six to about two dozen ten two
thousand twelve. Um now there's a lot
of research in provides learning
because that's something we don't know
how to do yet and we know that it's
gonna be very important for EI and a
lot of with that research is strongly
motivated from the by the probabilistic
interpretation of what enterprise
lining is about it's about capturing
the joint distribution of random
variables. So a lot of the work that's
been done in in in graphical models
especially like and variables like
"'specially" variational methods or
even in spirit yeah in something
methods multicoloured methods all of
these you know come up as useful tools
at least for some of the other times
there other algorithms like the again
for example where we completely bypass
any kind of probabilistic
interpretation but if you actually want
to analyse the algorithm and becomes
important again to think about notions
from from probability. So I would sure
because I'm going to go and mixing some
questions from the board questions one
room so if it's okay so there's one to
general question which was the one
doing most so you ring by the postal
workers was what are what could be a
non troubles from the application those
people from the so you get to speak
what does a task that dip running
conserve the best some can sorta worse
what would be the the worst thing to
apply pruning too that's a loaded
question I think declining can solve. I
mean if you look at your networks of
modern times their correlation learners
right like they can do causal entrance
they can do advanced reasoning. So
anything that can basically be solve it
correlations and like simple mechanics
like on days. I think can can be so
that deep learning of today I don't in
know what deep learning is anyways like
it like that field is moving fast
enough that the goal posts keeps
changing but as of today if you take a
net send recurrent nets LST M.'s day do
strong correlation statistics and
counting and that's about it. Um and a
lot of tasks that couldn't be sold in
the past apparently can be so just
that's of these properties. Um and if
you want something beyond that where
you actually need to reason with very
little data or so that's something that
gives learning methods as of today
cannot do. And what a bunch of other
actually working towards getting two
yeah I agree with that it depends on
your definition of what deep learning
is so there's today's T planning and
there is what researchers are working
on now which really is trying to
address these questions of of reasoning
and even causality and it's really
really important because if we wanna
reach a I would need machines that
really understand the world. And that
means forming a kind of causal
explanation of what is going on. Um now
for me that could be very much part of
deepening the the the this what's
special about the planning isn't that
we use backdrop isn't that we use
neural nets in their current form what
special is the idea of you know having
learning representations and learning
multiple levels are of presentation
that correspond to different levels of
abstraction what we need to do is to
inject more of these ideas into the
buttons including the work going on
with the reinforcement learning in
order to to approach these calls but
that's still kind of of the frontier I
don't think I could at is that people
running can is clearly even in in its
future form is clearly not gonna be
useful when you want to learn a
something like a random function
function that doesn't have any
particular structure the reason
declining is working well is because
the kinds of tasks we trying to learn
with it. Um have this kind of
compositional structure that you know
is is is taking being taken advantage
of by by these you know that's Um hi I
have a more practical question the
discussion and numbers and efficiency
today that was discussed once basically
on the server I was wondering how far
we had or how far are we from and
buddies time more by and deep learning.
So in terms of how many for example
that images per second in the case of a
mention that can be S if five on the
mobile I'm not talking about rain but
in terms of free time predictions how
how long we now know what efficiency in
what performance Yeah I mean I can't
specify actual numbers I mean I
would've thought since then you know I
mean that's that's your button right is
they the mobile side but what are what
I'm seeing is edging very very rapidly
to very very low power. Um you know
people like a talking about invisible
devices. So whatever the numbers on now
they the really amazing part about it
is is that's gonna mean nothing in
probably six months time it is kind of
like a collie the I five there you know
it's it's like you bring out the next I
phone and then it's already gone
history you know because we're
developing. So so remote don't get
stuck on the numbers too much simply
because it really is moving back quick.
But inference obviously is that that's
the really important part of it and and
how quick you can do and and I think
Andrew and of then by do would
definitely agree with me that it's
smoke while a lot you know and and
they've been able to use a whether it
will be an actual mobile phone that we
use or whether we will be you know bell
labs is is working on a a matchbox size
unit that that would you know say
attached to this jack a buddy would
keep all your data local so this
there's no sort of to and from the
actual cloud. And you've got the
learning and and inference going on in
them in this matchbox sized unit now
you can understand what kind of very
low power budget that would be for I
mean you know where we were credit card
size of the moment that something about
matchbox with all the capability to
learning. And and inference but again
don't get confused with the actual
training side you know we can do all
that in the server. And but it's
inferences bringing the on board I
don't know where you be but it's it's
an exciting to actually sit and watch
it happen. But yeah I like to call to
ES out actually bit as harbour makers
you know this field is evolving so
rapidly. And we were speaking earlier
if I commit something to harder today
what he gets ready here to you know be
obsolete. Now to me she special purpose
devices there's in a lot of exciting at
development section B of what I think
is from dubbing a startup called movie
use. Indeed you have a very well very
low low power and all seven grams you
for insane in the whole package seven
grams if you have a USB stick you can
train your network with cancer for all
it is take that into account for
example congestion mission also work of
a professor we could be passed yet that
matt is between the USA the do you have
actually I just startup costs alchemy
that they you what equal cortical
columns it slightly different style in
your networks to make it extremely
extremely low power in that survey you
know a lot of this is moving quite
rapidly so I worked on deep learning
for mobile for two years so I have a
little bit of more practical numbers
there. Um that the main thing and
mobile when you're and planning models
is how fast reformed runs out of
battery. And the speed of the actual
running the model is limited by the
memory bandwidth it's not actually the
amount of computer to do but how much
how fast you can pump in images and
read them very layers and memory and so
on. Um on this note there's a lot of
research that's coming out recently on
one bit networks and like a bit
networks and so on where they're trying
to basically give you almost the same
accuracy maybe slightly lower accuracy
of the models that you train in full
float precision. But they will give you
about eight times or thirty two times
more throughput. Um about a year and a
half ago when I actually remember the
numbers we did about two to three FBS
of like a little net for example on
mobile on mobile CP is still on on a
knife ones and stuff see pews are
actually still faster for declining
applications tend you use. Um as of
today like if you for example do X nor
that's fit or just one bit X not
operations all all where it you might
actually get like ten fifteen I guess
but I don't think anyone published hard
numbers on that side. But if you want
specialise chips and we'd actually
makes one of the best chips there TX
one it's rated it and what's in it
gives you want to have a lot of
compute. So that's like really really
efficient if you want to do it like
robotics applications and stuff yeah
thanks yeah so so and ask my question
is about and so I speak as a years show
was mentioning in this presentation
eleven yeah what it's rectify fighters.
So there's an optimisation problem for
deeper networks even what directly
fires just happens later that the
deeper it is the more nonlinearity are
composed of harder is that right that's
also true with recurrent nets by the
way and and out complex like reasoning
based systems like naming that's or and
you're altering machines. Um so what
happened is we we were taking advantage
of the ability of unsupervised learning
to extract fairly good representations
as at least as an initial a starting
point for supervised learning. And that
were quite well and it continues to be
actually something useful if you wanna
do semi supervised learning and you
have a only access to a small number of
labelled examples or maybe you to rice
ruining and you have a small number of
labelled examples for a new category.
So the the combination of supervised
learning in unsupervised that is
something is actually coming back
there's several methods that have been
proposed that seem to be doing a film
interesting job. And I I think we're
gonna see more of that is we are
expanding the reach of applicability of
the nets two domains where we might
have only a few examples of menu
category but of course that wouldn't
generalised unless you have all the
source of information so lots of and
labelled data or lots of other tasks.
And these as learning that this can be
useful there so why is it that the with
the rectifier is that a position gets
to be easier we don't completely
understand as I said earlier there was
somebody in the back raise their hand
first so oh yeah my name's martin in
computational materials physicist at
EPFL so you know we try to attract
assumption on it things and there's
always this back and forth you know you
don't have enough data we sale well for
us we've got loads of data. And and it
it is you know this kind of thinking
that for example in in linguistics
light. So that the sort of turn the
poverty of stimulus that Arafat humans
are able to learn. Um despite you know
secure absence of stimulus especially
in infancy. So is it do you think there
are things that can be learned or use
from biology to try and burst discuss
virtues to amount of data that's needed
perhaps for example the quest six euros
are only capable of intrinsically
learning one of the five possible types
fling was the grammar you know by
making these concessions or other means
to think we can reduced amount of data
that we actually need to do reasonable
work. and what I'm saying is it. I mean
there there are two sides to a I want
it one is the computer science I the
others the your sign side. And I've
been following for quite a few years
companies like new man it's you are
approaching it approaching I from or
rather a GI so artificial general
intelligence from then your sign side I
think it's really important to do that.
Um young the couldn't would say you
know you hear has the quote about some
Boeing doesn't use factors you know and
that this on the self phone that you
can take that that biological roots you
know we but it's it's taken the advice
and and I say that on the back of
various conversations about using
genetic algorithms for example instead
of random search you know for finding
the find the the the best weights to
optimal roots and things like that.
Because evolution and is one way that
we know is very simplistic it may take
a long time but you know when we think
about how we live as as children we
when we first bone you know the the
first thing that that we do is we live
in edge detection and then as the eyes
develop we we gain the ability to sink
alone we gain the ability to put the I
just together the way that the learning
those but we also say that's we're able
to for example on the sun the concept
of a call without seeing a thousand
calls you know and and and these huge
data sets but you also have to take
into account that the average human
we'll spend a quarter of the life in
school. And then you know still
continually learning so is really I
don't know it's the argument a space
again with them unsupervised versus
supervised learning but I think
following the biological route will
help it'll be another way it'll give us
you know another inside and also the
neuroscience I I do believe that with
we live in a great deal about ourselves
and how we operate by by following I by
learning how we learn helps is to learn
how to teach machines or how to show
machines how to learn. And I think
that's a very vital So that that
bridging the gap between your signs and
deepening is actually one of my pet
projects for the last year or two and I
think it's a really interesting
exciting direction there's a lot we
don't understand about how the brain
souls the the the the learning problem
that had a large scale. Um and that we
have lots of concepts deepening but we
don't have yeah but we're still kind of
we haven't solved the advise any
problem really and so I think we can
maybe get a lot of inspiration from how
humans do it. Uh regarding your
question about learning from there are
a few examples I in fact a lot of the
research and E planning is motivated by
this very question. Um how do we to
transfer learning and provides learning
all of these approaches are trying to
answer the question if we're given a
new task can we learn from very few
examples but the the the general answer
to this is this is again there's no
free lunch and the only we can make
that work is because the the learner
has already discovered lot about the
world the reason the human is able I
think the reason the human is able to
learn language from what the little
data compared to the the massive
amounts of data we're using now forcing
machine translation his because not
child is also a observing everything
else in the world not just you know
hearing utterances backchannel this
understanding sort of intuitive physics
that child is understanding social
relationships and and building a causal
model and an explanatory model of how
the world works. Then basically what
happens I think is using language to
you know put names on on things but
these are we things you understand
that's why child will be able to
recognise and you say animal from a
single picture. Because the child has
already formed the notions of of
animals and mammals and you know having
legs and eating and so on. That's
that's how I see it my thank you. So I
I would like to get back a little bit
about the biology but it and especially
what you say before now we can catch a
little bit more of course well it it
and then the next step of correlational
so I could problems that elected
relations in the next would because I
need a yeah I would like to do a little
bit last yet they see a menu for the
process ventral a little bit of your
work about a more biologically
plausible way to write create networks
and dependent side that's dependent
plasticity right right and I would like
to know a little bit if you think this
might be a way to get a more
correlation like results. And the only
one and then on that it would like to
know if at face book you guys I
approach it is a problem at all and if
yes how right so what I talked about
last you a nice email is something that
has continued in my lap and we made a
lot of progress and we put out a few
archive papers if you're interested a a
lot of our focus has been on just
making a biologically plausible version
of backdrop because you know backdrop
is really as I said the work course of
the successes we have recently and and
it's gonna crazy that we don't have a
reasonable theory of how breeze could
even do that but but I'm also very
interested in a biologically plausible
version of some form of unsupervised
learning a and the things we're
exploring now you can think of is more
like those machines energy based models
except I'm going I'm trying to throw
away the energy altogether because it
imposes too much too many constraints
and yeah I'm I'm I'm very optimistic
that will soon have biologically
plausible at least at some superficial
level methods for both scrutinise and
unsupervised learning that doesn't
answer the question about causality
though this is a totally different
battle tricks and there is not enough
work in that direction from pure even
truly machine learning point of view.
So on the biological aspects at phase
but at least we're directly not doing
any research like yeah sure has his pet
project. Um but on the causality front
we are actually very very interested in
building causal models of the recently
published a paper on doing causal
inference in visual scenes. Um
basically trying to understand if
there's a car in the scene and there's
a bridge trying to for example as the
question of like is the car there
because the bridge or is the bridge
there because the car. Um these are the
questions that some of our researchers
Leon but to David Lopez because these
guys are really interested in and the
model that we they'll be called neural
causation coefficient. Um and we think
we have some very interesting initial
results so we maybe train the model on
us and that explain distributions to
just basically given to variables tell
the terrible "'cause" the the other
verbal like whether a Cosby or because
they or neither a or B or cost but each
other and so on. And it actually
generalises a train on this a syntactic
distributions just to do this call so
analysis it generalises to doing
"'cause" analysis and natural images
with the features extracted from
residual network car models be trained
on image ask for example yeah you can
probably read a big war that okay thank
you is it thanks and so there's a
tendency not is that the networks
become more and more complex so they
get their good drawn deeper and deeper
and also there's you types of layer
being injured is being introduced like
there's dropout layers there's new
kinds of the nonlinearity is there is
layer now molestation and so on. And so
and anything so motions are learning
more and more about humans somehow
learn less and less about the model so
the mark on six the most become the
less inside we get from from the small
so is there any systematic way of
somehow getting to know what the nose
now knows and what it doesn't know so
problems that's the way it's so the
activations works very well in some
cases like the when you have visual
recognition where you can just see them
I wasn't the noses and this is but one
general approach to understand what the
no nodes and whatnot when you have this
generally sorry another problem there
is a systematic recipe. It's called the
scientific method okay we we have we
are in front of the phenomena in a
learning a button that you know is
trained and and it's we can see what
happens while straining we can see what
happens after a strain and we're trying
to figure out what what's going on what
what in queens matter why built
theories around I mean that's a that's
something that the machine any
committee doesn't do enough a lot of
the work unfortunately is basically
tinkering. And not busy trying to
figure out you know wise working even
one is working if you build a system
which has like ten different
ingredients. Um and you be the
benchmark great you get a paper but
have we learned something. So
fortunately I mean there are people
really focus on not just getting better
numbers but trying to understand what
what is going on or at least which part
some more important so it's easy when
you get into sort of the engineering
mode of building something that works
well to forget about why but the why
question is really central to answering
a question and and it's there should be
more of it but there there there is I
mean I think good papers are trying to
also deal with the wine. So your
initial state and I would say first of
all models are becoming simpler not
more complicated expect that I mean I
mean go and no into the role recurrent
net scenario but that engine everything
but if you just look at con that's
three twenty dramatic comments or
around any tell is comments had way
more types of normalisation many more
types of pulling many types of non in
their teas and we use second order
methods to optimise things and so on
and right now all you do is three by
three compilations stack up a few
layers send like max building and
that's about it and like every con that
use today is pretty much built out of
these this one recipe right. Um sure
you have resin that's that recently
appeared but come on do not that
there's just like one nice identity
connection. Um in terms of in terms of
like how do we analyse. I think as you
ashes said are in the deep departing
committee as of today does suffer from
not doing the scientific method. Um
it's basically just beating whatever
benchmark there is like squeeze out
half a percent on so art and then you
have new state of the yeah it would
just I think the community would just
mature over time but until that it is
ooh and that there is an that there's a
whole lot of progress been made simply
because they the commercial value you
have to sort of recognise that and yes
you go and and yes face poking yes try
to you know have all got these products
and you know this is what I said about
infants pain it really really vital.
And and they have product so they they
have you know millions and millions and
not billions of the customers and and
they have to get this right. So yes a
lot of research is concentrated on
getting the quickest getting and you
know the the the cheapest in terms of
energy into in terms of computer cost
in terms of passing it over to but
there is a large amount of research
going on possibly not as much and and I
think this is what we get a bad rap for
not doing enough they re there's a lot
more commercial reason to to put the
sound get respect to enhance the
benchmarking then there is very but
there is a lot of the a lot of people
working on the ferry side of it. Um you
know I I'm obviously biased insane
check out and orgy sign and and and his
PHD songs paper on pruning but there's
also a whole lot if you delve into
archive. And you know if if anybody's
not really familiar with archive you
know you can search through and put in
there and things like that that there
is a lot going on. It just kind of get
so overwhelmed by be massive amount
that's going on on the commercial side
so so maybe one question very two twos
is that kind of is for I think
extremely important specie because I'm
from the computer vision sign want to
this phenomenon of tuning of the things
getting a proposal something in the CPL
paper is something we see so one
question was how we should reckoning
value for new architectural be judged.
So yeah so we when we have say that we
should not be just happy because we do
one percent better than when should we
as as over your of the century paper
with one one but growing should we
judge this architecture with is
architecture. So I'm involved in a lot
of conference organisation and things
like that and workshops. And I think in
general there is not enough there's too
much value put on you know. Q or
performance. And none of value put on
having an interesting story. And
interesting hypothesis an interesting
theory that from which we can learn
that something that could go beyond
this particular task this particular
benchmark and and it's even more
interesting if if that use more novels
more original. So the danger if we
don't do that is that it may happen to
our field what happened say in the
speech recognition community for more
than a decade where people don't
explore that much they just take the
existing ideas and tweak them in the
make little variations. And if you if
we do that then we lose the ability to
rediscover very different approaches
and so we need more to encourage more
exploration things that that are quite
different but but somehow maybe you
know a good solution because we have
this true then you know and appealing
your your hypothesis behind it. That's
what I would like to see more and that
would like to see reviewers put more
weight on on that originality and
that's an interesting idea is rather
than just the numbers I would I would
ideally one really years to put more
emphasis on originality then like how
you got this number that the reality is
that you submitted paper TCP alright
well get rejected if you're not state
of the art. Um I think a uniform set of
hardware benchmarks if they're
established. And that would be really
helpful like if you can judge an
architecture by how well the
generalises across a large variety of
tasks generalisation is what we care
about when we build any of these models
right. So bill like if you give the
choice to the researchers on how to
write the story of generalisation they
will write it exactly to what their
findings are like they will basically
say my model really generalise as well
to this this and this data send and
those three is this are probably like
similar in nature anyway so as a field
for example the computer vision feel
there's a lot of good work being done
to established harder and harder sets
of benchmarks every year like the coca
challenge visual cue name but yeah it
has its buses like basically like
having these harder and harder
benchmark setting is the right way and
like that would be like good but and
getting about so I I suppose I wanna
use this as a as a shouts out seawall
too really make an effort to go outside
of the box yeah mentions what I was
talking that I'm in conversations with
people using genetic algorithms and
this particular developer is you know
just an extremely scale talents a guy
he's been working on software
development on chips and a whole host
of different things and for about
twenty five years this particular thing
this using genetic algorithms he spent
about a decade on it. And he's never
really pushed it because you constantly
got got flack really fees it was kind
of like not it just totally different.
And and I was still getting barry is
for these new ideas. I suppose the
shall that is to be brave and think
outside of the box you know they they
the proton that I've I've just left
after its first week with napster
setting is to really the driving forces
to be creative and think outside the
box there are lots and lots of people
is is not actually said just
incrementing on top of existing
knowledge and and make it better and
that's great but we need people to
throw in you know that that
evolutionary chance that population
every round and thing that that will
help as a lot quicker gets a where
we're going. So for example look into
genetic algorithms with the neural
networks and you know people old enough
to do that I think outside the box and
just just one oh I need to do is just
one comment because I I mentioned
earlier that if we go to premature like
on TV harbours some structure they have
today they said if you would is
evolving. However you know estimation
all nature helps is that the same way
that just said that the way the machine
many works because you have this
structure likely we do have some I'm
number basic math operations or
mathematical operations that you you
know turn out to be the foundation of
these you know aspect expect to put buy
those things. So it's a decode people
you know think outside the box but
using the boxes that we have thank you
for your presentation the context of
sell cars frightening to trust
behaviours learn by earning as coming.
And then so my question is this one is
anyway you're right around the
prediction that were besides a very
cool results testing sets you will
never get any guarantee from a machine
learning system period and you don't
get any guarantees to make even by the
way you are judging acumen by you know
their behaviour same thing with a
machine learning system and we can make
that test much more extensive by the
way is not just you you pass a twenty
minute you know driving test you
actually have cars run for and those of
thousands of Miles or whatever it is
now that being said I wouldn't trust
the current systems to drive my car I
think there's a lot that's missing. But
I have a lot of confidence as well that
you know you need to get better and
substantially better the next two
years. But we have to be empirical
there's no other way hi maybe the
question to the hardware developers. So
D currently using like open source a
operating systems an open source
frameworks and in the middle which is
kind of this black box called could is
there a chance we see in idea moving
into more or like position where we can
actually have access to the source code
of these things too I mean we can also
help make them better with full
requests and stuff. and that the
discussion comes up a lot and and I
can't speak for you know the the the
chip teams and and certainly I can't
speak for jensen. Um but we recently
had the machine learning so much where
we invited a whole lot of research is
in in this is January that headquarters
and santa Clara and nothing question
came up again and they did agree to at
least try open up the parts of QT and
then the you know the the the same
kernels that we really need that we
were holding quite type C and they
agree to that how much of a future they
would agree to do I I'm afraid I can't
speak but it is a valid question that
the we're listening to and then people
are considering but again I can't speak
for those who will make that ultimate
decision. But we hate you think it I
have to say this in actually no within
do we actually do take an approach
which is open source an open systems an
actual my no talk on Friday all
explained that well over code is open
source including the instruction set or
GP using you can program in assembly if
you care to so you know they do this is
to make it open again and several
reasons for that to know what some ways
to benefit from the small the comedian
assisted in speaking here you know
there's a lot of developments to
happenings to to happen. So they've
usually been you open an open source is
definitely one way to go so as an open
source telephone who has to deal that
close or systems from and media every
day. I feel you I think in one one
interesting thing there is windows and
Linux for somewhat like that as well
all the latest and greatest drivers and
technologies and everything used to be
written for windows and like Linux was
given secondary like six months a year
later stuff. Um it only takes a single
person and to write a single
implementation that is on par but the
closer system to basically break the
value and the closer system like for
example how Scott grey nirvana wrote
could I kernels for convolution and
stuff as fast as what and media ahead.
And that causes a lot of open this to
be forced upon. Um by the committee so
I think like if there's one thing it's
as as open source developers to really
push companies to not see the value
anymore to close or is there software
it seems like from like a purely market
driven perspective that seems to be
like the only thing that would open
things up things I one brought up a few
times the closer as it fine ah thanks
so it was brought up a few times the
inspirations from biology as we know
from the and real brain networks there
happily recordings. And but the very
successful cases of applications of
declaring for example they're mostly
record. I mean they're like expressing
or point I've you is it a matter of
practical difficulty that you haven't
that it hasn't been tried or you think
that even if we tried like current
neural net force there might not be a
significant advantages rick if if the
records connections yes one of the
things I've been working on is using
the recurrent connections as part of
the computation for and I don't mean to
three to process sequences but just to
even process a single example in using
those we contractions both for
computation. And for credit assignment
what backdrop is is doing. Um so that's
precisely the thing I was talking about
oh here the tries to bridge the gap to
between declining and and and your
science that's one of the ingredients.
Um there's this one question we've
which we yeah have absolutely no idea
how brains could handle it is something
like back problem through time in other
words. I think we have some reasonable
ideas of how brains could implement
backdrop it in the sense of a single
static input. And and the recurrence
essentially propagates the equivalent
of gradients. But the the thing we
really have no idea is how to do it you
could go to back up your time where you
have a sequence of states. And now it's
really doesn't sound very logically
possible that you would have to store
the whole sequence which might be your
whole day right. And then somehow you
know playback backwards you know and
computing the greens are already
equivalent that's something it's
totally open question as far as I'm
concerned you when this question
because we were supposed to finish at
fifty nine three twenty six lives of
feeling people are your to that you
train so Thank you. And they have a
really mean questions related to that
but maybe so there was a question which
problems can be solved and which not
which is well the easy to answer and
also I like I mean the problem is quite
it's to do engineering or in general
engineering just benchmarking and so
on. And I really like the opinion
analyse the things and find out what's
happening. But I'm searching for and
then maybe you have the onset is and
brought analysis of gender the the
problems there so radical in the
properties of the problems of the data
of the quantity of the distribution of
the data and whatsoever. And the
possible networks not only CN then and
alter encoders but also the others yeah
man multidimensional recurrent networks
and whatsoever knowing when can be
applied want. Why. Why not is there any
very nice deep analysis of that amount
which you would slide with a book which
was supposed to help dealing with that
question right so once you understand
the building blocks a lot of these
different architectures start making
sense. And and then you can sing
considering your problem for example
recently I was looking at how can we
use these things to model three
structural proteins right so it doesn't
really fit the things we normally do.
But but you can we use a lot of the
ideas that have been around the the
algorithms that we know and and compose
them in new ways once you really you
know make sense in your mind about what
they're doing and why. So yeah I'm not
I I can't give you didn't answer of
course in in one sentence all these
things but that's the kind of thing
that you get by Reading a book or by
following the literature in trying to
make sense of these methods are just
you know how can I implement them but
no why you know why is this for example
why we have bidirectional recurrent net
we did this good reason it makes sense.
Um so what do we use attention well
this this good reason I mean it it it
it helps us deal with a particular
issue. Um once you understand these
things you can you can you know be
creative and and apply the menu
settings and I think it also comes back
to what I said well there and so again
I assume I think the the question and
using I say do a like you know what's
the potential effects for each having
them automatic configuration of say
what types parallelism to use of which
points in in the layers but also which
types of layers I mean what I know it's
a lot of hardware but theoretically I
mean what's what's involved. So that we
can do that there's a lot involved
there is very hard problems and the
world of compilers that need to be sold
or before you can get to like automatic
fertilisation. And that actually ties
very much into the the specific
problems and can compilers are not
specific to compilers but they are from
grab teary in general and they also
apply to searching for ideal neural
network architectures are like stuff
like that. Um as of today humans do it
by hand but you could probably build
some kind of deep Q that for that
predicts switch parallelism to you is
and it's not actually implausible at
this stage of very yeah I don't think
we're file format right I I think it's
I was that far. I wouldn't say we're
not five everywhere not far from it I
would say it wouldn't have been a
plausible talk about two years ago. Now
it's plausible. It's not possible I
don't know and yeah I think do your
question like what you actually said
was pretty bad spot on read and
understand the textbook and that would
be the first I'm not really happy was
set on soon yeah I mean I wrote a book
yes of course I I read many books and
me purposely may be I have some quite
good I yeah but typically these people
who only use deep learning they know
their problem. And they don't want to
have to dive into all the specifics of
the neural networks and understand
everything in order to map that finally
to the problems. I mean Marilyn
analysis of the problems and finding
then which article architecture works.
The other way around yeah I I think so
ms was talking about using machine
learning and reinforcement learning you
know to to try to do these kinds of
things but it's quite now what right
now it's quite you know like it's it's
research goal is not something we know
how to do and and the the the result of
this is that people who have that
expertise are you know in big demand
from industry and they can earn a lot
of money so until we can automate that
job which seems pretty far off. I think
we're gonna have to go the hard way and
and and you know have engineers learn
about the the the underlying science
sufficiently to you know combine these
building blocks together. And that
being said I think something positives
happening with the software tools and
and the progress with hardware which
makes it easier for people without
necessarily doesn't the strong
mathematical background our a lot of
expertise in in using your that to
tinker so you know once you understand
a few basic ideas for machine learning
and from from you on that you can and
if you have a good software tools which
you know are well organised with lots
of existing examples of you know
different types of architectures and
then module architectures and and
efficient a hardware you can actually
do a lot by just you know playing
around with the the label blocks and
you know for simple five six our time
investment anybody can go through
online courses and and get a grasp of
it but I suppose exactly how does each
and every application gain from D
learning I'll be quite happy if we
don't quite so that with a I just yeah
because then not the average all so
okay so maybe you can things the
speakers again on the I will tomorrow
we will us assume is presenting torch
basically for three hours starting up
yeah exactly so so so we use for I
don't in talk on your show we give a
more modern talk about defenders you

Share this talk: 

Conference program

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
4 July 2016 · 2:01 p.m.
Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
4 July 2016 · 3:20 p.m.
Day 1 - Questions and Answers
4 July 2016 · 4:16 p.m.
Torch 1
Soumith Chintala, Facebook
5 July 2016 · 10:02 a.m.
Torch 2
Soumith Chintala, Facebook
5 July 2016 · 11:21 a.m.
Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
5 July 2016 · 1:59 p.m.
Torch 3
Soumith Chintala, Facebook
5 July 2016 · 3:28 p.m.
Day 2 - Questions and Answers
5 July 2016 · 4:21 p.m.
TensorFlow 1
Mihaela Rosca, Google
6 July 2016 · 10 a.m.
TensorFlow 2
Mihaela Rosca, Google
6 July 2016 · 11:19 a.m.

Recommended talks

Ouverture du 17ème Forum Economique Rhodanien
Madame Anne-Laure Couchepin Vouilloz, Présidente de la Ville de Martigny
15 Sept. 2017 · 9:07 a.m.