Embed code
Note: this content has been automatically generated.
00:00:00
Yeah and you him this might I see
00:00:09
question but you're right questions we
00:00:22
need nonlinearity is the system
00:00:24
otherwise the whole thing would be
00:00:25
here. Well why does work oh you don't
00:00:32
know you know in two thousand ten we
00:00:36
started thing would and you got this
00:00:39
result that placed like injured or sick
00:00:43
like variables it really works a lot
00:00:45
better for training deep that's like
00:00:47
you know no two or three years but four
00:00:50
five six seven eight initially we
00:00:54
thought it would be a problem because
00:00:56
the relatives flat place oh and the
00:01:00
fuck was this is that if the derivative
00:01:04
information actually what happens is
00:01:06
that it does slow because there's
00:01:09
always a significant proportion like
00:01:11
half of the units which or the wish. So
00:01:16
information does in one hypothesis that
00:01:19
we played with is maybe works better
00:01:22
because there's a kind of symmetry
00:01:24
breaking going on one and so when you
00:01:25
train only a few or so half the unit
00:01:29
specialised for that example and the
00:01:30
other ones don't try to go for example
00:01:32
where is tension units all of the units
00:01:35
get a signal that oh you should do
00:01:37
something to get the error down for
00:01:38
that example the other obvious thing is
00:01:41
maybe the media part is really
00:01:42
important because what it means is that
00:01:45
for any particular example there's a
00:01:47
maple around that example where it's
00:01:49
just that when you're mapping inputs
00:01:51
and outputs. And we understand a lot
00:01:53
about linear mapping obtained by
00:01:56
multiplication of of mixes together
00:01:59
this beautiful paper for example by
00:02:00
Andrew sex realising this phenomenon
00:02:02
and you understand how converges
00:02:05
proceed but of course here reads that
00:02:06
particular because the linear mapping
00:02:08
you get is different depending on which
00:02:11
subset of units are active each time.
00:02:14
So there's a lot we don't understand
00:02:16
yeah we have some some ideas but it's
00:02:18
something I think the next generation
00:02:20
should be investigating you know wise
00:02:22
is working people are also modified
00:02:25
real to work even better like you know
00:02:28
sort of having a zero actually have a
00:02:30
little bit of slow so so you know it's
00:02:34
not the end of the story. Okay yeah
00:02:46
hello but I think everybody a very
00:03:19
curious about the hyper parameter to
00:03:21
think us it's like the number of
00:03:23
keeping a years. And number but you can
00:03:26
you need in each of the the year
00:03:28
especially on the the conversion your
00:03:30
network depending on the training data
00:03:34
number of the training data and
00:03:36
depending on the the difficulties of
00:03:39
the task. So could you have any
00:03:41
intuition about the to think the like
00:03:44
the perfect or depending on the data
00:03:46
and the difficulties that's another
00:03:54
area where more work needs to be done
00:03:57
right now we think of the problem of
00:04:01
finding her parameters as an
00:04:03
optimisation problem itself where were
00:04:06
optimising violation. So we can
00:04:09
actually use me tools from optimisation
00:04:12
you know to search for good parameters
00:04:14
now this is a bit difficult because
00:04:18
each step of trying a particular
00:04:23
parameter configuration is pretty
00:04:24
expensive because it involves training
00:04:26
the whole but actually this is how we
00:04:29
do it. We go and try many
00:04:31
configurations and based on the results
00:04:33
we try out the configurations. And it's
00:04:36
really an optimisation process usually
00:04:38
with the human in the loop. But you
00:04:40
could you could also do it more
00:04:42
dramatically so there's a lot of work
00:04:45
on what's called vision optimisations
00:04:46
or privileged optimisation swear use
00:04:48
completely automatic way of proposing
00:04:51
annexed configuration of the parameters
00:04:53
to try so these are fairly complex
00:04:56
methods and in fact you'll nets are
00:04:58
starting to be used due to help with
00:04:59
that summarisation. But in practise
00:05:03
this very very simple way to do the
00:05:05
optimisation which is called a random
00:05:07
search basically you just lounge twenty
00:05:12
or thirty different configurations. And
00:05:15
you see what happens and if you don't
00:05:18
have a good enough results you launch
00:05:20
more and and you have to be a little
00:05:25
bit smart about the set of values to be
00:05:28
tried for each of the parameters based
00:05:29
on prior experience but it's not really
00:05:31
complicated once you get used to doing
00:05:35
this a lot of practitioners to get a
00:05:36
sense of it's good of intuitive sense
00:05:38
of what works what doesn't work. And
00:05:41
they will manually explore because you
00:05:44
may not have access to you know three
00:05:45
GP use. Um and so it becomes more like
00:05:50
an art but in principle it it it could
00:05:52
all be or maybe it's just a matter of
00:05:53
computational resources. So so I go
00:05:57
actually with the put online to select
00:06:00
some questions on the the words one
00:06:02
only to to these which is the the new
00:06:06
intensive work around the prison able
00:06:09
to automatic you know just to survive
00:06:11
today to itself so they have been you
00:06:13
know to cascade correlation right. And
00:06:15
your little ones welcomes why where is
00:06:18
on the way isn't it more and
00:06:20
transceiver research looked at I think
00:06:26
it's been looked at there is a paper in
00:06:30
two thousand fourteen called hyper grad
00:06:33
from right handedness is group that
00:06:38
opposes the out of searching over have
00:06:41
a hyper parameters as making them
00:06:44
defensible using some kind of momentum.
00:06:47
But these things take so much
00:06:50
computational resources like searching
00:06:53
through building models around hyper
00:06:56
parameters and configurations about
00:06:59
parameters are expensive enough that
00:07:02
research. That's more serious has been
00:07:04
pretty slow. And also on the note up
00:07:08
like what or like how do you find
00:07:12
better hyper parameters. Um I I so both
00:07:16
sides of the kind of having not enough
00:07:19
resources that and why you as a grad
00:07:21
student. Um and also at face but maybe
00:07:24
have almost unlimited resources. Um at
00:07:29
NYUV we used to call this method GSO
00:07:34
read student optimisation. Um there
00:07:38
they casted essentially over over
00:07:41
couple years will form an internal
00:07:44
model of what works and what doesn't I
00:07:48
think there is an optimisation tries to
00:07:51
do this as well it tries to build a
00:07:52
model from like past experiences past
00:07:56
experiments that are done that it can
00:08:00
use to draw better hyper that that
00:08:04
hyper parameters samples. Um an
00:08:06
explorer better but just fresh out of
00:08:09
ICML just last week or so. Um there
00:08:13
have been claims that page and
00:08:16
optimisation for searching for hyper
00:08:19
parameters that and really where you
00:08:20
can just doing a bit more random
00:08:22
searches actually equally effective. Um
00:08:26
it's yeah it's very very empirical at
00:08:30
this point like you don't have a it's
00:08:33
valuation the four hypothetical is Okay
00:08:41
I just just a quick comment "'cause"
00:08:42
that producer or both at asleep using
00:08:44
GP use an L for ED actually two things
00:08:47
I remember seeing with paper I think is
00:08:49
going to fire cafe out of berkeley. I
00:08:51
get in one very nicely we speak about
00:08:53
doing character to isolation but I
00:08:55
think I'd like to make is also quite
00:08:57
words like always keel out of machine
00:08:59
learning that would be to enable you to
00:09:02
do all those searches of more more
00:09:03
closely parallel using a framework like
00:09:05
face books very I just wanna say that
00:09:13
it it also depends a lot on the kind of
00:09:15
problem you're working on so if
00:09:17
training takes two hours then you can
00:09:20
use you you can you know you can use
00:09:23
have the optimisation or random search.
00:09:26
"'cause" you could lounge many
00:09:28
experiments but if training takes two
00:09:30
weeks like training for image that then
00:09:34
you know that's not gonna cut it and
00:09:36
that's why this GS so oh my questions
00:09:44
okay thank you for the doctor talks
00:09:48
whether what like to have a somehow
00:09:51
speculative question as leaving or can
00:09:54
you describe a link between probably
00:09:55
still graphical models and dip learning
00:09:58
or can they benefit from each other or
00:10:00
there's a lot of links you know the
00:10:06
first ways that we discovered to train
00:10:08
the networks were based on using and
00:10:11
specialise probabilistic models of
00:10:14
learner presentations like RP m.'s. Um
00:10:18
and so that was you know say two
00:10:22
thousand six to about two dozen ten two
00:10:24
thousand twelve. Um now there's a lot
00:10:28
of research in provides learning
00:10:30
because that's something we don't know
00:10:32
how to do yet and we know that it's
00:10:33
gonna be very important for EI and a
00:10:36
lot of with that research is strongly
00:10:39
motivated from the by the probabilistic
00:10:42
interpretation of what enterprise
00:10:43
lining is about it's about capturing
00:10:45
the joint distribution of random
00:10:47
variables. So a lot of the work that's
00:10:49
been done in in in graphical models
00:10:53
especially like and variables like
00:10:56
"'specially" variational methods or
00:10:58
even in spirit yeah in something
00:11:00
methods multicoloured methods all of
00:11:02
these you know come up as useful tools
00:11:06
at least for some of the other times
00:11:08
there other algorithms like the again
00:11:11
for example where we completely bypass
00:11:13
any kind of probabilistic
00:11:14
interpretation but if you actually want
00:11:16
to analyse the algorithm and becomes
00:11:18
important again to think about notions
00:11:22
from from probability. So I would sure
00:11:32
because I'm going to go and mixing some
00:11:33
questions from the board questions one
00:11:35
room so if it's okay so there's one to
00:11:38
general question which was the one
00:11:39
doing most so you ring by the postal
00:11:42
workers was what are what could be a
00:11:46
non troubles from the application those
00:11:48
people from the so you get to speak
00:11:50
what does a task that dip running
00:11:51
conserve the best some can sorta worse
00:11:54
what would be the the worst thing to
00:11:56
apply pruning too that's a loaded
00:12:01
question I think declining can solve. I
00:12:09
mean if you look at your networks of
00:12:12
modern times their correlation learners
00:12:14
right like they can do causal entrance
00:12:17
they can do advanced reasoning. So
00:12:21
anything that can basically be solve it
00:12:24
correlations and like simple mechanics
00:12:29
like on days. I think can can be so
00:12:33
that deep learning of today I don't in
00:12:36
know what deep learning is anyways like
00:12:39
it like that field is moving fast
00:12:41
enough that the goal posts keeps
00:12:43
changing but as of today if you take a
00:12:46
net send recurrent nets LST M.'s day do
00:12:50
strong correlation statistics and
00:12:53
counting and that's about it. Um and a
00:12:57
lot of tasks that couldn't be sold in
00:12:59
the past apparently can be so just
00:13:02
that's of these properties. Um and if
00:13:06
you want something beyond that where
00:13:08
you actually need to reason with very
00:13:11
little data or so that's something that
00:13:17
gives learning methods as of today
00:13:18
cannot do. And what a bunch of other
00:13:22
actually working towards getting two
00:13:25
yeah I agree with that it depends on
00:13:29
your definition of what deep learning
00:13:30
is so there's today's T planning and
00:13:33
there is what researchers are working
00:13:34
on now which really is trying to
00:13:37
address these questions of of reasoning
00:13:39
and even causality and it's really
00:13:42
really important because if we wanna
00:13:45
reach a I would need machines that
00:13:47
really understand the world. And that
00:13:50
means forming a kind of causal
00:13:52
explanation of what is going on. Um now
00:13:57
for me that could be very much part of
00:13:59
deepening the the the this what's
00:14:01
special about the planning isn't that
00:14:02
we use backdrop isn't that we use
00:14:04
neural nets in their current form what
00:14:06
special is the idea of you know having
00:14:09
learning representations and learning
00:14:11
multiple levels are of presentation
00:14:13
that correspond to different levels of
00:14:15
abstraction what we need to do is to
00:14:16
inject more of these ideas into the
00:14:19
buttons including the work going on
00:14:21
with the reinforcement learning in
00:14:23
order to to approach these calls but
00:14:25
that's still kind of of the frontier I
00:14:28
don't think I could at is that people
00:14:30
running can is clearly even in in its
00:14:32
future form is clearly not gonna be
00:14:34
useful when you want to learn a
00:14:36
something like a random function
00:14:37
function that doesn't have any
00:14:38
particular structure the reason
00:14:39
declining is working well is because
00:14:42
the kinds of tasks we trying to learn
00:14:43
with it. Um have this kind of
00:14:46
compositional structure that you know
00:14:48
is is is taking being taken advantage
00:14:51
of by by these you know that's Um hi I
00:14:58
have a more practical question the
00:15:01
discussion and numbers and efficiency
00:15:04
today that was discussed once basically
00:15:07
on the server I was wondering how far
00:15:10
we had or how far are we from and
00:15:14
buddies time more by and deep learning.
00:15:18
So in terms of how many for example
00:15:21
that images per second in the case of a
00:15:24
mention that can be S if five on the
00:15:27
mobile I'm not talking about rain but
00:15:29
in terms of free time predictions how
00:15:31
how long we now know what efficiency in
00:15:36
what performance Yeah I mean I can't
00:15:43
specify actual numbers I mean I
00:15:45
would've thought since then you know I
00:15:47
mean that's that's your button right is
00:15:49
they the mobile side but what are what
00:15:54
I'm seeing is edging very very rapidly
00:15:58
to very very low power. Um you know
00:16:01
people like a talking about invisible
00:16:04
devices. So whatever the numbers on now
00:16:09
they the really amazing part about it
00:16:13
is is that's gonna mean nothing in
00:16:15
probably six months time it is kind of
00:16:18
like a collie the I five there you know
00:16:20
it's it's like you bring out the next I
00:16:22
phone and then it's already gone
00:16:25
history you know because we're
00:16:26
developing. So so remote don't get
00:16:33
stuck on the numbers too much simply
00:16:34
because it really is moving back quick.
00:16:37
But inference obviously is that that's
00:16:40
the really important part of it and and
00:16:43
how quick you can do and and I think
00:16:45
Andrew and of then by do would
00:16:48
definitely agree with me that it's
00:16:50
smoke while a lot you know and and
00:16:53
they've been able to use a whether it
00:16:55
will be an actual mobile phone that we
00:16:57
use or whether we will be you know bell
00:17:00
labs is is working on a a matchbox size
00:17:03
unit that that would you know say
00:17:05
attached to this jack a buddy would
00:17:08
keep all your data local so this
00:17:10
there's no sort of to and from the
00:17:12
actual cloud. And you've got the
00:17:14
learning and and inference going on in
00:17:16
them in this matchbox sized unit now
00:17:19
you can understand what kind of very
00:17:21
low power budget that would be for I
00:17:23
mean you know where we were credit card
00:17:25
size of the moment that something about
00:17:27
matchbox with all the capability to
00:17:30
learning. And and inference but again
00:17:33
don't get confused with the actual
00:17:34
training side you know we can do all
00:17:36
that in the server. And but it's
00:17:38
inferences bringing the on board I
00:17:41
don't know where you be but it's it's
00:17:43
an exciting to actually sit and watch
00:17:46
it happen. But yeah I like to call to
00:17:50
ES out actually bit as harbour makers
00:17:53
you know this field is evolving so
00:17:54
rapidly. And we were speaking earlier
00:17:57
if I commit something to harder today
00:17:59
what he gets ready here to you know be
00:18:01
obsolete. Now to me she special purpose
00:18:04
devices there's in a lot of exciting at
00:18:06
development section B of what I think
00:18:08
is from dubbing a startup called movie
00:18:10
use. Indeed you have a very well very
00:18:13
low low power and all seven grams you
00:18:16
for insane in the whole package seven
00:18:18
grams if you have a USB stick you can
00:18:20
train your network with cancer for all
00:18:22
it is take that into account for
00:18:24
example congestion mission also work of
00:18:26
a professor we could be passed yet that
00:18:28
matt is between the USA the do you have
00:18:32
actually I just startup costs alchemy
00:18:35
that they you what equal cortical
00:18:37
columns it slightly different style in
00:18:38
your networks to make it extremely
00:18:40
extremely low power in that survey you
00:18:42
know a lot of this is moving quite
00:18:44
rapidly so I worked on deep learning
00:18:49
for mobile for two years so I have a
00:18:52
little bit of more practical numbers
00:18:54
there. Um that the main thing and
00:18:57
mobile when you're and planning models
00:19:00
is how fast reformed runs out of
00:19:03
battery. And the speed of the actual
00:19:07
running the model is limited by the
00:19:10
memory bandwidth it's not actually the
00:19:12
amount of computer to do but how much
00:19:14
how fast you can pump in images and
00:19:18
read them very layers and memory and so
00:19:21
on. Um on this note there's a lot of
00:19:25
research that's coming out recently on
00:19:28
one bit networks and like a bit
00:19:30
networks and so on where they're trying
00:19:33
to basically give you almost the same
00:19:36
accuracy maybe slightly lower accuracy
00:19:38
of the models that you train in full
00:19:41
float precision. But they will give you
00:19:44
about eight times or thirty two times
00:19:46
more throughput. Um about a year and a
00:19:50
half ago when I actually remember the
00:19:52
numbers we did about two to three FBS
00:19:58
of like a little net for example on
00:20:01
mobile on mobile CP is still on on a
00:20:04
knife ones and stuff see pews are
00:20:07
actually still faster for declining
00:20:09
applications tend you use. Um as of
00:20:12
today like if you for example do X nor
00:20:15
that's fit or just one bit X not
00:20:18
operations all all where it you might
00:20:21
actually get like ten fifteen I guess
00:20:23
but I don't think anyone published hard
00:20:28
numbers on that side. But if you want
00:20:30
specialise chips and we'd actually
00:20:33
makes one of the best chips there TX
00:20:35
one it's rated it and what's in it
00:20:38
gives you want to have a lot of
00:20:40
compute. So that's like really really
00:20:44
efficient if you want to do it like
00:20:46
robotics applications and stuff yeah
00:20:48
thanks yeah so so and ask my question
00:21:01
is about and so I speak as a years show
00:21:05
was mentioning in this presentation
00:21:09
eleven yeah what it's rectify fighters.
00:21:27
So there's an optimisation problem for
00:21:32
deeper networks even what directly
00:21:34
fires just happens later that the
00:21:36
deeper it is the more nonlinearity are
00:21:38
composed of harder is that right that's
00:21:42
also true with recurrent nets by the
00:21:43
way and and out complex like reasoning
00:21:47
based systems like naming that's or and
00:21:50
you're altering machines. Um so what
00:21:53
happened is we we were taking advantage
00:21:57
of the ability of unsupervised learning
00:21:59
to extract fairly good representations
00:22:02
as at least as an initial a starting
00:22:05
point for supervised learning. And that
00:22:08
were quite well and it continues to be
00:22:11
actually something useful if you wanna
00:22:13
do semi supervised learning and you
00:22:15
have a only access to a small number of
00:22:17
labelled examples or maybe you to rice
00:22:18
ruining and you have a small number of
00:22:20
labelled examples for a new category.
00:22:23
So the the combination of supervised
00:22:25
learning in unsupervised that is
00:22:27
something is actually coming back
00:22:28
there's several methods that have been
00:22:31
proposed that seem to be doing a film
00:22:34
interesting job. And I I think we're
00:22:36
gonna see more of that is we are
00:22:39
expanding the reach of applicability of
00:22:41
the nets two domains where we might
00:22:44
have only a few examples of menu
00:22:46
category but of course that wouldn't
00:22:48
generalised unless you have all the
00:22:49
source of information so lots of and
00:22:51
labelled data or lots of other tasks.
00:22:54
And these as learning that this can be
00:22:56
useful there so why is it that the with
00:22:59
the rectifier is that a position gets
00:23:01
to be easier we don't completely
00:23:03
understand as I said earlier there was
00:23:16
somebody in the back raise their hand
00:23:18
first so oh yeah my name's martin in
00:23:29
computational materials physicist at
00:23:31
EPFL so you know we try to attract
00:23:35
assumption on it things and there's
00:23:38
always this back and forth you know you
00:23:40
don't have enough data we sale well for
00:23:42
us we've got loads of data. And and it
00:23:45
it is you know this kind of thinking
00:23:47
that for example in in linguistics
00:23:49
light. So that the sort of turn the
00:23:52
poverty of stimulus that Arafat humans
00:23:54
are able to learn. Um despite you know
00:23:57
secure absence of stimulus especially
00:24:00
in infancy. So is it do you think there
00:24:04
are things that can be learned or use
00:24:06
from biology to try and burst discuss
00:24:10
virtues to amount of data that's needed
00:24:11
perhaps for example the quest six euros
00:24:14
are only capable of intrinsically
00:24:16
learning one of the five possible types
00:24:18
fling was the grammar you know by
00:24:19
making these concessions or other means
00:24:22
to think we can reduced amount of data
00:24:24
that we actually need to do reasonable
00:24:26
work. and what I'm saying is it. I mean
00:24:34
there there are two sides to a I want
00:24:36
it one is the computer science I the
00:24:39
others the your sign side. And I've
00:24:42
been following for quite a few years
00:24:44
companies like new man it's you are
00:24:47
approaching it approaching I from or
00:24:50
rather a GI so artificial general
00:24:52
intelligence from then your sign side I
00:24:56
think it's really important to do that.
00:24:58
Um young the couldn't would say you
00:25:01
know you hear has the quote about some
00:25:04
Boeing doesn't use factors you know and
00:25:06
that this on the self phone that you
00:25:07
can take that that biological roots you
00:25:10
know we but it's it's taken the advice
00:25:13
and and I say that on the back of
00:25:15
various conversations about using
00:25:18
genetic algorithms for example instead
00:25:22
of random search you know for finding
00:25:26
the find the the the best weights to
00:25:28
optimal roots and things like that.
00:25:31
Because evolution and is one way that
00:25:37
we know is very simplistic it may take
00:25:40
a long time but you know when we think
00:25:42
about how we live as as children we
00:25:46
when we first bone you know the the
00:25:48
first thing that that we do is we live
00:25:50
in edge detection and then as the eyes
00:25:52
develop we we gain the ability to sink
00:25:54
alone we gain the ability to put the I
00:25:56
just together the way that the learning
00:25:59
those but we also say that's we're able
00:26:03
to for example on the sun the concept
00:26:05
of a call without seeing a thousand
00:26:07
calls you know and and and these huge
00:26:10
data sets but you also have to take
00:26:12
into account that the average human
00:26:14
we'll spend a quarter of the life in
00:26:16
school. And then you know still
00:26:19
continually learning so is really I
00:26:23
don't know it's the argument a space
00:26:24
again with them unsupervised versus
00:26:26
supervised learning but I think
00:26:28
following the biological route will
00:26:31
help it'll be another way it'll give us
00:26:34
you know another inside and also the
00:26:36
neuroscience I I do believe that with
00:26:38
we live in a great deal about ourselves
00:26:41
and how we operate by by following I by
00:26:45
learning how we learn helps is to learn
00:26:49
how to teach machines or how to show
00:26:52
machines how to learn. And I think
00:26:54
that's a very vital So that that
00:26:57
bridging the gap between your signs and
00:27:01
deepening is actually one of my pet
00:27:03
projects for the last year or two and I
00:27:06
think it's a really interesting
00:27:08
exciting direction there's a lot we
00:27:10
don't understand about how the brain
00:27:12
souls the the the the learning problem
00:27:14
that had a large scale. Um and that we
00:27:17
have lots of concepts deepening but we
00:27:19
don't have yeah but we're still kind of
00:27:23
we haven't solved the advise any
00:27:24
problem really and so I think we can
00:27:27
maybe get a lot of inspiration from how
00:27:29
humans do it. Uh regarding your
00:27:31
question about learning from there are
00:27:33
a few examples I in fact a lot of the
00:27:35
research and E planning is motivated by
00:27:37
this very question. Um how do we to
00:27:40
transfer learning and provides learning
00:27:42
all of these approaches are trying to
00:27:45
answer the question if we're given a
00:27:48
new task can we learn from very few
00:27:50
examples but the the the general answer
00:27:54
to this is this is again there's no
00:27:55
free lunch and the only we can make
00:27:57
that work is because the the learner
00:27:59
has already discovered lot about the
00:28:01
world the reason the human is able I
00:28:03
think the reason the human is able to
00:28:04
learn language from what the little
00:28:08
data compared to the the massive
00:28:10
amounts of data we're using now forcing
00:28:11
machine translation his because not
00:28:14
child is also a observing everything
00:28:18
else in the world not just you know
00:28:19
hearing utterances backchannel this
00:28:22
understanding sort of intuitive physics
00:28:24
that child is understanding social
00:28:27
relationships and and building a causal
00:28:31
model and an explanatory model of how
00:28:33
the world works. Then basically what
00:28:36
happens I think is using language to
00:28:40
you know put names on on things but
00:28:42
these are we things you understand
00:28:43
that's why child will be able to
00:28:45
recognise and you say animal from a
00:28:48
single picture. Because the child has
00:28:51
already formed the notions of of
00:28:53
animals and mammals and you know having
00:28:55
legs and eating and so on. That's
00:28:58
that's how I see it my thank you. So I
00:29:23
I would like to get back a little bit
00:29:25
about the biology but it and especially
00:29:28
what you say before now we can catch a
00:29:32
little bit more of course well it it
00:29:33
and then the next step of correlational
00:29:35
so I could problems that elected
00:29:37
relations in the next would because I
00:29:39
need a yeah I would like to do a little
00:29:41
bit last yet they see a menu for the
00:29:44
process ventral a little bit of your
00:29:46
work about a more biologically
00:29:48
plausible way to write create networks
00:29:54
and dependent side that's dependent
00:29:57
plasticity right right and I would like
00:29:58
to know a little bit if you think this
00:30:01
might be a way to get a more
00:30:04
correlation like results. And the only
00:30:07
one and then on that it would like to
00:30:08
know if at face book you guys I
00:30:10
approach it is a problem at all and if
00:30:13
yes how right so what I talked about
00:30:17
last you a nice email is something that
00:30:19
has continued in my lap and we made a
00:30:21
lot of progress and we put out a few
00:30:23
archive papers if you're interested a a
00:30:26
lot of our focus has been on just
00:30:27
making a biologically plausible version
00:30:29
of backdrop because you know backdrop
00:30:32
is really as I said the work course of
00:30:34
the successes we have recently and and
00:30:36
it's gonna crazy that we don't have a
00:30:38
reasonable theory of how breeze could
00:30:39
even do that but but I'm also very
00:30:43
interested in a biologically plausible
00:30:45
version of some form of unsupervised
00:30:48
learning a and the things we're
00:30:50
exploring now you can think of is more
00:30:52
like those machines energy based models
00:30:55
except I'm going I'm trying to throw
00:30:56
away the energy altogether because it
00:30:59
imposes too much too many constraints
00:31:01
and yeah I'm I'm I'm very optimistic
00:31:06
that will soon have biologically
00:31:11
plausible at least at some superficial
00:31:13
level methods for both scrutinise and
00:31:17
unsupervised learning that doesn't
00:31:18
answer the question about causality
00:31:19
though this is a totally different
00:31:21
battle tricks and there is not enough
00:31:23
work in that direction from pure even
00:31:25
truly machine learning point of view.
00:31:26
So on the biological aspects at phase
00:31:34
but at least we're directly not doing
00:31:38
any research like yeah sure has his pet
00:31:43
project. Um but on the causality front
00:31:47
we are actually very very interested in
00:31:50
building causal models of the recently
00:31:53
published a paper on doing causal
00:31:57
inference in visual scenes. Um
00:32:01
basically trying to understand if
00:32:04
there's a car in the scene and there's
00:32:08
a bridge trying to for example as the
00:32:11
question of like is the car there
00:32:13
because the bridge or is the bridge
00:32:15
there because the car. Um these are the
00:32:17
questions that some of our researchers
00:32:20
Leon but to David Lopez because these
00:32:23
guys are really interested in and the
00:32:27
model that we they'll be called neural
00:32:30
causation coefficient. Um and we think
00:32:36
we have some very interesting initial
00:32:38
results so we maybe train the model on
00:32:42
us and that explain distributions to
00:32:45
just basically given to variables tell
00:32:52
the terrible "'cause" the the other
00:32:55
verbal like whether a Cosby or because
00:32:57
they or neither a or B or cost but each
00:33:00
other and so on. And it actually
00:33:03
generalises a train on this a syntactic
00:33:07
distributions just to do this call so
00:33:09
analysis it generalises to doing
00:33:14
"'cause" analysis and natural images
00:33:17
with the features extracted from
00:33:19
residual network car models be trained
00:33:22
on image ask for example yeah you can
00:33:25
probably read a big war that okay thank
00:33:28
you is it thanks and so there's a
00:33:36
tendency not is that the networks
00:33:39
become more and more complex so they
00:33:41
get their good drawn deeper and deeper
00:33:43
and also there's you types of layer
00:33:45
being injured is being introduced like
00:33:48
there's dropout layers there's new
00:33:49
kinds of the nonlinearity is there is
00:33:52
layer now molestation and so on. And so
00:33:57
and anything so motions are learning
00:33:59
more and more about humans somehow
00:34:01
learn less and less about the model so
00:34:03
the mark on six the most become the
00:34:06
less inside we get from from the small
00:34:08
so is there any systematic way of
00:34:10
somehow getting to know what the nose
00:34:14
now knows and what it doesn't know so
00:34:17
problems that's the way it's so the
00:34:18
activations works very well in some
00:34:20
cases like the when you have visual
00:34:23
recognition where you can just see them
00:34:26
I wasn't the noses and this is but one
00:34:29
general approach to understand what the
00:34:31
no nodes and whatnot when you have this
00:34:33
generally sorry another problem there
00:34:39
is a systematic recipe. It's called the
00:34:43
scientific method okay we we have we
00:34:47
are in front of the phenomena in a
00:34:49
learning a button that you know is
00:34:51
trained and and it's we can see what
00:34:54
happens while straining we can see what
00:34:55
happens after a strain and we're trying
00:34:57
to figure out what what's going on what
00:35:00
what in queens matter why built
00:35:03
theories around I mean that's a that's
00:35:07
something that the machine any
00:35:08
committee doesn't do enough a lot of
00:35:11
the work unfortunately is basically
00:35:12
tinkering. And not busy trying to
00:35:16
figure out you know wise working even
00:35:18
one is working if you build a system
00:35:20
which has like ten different
00:35:22
ingredients. Um and you be the
00:35:25
benchmark great you get a paper but
00:35:29
have we learned something. So
00:35:31
fortunately I mean there are people
00:35:33
really focus on not just getting better
00:35:36
numbers but trying to understand what
00:35:38
what is going on or at least which part
00:35:40
some more important so it's easy when
00:35:41
you get into sort of the engineering
00:35:43
mode of building something that works
00:35:44
well to forget about why but the why
00:35:47
question is really central to answering
00:35:48
a question and and it's there should be
00:35:51
more of it but there there there is I
00:35:54
mean I think good papers are trying to
00:35:56
also deal with the wine. So your
00:36:01
initial state and I would say first of
00:36:04
all models are becoming simpler not
00:36:07
more complicated expect that I mean I
00:36:10
mean go and no into the role recurrent
00:36:13
net scenario but that engine everything
00:36:15
but if you just look at con that's
00:36:17
three twenty dramatic comments or
00:36:20
around any tell is comments had way
00:36:23
more types of normalisation many more
00:36:26
types of pulling many types of non in
00:36:29
their teas and we use second order
00:36:32
methods to optimise things and so on
00:36:34
and right now all you do is three by
00:36:36
three compilations stack up a few
00:36:39
layers send like max building and
00:36:42
that's about it and like every con that
00:36:44
use today is pretty much built out of
00:36:47
these this one recipe right. Um sure
00:36:52
you have resin that's that recently
00:36:53
appeared but come on do not that
00:36:56
there's just like one nice identity
00:36:58
connection. Um in terms of in terms of
00:37:03
like how do we analyse. I think as you
00:37:06
ashes said are in the deep departing
00:37:10
committee as of today does suffer from
00:37:13
not doing the scientific method. Um
00:37:17
it's basically just beating whatever
00:37:20
benchmark there is like squeeze out
00:37:23
half a percent on so art and then you
00:37:25
have new state of the yeah it would
00:37:29
just I think the community would just
00:37:30
mature over time but until that it is
00:37:34
ooh and that there is an that there's a
00:37:40
whole lot of progress been made simply
00:37:43
because they the commercial value you
00:37:45
have to sort of recognise that and yes
00:37:51
you go and and yes face poking yes try
00:37:53
to you know have all got these products
00:37:55
and you know this is what I said about
00:37:57
infants pain it really really vital.
00:38:00
And and they have product so they they
00:38:01
have you know millions and millions and
00:38:03
not billions of the customers and and
00:38:06
they have to get this right. So yes a
00:38:08
lot of research is concentrated on
00:38:11
getting the quickest getting and you
00:38:13
know the the the cheapest in terms of
00:38:16
energy into in terms of computer cost
00:38:20
in terms of passing it over to but
00:38:23
there is a large amount of research
00:38:26
going on possibly not as much and and I
00:38:29
think this is what we get a bad rap for
00:38:31
not doing enough they re there's a lot
00:38:33
more commercial reason to to put the
00:38:36
sound get respect to enhance the
00:38:38
benchmarking then there is very but
00:38:40
there is a lot of the a lot of people
00:38:42
working on the ferry side of it. Um you
00:38:45
know I I'm obviously biased insane
00:38:47
check out and orgy sign and and and his
00:38:51
PHD songs paper on pruning but there's
00:38:56
also a whole lot if you delve into
00:38:58
archive. And you know if if anybody's
00:39:01
not really familiar with archive you
00:39:03
know you can search through and put in
00:39:05
there and things like that that there
00:39:07
is a lot going on. It just kind of get
00:39:09
so overwhelmed by be massive amount
00:39:13
that's going on on the commercial side
00:39:15
so so maybe one question very two twos
00:39:18
is that kind of is for I think
00:39:22
extremely important specie because I'm
00:39:24
from the computer vision sign want to
00:39:27
this phenomenon of tuning of the things
00:39:29
getting a proposal something in the CPL
00:39:32
paper is something we see so one
00:39:35
question was how we should reckoning
00:39:37
value for new architectural be judged.
00:39:39
So yeah so we when we have say that we
00:39:43
should not be just happy because we do
00:39:44
one percent better than when should we
00:39:47
as as over your of the century paper
00:39:50
with one one but growing should we
00:39:52
judge this architecture with is
00:39:54
architecture. So I'm involved in a lot
00:39:58
of conference organisation and things
00:40:00
like that and workshops. And I think in
00:40:03
general there is not enough there's too
00:40:05
much value put on you know. Q or
00:40:09
performance. And none of value put on
00:40:12
having an interesting story. And
00:40:15
interesting hypothesis an interesting
00:40:18
theory that from which we can learn
00:40:23
that something that could go beyond
00:40:24
this particular task this particular
00:40:26
benchmark and and it's even more
00:40:30
interesting if if that use more novels
00:40:34
more original. So the danger if we
00:40:38
don't do that is that it may happen to
00:40:42
our field what happened say in the
00:40:44
speech recognition community for more
00:40:47
than a decade where people don't
00:40:49
explore that much they just take the
00:40:51
existing ideas and tweak them in the
00:40:53
make little variations. And if you if
00:40:56
we do that then we lose the ability to
00:40:59
rediscover very different approaches
00:41:02
and so we need more to encourage more
00:41:05
exploration things that that are quite
00:41:08
different but but somehow maybe you
00:41:11
know a good solution because we have
00:41:12
this true then you know and appealing
00:41:15
your your hypothesis behind it. That's
00:41:18
what I would like to see more and that
00:41:20
would like to see reviewers put more
00:41:22
weight on on that originality and
00:41:25
that's an interesting idea is rather
00:41:28
than just the numbers I would I would
00:41:35
ideally one really years to put more
00:41:38
emphasis on originality then like how
00:41:42
you got this number that the reality is
00:41:44
that you submitted paper TCP alright
00:41:47
well get rejected if you're not state
00:41:49
of the art. Um I think a uniform set of
00:41:55
hardware benchmarks if they're
00:41:58
established. And that would be really
00:42:00
helpful like if you can judge an
00:42:05
architecture by how well the
00:42:07
generalises across a large variety of
00:42:10
tasks generalisation is what we care
00:42:13
about when we build any of these models
00:42:15
right. So bill like if you give the
00:42:19
choice to the researchers on how to
00:42:22
write the story of generalisation they
00:42:25
will write it exactly to what their
00:42:27
findings are like they will basically
00:42:29
say my model really generalise as well
00:42:32
to this this and this data send and
00:42:34
those three is this are probably like
00:42:36
similar in nature anyway so as a field
00:42:40
for example the computer vision feel
00:42:42
there's a lot of good work being done
00:42:44
to established harder and harder sets
00:42:48
of benchmarks every year like the coca
00:42:49
challenge visual cue name but yeah it
00:42:53
has its buses like basically like
00:42:55
having these harder and harder
00:43:00
benchmark setting is the right way and
00:43:01
like that would be like good but and
00:43:06
getting about so I I suppose I wanna
00:43:08
use this as a as a shouts out seawall
00:43:10
too really make an effort to go outside
00:43:14
of the box yeah mentions what I was
00:43:17
talking that I'm in conversations with
00:43:19
people using genetic algorithms and
00:43:23
this particular developer is you know
00:43:26
just an extremely scale talents a guy
00:43:28
he's been working on software
00:43:30
development on chips and a whole host
00:43:35
of different things and for about
00:43:36
twenty five years this particular thing
00:43:39
this using genetic algorithms he spent
00:43:41
about a decade on it. And he's never
00:43:44
really pushed it because you constantly
00:43:46
got got flack really fees it was kind
00:43:49
of like not it just totally different.
00:43:51
And and I was still getting barry is
00:43:55
for these new ideas. I suppose the
00:43:57
shall that is to be brave and think
00:43:59
outside of the box you know they they
00:44:01
the proton that I've I've just left
00:44:03
after its first week with napster
00:44:06
setting is to really the driving forces
00:44:09
to be creative and think outside the
00:44:11
box there are lots and lots of people
00:44:14
is is not actually said just
00:44:16
incrementing on top of existing
00:44:18
knowledge and and make it better and
00:44:20
that's great but we need people to
00:44:22
throw in you know that that
00:44:24
evolutionary chance that population
00:44:26
every round and thing that that will
00:44:29
help as a lot quicker gets a where
00:44:31
we're going. So for example look into
00:44:34
genetic algorithms with the neural
00:44:36
networks and you know people old enough
00:44:38
to do that I think outside the box and
00:44:43
just just one oh I need to do is just
00:44:46
one comment because I I mentioned
00:44:48
earlier that if we go to premature like
00:44:51
on TV harbours some structure they have
00:44:53
today they said if you would is
00:44:54
evolving. However you know estimation
00:44:57
all nature helps is that the same way
00:44:59
that just said that the way the machine
00:45:01
many works because you have this
00:45:02
structure likely we do have some I'm
00:45:05
number basic math operations or
00:45:07
mathematical operations that you you
00:45:09
know turn out to be the foundation of
00:45:11
these you know aspect expect to put buy
00:45:13
those things. So it's a decode people
00:45:15
you know think outside the box but
00:45:17
using the boxes that we have thank you
00:45:26
for your presentation the context of
00:45:29
sell cars frightening to trust
00:45:32
behaviours learn by earning as coming.
00:45:38
And then so my question is this one is
00:45:43
anyway you're right around the
00:45:47
prediction that were besides a very
00:45:51
cool results testing sets you will
00:46:03
never get any guarantee from a machine
00:46:05
learning system period and you don't
00:46:09
get any guarantees to make even by the
00:46:10
way you are judging acumen by you know
00:46:16
their behaviour same thing with a
00:46:18
machine learning system and we can make
00:46:22
that test much more extensive by the
00:46:25
way is not just you you pass a twenty
00:46:28
minute you know driving test you
00:46:31
actually have cars run for and those of
00:46:35
thousands of Miles or whatever it is
00:46:36
now that being said I wouldn't trust
00:46:42
the current systems to drive my car I
00:46:45
think there's a lot that's missing. But
00:46:48
I have a lot of confidence as well that
00:46:50
you know you need to get better and
00:46:52
substantially better the next two
00:46:53
years. But we have to be empirical
00:46:55
there's no other way hi maybe the
00:47:09
question to the hardware developers. So
00:47:11
D currently using like open source a
00:47:16
operating systems an open source
00:47:17
frameworks and in the middle which is
00:47:19
kind of this black box called could is
00:47:23
there a chance we see in idea moving
00:47:25
into more or like position where we can
00:47:29
actually have access to the source code
00:47:31
of these things too I mean we can also
00:47:34
help make them better with full
00:47:37
requests and stuff. and that the
00:47:39
discussion comes up a lot and and I
00:47:41
can't speak for you know the the the
00:47:46
chip teams and and certainly I can't
00:47:49
speak for jensen. Um but we recently
00:47:52
had the machine learning so much where
00:47:54
we invited a whole lot of research is
00:47:57
in in this is January that headquarters
00:48:01
and santa Clara and nothing question
00:48:03
came up again and they did agree to at
00:48:06
least try open up the parts of QT and
00:48:11
then the you know the the the same
00:48:13
kernels that we really need that we
00:48:15
were holding quite type C and they
00:48:18
agree to that how much of a future they
00:48:20
would agree to do I I'm afraid I can't
00:48:23
speak but it is a valid question that
00:48:25
the we're listening to and then people
00:48:27
are considering but again I can't speak
00:48:31
for those who will make that ultimate
00:48:33
decision. But we hate you think it I
00:48:39
have to say this in actually no within
00:48:41
do we actually do take an approach
00:48:43
which is open source an open systems an
00:48:46
actual my no talk on Friday all
00:48:48
explained that well over code is open
00:48:51
source including the instruction set or
00:48:53
GP using you can program in assembly if
00:48:55
you care to so you know they do this is
00:48:58
to make it open again and several
00:49:00
reasons for that to know what some ways
00:49:02
to benefit from the small the comedian
00:49:05
assisted in speaking here you know
00:49:07
there's a lot of developments to
00:49:08
happenings to to happen. So they've
00:49:10
usually been you open an open source is
00:49:12
definitely one way to go so as an open
00:49:18
source telephone who has to deal that
00:49:21
close or systems from and media every
00:49:24
day. I feel you I think in one one
00:49:30
interesting thing there is windows and
00:49:34
Linux for somewhat like that as well
00:49:36
all the latest and greatest drivers and
00:49:39
technologies and everything used to be
00:49:42
written for windows and like Linux was
00:49:44
given secondary like six months a year
00:49:47
later stuff. Um it only takes a single
00:49:52
person and to write a single
00:49:54
implementation that is on par but the
00:49:57
closer system to basically break the
00:50:01
value and the closer system like for
00:50:03
example how Scott grey nirvana wrote
00:50:08
could I kernels for convolution and
00:50:11
stuff as fast as what and media ahead.
00:50:14
And that causes a lot of open this to
00:50:18
be forced upon. Um by the committee so
00:50:22
I think like if there's one thing it's
00:50:24
as as open source developers to really
00:50:27
push companies to not see the value
00:50:31
anymore to close or is there software
00:50:34
it seems like from like a purely market
00:50:37
driven perspective that seems to be
00:50:39
like the only thing that would open
00:50:41
things up things I one brought up a few
00:50:55
times the closer as it fine ah thanks
00:51:02
so it was brought up a few times the
00:51:04
inspirations from biology as we know
00:51:06
from the and real brain networks there
00:51:11
happily recordings. And but the very
00:51:15
successful cases of applications of
00:51:17
declaring for example they're mostly
00:51:19
record. I mean they're like expressing
00:51:21
or point I've you is it a matter of
00:51:24
practical difficulty that you haven't
00:51:27
that it hasn't been tried or you think
00:51:30
that even if we tried like current
00:51:33
neural net force there might not be a
00:51:35
significant advantages rick if if the
00:51:37
records connections yes one of the
00:51:44
things I've been working on is using
00:51:47
the recurrent connections as part of
00:51:49
the computation for and I don't mean to
00:51:52
three to process sequences but just to
00:51:53
even process a single example in using
00:51:57
those we contractions both for
00:51:58
computation. And for credit assignment
00:52:01
what backdrop is is doing. Um so that's
00:52:04
precisely the thing I was talking about
00:52:06
oh here the tries to bridge the gap to
00:52:10
between declining and and and your
00:52:12
science that's one of the ingredients.
00:52:14
Um there's this one question we've
00:52:17
which we yeah have absolutely no idea
00:52:21
how brains could handle it is something
00:52:24
like back problem through time in other
00:52:26
words. I think we have some reasonable
00:52:29
ideas of how brains could implement
00:52:31
backdrop it in the sense of a single
00:52:34
static input. And and the recurrence
00:52:37
essentially propagates the equivalent
00:52:40
of gradients. But the the thing we
00:52:42
really have no idea is how to do it you
00:52:44
could go to back up your time where you
00:52:46
have a sequence of states. And now it's
00:52:49
really doesn't sound very logically
00:52:52
possible that you would have to store
00:52:53
the whole sequence which might be your
00:52:54
whole day right. And then somehow you
00:52:58
know playback backwards you know and
00:53:02
computing the greens are already
00:53:04
equivalent that's something it's
00:53:07
totally open question as far as I'm
00:53:08
concerned you when this question
00:53:15
because we were supposed to finish at
00:53:17
fifty nine three twenty six lives of
00:53:19
feeling people are your to that you
00:53:21
train so Thank you. And they have a
00:53:26
really mean questions related to that
00:53:29
but maybe so there was a question which
00:53:31
problems can be solved and which not
00:53:33
which is well the easy to answer and
00:53:37
also I like I mean the problem is quite
00:53:40
it's to do engineering or in general
00:53:42
engineering just benchmarking and so
00:53:45
on. And I really like the opinion
00:53:48
analyse the things and find out what's
00:53:50
happening. But I'm searching for and
00:53:52
then maybe you have the onset is and
00:53:55
brought analysis of gender the the
00:53:58
problems there so radical in the
00:54:03
properties of the problems of the data
00:54:05
of the quantity of the distribution of
00:54:08
the data and whatsoever. And the
00:54:11
possible networks not only CN then and
00:54:14
alter encoders but also the others yeah
00:54:16
man multidimensional recurrent networks
00:54:19
and whatsoever knowing when can be
00:54:22
applied want. Why. Why not is there any
00:54:27
very nice deep analysis of that amount
00:54:31
which you would slide with a book which
00:54:34
was supposed to help dealing with that
00:54:36
question right so once you understand
00:54:40
the building blocks a lot of these
00:54:43
different architectures start making
00:54:45
sense. And and then you can sing
00:54:48
considering your problem for example
00:54:51
recently I was looking at how can we
00:54:53
use these things to model three
00:54:56
structural proteins right so it doesn't
00:54:57
really fit the things we normally do.
00:55:00
But but you can we use a lot of the
00:55:02
ideas that have been around the the
00:55:05
algorithms that we know and and compose
00:55:08
them in new ways once you really you
00:55:09
know make sense in your mind about what
00:55:11
they're doing and why. So yeah I'm not
00:55:15
I I can't give you didn't answer of
00:55:16
course in in one sentence all these
00:55:19
things but that's the kind of thing
00:55:21
that you get by Reading a book or by
00:55:23
following the literature in trying to
00:55:24
make sense of these methods are just
00:55:26
you know how can I implement them but
00:55:29
no why you know why is this for example
00:55:33
why we have bidirectional recurrent net
00:55:34
we did this good reason it makes sense.
00:55:37
Um so what do we use attention well
00:55:41
this this good reason I mean it it it
00:55:43
it helps us deal with a particular
00:55:44
issue. Um once you understand these
00:55:47
things you can you can you know be
00:55:48
creative and and apply the menu
00:55:50
settings and I think it also comes back
00:55:54
to what I said well there and so again
00:55:57
I assume I think the the question and
00:55:59
using I say do a like you know what's
00:56:03
the potential effects for each having
00:56:07
them automatic configuration of say
00:56:10
what types parallelism to use of which
00:56:12
points in in the layers but also which
00:56:16
types of layers I mean what I know it's
00:56:20
a lot of hardware but theoretically I
00:56:21
mean what's what's involved. So that we
00:56:23
can do that there's a lot involved
00:56:30
there is very hard problems and the
00:56:33
world of compilers that need to be sold
00:56:36
or before you can get to like automatic
00:56:39
fertilisation. And that actually ties
00:56:41
very much into the the specific
00:56:43
problems and can compilers are not
00:56:45
specific to compilers but they are from
00:56:47
grab teary in general and they also
00:56:52
apply to searching for ideal neural
00:56:55
network architectures are like stuff
00:56:58
like that. Um as of today humans do it
00:57:02
by hand but you could probably build
00:57:06
some kind of deep Q that for that
00:57:11
predicts switch parallelism to you is
00:57:15
and it's not actually implausible at
00:57:19
this stage of very yeah I don't think
00:57:23
we're file format right I I think it's
00:57:24
I was that far. I wouldn't say we're
00:57:27
not five everywhere not far from it I
00:57:30
would say it wouldn't have been a
00:57:33
plausible talk about two years ago. Now
00:57:38
it's plausible. It's not possible I
00:57:44
don't know and yeah I think do your
00:57:47
question like what you actually said
00:57:50
was pretty bad spot on read and
00:57:52
understand the textbook and that would
00:57:54
be the first I'm not really happy was
00:57:59
set on soon yeah I mean I wrote a book
00:58:02
yes of course I I read many books and
00:58:05
me purposely may be I have some quite
00:58:07
good I yeah but typically these people
00:58:11
who only use deep learning they know
00:58:12
their problem. And they don't want to
00:58:15
have to dive into all the specifics of
00:58:18
the neural networks and understand
00:58:20
everything in order to map that finally
00:58:22
to the problems. I mean Marilyn
00:58:24
analysis of the problems and finding
00:58:26
then which article architecture works.
00:58:29
The other way around yeah I I think so
00:58:35
ms was talking about using machine
00:58:38
learning and reinforcement learning you
00:58:41
know to to try to do these kinds of
00:58:43
things but it's quite now what right
00:58:45
now it's quite you know like it's it's
00:58:48
research goal is not something we know
00:58:49
how to do and and the the the result of
00:58:54
this is that people who have that
00:58:56
expertise are you know in big demand
00:59:00
from industry and they can earn a lot
00:59:03
of money so until we can automate that
00:59:07
job which seems pretty far off. I think
00:59:10
we're gonna have to go the hard way and
00:59:12
and and you know have engineers learn
00:59:15
about the the the underlying science
00:59:18
sufficiently to you know combine these
00:59:20
building blocks together. And that
00:59:22
being said I think something positives
00:59:23
happening with the software tools and
00:59:27
and the progress with hardware which
00:59:30
makes it easier for people without
00:59:34
necessarily doesn't the strong
00:59:35
mathematical background our a lot of
00:59:38
expertise in in using your that to
00:59:40
tinker so you know once you understand
00:59:43
a few basic ideas for machine learning
00:59:46
and from from you on that you can and
00:59:48
if you have a good software tools which
00:59:50
you know are well organised with lots
00:59:51
of existing examples of you know
00:59:53
different types of architectures and
00:59:55
then module architectures and and
00:59:57
efficient a hardware you can actually
00:59:59
do a lot by just you know playing
01:00:02
around with the the label blocks and
01:00:05
you know for simple five six our time
01:00:08
investment anybody can go through
01:00:11
online courses and and get a grasp of
01:00:14
it but I suppose exactly how does each
01:00:18
and every application gain from D
01:00:21
learning I'll be quite happy if we
01:00:24
don't quite so that with a I just yeah
01:00:26
because then not the average all so
01:00:28
okay so maybe you can things the
01:00:33
speakers again on the I will tomorrow
01:00:43
we will us assume is presenting torch
01:00:46
basically for three hours starting up
01:00:49
yeah exactly so so so we use for I
01:00:54
don't in talk on your show we give a
01:00:55
more modern talk about defenders you

Share this talk: 


Conference program

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
4 July 2016 · 2:01 p.m.
Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
4 July 2016 · 3:20 p.m.
Day 1 - Questions and Answers
Panel
4 July 2016 · 4:16 p.m.
Torch 1
Soumith Chintala, Facebook
5 July 2016 · 10:02 a.m.
Torch 2
Soumith Chintala, Facebook
5 July 2016 · 11:21 a.m.
Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
5 July 2016 · 1:59 p.m.
Torch 3
Soumith Chintala, Facebook
5 July 2016 · 3:28 p.m.
Day 2 - Questions and Answers
Panel
5 July 2016 · 4:21 p.m.
TensorFlow 1
Mihaela Rosca, Google
6 July 2016 · 10 a.m.
TensorFlow 2
Mihaela Rosca, Google
6 July 2016 · 11:19 a.m.

Recommended talks

Défis actuels pour la numérisation des archives audiovisuelles
Jean-Pierre Gehrig, Directeur de Cinetis SA
20 Oct. 2016 · 7 p.m.