Embed code
Note: this content has been automatically generated.
00:00:00
Hmmm oh okay so we'll start the session
00:00:22
with one more participants so midnight
00:00:25
are all skies formal and she will be
00:00:27
talking to more about that on soft so
00:00:30
on so we can ask questions about
00:00:31
frameworks both so Bout top shown
00:00:34
console to user disown around right no
00:00:39
she's you know okay so that's why a day
00:00:46
is is that a bunch of questions that we
00:00:48
ask hua websites so I may sometime or
00:00:52
like today would ask by one of those
00:00:54
one because it's it's okay yeah I one
00:00:57
so what would be the most that was the
00:01:00
best chance there are goes are much
00:01:02
learning techniques so what about
00:01:04
decision trees for instance yeah you
00:01:07
sure okay if you if you where we we we
00:01:09
can you could yes exactly so something
00:01:14
when you you are not allowed to have
00:01:16
gradients on the back pop on offence ah
00:01:21
okay so is a microphone I'm trying to
00:01:27
understand by problem. And realise it
00:01:31
and I think there is a general
00:01:34
principle behind backdrop which is
00:01:36
trying to find ways to do credit
00:01:38
assignment you can think about what's
00:01:40
going on and some are L reports betting
00:01:42
incumbents back prop is is great I mean
00:01:48
it's it's working quite well but but
00:01:50
maybe there's some more general
00:01:52
principles that could you applied that
00:01:55
the work you can when the changes we
00:01:59
care about and not infinitesimal which
00:02:00
is one of the weaknesses of that
00:02:02
problem so yeah I don't wanna talk more
00:02:08
about it but I for me. D planning is
00:02:12
not back problem declining is about
00:02:14
learning this representations the
00:02:19
distributor station running good
00:02:21
representations and backdrop is the
00:02:23
best we have now but I hope we can find
00:02:25
better something. So So actually for
00:02:34
it's like or a lot of and all P tasks.
00:02:37
Um keep signing is not a very good tool
00:02:41
like usually just bag of words followed
00:02:45
by some SVM is like it's faster the
00:02:51
either the same performance are better
00:02:54
and it scares very nicely and also work
00:02:56
to back is not designing either and
00:02:59
it's like used everywhere. And it's
00:03:02
really effective but things. Um okay
00:03:10
and also pay means is so what about as
00:03:19
not deep learning but it's it's shallow
00:03:22
but it's representation learning and
00:03:24
just distributed representations. But
00:03:27
yeah it's Monty obviously I mean how
00:03:30
much make alone would keep saying where
00:03:33
do back is not defined things that I
00:03:35
agree make sense bend like K means then
00:03:39
like you know all these clustering
00:03:40
algorithm that we use everyday followed
00:03:42
by some kind of SVM or something like
00:03:45
these are still most effective for in
00:03:47
like a lot of cases that I I don't
00:03:50
think there we go so here's an example
00:03:53
working means is very useful sometimes
00:03:56
you want to take a quick hard decisions
00:04:00
for example we're working on the
00:04:03
scaling memory nets and you want
00:04:05
something like hashing or clustering.
00:04:08
Um that can quickly you know find some
00:04:12
sort of good answers. Um if we use the
00:04:17
normal soft attention just competition
00:04:21
too expensive but but there are other
00:04:23
algorithms that can help us there. So
00:04:29
no trees no I I wrote a paper about how
00:04:33
trees are bad because the generalise a
00:04:41
locally and so they can essentially be
00:04:43
killed by the curse of dimensionality.
00:04:46
Now when you do a forest or any kind of
00:04:50
combination of trees like boosting you
00:04:54
actually go deeper by one level and you
00:04:56
get some nice kind of this
00:04:58
representation "'cause" if you think
00:05:00
about it each truly is two you sort of
00:05:04
one aspect of the problem and and the
00:05:08
the it's the composition of all the
00:05:11
leaves that you've selected for each
00:05:14
tree which is a presentation of your
00:05:16
data. So it's actually pretty powerful
00:05:18
representation problem is right now
00:05:21
yeah I don't it's not clear how to go
00:05:23
beyond these two levels and also except
00:05:26
for boosting there's no it's it's not a
00:05:30
how to train these things jointly for
00:05:31
example. But yeah that's so it's not
00:05:35
something that's the that's a centre of
00:05:37
so if your topic up users press easy to
00:05:42
drink to bring into trees this ability
00:05:45
to to extract a certain type of
00:05:48
information that which is trying to
00:05:50
that to another little trees which is
00:05:53
to duty some sort of the tree you could
00:05:57
do that but I think you first you would
00:06:01
need to do some kind of yeah and if
00:06:05
it's not you know how you could
00:06:07
optimise jointly all the trees. So so
00:06:09
here's an example where you know there
00:06:10
are things we'd like to optimise. But
00:06:13
back problem can be used to optimise
00:06:14
them and so trees are kind of greedy
00:06:16
things that have greedy all buttons but
00:06:19
it's they're not is you can't
00:06:22
generalise to you know trees on trees
00:06:24
entries because we don't have any agree
00:06:26
all ribbons for that. Maybe questions
00:06:31
in the room oh I I was wondering
00:06:45
whether there is a mean when it comes
00:06:48
to frameworks. So and today we saw a
00:06:51
nice going by and in that in that
00:06:54
slides where we see that that's a flaw
00:06:57
is more closer to that that that
00:06:59
production one that or she's closer to
00:07:03
research is there I don't see before
00:07:05
going because everybody's doing. And
00:07:08
research but there are but there's a
00:07:13
lot of hope that it's got could be
00:07:14
turned it to production it was actual
00:07:16
actual existence and it it is difficult
00:07:19
to change frameworks as we go from
00:07:22
research to production is there a
00:07:25
recipe for going the whole cycle from
00:07:28
from research to production with one
00:07:32
framework everybody could that be on or
00:07:36
is it like something very preferential
00:07:38
or subject So the question is is there
00:07:41
for a market that researchers beta
00:07:43
scientist and developers can use and I
00:07:46
think we will see a lot more about this
00:07:48
tomorrow but this is the point of
00:07:50
cancer flow and it was built with this
00:07:52
in mind because I don't know there are
00:07:54
a lot of very good researchers and a
00:07:56
lot of what developers that want to put
00:07:57
this research ideas into production.
00:08:00
And the feel this the deploring here
00:08:02
right now is moving so fast that if you
00:08:04
have two different systems you end up
00:08:07
with your ideas in production being
00:08:08
completely out of date. So this is what
00:08:11
I actually aims to do to be the system
00:08:13
that researchers and this is what's
00:08:14
happening right now now researchers use
00:08:16
answer flow and the same models are
00:08:18
being production also but then several
00:08:20
easily. other questions okay then I
00:08:33
will go through my oh yeah hello. Um my
00:08:39
question is concerned about
00:08:41
unsupervised learning and the
00:08:44
understanding what are the features
00:08:45
that will there be a simplified
00:08:48
unsupervised learning because if you
00:08:50
have images and then we get the like
00:08:52
which genitive model we can see the
00:08:55
images that we generate but if we have
00:08:56
other types of data which don't have a
00:08:59
visual interpretation how do we go
00:09:01
about the assessing that live in fact
00:09:05
also the generative model. And the
00:09:08
secondary how could we use this to
00:09:10
perform some kind of clustering or
00:09:12
understanding of the data it's a very
00:09:15
good question even for images it's not
00:09:19
completely that's satisfactory to only
00:09:21
look at was generated and there is some
00:09:25
some nice discussions about whether
00:09:28
even you know having good generation
00:09:31
doesn't necessarily mean we have good
00:09:33
features that in the sense of using
00:09:35
them for particular task it's not clear
00:09:40
was the right even if you stick within
00:09:43
generation is not clear what's the
00:09:44
rights measures we should use to know
00:09:47
that we have a good generator. So this
00:09:50
lot of open proper problems about you
00:09:51
know how do we evaluate instruments
00:09:53
learning in general this is really a
00:09:55
field where of papers are being which
00:09:57
in these days and you know I we don't
00:09:59
know what the right answers or
00:10:01
classical answers to question are
00:10:03
simply to take the unsupervised
00:10:06
learning as a helper for some
00:10:10
supervised learning task right. So you
00:10:12
could do some nice provides nothing you
00:10:13
could do well the preach retraining
00:10:15
which is a form of sony's prize
00:10:16
learning. Um yeah things like transfer
00:10:21
learning with unsupervised learning has
00:10:22
been done before. So you you basically
00:10:25
define another task which hopefully
00:10:27
would be can helped by using the
00:10:30
features or the regular eyes or
00:10:33
whatever coming from an splice I so
00:10:35
that's kind of not completely
00:10:38
satisfactory because it may measures
00:10:39
some aspects and maybe not other
00:10:41
aspects but yeah that's what we have
00:10:43
now. So an ideal and answer to your
00:10:48
question from from conceptual point if
00:10:50
you would be something like not a
00:10:53
single task. But a very rich wind
00:10:57
family of tasks. So if I can define so
00:11:02
we let me give you a concrete example
00:11:03
would say that I could have a task
00:11:10
which is something like visual question
00:11:13
answering. So so you have to you you're
00:11:15
given any mention and somebody asks a
00:11:17
question natural language and you have
00:11:18
to answer and if the questions are
00:11:21
really very completely open and you
00:11:24
know the kinds of questions then
00:11:26
presumably if you're doing a good job
00:11:28
is if you're able to solve any you know
00:11:31
semantic understanding question about
00:11:33
the image express natural language.
00:11:36
Well this you know presumably means
00:11:38
you've extracted the right information
00:11:39
from the image. So I I I think we can
00:11:44
conceive of very broad tasks that go at
00:11:49
all of the aspects and the data because
00:11:51
if if I throwing some aspect of the
00:11:53
image somebody may come up with a
00:11:55
question that not gonna be able to
00:11:57
answer right my features away things
00:12:01
that that the question could ask about
00:12:04
know this by still in this particular
00:12:06
example because obviously humans are
00:12:08
not gonna ask you know is the pixel
00:12:11
three twenty one seventy six greater
00:12:15
then pixel little but that's not the
00:12:17
kind of question you gone again Yes I
00:12:32
have a question about the many people I
00:12:36
speak with the is comes to from signal
00:12:39
processing have been working with the
00:12:41
image processing and different things
00:12:47
before and they usually see that are
00:12:52
very sceptical too deep longing because
00:12:55
it's services a black box and
00:12:57
everything so within it. And which is
00:13:02
about yeah yeah if if features but what
00:13:07
usually in in the it class together the
00:13:11
signal processing you have very good
00:13:14
knowledge of the lower levels. And
00:13:17
there are very good to models of that
00:13:20
also with the agenda line detectors the
00:13:25
way the structures and things like
00:13:28
that. But the problem as being the
00:13:32
semantic up and the but couldn't we
00:13:36
inherit the the lower thereabouts and
00:13:39
the knowledge from all resource that
00:13:42
has been a for there into the deep
00:13:45
learning and concentrate more on the
00:13:48
semantic solutions I think that's
00:13:53
already what's happened a lot of the
00:13:56
early research with convolutional nets
00:13:59
especially the the period where we use
00:14:01
a lot of expertise lining was actually
00:14:03
focusing on the evaluation that metric
00:14:07
was how does it look like we're getting
00:14:09
Gabor filters and what does it look
00:14:13
like filters that you would expect
00:14:14
using you know sensible signal
00:14:16
processing. And that was it of course
00:14:19
quotable evaluation as my first part of
00:14:23
the answer the second part is it's not
00:14:24
where people focus right now people
00:14:26
focus you know the design of the
00:14:27
architecture isn't anymore like how to
00:14:29
do the first two years that's not where
00:14:32
the action is the action is precisely
00:14:34
where you're talking about the semantic
00:14:36
aspect thinking about objects thinking
00:14:38
about scenes and you know a high level
00:14:41
relationships and so on but the still
00:14:44
many yeah many deep learning past is
00:14:49
very much about creating data and
00:14:52
augment the data with the different you
00:14:55
your metrics. And that is you know way
00:14:59
already solidly in the lower levels if
00:15:02
you have a proper models and so I think
00:15:05
you want to take a step above that yeah
00:15:08
people have tried that so the reason
00:15:10
we're using these complex with
00:15:12
different nations is because before
00:15:15
people we're doing exactly what you say
00:15:17
they were taking handcrafted features
00:15:20
which were invariant two kinds of
00:15:22
things and sticking some machine
00:15:23
running on top including your meds but
00:15:26
it turns out that he works better if
00:15:28
you learn the whole thing into and so
00:15:32
what you're suggesting is in you know
00:15:34
something we should do is something we
00:15:36
have done. And maybe we can do it
00:15:38
better but it has been tried it it's
00:15:40
it's exactly where we come from oh
00:15:43
right. So so I have at face but we do
00:15:49
have some research going on and decide
00:15:53
where we want to clearly have learned
00:15:55
from all of the research and signal
00:15:59
processing. Um last year we published a
00:16:03
paper called the complex well dude
00:16:05
continents which basically inspired
00:16:08
from wavelet packet transforms. Um they
00:16:12
don't work as well as well yeah that's
00:16:15
but regardless I think it's important
00:16:18
to understand. Um why applying
00:16:23
traditional signal processing methods
00:16:25
directly just doesn't work as well. Um
00:16:29
we we do have a collaborations that and
00:16:34
why you professors for example David
00:16:36
here man there's a lot of ongoing work
00:16:40
but we at this moment we don't see
00:16:42
anything promising enough to be excited
00:16:46
about oh one question actually about
00:16:59
the batteries is there a successful way
00:17:02
to integrate the motion of time on it
00:17:05
the like some successful application
00:17:08
like time series and predicting that
00:17:10
that's that's fine yes three letters
00:17:20
are and and we carry on that and there
00:17:26
are many forms. They're just design
00:17:29
exactly for that. And they're working
00:17:31
beautifully well I have some more
00:17:37
advanced questions to this island I've
00:17:41
seen. a nice idea of by interpolation I
00:17:49
that And there are and for an and yeah
00:18:04
cool Marcus investigations and capacity
00:18:08
into thousand fifteen is correct yeah
00:18:13
but in general how okay yes quite well
00:18:17
do you have some wrestlers beyond the
00:18:26
things that already exists I don't know
00:18:30
I think personally I would be
00:18:34
interested in understanding more the
00:18:37
structure of the dynamics. Um how the I
00:18:46
can spectrum of the jacobian changes
00:18:49
and both are in training or trying the
00:18:52
sequence another interesting question
00:18:56
is what information is preserved in the
00:18:59
state. So if you think about what a
00:19:02
recurrent net does is it reads a
00:19:05
sequence and that any point it has a
00:19:09
vector steak that is a function of that
00:19:12
and of course it has to throw away some
00:19:14
information and and it's gonna keep
00:19:15
some information. So we could use some
00:19:18
kind of monitoring devices maybe to try
00:19:20
to figure out what a particular
00:19:22
recurrent net is remembering and what
00:19:24
it's forgetting about the input but
00:19:29
yeah I I think it's a great question
00:19:31
and more could be done to understand
00:19:34
what's going on in a would maybe help
00:19:36
us design better recurrent that's as
00:19:38
well better architectures oh so at some
00:19:47
point soon as you were saying that face
00:19:51
book imagine it is a small data set. So
00:19:54
so what the future be of the
00:19:56
interaction between the getting X on
00:19:58
the on the combined because it is a is
00:20:00
a is a bit of feeling that's we soon if
00:20:04
not already well not being the same
00:20:06
category. So we go to conferences and
00:20:09
you have to wonder about we have if you
00:20:12
you have that's a good question there
00:20:17
are actually larger dataset sleeker
00:20:22
hundred million dataset. It's hundred
00:20:26
times larger than they measurement. But
00:20:28
it's weak labels and that's something
00:20:33
we've been really interested in and it
00:20:37
is true or data sets are larger and
00:20:41
magnitudes but we also don't publish
00:20:43
any research on our data sets we play a
00:20:47
level game in the sense that we play
00:20:49
the game that the academics are playing
00:20:52
every work on the datasets that are
00:20:54
public and there's work published
00:20:57
around it it's it's not only cushion
00:20:59
that I'd so it's a question of
00:21:01
competition power also I discuss
00:21:03
sometime people were private companies
00:21:07
are the at at the at the hurt of
00:21:10
getting good results especially when
00:21:12
you play the rock ending game four
00:21:15
ounces if you if you guys to such with
00:21:18
one thousand GP use we do agree touch
00:21:21
we stooges use and it's it's it's over
00:21:25
also again what would be the way of
00:21:29
dealing with this. this is something
00:21:31
that I didn't think about a lot because
00:21:35
then I running go visit research labs
00:21:38
and also some of my interns come in and
00:21:42
there's a there disconnect there
00:21:45
between industry and academia. I mean
00:21:48
some labs for example your shows that
00:21:50
the or or you know rich enough to or
00:21:56
you know they they get donations for
00:21:59
like a lot of cheap use but like most
00:22:01
labs I know there's like one or two
00:22:03
jeep use where you do most of a
00:22:05
research and there is no clear answer
00:22:10
like at face but we are trying to
00:22:12
bridge just disconnect we are donating
00:22:15
machines to many research that in
00:22:18
europe. And it's an ongoing program. Um
00:22:23
that's one way to bridge the gap but if
00:22:26
you ask how do you how do you bridge
00:22:31
the income back got that in the real
00:22:33
world. Um how do it is the the
00:22:38
disparity between the rich and the
00:22:39
poor. I think it's a hard problem. Um
00:22:43
and that's that's it's it's it's it's a
00:22:48
hard question as well to to answer your
00:22:52
cat so here's here's a simple
00:22:53
suggestion you know make your tax
00:22:58
returns public in other words you know
00:23:01
declared your paper how many GP use
00:23:03
you're using we actually we talk about
00:23:07
how many jeep you think it's something
00:23:10
that we should it should become a habit
00:23:13
and that we viewers would take that
00:23:14
into account in their judgement because
00:23:17
you can't compare to papers where you
00:23:19
know that has one to diffuse the other
00:23:21
as a hundred for the same job So this
00:23:25
is what I wanted to to cause not a
00:23:27
simple because somebody could fake that
00:23:29
you only have to give you right but
00:23:31
people can I but that was some do do
00:23:33
you think it would make sense I suggest
00:23:34
is really too few people I was hoping
00:23:36
to but right no it's it doesn't seem to
00:23:39
to me so interesting to others but
00:23:42
would it make sense that people have to
00:23:43
declare the amount of clubs they burnt
00:23:46
for the paper including the grid search
00:23:48
including everything so to have a rough
00:23:50
is even of earth estimate that that
00:23:52
would kind of be helpful no I think it
00:23:59
makes no sense at all because doing
00:24:06
better research is not a function of
00:24:08
the plot to burn the I mean we could
00:24:11
burn a billion plot and still publisher
00:24:15
that paper it's not enough to have
00:24:17
computers but it it can help to give
00:24:19
you the little edge for you know
00:24:22
beating the benchmark so I think there
00:24:25
is another way that we could both agree
00:24:30
which is something I talked about
00:24:31
yesterday change the focus of the
00:24:35
evaluation from purely the numbers to
00:24:38
something more about the ideas and I
00:24:40
know it's harder is much easier so well
00:24:42
you're far from the benchmark reject to
00:24:46
try to actually think about all is this
00:24:48
really interesting any like is you know
00:24:51
what what we feel about this idea is it
00:24:54
is it reasonable and this is something
00:24:56
that could have an impact if it works
00:24:58
which harder to do the evaluation but
00:25:02
so he in the here's another thing I
00:25:04
think what's gonna happen naturally is
00:25:07
a segregation of the tasks all of the
00:25:10
research goals right so so one of the
00:25:14
main reasons for doing the kind of
00:25:16
research we're doing in my lap within
00:25:18
spies leading under the models that you
00:25:20
don't need to have a million examples
00:25:22
to do that you can you can develop new
00:25:25
ideas and test them on small datasets
00:25:28
in fact most new ideas fail on and this
00:25:33
and so you don't need to go very far to
00:25:34
know that it doesn't work. Um it's it's
00:25:38
so I I think we'll see some sort of
00:25:40
research topics that are gonna be more
00:25:43
explored by academia. And some research
00:25:46
topics that require doing things like
00:25:49
you know producing the state of the art
00:25:51
in some can be difficult computer
00:25:53
vision task focused more more I
00:25:56
industrial apps. It's gonna be sad but
00:25:59
I think that's where it might be going.
00:26:01
So the other alternative is we come in
00:26:05
those two people sitting there to to
00:26:08
make that kind of attitude is So I just
00:26:14
wanna say that I think I bit also what
00:26:17
the yours was that it's I think less as
00:26:19
a competition more as a symbiotic
00:26:21
relationship. And academia and industry
00:26:24
can complement each other and from
00:26:26
double side there are hundreds of
00:26:28
grants the done to research labs every
00:26:32
year of visiting scientists that's just
00:26:34
common work that well four months and
00:26:38
also open sourcing tons of little helps
00:26:40
as well right I mean the idea is to
00:26:42
help the community not only help or
00:26:44
sell yeah but okay that that may turned
00:26:46
a bit to put it together but I I I
00:26:48
understand that's the phrase book
00:26:50
Conger sees a solution has faced book
00:26:52
on Google funding the results but I
00:26:55
mean we there are so many things to
00:26:57
discuss here you wanted to add
00:27:00
something okay can okay Of first yeah
00:27:04
well we actually a totally agree a
00:27:06
specific with that donating to
00:27:07
universities even though you know we're
00:27:09
not Intel rain D so we actually usually
00:27:12
one or two you use actually have been
00:27:14
doing these mostly in the US but
00:27:16
however you know we're open to
00:27:17
corporation you're also going to being
00:27:20
a few other aspects and I should be in
00:27:22
here in this conference and there's
00:27:23
another node centric approach even you
00:27:26
S rises anymore tightly you were
00:27:28
there's also these came out machine
00:27:29
learning you think your own things like
00:27:32
how do approach to park actual I'll
00:27:34
talk about "'em" or have done some work
00:27:36
with the rice university which she you
00:27:38
know each accelerate practise practise
00:27:40
and you use in that again takes you
00:27:42
will for dirty to be able to handle
00:27:44
bigger datasets. So again there's one
00:27:46
interesting direction to to what you're
00:27:48
looking at the down thinking also for
00:27:51
this there's tired I was like a graph
00:27:52
lab in the US to know there's a company
00:27:54
called actually about comedies are both
00:27:56
know probably this came out you know a
00:28:00
holiday to many companies that also
00:28:02
work in that space And so two things
00:28:07
I'm very and I'm sure I'm date some
00:28:10
extent as well give out a lot of GPS
00:28:13
we've got a hardware academic ground so
00:28:16
if there is anyone you just go online
00:28:18
and put a proposal three and we tend to
00:28:21
I'm actually I'm actually we now for
00:28:23
giving away way too remote look nine
00:28:25
sales. Um so there is that but the
00:28:29
other thing is we will kind of just
00:28:31
assuming that the learning is gonna
00:28:33
continue the way it's and it's going
00:28:37
which is I really really intensive
00:28:39
training. And then you have your
00:28:41
inference and there are already
00:28:43
research is to collect that entire work
00:28:45
load. Um you know what mention names
00:28:48
but I mean I'm I'm talking to people
00:28:50
who have upbringing and and things that
00:28:54
expectation minimise asian and where
00:28:56
you're going more the biological we
00:28:59
wear and this a space comes back to
00:29:01
attention models things like that where
00:29:03
you're recognising the features before
00:29:05
you even then go to the training which
00:29:08
and this flipping of the workload means
00:29:09
that you don't need as many GP use and
00:29:12
you can also have massive massive data
00:29:15
sets because you're not doing this
00:29:17
intensive training it so you actually
00:29:19
found and and that's just one thing do
00:29:22
learning is not gonna remain as it
00:29:24
currently is we're not there and there
00:29:26
is is as far as I'm concerned as far as
00:29:29
I know only about one or two people
00:29:31
actually looking at this because the
00:29:33
majority people just assume the living
00:29:36
is this this huge training and then the
00:29:38
inference. Um but that's gonna change
00:29:42
the field you have in Macy's these
00:29:44
people make at way that that will
00:29:45
change the failed for GP you use but
00:29:48
there's maybe another solution that we
00:29:53
come to us from hardware guys. So there
00:30:01
are lots of companies who are trying to
00:30:03
compete with envy via. And build the
00:30:06
next generation of neon that chips. Um
00:30:10
this could give us a hundred full speed
00:30:13
up in the next couple of years. And it
00:30:17
could level the playing field if if
00:30:20
these chips also sold in in a commodity
00:30:24
products. And they're gonna be cheap.
00:30:27
And it's gonna make it hopefully much
00:30:29
easier for research. That's a
00:30:31
possibility that I hope will happen
00:30:33
yeah I think we both on the speaker
00:30:36
that actually some interesting because
00:30:37
what's something that you mentioned
00:30:39
like if I were to commute to harder to
00:30:41
days neural networks by the time I had
00:30:44
rates ready to be obsolete. So we have
00:30:46
extra yeah I don't think I don't think
00:30:48
so. I think I think a lot of the
00:30:50
building blocks will be there actually
00:30:53
shows but I agree with you that's
00:30:54
exactly it and we need to identify the
00:30:56
the building blocks a making those
00:30:58
available in making those programmable
00:30:59
yeah that's that's actually that's but
00:31:03
the other thing is that the differences
00:31:05
in hardware very reminds you know for
00:31:08
example Pascal was like three years of
00:31:11
aren't indian and it's what should
00:31:13
bring it to to market. So that's very
00:31:15
very slow compared to the advances that
00:31:18
you make in in software and I'd have to
00:31:20
get on is numbers you know on a really
00:31:23
quick. And and and it's not so much the
00:31:25
the neon framework it's the actual
00:31:28
shape that that developing even though
00:31:29
it's still okay I think we have to get
00:31:32
it out next year yeah yeah basically
00:31:34
they use the RG B.s right now to do the
00:31:36
the simulation but you know when it
00:31:38
comes out I mean this is a dedicated
00:31:39
chip so you know where well where
00:31:41
everyone and all tabs are actually
00:31:43
working this got grey and yeah be it
00:31:45
could be used for training not just for
00:31:47
infants yeah but the the point is that
00:31:51
the software advances I think you you
00:31:54
have to realise that that is gonna
00:31:55
happen a lot quicker than hardware you
00:31:58
know I mean we still talking next year
00:31:59
before Madonna bring bring the chip
00:32:02
out. And in the meantime you know this
00:32:04
people doing tests only on on CPU at
00:32:07
the moment the claiming two hundred
00:32:09
times speed up using genetic
00:32:11
algorithms. And the neural networks and
00:32:15
this gonna be way more advances in
00:32:17
software thing even before the the
00:32:18
hardware comes in sorry I'm very
00:32:24
curious to hear a less busy numbers and
00:32:26
well to do. So I'm speaking as loud as
00:32:30
I can honestly I'm really curious to
00:32:34
hear cast position on this and what
00:32:36
Google is doing a with that you you the
00:32:41
question is a curious to see what the
00:32:44
goes doing with the new UITP you yeah
00:32:47
so obviously it's not a secret that
00:32:49
Google uses a lot of deep learning. And
00:32:53
speeding up the training is very
00:32:57
important so you pews are obviously
00:32:59
still very much in use and TP use our
00:33:04
as you probably know already used by
00:33:07
problem products for one here so around
00:33:10
brain uses the user and bring one of
00:33:12
the search out. So it's cranking out
00:33:17
where they're just not the very simple
00:33:19
base as you can imagine and part of it
00:33:22
is neural network that helps with the
00:33:25
ranking and that uses to be for example
00:33:27
so we we definitely see neural network
00:33:32
or machine I machine going specific
00:33:34
hardware helping but again. It's not a
00:33:37
focus on hardware versus softer they're
00:33:39
both that that thing and it's not
00:33:41
facing a bit rather it's getting at at
00:33:45
that thing at the same time and says
00:33:47
likes sea horses we want to have the
00:33:49
hardware to enable us to do the best
00:33:52
possible research sh cool I had a
00:33:58
general question ah yeah yeah it's it's
00:34:05
more search a word stuff presentation.
00:34:07
So is there some sort of intuition
00:34:10
behind why cans work better than
00:34:12
variational autumn colours because
00:34:15
variation on encounters have a nice
00:34:16
elegant formulation but can simply
00:34:19
before in the past "'cause" they can't
00:34:20
"'cause" variational or I don't colours
00:34:22
can scale this over safari believe can
00:34:25
sell so it's a good question I think
00:34:27
different researchers may have
00:34:28
different opinions about this that what
00:34:32
happens with very small encoders is
00:34:34
that the it tends to as I said to lose
00:34:38
too much information about the input in
00:34:40
their later representation by adding
00:34:42
too much noise somehow and even if you
00:34:47
just yeah and then what happens is
00:34:52
that's the decoder sees the same
00:34:57
representation being associated to
00:34:59
different a axes right so I I it's
00:35:04
trying to do a one to many mapping and
00:35:08
it does it by having a deterministic
00:35:11
function fall by at some gaussian
00:35:14
noise. So what what you're getting is
00:35:16
that the mean of that gaussian is going
00:35:19
to be somehow in the middle of many
00:35:22
images that correspond to the same
00:35:26
later no presentation roughly speaking.
00:35:28
So what happens that's what you get a
00:35:29
blurred images image is the up because
00:35:32
the the average of a bunch of images is
00:35:35
a kind of a blurry image whereas gas
00:35:40
doesn't have this issue at all it can
00:35:44
produce very very sharp images but it
00:35:46
has other issues it may miss boats
00:35:49
other it it may give zero probability
00:35:52
to things that should happen in the
00:35:54
world of course when you generate
00:35:56
samples you don't necessarily see this
00:35:58
that is a whole world that is missing.
00:36:00
So it looks nice but if you were to
00:36:05
compute the log like you have again you
00:36:07
get infinitely bad log like you so yeah
00:36:12
they they have their advantages and
00:36:14
disadvantages so maybe again imagine
00:36:24
general question so there is a bit the
00:36:26
feeling that I go result was it a bit
00:36:29
of a surprise for even the people in
00:36:31
the field. So what what would be
00:36:35
according to each of you something
00:36:37
which is ask clearly. That's redefine
00:36:39
because this is pretty pretty clearly
00:36:41
define these are let's say that you
00:36:45
don't expect to happen before ten years
00:36:47
and if it was happening before ten
00:36:49
years you would be very surprised well
00:36:52
is is just too much of a perspective
00:36:54
question is that okay before okay two
00:37:00
years two years. I would say is
00:37:06
starcraft within if it if it gets all
00:37:10
within two years that would be that
00:37:14
would be very very impressive so
00:37:17
without the go I think there was
00:37:19
sentiment the your for all I got was so
00:37:24
I went initial paper came out that I'll
00:37:29
forego I mean that goal will be thought
00:37:33
to because the initial results really
00:37:35
promising just what that's the building
00:37:38
this about it or with starcraft I think
00:37:45
they are very hard problems in it to us
00:37:49
all first for example doing
00:37:53
assimilation and in inside the model
00:37:56
like all go has an advantage of having
00:38:00
the simulator it can predict different
00:38:02
moves and then see if they're ballad or
00:38:05
not that's not applicable either to
00:38:09
starcraft or to the real world but
00:38:12
basically doing planning in this late
00:38:14
and space and another thing is also the
00:38:16
action spaces are much larger which
00:38:18
means we won we need a system that can
00:38:23
do hierarchical actions really
00:38:25
effectively or even in for the
00:38:28
hierarchy of actions automatically and
00:38:30
I would say if that in two years
00:38:33
something like this happens that would
00:38:35
be amazing and surprising natural
00:38:41
language understanding yeah I don't
00:38:44
know what's the benchmark there is one
00:38:49
of course that ring test but the
00:38:52
problem with that ring test is that
00:38:55
it's not just about natural language
00:38:56
understanding it's also about fully I
00:38:59
so you have to understand you know
00:39:02
everything about the will that humans
00:39:03
typically know about but you could
00:39:05
imagine a trying tests geared at a
00:39:08
particular domain like be able to
00:39:13
answer technical questions about Linux
00:39:15
or you put two for example there's data
00:39:18
set for this too small but in doing it
00:39:22
as well as a human. I think that's
00:39:24
something. That's not impossible but I
00:39:28
doubt that will have it in two years
00:39:30
but if if we do it would be a great
00:39:32
success I think I would be very
00:39:38
surprised if we get in use machine
00:39:40
learning algorithms to generalise as
00:39:42
well as we do so also a bit related to
00:39:44
transfer learning if unsure what to
00:39:46
your old two features of T rex the next
00:39:49
day that we are to run around the house
00:39:51
and we'll say artist direct this is not
00:39:53
the T rex right we are very far from
00:39:56
that now I I'm not at liberty to guy.
00:40:02
But but no not a bad idea is just two
00:40:08
examples is the enough for a child
00:40:10
because as usual said in his talk right
00:40:12
we understand things about the world
00:40:14
anyway easily able to generalise and
00:40:16
right now we're very very far from that
00:40:18
should read some papers that came out
00:40:22
recently using the only got dataset
00:40:26
where it looks like you're you know
00:40:29
we're able to do a fairly good job with
00:40:32
one or two or three examples using sort
00:40:36
of one trouble on different you one
00:40:38
shot learning techniques I and I think
00:40:40
it's all this problem at all right but
00:40:42
but there's been some recent progress.
00:40:44
So we could see more of that in the
00:40:47
next and of course the magic comes from
00:40:49
the fact that you've already seen
00:40:52
hundreds of other similar in this case
00:40:57
similar alphabets and then you can
00:41:00
generalise to a new alphabet with you.
00:41:03
You know we ways of writing specific
00:41:06
actors but shouldn't exclude this
00:41:10
because also the child has the
00:41:11
knowledge about the world so you
00:41:13
shouldn't assume that learning will not
00:41:15
come from nothing right we just one
00:41:17
more generalisations so within the next
00:41:24
two years of roughly I don't think it
00:41:28
will happen but possibly within the
00:41:31
next five fucking john but I really
00:41:34
would like to see and this is strange
00:41:36
thing fair estimates that from an very
00:41:39
your morphing chips and and
00:41:40
developments that and that's really
00:41:43
bringing down the the power budget but
00:41:46
also ramping up the capability. Um I
00:41:50
don't exactly know where this more
00:41:53
slower and capabilities going to sort
00:41:57
of P can get its next. So the second
00:42:01
wind from a goddess that but normal for
00:42:04
chips are probably very very important
00:42:07
for getting to a GI and and I think you
00:42:11
know getting to a TI is is a really
00:42:13
important thing ignoring all the scary
00:42:17
stuff and what could go wrong at such a
00:42:20
but we need to get that kind of
00:42:21
capability. And I suppose if you and if
00:42:24
you go away from D living for for one
00:42:27
second the other thing would be the the
00:42:31
space program pushing this to you know
00:42:33
to for the for the or getting passed
00:42:37
for the for the limits you know things
00:42:39
like you know you know most things.
00:42:40
Well next year will fly twenty eighty
00:42:47
to to get a know that that kind of
00:42:50
focus is is gonna really yeah take this
00:42:54
feels it's a difference different
00:42:57
sectors I think I actually just
00:43:02
summations visit those speak if brought
00:43:04
their crops are no the the reason for
00:43:07
having it actually or X box comes out a
00:43:09
Christmas level type terrify machine
00:43:12
you two have been on the top five
00:43:13
hundred list a few years ago. I in
00:43:16
actually right now are not the US
00:43:17
department of energy's having these
00:43:19
access key program a need to be able to
00:43:22
annex extra four system in twenty
00:43:24
thirty two or so. But again we need to
00:43:27
bring those flops to there hopefully
00:43:28
system no questions and you know yeah
00:43:41
but yeah sorry I'm asking many
00:43:45
questions but this time or maybe to the
00:43:47
hardware produces. Um fan of recurrent
00:43:51
neural networks and especially might it
00:44:00
used is not because they're not good
00:44:02
but because they're not so really fast
00:44:04
there are in some laps. goods
00:44:07
implementations but I wonder uses some
00:44:10
I don't know natives to disappoint
00:44:14
coming soon follows such architectures
00:44:17
and so I know you can very well. And
00:44:22
from from what I know he brought out
00:44:26
there that LSTM simply because there
00:44:29
wasn't any decent way to paralyse with
00:44:33
GP and this is like the the pretty case
00:44:35
it's a a lot of where the we started
00:44:37
doing and keep you know and fives
00:44:38
obviously now offering I mean it's only
00:44:41
single or and then at the moment. But
00:44:43
we are working on it but that paper.
00:44:46
And what really surprises me and and it
00:44:48
does all the time actually that it's
00:44:52
yeah is is so quiet about what they do
00:44:54
you know that it is but it it which is
00:44:56
surprising you know when we know what
00:44:59
your is like right but that that type
00:45:02
is probably should have been pushed
00:45:03
around that a whole lot more because
00:45:05
it's it was purposely. So that you
00:45:07
could use it with with GP A.'s and it's
00:45:09
it's really just a very another elegant
00:45:13
solution because it it this full
00:45:16
concepts of the actual volumetric data.
00:45:20
And then you know if you if you haven't
00:45:21
read the permit of a CM paper just take
00:45:24
a look at it this. It's useful but it
00:45:26
it did drive us and to be honest when I
00:45:30
first started and video I did say you
00:45:33
know why are we not covering on "'em"
00:45:35
but to be honest I think a a year ago
00:45:39
there wasn't that much activity with
00:45:41
with our own and especially on the
00:45:42
white afield anyway. So I was pushing
00:45:44
you know we need to double more but
00:45:46
again it takes time we we've only got
00:45:48
finite number of people so we'll gather
00:45:50
but it was purposefully. So that you
00:45:53
could implement on GPA "'cause" they've
00:45:55
been using GP news. And opinion you a
00:45:57
big proponent of those for for a long
00:45:59
time I don't know what yours while
00:46:07
you're saying that are intense and use
00:46:08
that much my lab they're used all over
00:46:12
the place. I mean some you know variant
00:46:15
of L already oh I mean if you're in the
00:46:20
via research later for and the
00:46:23
publication means CNN and so the much
00:46:25
more here's switching conference you go
00:46:28
if you go to CP or maybe you don't see
00:46:30
that much but if you go to Lena
00:46:31
language related conferences that
00:46:33
enables you different picture but these
00:46:34
are the one I mentioned I mentioned the
00:46:36
multidimensional oh the
00:46:37
multidimensional yes oh yeah well it's
00:46:47
because you I was only lasted that and
00:46:49
that they were that paper is stalling
00:46:51
and and and you can so the same then
00:46:55
there's but a lot more check out you
00:46:59
know three D volumetric data for GPAXS
00:47:03
to the the is quite a lot different
00:47:04
type is that that are out there now so
00:47:09
fast a military and he gave a paper
00:47:12
LGTC conference and and I think that
00:47:15
what is going on base he's so the jump
00:47:18
strangest also because of that one
00:47:19
paper. But is this quite a lot we
00:47:21
devalue much is really ramping up now
00:47:23
because as a obviously the the medical
00:47:25
applications. Um but again it's you
00:47:28
know we write the beginning of this
00:47:29
where like when you get to three day
00:47:31
and "'kay" so maybe one one more
00:47:43
question and then we can stuff here so
00:47:45
the question for the a framework people
00:47:48
so it's it's one of the question that
00:47:51
you got quite a lot of votes on the
00:47:53
website was what are the and the
00:47:56
inevitable trade offs in a framework so
00:47:59
we you you you showed this kind of
00:48:02
gradient between and destroy and work
00:48:04
too when we see that a sex to be to one
00:48:07
side to be changes are on maybe like it
00:48:10
when question which was asked at the
00:48:11
beginning it ye it's I was loose and
00:48:16
nobody the right of so it's impossible
00:48:17
to have the best of both worlds or is
00:48:20
it simply that we do not have yet come
00:48:24
with the the right the right overall
00:48:26
thing or what we call them so I think
00:48:33
regard regarding the initial trade off
00:48:36
question something that clearly comes
00:48:39
to mind is what's the right level of
00:48:41
abstraction. So ideally you want to
00:48:44
have the things always being composed
00:48:48
of different operations and have the
00:48:50
operations and everything being very
00:48:52
modular but sometimes if you do that
00:48:55
you have your call this lower right
00:48:58
because you can't optimise for example
00:48:59
if you spend your time actually writing
00:49:01
your into and program that say in C or
00:49:04
C plus plus you can really optimise the
00:49:06
bare or to put that same for speed
00:49:09
memory usage and so on but if you want
00:49:11
to have a composition allergy then you
00:49:15
trade off a little bit of the the speed
00:49:17
and also this comes up with numerical
00:49:21
stability. So as we know soft max is
00:49:24
not really that remote "'cause" stable
00:49:26
stable peaceful max and then cross
00:49:28
entropy so for example intensive you
00:49:29
have soft max with cross entropy on one
00:49:32
not ideal for generalisation but
00:49:34
sometimes you you have to combine these
00:49:37
operation so I think this is something
00:49:39
that when you design a deep learning
00:49:41
frame and this is one of the hard
00:49:42
questions where do you stop where where
00:49:44
is the level of composition allergy
00:49:46
that you want to go with and for
00:49:48
example for tens of low for a compared
00:49:51
to do forced DB deplore in systematic
00:49:54
well it's more compositional so and it
00:49:56
turns out that if you do it right you
00:49:58
can even make it faster so it's faster
00:49:59
than the person system also more
00:50:03
modular but I think overall this is one
00:50:06
this is the first thing that comes to
00:50:07
mind I think there is of course this
00:50:13
tradeoff exist but the there are some
00:50:15
tools to really improve on both fronts
00:50:19
and one of them is a very old one it's
00:50:21
called the compiler and the compiler
00:50:24
allows you to have a lot of flexibility
00:50:26
and and modularity but you know once
00:50:30
you've specified the computation you
00:50:31
can use the compilers intelligence
00:50:36
which could you use machine learning
00:50:39
you know to to make it efficient.
00:50:41
Instead of having a human design right
00:50:43
in you know has been designed to try to
00:50:48
you know make it easy for putting
00:50:50
compiler technology but you know now I
00:50:55
think we could do a lot better if we
00:50:56
put in like professional compiler
00:50:57
writers to do these kinds of things
00:51:00
hopefully does it flow will get their
00:51:02
but I think this is a direction where
00:51:04
we could have both ease of you know
00:51:09
design flexibility. And efficiency and
00:51:14
you know efficient implementation and
00:51:15
production ready think the remote there
00:51:22
but make good points and it there's a
00:51:24
comment the in their where we are not
00:51:30
like I mean your question was do we
00:51:33
always have to make these trade offs
00:51:35
between research and the production
00:51:38
like faster and flexibility in an ideal
00:51:42
world we done you actually should be
00:51:45
able to write the most flexible most
00:51:48
fast is the thing but yeah do you have
00:51:54
not yet the the research on the
00:51:58
compilers and the grab placements and
00:52:02
basically a bunch of system research is
00:52:04
still not there yet to build such a
00:52:07
tool and that's why people do these
00:52:10
things by hand. Um when that research
00:52:14
catches up and I'm sure it will catch
00:52:16
up like thing we will move closer to
00:52:20
words like unified system that does but
00:52:23
things really well "'kay" so and use a
00:52:30
question okay so maybe that's enough

Share this talk: 


Conference program

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
4 July 2016 · 2:01 p.m.
Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
4 July 2016 · 3:20 p.m.
Day 1 - Questions and Answers
Panel
4 July 2016 · 4:16 p.m.
Torch 1
Soumith Chintala, Facebook
5 July 2016 · 10:02 a.m.
Torch 2
Soumith Chintala, Facebook
5 July 2016 · 11:21 a.m.
Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
5 July 2016 · 1:59 p.m.
Torch 3
Soumith Chintala, Facebook
5 July 2016 · 3:28 p.m.
Day 2 - Questions and Answers
Panel
5 July 2016 · 4:21 p.m.
TensorFlow 1
Mihaela Rosca, Google
6 July 2016 · 10 a.m.
TensorFlow 2
Mihaela Rosca, Google
6 July 2016 · 11:19 a.m.

Recommended talks

Défis actuels pour la numérisation des archives audiovisuelles
Jean-Pierre Gehrig, Directeur de Cinetis SA
20 Oct. 2016 · 7 p.m.