Day 1 - Questions and Answers | Panel | 04.07.2016 at 16:16 | Part of Deep Learning, Tools and Methods workshop

Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.

00:00:00

Yeah and you him this might I see

00:00:09

question but you're right questions we

00:00:22

need nonlinearity is the system

00:00:24

otherwise the whole thing would be

00:00:25

here. Well why does work oh you don't

00:00:32

know you know in two thousand ten we

00:00:36

started thing would and you got this

00:00:39

result that placed like injured or sick

00:00:43

like variables it really works a lot

00:00:45

better for training deep that's like

00:00:47

you know no two or three years but four

00:00:50

five six seven eight initially we

00:00:54

thought it would be a problem because

00:00:56

the relatives flat place oh and the

00:01:00

fuck was this is that if the derivative

00:01:04

information actually what happens is

00:01:06

that it does slow because there's

00:01:09

always a significant proportion like

00:01:11

half of the units which or the wish. So

00:01:16

information does in one hypothesis that

00:01:19

we played with is maybe works better

00:01:22

because there's a kind of symmetry

00:01:24

breaking going on one and so when you

00:01:25

train only a few or so half the unit

00:01:29

specialised for that example and the

00:01:30

other ones don't try to go for example

00:01:32

where is tension units all of the units

00:01:35

get a signal that oh you should do

00:01:37

something to get the error down for

00:01:38

that example the other obvious thing is

00:01:41

maybe the media part is really

00:01:42

important because what it means is that

00:01:45

for any particular example there's a

00:01:47

maple around that example where it's

00:01:49

just that when you're mapping inputs

00:01:51

and outputs. And we understand a lot

00:01:53

about linear mapping obtained by

00:01:56

multiplication of of mixes together

00:01:59

this beautiful paper for example by

00:02:00

Andrew sex realising this phenomenon

00:02:02

and you understand how converges

00:02:05

proceed but of course here reads that

00:02:06

particular because the linear mapping

00:02:08

you get is different depending on which

00:02:11

subset of units are active each time.

00:02:14

So there's a lot we don't understand

00:02:16

yeah we have some some ideas but it's

00:02:18

something I think the next generation

00:02:20

should be investigating you know wise

00:02:22

is working people are also modified

00:02:25

real to work even better like you know

00:02:28

sort of having a zero actually have a

00:02:30

little bit of slow so so you know it's

00:02:34

not the end of the story. Okay yeah

00:02:46

hello but I think everybody a very

00:03:19

curious about the hyper parameter to

00:03:21

think us it's like the number of

00:03:23

keeping a years. And number but you can

00:03:26

you need in each of the the year

00:03:28

especially on the the conversion your

00:03:30

network depending on the training data

00:03:34

number of the training data and

00:03:36

depending on the the difficulties of

00:03:39

the task. So could you have any

00:03:41

intuition about the to think the like

00:03:44

the perfect or depending on the data

00:03:46

and the difficulties that's another

00:03:54

area where more work needs to be done

00:03:57

right now we think of the problem of

00:04:01

finding her parameters as an

00:04:03

optimisation problem itself where were

00:04:06

optimising violation. So we can

00:04:09

actually use me tools from optimisation

00:04:12

you know to search for good parameters

00:04:14

now this is a bit difficult because

00:04:18

each step of trying a particular

00:04:23

parameter configuration is pretty

00:04:24

expensive because it involves training

00:04:26

the whole but actually this is how we

00:04:29

do it. We go and try many

00:04:31

configurations and based on the results

00:04:33

we try out the configurations. And it's

00:04:36

really an optimisation process usually

00:04:38

with the human in the loop. But you

00:04:40

could you could also do it more

00:04:42

dramatically so there's a lot of work

00:04:45

on what's called vision optimisations

00:04:46

or privileged optimisation swear use

00:04:48

completely automatic way of proposing

00:04:51

annexed configuration of the parameters

00:04:53

to try so these are fairly complex

00:04:56

methods and in fact you'll nets are

00:04:58

starting to be used due to help with

00:04:59

that summarisation. But in practise

00:05:03

this very very simple way to do the

00:05:05

optimisation which is called a random

00:05:07

search basically you just lounge twenty

00:05:12

or thirty different configurations. And

00:05:15

you see what happens and if you don't

00:05:18

have a good enough results you launch

00:05:20

more and and you have to be a little

00:05:25

bit smart about the set of values to be

00:05:28

tried for each of the parameters based

00:05:29

on prior experience but it's not really

00:05:31

complicated once you get used to doing

00:05:35

this a lot of practitioners to get a

00:05:36

sense of it's good of intuitive sense

00:05:38

of what works what doesn't work. And

00:05:41

they will manually explore because you

00:05:44

may not have access to you know three

00:05:45

GP use. Um and so it becomes more like

00:05:50

an art but in principle it it it could

00:05:52

all be or maybe it's just a matter of

00:05:53

computational resources. So so I go

00:05:57

actually with the put online to select

00:06:00

some questions on the the words one

00:06:02

only to to these which is the the new

00:06:06

intensive work around the prison able

00:06:09

to automatic you know just to survive

00:06:11

today to itself so they have been you

00:06:13

know to cascade correlation right. And

00:06:15

your little ones welcomes why where is

00:06:18

on the way isn't it more and

00:06:20

transceiver research looked at I think

00:06:26

it's been looked at there is a paper in

00:06:30

two thousand fourteen called hyper grad

00:06:33

from right handedness is group that

00:06:38

opposes the out of searching over have

00:06:41

a hyper parameters as making them

00:06:44

defensible using some kind of momentum.

00:06:47

But these things take so much

00:06:50

computational resources like searching

00:06:53

through building models around hyper

00:06:56

parameters and configurations about

00:06:59

parameters are expensive enough that

00:07:02

research. That's more serious has been

00:07:04

pretty slow. And also on the note up

00:07:08

like what or like how do you find

00:07:12

better hyper parameters. Um I I so both

00:07:16

sides of the kind of having not enough

00:07:19

resources that and why you as a grad

00:07:21

student. Um and also at face but maybe

00:07:24

have almost unlimited resources. Um at

00:07:29

NYUV we used to call this method GSO

00:07:34

read student optimisation. Um there

00:07:38

they casted essentially over over

00:07:41

couple years will form an internal

00:07:44

model of what works and what doesn't I

00:07:48

think there is an optimisation tries to

00:07:51

do this as well it tries to build a

00:07:52

model from like past experiences past

00:07:56

experiments that are done that it can

00:08:00

use to draw better hyper that that

00:08:04

hyper parameters samples. Um an

00:08:06

explorer better but just fresh out of

00:08:09

ICML just last week or so. Um there

00:08:13

have been claims that page and

00:08:16

optimisation for searching for hyper

00:08:19

parameters that and really where you

00:08:20

can just doing a bit more random

00:08:22

searches actually equally effective. Um

00:08:26

it's yeah it's very very empirical at

00:08:30

this point like you don't have a it's

00:08:33

valuation the four hypothetical is Okay

00:08:41

I just just a quick comment "'cause"

00:08:42

that producer or both at asleep using

00:08:44

GP use an L for ED actually two things

00:08:47

I remember seeing with paper I think is

00:08:49

going to fire cafe out of berkeley. I

00:08:51

get in one very nicely we speak about

00:08:53

doing character to isolation but I

00:08:55

think I'd like to make is also quite

00:08:57

words like always keel out of machine

00:08:59

learning that would be to enable you to

00:09:02

do all those searches of more more

00:09:03

closely parallel using a framework like

00:09:05

face books very I just wanna say that

00:09:13

it it also depends a lot on the kind of

00:09:15

problem you're working on so if

00:09:17

training takes two hours then you can

00:09:20

use you you can you know you can use

00:09:23

have the optimisation or random search.

00:09:26

"'cause" you could lounge many

00:09:28

experiments but if training takes two

00:09:30

weeks like training for image that then

00:09:34

you know that's not gonna cut it and

00:09:36

that's why this GS so oh my questions

00:09:44

okay thank you for the doctor talks

00:09:48

whether what like to have a somehow

00:09:51

speculative question as leaving or can

00:09:54

you describe a link between probably

00:09:55

still graphical models and dip learning

00:09:58

or can they benefit from each other or

00:10:00

there's a lot of links you know the

00:10:06

first ways that we discovered to train

00:10:08

the networks were based on using and

00:10:11

specialise probabilistic models of

00:10:14

learner presentations like RP m.'s. Um

00:10:18

and so that was you know say two

00:10:22

thousand six to about two dozen ten two

00:10:24

thousand twelve. Um now there's a lot

00:10:28

of research in provides learning

00:10:30

because that's something we don't know

00:10:32

how to do yet and we know that it's

00:10:33

gonna be very important for EI and a

00:10:36

lot of with that research is strongly

00:10:39

motivated from the by the probabilistic

00:10:42

interpretation of what enterprise

00:10:43

lining is about it's about capturing

00:10:45

the joint distribution of random

00:10:47

variables. So a lot of the work that's

00:10:49

been done in in in graphical models

00:10:53

especially like and variables like

00:10:56

"'specially" variational methods or

00:10:58

even in spirit yeah in something

00:11:00

methods multicoloured methods all of

00:11:02

these you know come up as useful tools

00:11:06

at least for some of the other times

00:11:08

there other algorithms like the again

00:11:11

for example where we completely bypass

00:11:13

any kind of probabilistic

00:11:14

interpretation but if you actually want

00:11:16

to analyse the algorithm and becomes

00:11:18

important again to think about notions

00:11:22

from from probability. So I would sure

00:11:32

because I'm going to go and mixing some

00:11:33

questions from the board questions one

00:11:35

room so if it's okay so there's one to

00:11:38

general question which was the one

00:11:39

doing most so you ring by the postal

00:11:42

workers was what are what could be a

00:11:46

non troubles from the application those

00:11:48

people from the so you get to speak

00:11:50

what does a task that dip running

00:11:51

conserve the best some can sorta worse

00:11:54

what would be the the worst thing to

00:11:56

apply pruning too that's a loaded

00:12:01

question I think declining can solve. I

00:12:09

mean if you look at your networks of

00:12:12

modern times their correlation learners

00:12:14

right like they can do causal entrance

00:12:17

they can do advanced reasoning. So

00:12:21

anything that can basically be solve it

00:12:24

correlations and like simple mechanics

00:12:29

like on days. I think can can be so

00:12:33

that deep learning of today I don't in

00:12:36

know what deep learning is anyways like

00:12:39

it like that field is moving fast

00:12:41

enough that the goal posts keeps

00:12:43

changing but as of today if you take a

00:12:46

net send recurrent nets LST M.'s day do

00:12:50

strong correlation statistics and

00:12:53

counting and that's about it. Um and a

00:12:57

lot of tasks that couldn't be sold in

00:12:59

the past apparently can be so just

00:13:02

that's of these properties. Um and if

00:13:06

you want something beyond that where

00:13:08

you actually need to reason with very

00:13:11

little data or so that's something that

00:13:17

gives learning methods as of today

00:13:18

cannot do. And what a bunch of other

00:13:22

actually working towards getting two

00:13:25

yeah I agree with that it depends on

00:13:29

your definition of what deep learning

00:13:30

is so there's today's T planning and

00:13:33

there is what researchers are working

00:13:34

on now which really is trying to

00:13:37

address these questions of of reasoning

00:13:39

and even causality and it's really

00:13:42

really important because if we wanna

00:13:45

reach a I would need machines that

00:13:47

really understand the world. And that

00:13:50

means forming a kind of causal

00:13:52

explanation of what is going on. Um now

00:13:57

for me that could be very much part of

00:13:59

deepening the the the this what's

00:14:01

special about the planning isn't that

00:14:02

we use backdrop isn't that we use

00:14:04

neural nets in their current form what

00:14:06

special is the idea of you know having

00:14:09

learning representations and learning

00:14:11

multiple levels are of presentation

00:14:13

that correspond to different levels of

00:14:15

abstraction what we need to do is to

00:14:16

inject more of these ideas into the

00:14:19

buttons including the work going on

00:14:21

with the reinforcement learning in

00:14:23

order to to approach these calls but

00:14:25

that's still kind of of the frontier I

00:14:28

don't think I could at is that people

00:14:30

running can is clearly even in in its

00:14:32

future form is clearly not gonna be

00:14:34

useful when you want to learn a

00:14:36

something like a random function

00:14:37

function that doesn't have any

00:14:38

particular structure the reason

00:14:39

declining is working well is because

00:14:42

the kinds of tasks we trying to learn

00:14:43

with it. Um have this kind of

00:14:46

compositional structure that you know

00:14:48

is is is taking being taken advantage

00:14:51

of by by these you know that's Um hi I

00:14:58

have a more practical question the

00:15:01

discussion and numbers and efficiency

00:15:04

today that was discussed once basically

00:15:07

on the server I was wondering how far

00:15:10

we had or how far are we from and

00:15:14

buddies time more by and deep learning.

00:15:18

So in terms of how many for example

00:15:21

that images per second in the case of a

00:15:24

mention that can be S if five on the

00:15:27

mobile I'm not talking about rain but

00:15:29

in terms of free time predictions how

00:15:31

how long we now know what efficiency in

00:15:36

what performance Yeah I mean I can't

00:15:43

specify actual numbers I mean I

00:15:45

would've thought since then you know I

00:15:47

mean that's that's your button right is

00:15:49

they the mobile side but what are what

00:15:54

I'm seeing is edging very very rapidly

00:15:58

to very very low power. Um you know

00:16:01

people like a talking about invisible

00:16:04

devices. So whatever the numbers on now

00:16:09

they the really amazing part about it

00:16:13

is is that's gonna mean nothing in

00:16:15

probably six months time it is kind of

00:16:18

like a collie the I five there you know

00:16:20

it's it's like you bring out the next I

00:16:22

phone and then it's already gone

00:16:25

history you know because we're

00:16:26

developing. So so remote don't get

00:16:33

stuck on the numbers too much simply

00:16:34

because it really is moving back quick.

00:16:37

But inference obviously is that that's

00:16:40

the really important part of it and and

00:16:43

how quick you can do and and I think

00:16:45

Andrew and of then by do would

00:16:48

definitely agree with me that it's

00:16:50

smoke while a lot you know and and

00:16:53

they've been able to use a whether it

00:16:55

will be an actual mobile phone that we

00:16:57

use or whether we will be you know bell

00:17:00

labs is is working on a a matchbox size

00:17:03

unit that that would you know say

00:17:05

attached to this jack a buddy would

00:17:08

keep all your data local so this

00:17:10

there's no sort of to and from the

00:17:12

actual cloud. And you've got the

00:17:14

learning and and inference going on in

00:17:16

them in this matchbox sized unit now

00:17:19

you can understand what kind of very

00:17:21

low power budget that would be for I

00:17:23

mean you know where we were credit card

00:17:25

size of the moment that something about

00:17:27

matchbox with all the capability to

00:17:30

learning. And and inference but again

00:17:33

don't get confused with the actual

00:17:34

training side you know we can do all

00:17:36

that in the server. And but it's

00:17:38

inferences bringing the on board I

00:17:41

don't know where you be but it's it's

00:17:43

an exciting to actually sit and watch

00:17:46

it happen. But yeah I like to call to

00:17:50

ES out actually bit as harbour makers

00:17:53

you know this field is evolving so

00:17:54

rapidly. And we were speaking earlier

00:17:57

if I commit something to harder today

00:17:59

what he gets ready here to you know be

00:18:01

obsolete. Now to me she special purpose

00:18:04

devices there's in a lot of exciting at

00:18:06

development section B of what I think

00:18:08

is from dubbing a startup called movie

00:18:10

use. Indeed you have a very well very

00:18:13

low low power and all seven grams you

00:18:16

for insane in the whole package seven

00:18:18

grams if you have a USB stick you can

00:18:20

train your network with cancer for all

00:18:22

it is take that into account for

00:18:24

example congestion mission also work of

00:18:26

a professor we could be passed yet that

00:18:28

matt is between the USA the do you have

00:18:32

actually I just startup costs alchemy

00:18:35

that they you what equal cortical

00:18:37

columns it slightly different style in

00:18:38

your networks to make it extremely

00:18:40

extremely low power in that survey you

00:18:42

know a lot of this is moving quite

00:18:44

rapidly so I worked on deep learning

00:18:49

for mobile for two years so I have a

00:18:52

little bit of more practical numbers

00:18:54

there. Um that the main thing and

00:18:57

mobile when you're and planning models

00:19:00

is how fast reformed runs out of

00:19:03

battery. And the speed of the actual

00:19:07

running the model is limited by the

00:19:10

memory bandwidth it's not actually the

00:19:12

amount of computer to do but how much

00:19:14

how fast you can pump in images and

00:19:18

read them very layers and memory and so

00:19:21

on. Um on this note there's a lot of

00:19:25

research that's coming out recently on

00:19:28

one bit networks and like a bit

00:19:30

networks and so on where they're trying

00:19:33

to basically give you almost the same

00:19:36

accuracy maybe slightly lower accuracy

00:19:38

of the models that you train in full

00:19:41

float precision. But they will give you

00:19:44

about eight times or thirty two times

00:19:46

more throughput. Um about a year and a

00:19:50

half ago when I actually remember the

00:19:52

numbers we did about two to three FBS

00:19:58

of like a little net for example on

00:20:01

mobile on mobile CP is still on on a

00:20:04

knife ones and stuff see pews are

00:20:07

actually still faster for declining

00:20:09

applications tend you use. Um as of

00:20:12

today like if you for example do X nor

00:20:15

that's fit or just one bit X not

00:20:18

operations all all where it you might

00:20:21

actually get like ten fifteen I guess

00:20:23

but I don't think anyone published hard

00:20:28

numbers on that side. But if you want

00:20:30

specialise chips and we'd actually

00:20:33

makes one of the best chips there TX

00:20:35

one it's rated it and what's in it

00:20:38

gives you want to have a lot of

00:20:40

compute. So that's like really really

00:20:44

efficient if you want to do it like

00:20:46

robotics applications and stuff yeah

00:20:48

thanks yeah so so and ask my question

00:21:01

is about and so I speak as a years show

00:21:05

was mentioning in this presentation

00:21:09

eleven yeah what it's rectify fighters.

00:21:27

So there's an optimisation problem for

00:21:32

deeper networks even what directly

00:21:34

fires just happens later that the

00:21:36

deeper it is the more nonlinearity are

00:21:38

composed of harder is that right that's

00:21:42

also true with recurrent nets by the

00:21:43

way and and out complex like reasoning

00:21:47

based systems like naming that's or and

00:21:50

you're altering machines. Um so what

00:21:53

happened is we we were taking advantage

00:21:57

of the ability of unsupervised learning

00:21:59

to extract fairly good representations

00:22:02

as at least as an initial a starting

00:22:05

point for supervised learning. And that

00:22:08

were quite well and it continues to be

00:22:11

actually something useful if you wanna

00:22:13

do semi supervised learning and you

00:22:15

have a only access to a small number of

00:22:17

labelled examples or maybe you to rice

00:22:18

ruining and you have a small number of

00:22:20

labelled examples for a new category.

00:22:23

So the the combination of supervised

00:22:25

learning in unsupervised that is

00:22:27

something is actually coming back

00:22:28

there's several methods that have been

00:22:31

proposed that seem to be doing a film

00:22:34

interesting job. And I I think we're

00:22:36

gonna see more of that is we are

00:22:39

expanding the reach of applicability of

00:22:41

the nets two domains where we might

00:22:44

have only a few examples of menu

00:22:46

category but of course that wouldn't

00:22:48

generalised unless you have all the

00:22:49

source of information so lots of and

00:22:51

labelled data or lots of other tasks.

00:22:54

And these as learning that this can be

00:22:56

useful there so why is it that the with

00:22:59

the rectifier is that a position gets

00:23:01

to be easier we don't completely

00:23:03

understand as I said earlier there was

00:23:16

somebody in the back raise their hand

00:23:18

first so oh yeah my name's martin in

00:23:29

computational materials physicist at

00:23:31

EPFL so you know we try to attract

00:23:35

assumption on it things and there's

00:23:38

always this back and forth you know you

00:23:40

don't have enough data we sale well for

00:23:42

us we've got loads of data. And and it

00:23:45

it is you know this kind of thinking

00:23:47

that for example in in linguistics

00:23:49

light. So that the sort of turn the

00:23:52

poverty of stimulus that Arafat humans

00:23:54

are able to learn. Um despite you know

00:23:57

secure absence of stimulus especially

00:24:00

in infancy. So is it do you think there

00:24:04

are things that can be learned or use

00:24:06

from biology to try and burst discuss

00:24:10

virtues to amount of data that's needed

00:24:11

perhaps for example the quest six euros

00:24:14

are only capable of intrinsically

00:24:16

learning one of the five possible types

00:24:18

fling was the grammar you know by

00:24:19

making these concessions or other means

00:24:22

to think we can reduced amount of data

00:24:24

that we actually need to do reasonable

00:24:26

work. and what I'm saying is it. I mean

00:24:34

there there are two sides to a I want

00:24:36

it one is the computer science I the

00:24:39

others the your sign side. And I've

00:24:42

been following for quite a few years

00:24:44

companies like new man it's you are

00:24:47

approaching it approaching I from or

00:24:50

rather a GI so artificial general

00:24:52

intelligence from then your sign side I

00:24:56

think it's really important to do that.

00:24:58

Um young the couldn't would say you

00:25:01

know you hear has the quote about some

00:25:04

Boeing doesn't use factors you know and

00:25:06

that this on the self phone that you

00:25:07

can take that that biological roots you

00:25:10

know we but it's it's taken the advice

00:25:13

and and I say that on the back of

00:25:15

various conversations about using

00:25:18

genetic algorithms for example instead

00:25:22

of random search you know for finding

00:25:26

the find the the the best weights to

00:25:28

optimal roots and things like that.

00:25:31

Because evolution and is one way that

00:25:37

we know is very simplistic it may take

00:25:40

a long time but you know when we think

00:25:42

about how we live as as children we

00:25:46

when we first bone you know the the

00:25:48

first thing that that we do is we live

00:25:50

in edge detection and then as the eyes

00:25:52

develop we we gain the ability to sink

00:25:54

alone we gain the ability to put the I

00:25:56

just together the way that the learning

00:25:59

those but we also say that's we're able

00:26:03

to for example on the sun the concept

00:26:05

of a call without seeing a thousand

00:26:07

calls you know and and and these huge

00:26:10

data sets but you also have to take

00:26:12

into account that the average human

00:26:14

we'll spend a quarter of the life in

00:26:16

school. And then you know still

00:26:19

continually learning so is really I

00:26:23

don't know it's the argument a space

00:26:24

again with them unsupervised versus

00:26:26

supervised learning but I think

00:26:28

following the biological route will

00:26:31

help it'll be another way it'll give us

00:26:34

you know another inside and also the

00:26:36

neuroscience I I do believe that with

00:26:38

we live in a great deal about ourselves

00:26:41

and how we operate by by following I by

00:26:45

learning how we learn helps is to learn

00:26:49

how to teach machines or how to show

00:26:52

machines how to learn. And I think

00:26:54

that's a very vital So that that

00:26:57

bridging the gap between your signs and

00:27:01

deepening is actually one of my pet

00:27:03

projects for the last year or two and I

00:27:06

think it's a really interesting

00:27:08

exciting direction there's a lot we

00:27:10

don't understand about how the brain

00:27:12

souls the the the the learning problem

00:27:14

that had a large scale. Um and that we

00:27:17

have lots of concepts deepening but we

00:27:19

don't have yeah but we're still kind of

00:27:23

we haven't solved the advise any

00:27:24

problem really and so I think we can

00:27:27

maybe get a lot of inspiration from how

00:27:29

humans do it. Uh regarding your

00:27:31

question about learning from there are

00:27:33

a few examples I in fact a lot of the

00:27:35

research and E planning is motivated by

00:27:37

this very question. Um how do we to

00:27:40

transfer learning and provides learning

00:27:42

all of these approaches are trying to

00:27:45

answer the question if we're given a

00:27:48

new task can we learn from very few

00:27:50

examples but the the the general answer

00:27:54

to this is this is again there's no

00:27:55

free lunch and the only we can make

00:27:57

that work is because the the learner

00:27:59

has already discovered lot about the

00:28:01

world the reason the human is able I

00:28:03

think the reason the human is able to

00:28:04

learn language from what the little

00:28:08

data compared to the the massive

00:28:10

amounts of data we're using now forcing

00:28:11

machine translation his because not

00:28:14

child is also a observing everything

00:28:18

else in the world not just you know

00:28:19

hearing utterances backchannel this

00:28:22

understanding sort of intuitive physics

00:28:24

that child is understanding social

00:28:27

relationships and and building a causal

00:28:31

model and an explanatory model of how

00:28:33

the world works. Then basically what

00:28:36

happens I think is using language to

00:28:40

you know put names on on things but

00:28:42

these are we things you understand

00:28:43

that's why child will be able to

00:28:45

recognise and you say animal from a

00:28:48

single picture. Because the child has

00:28:51

already formed the notions of of

00:28:53

animals and mammals and you know having

00:28:55

legs and eating and so on. That's

00:28:58

that's how I see it my thank you. So I

00:29:23

I would like to get back a little bit

00:29:25

about the biology but it and especially

00:29:28

what you say before now we can catch a

00:29:32

little bit more of course well it it

00:29:33

and then the next step of correlational

00:29:35

so I could problems that elected

00:29:37

relations in the next would because I

00:29:39

need a yeah I would like to do a little

00:29:41

bit last yet they see a menu for the

00:29:44

process ventral a little bit of your

00:29:46

work about a more biologically

00:29:48

plausible way to write create networks

00:29:54

and dependent side that's dependent

00:29:57

plasticity right right and I would like

00:29:58

to know a little bit if you think this

00:30:01

might be a way to get a more

00:30:04

correlation like results. And the only

00:30:07

one and then on that it would like to

00:30:08

know if at face book you guys I

00:30:10

approach it is a problem at all and if

00:30:13

yes how right so what I talked about

00:30:17

last you a nice email is something that

00:30:19

has continued in my lap and we made a

00:30:21

lot of progress and we put out a few

00:30:23

archive papers if you're interested a a

00:30:26

lot of our focus has been on just

00:30:27

making a biologically plausible version

00:30:29

of backdrop because you know backdrop

00:30:32

is really as I said the work course of

00:30:34

the successes we have recently and and

00:30:36

it's gonna crazy that we don't have a

00:30:38

reasonable theory of how breeze could

00:30:39

even do that but but I'm also very

00:30:43

interested in a biologically plausible

00:30:45

version of some form of unsupervised

00:30:48

learning a and the things we're

00:30:50

exploring now you can think of is more

00:30:52

like those machines energy based models

00:30:55

except I'm going I'm trying to throw

00:30:56

away the energy altogether because it

00:30:59

imposes too much too many constraints

00:31:01

and yeah I'm I'm I'm very optimistic

00:31:06

that will soon have biologically

00:31:11

plausible at least at some superficial

00:31:13

level methods for both scrutinise and

00:31:17

unsupervised learning that doesn't

00:31:18

answer the question about causality

00:31:19

though this is a totally different

00:31:21

battle tricks and there is not enough

00:31:23

work in that direction from pure even

00:31:25

truly machine learning point of view.

00:31:26

So on the biological aspects at phase

00:31:34

but at least we're directly not doing

00:31:38

any research like yeah sure has his pet

00:31:43

project. Um but on the causality front

00:31:47

we are actually very very interested in

00:31:50

building causal models of the recently

00:31:53

published a paper on doing causal

00:31:57

inference in visual scenes. Um

00:32:01

basically trying to understand if

00:32:04

there's a car in the scene and there's

00:32:08

a bridge trying to for example as the

00:32:11

question of like is the car there

00:32:13

because the bridge or is the bridge

00:32:15

there because the car. Um these are the

00:32:17

questions that some of our researchers

00:32:20

Leon but to David Lopez because these

00:32:23

guys are really interested in and the

00:32:27

model that we they'll be called neural

00:32:30

causation coefficient. Um and we think

00:32:36

we have some very interesting initial

00:32:38

results so we maybe train the model on

00:32:42

us and that explain distributions to

00:32:45

just basically given to variables tell

00:32:52

the terrible "'cause" the the other

00:32:55

verbal like whether a Cosby or because

00:32:57

they or neither a or B or cost but each

00:33:00

other and so on. And it actually

00:33:03

generalises a train on this a syntactic

00:33:07

distributions just to do this call so

00:33:09

analysis it generalises to doing

00:33:14

"'cause" analysis and natural images

00:33:17

with the features extracted from

00:33:19

residual network car models be trained

00:33:22

on image ask for example yeah you can

00:33:25

probably read a big war that okay thank

00:33:28

you is it thanks and so there's a

00:33:36

tendency not is that the networks

00:33:39

become more and more complex so they

00:33:41

get their good drawn deeper and deeper

00:33:43

and also there's you types of layer

00:33:45

being injured is being introduced like

00:33:48

there's dropout layers there's new

00:33:49

kinds of the nonlinearity is there is

00:33:52

layer now molestation and so on. And so

00:33:57

and anything so motions are learning

00:33:59

more and more about humans somehow

00:34:01

learn less and less about the model so

00:34:03

the mark on six the most become the

00:34:06

less inside we get from from the small

00:34:08

so is there any systematic way of

00:34:10

somehow getting to know what the nose

00:34:14

now knows and what it doesn't know so

00:34:17

problems that's the way it's so the

00:34:18

activations works very well in some

00:34:20

cases like the when you have visual

00:34:23

recognition where you can just see them

00:34:26

I wasn't the noses and this is but one

00:34:29

general approach to understand what the

00:34:31

no nodes and whatnot when you have this

00:34:33

generally sorry another problem there

00:34:39

is a systematic recipe. It's called the

00:34:43

scientific method okay we we have we

00:34:47

are in front of the phenomena in a

00:34:49

learning a button that you know is

00:34:51

trained and and it's we can see what

00:34:54

happens while straining we can see what

00:34:55

happens after a strain and we're trying

00:34:57

to figure out what what's going on what

00:35:00

what in queens matter why built

00:35:03

theories around I mean that's a that's

00:35:07

something that the machine any

00:35:08

committee doesn't do enough a lot of

00:35:11

the work unfortunately is basically

00:35:12

tinkering. And not busy trying to

00:35:16

figure out you know wise working even

00:35:18

one is working if you build a system

00:35:20

which has like ten different

00:35:22

ingredients. Um and you be the

00:35:25

benchmark great you get a paper but

00:35:29

have we learned something. So

00:35:31

fortunately I mean there are people

00:35:33

really focus on not just getting better

00:35:36

numbers but trying to understand what

00:35:38

what is going on or at least which part

00:35:40

some more important so it's easy when

00:35:41

you get into sort of the engineering

00:35:43

mode of building something that works

00:35:44

well to forget about why but the why

00:35:47

question is really central to answering

00:35:48

a question and and it's there should be

00:35:51

more of it but there there there is I

00:35:54

mean I think good papers are trying to

00:35:56

also deal with the wine. So your

00:36:01

initial state and I would say first of

00:36:04

all models are becoming simpler not

00:36:07

more complicated expect that I mean I

00:36:10

mean go and no into the role recurrent

00:36:13

net scenario but that engine everything

00:36:15

but if you just look at con that's

00:36:17

three twenty dramatic comments or

00:36:20

around any tell is comments had way

00:36:23

more types of normalisation many more

00:36:26

types of pulling many types of non in

00:36:29

their teas and we use second order

00:36:32

methods to optimise things and so on

00:36:34

and right now all you do is three by

00:36:36

three compilations stack up a few

00:36:39

layers send like max building and

00:36:42

that's about it and like every con that

00:36:44

use today is pretty much built out of

00:36:47

these this one recipe right. Um sure

00:36:52

you have resin that's that recently

00:36:53

appeared but come on do not that

00:36:56

there's just like one nice identity

00:36:58

connection. Um in terms of in terms of

00:37:03

like how do we analyse. I think as you

00:37:06

ashes said are in the deep departing

00:37:10

committee as of today does suffer from

00:37:13

not doing the scientific method. Um

00:37:17

it's basically just beating whatever

00:37:20

benchmark there is like squeeze out

00:37:23

half a percent on so art and then you

00:37:25

have new state of the yeah it would

00:37:29

just I think the community would just

00:37:30

mature over time but until that it is

00:37:34

ooh and that there is an that there's a

00:37:40

whole lot of progress been made simply

00:37:43

because they the commercial value you

00:37:45

have to sort of recognise that and yes

00:37:51

you go and and yes face poking yes try

00:37:53

to you know have all got these products

00:37:55

and you know this is what I said about

00:37:57

infants pain it really really vital.

00:38:00

And and they have product so they they

00:38:01

have you know millions and millions and

00:38:03

not billions of the customers and and

00:38:06

they have to get this right. So yes a

00:38:08

lot of research is concentrated on

00:38:11

getting the quickest getting and you

00:38:13

know the the the cheapest in terms of

00:38:16

energy into in terms of computer cost

00:38:20

in terms of passing it over to but

00:38:23

there is a large amount of research

00:38:26

going on possibly not as much and and I

00:38:29

think this is what we get a bad rap for

00:38:31

not doing enough they re there's a lot

00:38:33

more commercial reason to to put the

00:38:36

sound get respect to enhance the

00:38:38

benchmarking then there is very but

00:38:40

there is a lot of the a lot of people

00:38:42

working on the ferry side of it. Um you

00:38:45

know I I'm obviously biased insane

00:38:47

check out and orgy sign and and and his

00:38:51

PHD songs paper on pruning but there's

00:38:56

also a whole lot if you delve into

00:38:58

archive. And you know if if anybody's

00:39:01

not really familiar with archive you

00:39:03

know you can search through and put in

00:39:05

there and things like that that there

00:39:07

is a lot going on. It just kind of get

00:39:09

so overwhelmed by be massive amount

00:39:13

that's going on on the commercial side

00:39:15

so so maybe one question very two twos

00:39:18

is that kind of is for I think

00:39:22

extremely important specie because I'm

00:39:24

from the computer vision sign want to

00:39:27

this phenomenon of tuning of the things

00:39:29

getting a proposal something in the CPL

00:39:32

paper is something we see so one

00:39:35

question was how we should reckoning

00:39:37

value for new architectural be judged.

00:39:39

So yeah so we when we have say that we

00:39:43

should not be just happy because we do

00:39:44

one percent better than when should we

00:39:47

as as over your of the century paper

00:39:50

with one one but growing should we

00:39:52

judge this architecture with is

00:39:54

architecture. So I'm involved in a lot

00:39:58

of conference organisation and things

00:40:00

like that and workshops. And I think in

00:40:03

general there is not enough there's too

00:40:05

much value put on you know. Q or

00:40:09

performance. And none of value put on

00:40:12

having an interesting story. And

00:40:15

interesting hypothesis an interesting

00:40:18

theory that from which we can learn

00:40:23

that something that could go beyond

00:40:24

this particular task this particular

00:40:26

benchmark and and it's even more

00:40:30

interesting if if that use more novels

00:40:34

more original. So the danger if we

00:40:38

don't do that is that it may happen to

00:40:42

our field what happened say in the

00:40:44

speech recognition community for more

00:40:47

than a decade where people don't

00:40:49

explore that much they just take the

00:40:51

existing ideas and tweak them in the

00:40:53

make little variations. And if you if

00:40:56

we do that then we lose the ability to

00:40:59

rediscover very different approaches

00:41:02

and so we need more to encourage more

00:41:05

exploration things that that are quite

00:41:08

different but but somehow maybe you

00:41:11

know a good solution because we have

00:41:12

this true then you know and appealing

00:41:15

your your hypothesis behind it. That's

00:41:18

what I would like to see more and that

00:41:20

would like to see reviewers put more

00:41:22

weight on on that originality and

00:41:25

that's an interesting idea is rather

00:41:28

than just the numbers I would I would

00:41:35

ideally one really years to put more

00:41:38

emphasis on originality then like how

00:41:42

you got this number that the reality is

00:41:44

that you submitted paper TCP alright

00:41:47

well get rejected if you're not state

00:41:49

of the art. Um I think a uniform set of

00:41:55

hardware benchmarks if they're

00:41:58

established. And that would be really

00:42:00

helpful like if you can judge an

00:42:05

architecture by how well the

00:42:07

generalises across a large variety of

00:42:10

tasks generalisation is what we care

00:42:13

about when we build any of these models

00:42:15

right. So bill like if you give the

00:42:19

choice to the researchers on how to

00:42:22

write the story of generalisation they

00:42:25

will write it exactly to what their

00:42:27

findings are like they will basically

00:42:29

say my model really generalise as well

00:42:32

to this this and this data send and

00:42:34

those three is this are probably like

00:42:36

similar in nature anyway so as a field

00:42:40

for example the computer vision feel

00:42:42

there's a lot of good work being done

00:42:44

to established harder and harder sets

00:42:48

of benchmarks every year like the coca

00:42:49

challenge visual cue name but yeah it

00:42:53

has its buses like basically like

00:42:55

having these harder and harder

00:43:00

benchmark setting is the right way and

00:43:01

like that would be like good but and

00:43:06

getting about so I I suppose I wanna

00:43:08

use this as a as a shouts out seawall

00:43:10

too really make an effort to go outside

00:43:14

of the box yeah mentions what I was

00:43:17

talking that I'm in conversations with

00:43:19

people using genetic algorithms and

00:43:23

this particular developer is you know

00:43:26

just an extremely scale talents a guy

00:43:28

he's been working on software

00:43:30

development on chips and a whole host

00:43:35

of different things and for about

00:43:36

twenty five years this particular thing

00:43:39

this using genetic algorithms he spent

00:43:41

about a decade on it. And he's never

00:43:44

really pushed it because you constantly

00:43:46

got got flack really fees it was kind

00:43:49

of like not it just totally different.

00:43:51

And and I was still getting barry is

00:43:55

for these new ideas. I suppose the

00:43:57

shall that is to be brave and think

00:43:59

outside of the box you know they they

00:44:01

the proton that I've I've just left

00:44:03

after its first week with napster

00:44:06

setting is to really the driving forces

00:44:09

to be creative and think outside the

00:44:11

box there are lots and lots of people

00:44:14

is is not actually said just

00:44:16

incrementing on top of existing

00:44:18

knowledge and and make it better and

00:44:20

that's great but we need people to

00:44:22

throw in you know that that

00:44:24

evolutionary chance that population

00:44:26

every round and thing that that will

00:44:29

help as a lot quicker gets a where

00:44:31

we're going. So for example look into

00:44:34

genetic algorithms with the neural

00:44:36

networks and you know people old enough

00:44:38

to do that I think outside the box and

00:44:43

just just one oh I need to do is just

00:44:46

one comment because I I mentioned

00:44:48

earlier that if we go to premature like

00:44:51

on TV harbours some structure they have

00:44:53

today they said if you would is

00:44:54

evolving. However you know estimation

00:44:57

all nature helps is that the same way

00:44:59

that just said that the way the machine

00:45:01

many works because you have this

00:45:02

structure likely we do have some I'm

00:45:05

number basic math operations or

00:45:07

mathematical operations that you you

00:45:09

know turn out to be the foundation of

00:45:11

these you know aspect expect to put buy

00:45:13

those things. So it's a decode people

00:45:15

you know think outside the box but

00:45:17

using the boxes that we have thank you

00:45:26

for your presentation the context of

00:45:29

sell cars frightening to trust

00:45:32

behaviours learn by earning as coming.

00:45:38

And then so my question is this one is

00:45:43

anyway you're right around the

00:45:47

prediction that were besides a very

00:45:51

cool results testing sets you will

00:46:03

never get any guarantee from a machine

00:46:05

learning system period and you don't

00:46:09

get any guarantees to make even by the

00:46:10

way you are judging acumen by you know

00:46:16

their behaviour same thing with a

00:46:18

machine learning system and we can make

00:46:22

that test much more extensive by the

00:46:25

way is not just you you pass a twenty

00:46:28

minute you know driving test you

00:46:31

actually have cars run for and those of

00:46:35

thousands of Miles or whatever it is

00:46:36

now that being said I wouldn't trust

00:46:42

the current systems to drive my car I

00:46:45

think there's a lot that's missing. But

00:46:48

I have a lot of confidence as well that

00:46:50

you know you need to get better and

00:46:52

substantially better the next two

00:46:53

years. But we have to be empirical

00:46:55

there's no other way hi maybe the

00:47:09

question to the hardware developers. So

00:47:11

D currently using like open source a

00:47:16

operating systems an open source

00:47:17

frameworks and in the middle which is

00:47:19

kind of this black box called could is

00:47:23

there a chance we see in idea moving

00:47:25

into more or like position where we can

00:47:29

actually have access to the source code

00:47:31

of these things too I mean we can also

00:47:34

help make them better with full

00:47:37

requests and stuff. and that the

00:47:39

discussion comes up a lot and and I

00:47:41

can't speak for you know the the the

00:47:46

chip teams and and certainly I can't

00:47:49

speak for jensen. Um but we recently

00:47:52

had the machine learning so much where

00:47:54

we invited a whole lot of research is

00:47:57

in in this is January that headquarters

00:48:01

and santa Clara and nothing question

00:48:03

came up again and they did agree to at

00:48:06

least try open up the parts of QT and

00:48:11

then the you know the the the same

00:48:13

kernels that we really need that we

00:48:15

were holding quite type C and they

00:48:18

agree to that how much of a future they

00:48:20

would agree to do I I'm afraid I can't

00:48:23

speak but it is a valid question that

00:48:25

the we're listening to and then people

00:48:27

are considering but again I can't speak

00:48:31

for those who will make that ultimate

00:48:33

decision. But we hate you think it I

00:48:39

have to say this in actually no within

00:48:41

do we actually do take an approach

00:48:43

which is open source an open systems an

00:48:46

actual my no talk on Friday all

00:48:48

explained that well over code is open

00:48:51

source including the instruction set or

00:48:53

GP using you can program in assembly if

00:48:55

you care to so you know they do this is

00:48:58

to make it open again and several

00:49:00

reasons for that to know what some ways

00:49:02

to benefit from the small the comedian

00:49:05

assisted in speaking here you know

00:49:07

there's a lot of developments to

00:49:08

happenings to to happen. So they've

00:49:10

usually been you open an open source is

00:49:12

definitely one way to go so as an open

00:49:18

source telephone who has to deal that

00:49:21

close or systems from and media every

00:49:24

day. I feel you I think in one one

00:49:30

interesting thing there is windows and

00:49:34

Linux for somewhat like that as well

00:49:36

all the latest and greatest drivers and

00:49:39

technologies and everything used to be

00:49:42

written for windows and like Linux was

00:49:44

given secondary like six months a year

00:49:47

later stuff. Um it only takes a single

00:49:52

person and to write a single

00:49:54

implementation that is on par but the

00:49:57

closer system to basically break the

00:50:01

value and the closer system like for

00:50:03

example how Scott grey nirvana wrote

00:50:08

could I kernels for convolution and

00:50:11

stuff as fast as what and media ahead.

00:50:14

And that causes a lot of open this to

00:50:18

be forced upon. Um by the committee so

00:50:22

I think like if there's one thing it's

00:50:24

as as open source developers to really

00:50:27

push companies to not see the value

00:50:31

anymore to close or is there software

00:50:34

it seems like from like a purely market

00:50:37

driven perspective that seems to be

00:50:39

like the only thing that would open

00:50:41

things up things I one brought up a few

00:50:55

times the closer as it fine ah thanks

00:51:02

so it was brought up a few times the

00:51:04

inspirations from biology as we know

00:51:06

from the and real brain networks there

00:51:11

happily recordings. And but the very

00:51:15

successful cases of applications of

00:51:17

declaring for example they're mostly

00:51:19

record. I mean they're like expressing

00:51:21

or point I've you is it a matter of

00:51:24

practical difficulty that you haven't

00:51:27

that it hasn't been tried or you think

00:51:30

that even if we tried like current

00:51:33

neural net force there might not be a

00:51:35

significant advantages rick if if the

00:51:37

records connections yes one of the

00:51:44

things I've been working on is using

00:51:47

the recurrent connections as part of

00:51:49

the computation for and I don't mean to

00:51:52

three to process sequences but just to

00:51:53

even process a single example in using

00:51:57

those we contractions both for

00:51:58

computation. And for credit assignment

00:52:01

what backdrop is is doing. Um so that's

00:52:04

precisely the thing I was talking about

00:52:06

oh here the tries to bridge the gap to

00:52:10

between declining and and and your

00:52:12

science that's one of the ingredients.

00:52:14

Um there's this one question we've

00:52:17

which we yeah have absolutely no idea

00:52:21

how brains could handle it is something

00:52:24

like back problem through time in other

00:52:26

words. I think we have some reasonable

00:52:29

ideas of how brains could implement

00:52:31

backdrop it in the sense of a single

00:52:34

static input. And and the recurrence

00:52:37

essentially propagates the equivalent

00:52:40

of gradients. But the the thing we

00:52:42

really have no idea is how to do it you

00:52:44

could go to back up your time where you

00:52:46

have a sequence of states. And now it's

00:52:49

really doesn't sound very logically

00:52:52

possible that you would have to store

00:52:53

the whole sequence which might be your

00:52:54

whole day right. And then somehow you

00:52:58

know playback backwards you know and

00:53:02

computing the greens are already

00:53:04

equivalent that's something it's

00:53:07

totally open question as far as I'm

00:53:08

concerned you when this question

00:53:15

because we were supposed to finish at

00:53:17

fifty nine three twenty six lives of

00:53:19

feeling people are your to that you

00:53:21

train so Thank you. And they have a

00:53:26

really mean questions related to that

00:53:29

but maybe so there was a question which

00:53:31

problems can be solved and which not

00:53:33

which is well the easy to answer and

00:53:37

also I like I mean the problem is quite

00:53:40

it's to do engineering or in general

00:53:42

engineering just benchmarking and so

00:53:45

on. And I really like the opinion

00:53:48

analyse the things and find out what's

00:53:50

happening. But I'm searching for and

00:53:52

then maybe you have the onset is and

00:53:55

brought analysis of gender the the

00:53:58

problems there so radical in the

00:54:03

properties of the problems of the data

00:54:05

of the quantity of the distribution of

00:54:08

the data and whatsoever. And the

00:54:11

possible networks not only CN then and

00:54:14

alter encoders but also the others yeah

00:54:16

man multidimensional recurrent networks

00:54:19

and whatsoever knowing when can be

00:54:22

applied want. Why. Why not is there any

00:54:27

very nice deep analysis of that amount

00:54:31

which you would slide with a book which

00:54:34

was supposed to help dealing with that

00:54:36

question right so once you understand

00:54:40

the building blocks a lot of these

00:54:43

different architectures start making

00:54:45

sense. And and then you can sing

00:54:48

considering your problem for example

00:54:51

recently I was looking at how can we

00:54:53

use these things to model three

00:54:56

structural proteins right so it doesn't

00:54:57

really fit the things we normally do.

00:55:00

But but you can we use a lot of the

00:55:02

ideas that have been around the the

00:55:05

algorithms that we know and and compose

00:55:08

them in new ways once you really you

00:55:09

know make sense in your mind about what

00:55:11

they're doing and why. So yeah I'm not

00:55:15

I I can't give you didn't answer of

00:55:16

course in in one sentence all these

00:55:19

things but that's the kind of thing

00:55:21

that you get by Reading a book or by

00:55:23

following the literature in trying to

00:55:24

make sense of these methods are just

00:55:26

you know how can I implement them but

00:55:29

no why you know why is this for example

00:55:33

why we have bidirectional recurrent net

00:55:34

we did this good reason it makes sense.

00:55:37

Um so what do we use attention well

00:55:41

this this good reason I mean it it it

00:55:43

it helps us deal with a particular

00:55:44

issue. Um once you understand these

00:55:47

things you can you can you know be

00:55:48

creative and and apply the menu

00:55:50

settings and I think it also comes back

00:55:54

to what I said well there and so again

00:55:57

I assume I think the the question and

00:55:59

using I say do a like you know what's

00:56:03

the potential effects for each having

00:56:07

them automatic configuration of say

00:56:10

what types parallelism to use of which

00:56:12

points in in the layers but also which

00:56:16

types of layers I mean what I know it's

00:56:20

a lot of hardware but theoretically I

00:56:21

mean what's what's involved. So that we

00:56:23

can do that there's a lot involved

00:56:30

there is very hard problems and the

00:56:33

world of compilers that need to be sold

00:56:36

or before you can get to like automatic

00:56:39

fertilisation. And that actually ties

00:56:41

very much into the the specific

00:56:43

problems and can compilers are not

00:56:45

specific to compilers but they are from

00:56:47

grab teary in general and they also

00:56:52

apply to searching for ideal neural

00:56:55

network architectures are like stuff

00:56:58

like that. Um as of today humans do it

00:57:02

by hand but you could probably build

00:57:06

some kind of deep Q that for that

00:57:11

predicts switch parallelism to you is

00:57:15

and it's not actually implausible at

00:57:19

this stage of very yeah I don't think

00:57:23

we're file format right I I think it's

00:57:24

I was that far. I wouldn't say we're

00:57:27

not five everywhere not far from it I

00:57:30

would say it wouldn't have been a

00:57:33

plausible talk about two years ago. Now

00:57:38

it's plausible. It's not possible I

00:57:44

don't know and yeah I think do your

00:57:47

question like what you actually said

00:57:50

was pretty bad spot on read and

00:57:52

understand the textbook and that would

00:57:54

be the first I'm not really happy was

00:57:59

set on soon yeah I mean I wrote a book

00:58:02

yes of course I I read many books and

00:58:05

me purposely may be I have some quite

00:58:07

good I yeah but typically these people

00:58:11

who only use deep learning they know

00:58:12

their problem. And they don't want to

00:58:15

have to dive into all the specifics of

00:58:18

the neural networks and understand

00:58:20

everything in order to map that finally

00:58:22

to the problems. I mean Marilyn

00:58:24

analysis of the problems and finding

00:58:26

then which article architecture works.

00:58:29

The other way around yeah I I think so

00:58:35

ms was talking about using machine

00:58:38

learning and reinforcement learning you

00:58:41

know to to try to do these kinds of

00:58:43

things but it's quite now what right

00:58:45

now it's quite you know like it's it's

00:58:48

research goal is not something we know

00:58:49

how to do and and the the the result of

00:58:54

this is that people who have that

00:58:56

expertise are you know in big demand

00:59:00

from industry and they can earn a lot

00:59:03

of money so until we can automate that

00:59:07

job which seems pretty far off. I think

00:59:10

we're gonna have to go the hard way and

00:59:12

and and you know have engineers learn

00:59:15

about the the the underlying science

00:59:18

sufficiently to you know combine these

00:59:20

building blocks together. And that

00:59:22

being said I think something positives

00:59:23

happening with the software tools and

00:59:27

and the progress with hardware which

00:59:30

makes it easier for people without

00:59:34

necessarily doesn't the strong

00:59:35

mathematical background our a lot of

00:59:38

expertise in in using your that to

00:59:40

tinker so you know once you understand

00:59:43

a few basic ideas for machine learning

00:59:46

and from from you on that you can and

00:59:48

if you have a good software tools which

00:59:50

you know are well organised with lots

00:59:51

of existing examples of you know

00:59:53

different types of architectures and

00:59:55

then module architectures and and

00:59:57

efficient a hardware you can actually

00:59:59

do a lot by just you know playing

01:00:02

around with the the label blocks and

01:00:05

you know for simple five six our time

01:00:08

investment anybody can go through

01:00:11

online courses and and get a grasp of

01:00:14

it but I suppose exactly how does each

01:00:18

and every application gain from D

01:00:21

learning I'll be quite happy if we

01:00:24

don't quite so that with a I just yeah

01:00:26

because then not the average all so

01:00:28

okay so maybe you can things the

01:00:33

speakers again on the I will tomorrow

01:00:43

we will us assume is presenting torch

01:00:46

basically for three hours starting up

01:00:49

yeah exactly so so so we use for I

01:00:54

don't in talk on your show we give a

01:00:55

more modern talk about defenders you

Share this talk:

Conference Program

59:34

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
July 4, 2016 · 2:01 p.m.

2368 views

55:38

Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
July 4, 2016 · 3:20 p.m.

427 views

01:01:02

Day 1 - Questions and Answers
Panel
July 4, 2016 · 4:16 p.m.

330 views

55:14

Torch 1
Soumith Chintala, Facebook
July 5, 2016 · 10:02 a.m.

815 views

55:57

Torch 2
Soumith Chintala, Facebook
July 5, 2016 · 11:21 a.m.

342 views

01:08:04

Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
July 5, 2016 · 1:59 p.m.

2156 views

49:29

Torch 3
Soumith Chintala, Facebook
July 5, 2016 · 3:28 p.m.

275 views

52:43

Day 2 - Questions and Answers
Panel
July 5, 2016 · 4:21 p.m.

151 views

45:40

TensorFlow 1
Mihaela Rosca, Google
July 6, 2016 · 10 a.m.

2659 views

52:33

TensorFlow 2
Mihaela Rosca, Google
July 6, 2016 · 11:19 a.m.

1704 views

01:05:51

AMD's Open Compute and Open Source cross platform solutions for Machine Learning
Mauricio Breternitz, AMD
July 6, 2016 · 1:59 p.m.

1406 views

01:04:41

TensorFlow 3 and Day 3 Questions and Answers session
Mihaela Rosca, Google
July 6, 2016 · 3:21 p.m.

2250 views

Recommended talks

02:44

Day 1 - Questions and Answers
Panel

Embed

Transcriptions

Conference Program

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
July 4, 2016 · 2:01 p.m.

Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
July 4, 2016 · 3:20 p.m.

Day 1 - Questions and Answers
Panel
July 4, 2016 · 4:16 p.m.

Torch 1
Soumith Chintala, Facebook
July 5, 2016 · 10:02 a.m.

Torch 2
Soumith Chintala, Facebook
July 5, 2016 · 11:21 a.m.

Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
July 5, 2016 · 1:59 p.m.

Torch 3
Soumith Chintala, Facebook
July 5, 2016 · 3:28 p.m.

Day 2 - Questions and Answers
Panel
July 5, 2016 · 4:21 p.m.

TensorFlow 1
Mihaela Rosca, Google
July 6, 2016 · 10 a.m.

TensorFlow 2
Mihaela Rosca, Google
July 6, 2016 · 11:19 a.m.

AMD's Open Compute and Open Source cross platform solutions for Machine Learning
Mauricio Breternitz, AMD
July 6, 2016 · 1:59 p.m.

TensorFlow 3 and Day 3 Questions and Answers session
Mihaela Rosca, Google
July 6, 2016 · 3:21 p.m.

Recommended talks

Q&A (Roberto Boghetti)
Roberto Boghetti, Idiap Research Institute
Nov. 15, 2021 · 10:13 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Day 1 - Questions and Answers Panel

Embed

Transcriptions

Conference Program

Deep Supervised Learning of Representations Yoshua Bengio, University of Montreal, Canada July 4, 2016 · 2:01 p.m.

Hardware & software update from NVIDIA, Enabling Deep Learning Alison B Lowndes, NVIDIA July 4, 2016 · 3:20 p.m.

Day 1 - Questions and Answers Panel July 4, 2016 · 4:16 p.m.

Torch 1 Soumith Chintala, Facebook July 5, 2016 · 10:02 a.m.

Torch 2 Soumith Chintala, Facebook July 5, 2016 · 11:21 a.m.

Deep Generative Models Yoshua Bengio, University of Montreal, Canada July 5, 2016 · 1:59 p.m.

Torch 3 Soumith Chintala, Facebook July 5, 2016 · 3:28 p.m.

Day 2 - Questions and Answers Panel July 5, 2016 · 4:21 p.m.

TensorFlow 1 Mihaela Rosca, Google July 6, 2016 · 10 a.m.

TensorFlow 2 Mihaela Rosca, Google July 6, 2016 · 11:19 a.m.

AMD's Open Compute and Open Source cross platform solutions for Machine Learning Mauricio Breternitz, AMD July 6, 2016 · 1:59 p.m.

TensorFlow 3 and Day 3 Questions and Answers session Mihaela Rosca, Google July 6, 2016 · 3:21 p.m.

Recommended talks

Q&A (Roberto Boghetti) Roberto Boghetti, Idiap Research Institute Nov. 15, 2021 · 10:13 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Day 1 - Questions and Answers
Panel

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
July 4, 2016 · 2:01 p.m.

Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
July 4, 2016 · 3:20 p.m.

Day 1 - Questions and Answers
Panel
July 4, 2016 · 4:16 p.m.

Torch 1
Soumith Chintala, Facebook
July 5, 2016 · 10:02 a.m.

Torch 2
Soumith Chintala, Facebook
July 5, 2016 · 11:21 a.m.

Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
July 5, 2016 · 1:59 p.m.

Torch 3
Soumith Chintala, Facebook
July 5, 2016 · 3:28 p.m.

Day 2 - Questions and Answers
Panel
July 5, 2016 · 4:21 p.m.

TensorFlow 1
Mihaela Rosca, Google
July 6, 2016 · 10 a.m.

TensorFlow 2
Mihaela Rosca, Google
July 6, 2016 · 11:19 a.m.

AMD's Open Compute and Open Source cross platform solutions for Machine Learning
Mauricio Breternitz, AMD
July 6, 2016 · 1:59 p.m.

TensorFlow 3 and Day 3 Questions and Answers session
Mihaela Rosca, Google
July 6, 2016 · 3:21 p.m.

Q&A (Roberto Boghetti)
Roberto Boghetti, Idiap Research Institute
Nov. 15, 2021 · 10:13 a.m.