Erik Meijer, Facebook

Thursday, 7 June 2018 · 2:29 p.m. · 59m 46s · 181 views

Embed code

00:00:00

fighting me um so yeah so i faced look i i'm

00:00:05

a runner group that does three things uh one

00:00:08

is a programming languages uh for data size and machine learning that's what i will talk about today

00:00:14

um and then we do to other things were using machine learning to make our developers more productive

00:00:20

and the third thing we do is we use machine learning to make

00:00:23

our systems more efficient and depending on my uh speaking speed

00:00:28

uh okay like you know might and be able to let me just put my um

00:00:38

yeah timer there so i i might have some time for some extra sites

00:00:43

um so here's what i've been cut like i see myself so

00:00:47

we saw a great presentation this morning about that formal methods

00:00:52

um but the uh current be kept in me was thinking oh oh the first

00:00:56

example was from tony whore in nineteen seventy one about h. search algorithm

00:01:02

and the second example was in twenty eighteen about a search algorithm and

00:01:08

so is this still the case that after so many decades of programming

00:01:14

language research formal methods that we still have trouble to ride got

00:01:20

and now a search method is something that's got really nice you know exactly what you want to

00:01:25

do you have like you know some values and you want to find a particular one

00:01:29

but what about more interesting problems like face recognition that we all have on our phones

00:01:36

or spam filtering and that we all rely on how

00:01:40

would you guys specify those improve done correct and

00:01:45

and so what i want to go to talk to you

00:01:46

about today is and maybe another way of programming were

00:01:52

read don't to use intentional specifications intentions messages in the

00:01:56

sense of a formula that describes what we want

00:01:59

but we're really are really easy that was also like in the second order like

00:02:04

programmer should be lazy so what if we just like you know try to get

00:02:08

like use examples and use that to kind of writer code for us

00:02:12

i think that would make this grumpy get even happier with

00:02:17

now here's i think the root of the problem the root of the problem

00:02:22

is that the abstractions that we use for programming are the

00:02:26

wrong once and and we always have this guy

00:02:31

like problem that's the abstractions that we want and the

00:02:34

abstractions that we have are far apart and

00:02:39

and we keep struggling cat like to move that stone up the wall and then like you know once we got like

00:02:45

up the the close up big i've held and the the

00:02:48

stone runs back and who here uh uses java script

00:02:53

ah not as many people as i thought but i get javascript this

00:02:57

is really bad right every week there's a new javascript framework

00:03:01

where you have to get like you know you just mastered it and then you know that the next one comes and then you have to and

00:03:09

re learn it now of course don't your didn't only guy like have this first example

00:03:14

of a formal verification he was also a very early person that identify this problem

00:03:20

in nineteen seventy two where he got like describe this famous diagram

00:03:24

of abstract data types where there's this functional the top a do a f. from a to b.

00:03:30

that's the thing you want but you have to implement it

00:03:33

in and uh uh and a more concrete domain

00:03:39

from x. to y. and so how to make a dire ram commute that is really the got like

00:03:44

what computer sciences are all about and that is where we spend a lot of our energy

00:03:51

and but as i said like are abstractions are

00:03:55

bad so the way we uh write code

00:03:59

is in terms of mutable actually labelled graph so if you would get like inspect the heap

00:04:04

of a java a house or star are or javascript program

00:04:08

what's the what's the data represents it's a new double edged labelled grass

00:04:15

um and then we're trying to write functions uh but they're

00:04:19

not really functions right these are also side effecting

00:04:23

uh functions that allow arbitrary nesting so mathematically to get

00:04:28

a get a grasp of that structure is really

00:04:31

easy and i think we shot ourselves in the foot by cab like you know starting with this

00:04:38

basically uh the machines that we know how to build are good and one thing and one thing

00:04:44

only they can move data around from one address in memory to another address in memory

00:04:52

and if you google for one instruction machine you will see that you know or this is actually

00:04:58

all you need to you need one instruction namely move data from one side to the other

00:05:03

and and you can do all of computing with that which by itself is pretty remarkable and

00:05:11

but the thing i second your that is ultimately get like it or relying on mutation

00:05:18

right because we're moving some some data from this memory location to that memory locations over writing

00:05:24

the other one so everything and you you can get like you know

00:05:28

you need only this one instruction but it's easy to emulate more

00:05:32

complicated instructions on top of the the the the essence of computing that

00:05:37

we have like the essence of the following my machine is

00:05:41

mutation by moving things from one addressed to the other

00:05:45

and that is the thing that really that we are fighting

00:05:50

and and here's the thing you know the the the the princess and the pea

00:05:55

and so dippy that's the i coloured from alignments b. and then we can get

00:06:00

like in our add layers and layers of abstraction on top of this

00:06:04

and like you know and the princess but the we still feel that darn

00:06:11

be right no matter how many layers of abstraction we put on top of our our underlying machines

00:06:18

you still feel it and in some sense you have to feel it this is called mechanical sympathy so we

00:06:24

cannot pretends to say oh we're doing declarative programming because

00:06:28

ultimately if you want to make your code fast

00:06:31

you have to understand that the thing underneath is a physical machine

00:06:36

and it's a physical machine that can move things from one member location

00:06:39

to the other what's even worse is that it's observer ball that's

00:06:44

moving from one memory location to another might be slower

00:06:48

than moving from some other member location to another

00:06:51

one because there's a whole grab like cash hierarchy so anyone that really was a right efficient goat

00:06:58

we'll have to deal with the underlying um concrete mess of the machines and

00:07:04

no layer of abstraction will help you to get like remove that right

00:07:09

and and this is the pain that i've been trying to get sold my

00:07:12

whole life and i think i must say maybe i've given up

00:07:18

i don't think i can solve this pain anymore in the traditional sense and

00:07:24

i don't think that like you know adding more layers of objectionable help i don't think

00:07:29

adding more formal methods will help we have to kind of go to it very different for

00:07:34

of computation with no that's look at another form of computational

00:07:41

another computational model that has been extremely extremely successful

00:07:46

and that's relational databases yeah i'm pretty sure that and maybe not that many people use javascript but a

00:07:53

lot of people use relational databases was so beautiful

00:07:57

about relational databases is that it will averages

00:08:02

really beautiful and simple mathematics

00:08:06

alright so it's relational algebra is very old

00:08:11

and that caught in nineteen seventy

00:08:14

say hey i see all this got mathematics is like you know i could i've said second take the

00:08:20

union of said second august simple operations who i can build like you know algebra on top

00:08:27

and certainly this thing turns into a multi billion dollar industry and and

00:08:31

you know a lot of our economy runs on it and

00:08:36

but here's the thing the reason it databases are shoes so successful is that

00:08:42

it started with a clean and this is my personal opinion

00:08:46

and it started with a very clean mathematical abstraction

00:08:50

and that allows you know you to build query up the miser send things like that

00:08:55

plus that the the implementation maps quite closely to the underlying harder and the

00:09:01

reason is the tables are not complicated they don't have nice things

00:09:05

and they're just flat things of base types um and still very powerful

00:09:12

now here's the problem as everyone that has got like ever

00:09:15

try to use databases from regular programming languages noticed

00:09:20

the data model for database is very different from these you double edged labelled graphs

00:09:27

and that our programs are normal imperative programs eleven

00:09:32

and all ready in um with the early eighties um dave

00:09:37

meyer got like you know i notice notice that

00:09:41

any goals that the impedance relational object relational impedance mismatch

00:09:46

and am i think this is still a a

00:09:49

people still still write papers about it and

00:09:53

but the right there all our map for big i've like design you type

00:09:57

systems to make these all the useful so again this is another

00:10:01

instance of this problem where there's the abstraction that we want to

00:10:05

which is a relational database there's the abstraction that we have

00:10:10

you double edged labelled graphs and we have to got like you know bridge that gap

00:10:14

and but here's the thing it's the mathematics that you cannot beat right

00:10:19

the fact that this is based on really simple beautiful mathematics

00:10:24

is what makes these databases unbeatable and i know i've the

00:10:28

scars on my back the bullet holes on my back

00:10:31

because i to write right at one point when i was young eric

00:10:36

the goal of my life was to eliminate sequel from this planet

00:10:41

okay i wanted to get rid of sequel and and so that's when i did they think uh

00:10:47

and everybody else was was doing no sequel i don't have anybody you're even remembers no sequel

00:10:54

then i showed that most the goal was to make the the categorical do all sequels i try to collect go sequel

00:11:01

but didn't work at all so now everybody's talking about new sequel

00:11:06

and even if you look at things such park who started out like you know it

00:11:10

flat map and all those great things now the preferred way to programme

00:11:15

spark is using data frames and are in a relational way

00:11:20

and so i've given up on trying to get like beads sequel and get rid of it

00:11:27

and i just looked back and said what is it that makes it was so powerful

00:11:31

and so um versatile and that must be the mathematics you you cannot be mathematics

00:11:37

and and so or if you cannot be that join it

00:11:42

so that's got like you know so that's use mathematics wherever possible

00:11:48

and and of course you should always gotta look too abstract from the mathematics so

00:11:56

and if you look at the relational algebra and it's a beautiful mathematical theory but the question

00:12:03

i always ask myself but i think everybody here should ask themselves

00:12:07

is is that an instance of a more general concept

00:12:11

um because maybe that more general concept as several instances

00:12:17

that you can use or maybe it's easier

00:12:21

or more powerful to implement that more general concept and that is true you know like

00:12:27

in the everyday shells rising instance of of applies league a category

00:12:32

um in in and category theory and now again

00:12:38

this is where i think the the simplicity of the regular like a relational algebra

00:12:43

a warm because people when they hear category teary they

00:12:47

got like a round screening and but i

00:12:50

still think that you know that it's really good to get like whenever you see something

00:12:55

try to abstract and see got like is this an instance of something more general

00:13:00

and and i think that is also what martin has been doing with dorothy got like you

00:13:04

know always trying to find there's in essence always say you can always peel more layers

00:13:09

from the onion and as you peel it later you start to cry because that's

00:13:13

what happens when you can handle onions but it's crying for good cost

00:13:19

alright so that's where we are today it's a very grim picture and

00:13:28

we are stock with this got like weird computational model of imperative computations

00:13:34

over these weird mutable graphs and and what happens is that

00:13:40

i think because we don't find a way forward in this thing because we're

00:13:44

limited by the they got like you know our our actual hardware

00:13:49

and what you see is that people got like the c. d.'s explosion of programming languages and

00:13:54

programming languages and frameworks i should say so it's like you know for the web

00:13:58

uh there's javascript with like hundreds of of frameworks every week there's a new wall

00:14:04

and if you look for more while there's also like you know free berkshire operating systems

00:14:10

uh languages that you know every day there's a gap like you know new wall

00:14:14

on the server side yeah we were talking over over the break about frost

00:14:19

well about this ross that's trying to get put the diaper lipstick on the cat like impaired this

00:14:25

big and which is great but that's not bring

00:14:29

a progress right we're not fundamentally changing something

00:14:34

so i think the world of software so for one point or is doomed

00:14:40

or do so that's why i like the title of of this data future software

00:14:44

the future software is not and what you see on the screen here the future software something different we have

00:14:51

tried for fifty sixty years to make this work we cannot make it work i'm convinced we cannot

00:14:59

so was there where their software to bordeaux arise because after one comes to an

00:15:06

and and so for two point oh there's no humans involved there's no a formal specifications involved

00:15:14

it's great that doesn't that sound great no humans no formal

00:15:19

specifications just data you just throw data at this thing

00:15:23

and it will train itself um and it will work and

00:15:29

now let's see if that can you know if that's just a fairy tale or not because we

00:15:35

just saw that like regular program is no fairy tale but maybe this is a very clear

00:15:42

so here's the thing this is what i'm going to try to get like convince you about uh today

00:15:47

and we know our databases database is a is a magic gadget where you give it codes

00:15:56

some people colour the query but really that's codes right you give it code it hands you back data

00:16:03

not a one trick the one mathematical trick that i used my whole career

00:16:07

and that serve me well it's duality worries flip things around so can we do the opposite

00:16:12

can be building magic device where we give it data and it gives it back got

00:16:21

would would that be amazing well and since it's just to do well

00:16:27

it's you work and because if we can do the thing

00:16:31

on the left we should be able to the thing on the right correct so that's got like programming two point oh

00:16:37

so we're going to get like build this magic device that thursday tack into got

00:16:44

and so programming one point o. was humans turning coffee into gold

00:16:50

um programming to bordeaux is machines that their data into

00:16:56

models of course people use fancy names is like queries but

00:16:59

model is just like it or is just regular code

00:17:04

and i should also got like ad is not just coffee right like

00:17:08

all the grad students here's like probably pizza more than coffee

00:17:12

and now didn't mention this briefly in his introduction

00:17:18

this morning there is a big difference between

00:17:21

the old world of code and a new world of model so maybe it's a good

00:17:26

the thing to get like it or distinguish between these two these two terms a coat

00:17:32

is deterministic and discrete when you have a boolean it's

00:17:36

discrete it has two values true or false

00:17:41

well that's that's very restrictive if you look at your spam

00:17:45

filter there something is span you know but maybe it's

00:17:50

not really span is kinda like a little that spam or a lot of spam so it's a boolean

00:17:55

but it's it's like you know it's true would like you

00:17:57

know eighty five percent and then falls fifteen percent

00:18:02

right and or even like you know it can be you know some range it

00:18:08

can be get like a are and i don't know the the um

00:18:15

how good a a sauna yes that's got like some number between zero and a hundred

00:18:22

um but you know it can be anything in between and like with as small as possible and

00:18:27

it has some kind of distribution so the big difference between go down baubles is that

00:18:33

models are uncertain have uncertainty and them and they're often continuous

00:18:38

and and so those are two things that we have really bands from our traditional programming

00:18:43

languages for a long time and a lot of developers are afraid of uncertainty

00:18:49

um because uncertainly means sloppiness now you didn't you see me

00:18:53

right sick you know i'm i love sloppiness i mean

00:18:57

i that's they got like you know the essence of happiness is

00:19:01

is like sloppiness and like you know to be free

00:19:04

and you know our fell use want to be free they don't want to be discreet they want to get take

00:19:09

you know different values with some probability and but here's the best thing

00:19:16

there's a lot of all really engine mad from the

00:19:20

sixty for some seventeen hundreds and eighteen hundreds that

00:19:24

we can use to make this all work and that makes me really exciting right because remember

00:19:29

you cannot beat maths so we're going to use this all mad from that you know

00:19:35

calculus off that was invented in sixteen hundred and bayes rule

00:19:40

and from seventeen sixty or something ever going to use that to build the new a wave of program

00:19:48

so and i'm going to start with supervised learning and and here's what

00:19:53

we want to do with supervised learning we want to use

00:19:56

again mathematics simple and all the mathematics i don't want to invent

00:20:02

new mathematics like we had to do for computer science

00:20:06

like mathematicians the old mathematicians i don't know what it was what they had in the

00:20:10

water or i don't know maybe it was the the fungus in their grain

00:20:14

but they were much smarter than than us these days right so i want to use

00:20:19

ancient mathematics and and i want to get use that to get like you know to

00:20:25

during those functions by looking at the difference of the training data

00:20:30

and the value that this function computes and they use that dealt

00:20:33

that okay i've tried to minimise that and do that by

00:20:37

minimising get by kind of updating mutating that function so i'm

00:20:42

still do invitation right so i'm still doing comparative stuff

00:20:47

but i'm only doing that when i'm training dysfunction when i'm shaping dysfunction think of that function as

00:20:52

being made of clay or something so i've got like holding that function but once it's done

00:20:59

it's a pure function so i also nicely separated big of imperative

00:21:04

parts and you have like you know i'll be and

00:21:09

purely functional part i know that's the very very simple example because i want to show you that this is really

00:21:17

to reveal math that everybody here in kindergarten has done this

00:21:21

matt okay so that's much easier than three opposed conditions

00:21:26

and separation logic and more triples and whatever complicated mathematics

00:21:32

reached a weekly i don't know lots like we have to gather invent all these artificial mathematics no

00:21:38

for machine learning we can use really simple that next so let's look at

00:21:42

the nerd redirection and is probably in physics class you don't some measurements

00:21:47

like speed versus distance or whatever as you get a whole bunch of wines and and what

00:21:53

you have to do is you have to to find a line through those points

00:21:58

so if this function there is defined by y. equals a. x. plus b.

00:22:05

what i need to find this a and b. because those are the two values that determined

00:22:10

at that line you see that read stuff there that's the error because you get these points are not aligned on this line

00:22:18

and and so what this magic device is going to do you feed at this point

00:22:24

and it will tell you a and b. and now you have learned this function

00:22:29

isn't that great and no programming required or maybe only wants to get built that magic device

00:22:36

and and some people say you know this is just curve fitting i'm happy with that

00:22:42

i'm happy i i'll take care of it they got any day of the week

00:22:45

over you know whatever proving to your rooms are fighting with the type system just get weaker fitting

00:22:53

so oh and the trick that this thing uses is something

00:22:59

called back propagation and back propagation relies on taking derivatives

00:23:05

but before that that's got like look a little bit at but like you know how these functions it look like

00:23:11

so in order to do uh and back propagation our functions need to be differential well

00:23:18

and in order for function to death be differential it has to

00:23:21

be a function that takes real numbers and returns real numbers

00:23:25

i don't know if you remember from calculus you know the functions have to

00:23:28

be continuously capsule delta i you know the you might remember those definitions

00:23:34

and so all these functions that we ride our functions that take a a

00:23:39

topple and and double of real numbers i return and double of of real numbers

00:23:44

and that function is defined as a compositional smaller functions

00:23:50

now and feed the chain rule we know that but

00:23:56

two functions are different trouble the competition is also differential so that works out nicely

00:24:01

um and then you can define the error function that you there's many ways to do that

00:24:07

but notice here that we are yet just like in the um case of relational databases

00:24:13

we do have a beautiful mathematical theory but the functions here are functions that take

00:24:20

doubles of real numbers where are imperative goats takes grass

00:24:28

so also for machine learning we have this abstraction problem where we have to find

00:24:33

and coders and decoders that go from like you know our domain objects into

00:24:38

they get like you know the the values that the machine or an elder the pentagon

00:24:42

i'm pretty sure that the the p. h. d. students here that do machine learning

00:24:46

and they they know how hard it is to go find efficient encoding so off like you

00:24:52

know say to reduce into these got like doubles of of real number should is it

00:24:57

an unsolved problem how do you officially encodes values um such that you can feed into

00:25:03

machine another thing but anyway what we want to do is given this training data

00:25:09

and we want to find a and b. that minimise the error and the air is defined as this song

00:25:15

off they get like you know the actual value minus they get like value

00:25:19

of this uh function applied to your to your uh training data

00:25:26

um this thing uses the derivatives of the staff

00:25:31

you don't have to give understand this i'm just writing this down the way i do these derivative right away

00:25:36

i go to will from all five i type in the function and will from awful tell me the derivative so it's even there you can

00:25:42

be very lazy and and then in order to train this what you do is you guys start with a and b. read them

00:25:50

and then you go for each have value and be a your

00:25:53

uh your training set you just got like you know um

00:25:58

updates the weights or a and b. using this format i i

00:26:02

you just repeat until you run out of your training data

00:26:06

isn't that beautiful so i'm using mutation and i just run this thing

00:26:11

i'm you they these values a and b. until i find them

00:26:15

and then i give them to you and then from then on you have a pure function that just computes

00:26:19

why from x. based on these a and b. a.

00:26:24

okay so now you might ask yourself eric you hands waved your waiter days

00:26:29

how does this all work and in a little bit more detail so that they got like if you

00:26:34

some details here which i think it's very beautiful that um

00:26:40

so why do we use derivatives well as i said we want to minimise this error function

00:26:45

and maybe you remember from high school that you know you can use derivatives to

00:26:50

find the minimum of a function so if you have this project function here

00:26:54

and you want want to find it men and then you gotta look where the derivative is zero right that's a good way to find it

00:27:01

um and here's the got like you know the derivative you learn

00:27:05

this would symbolic differentiation or you can define the derivative

00:27:09

using this thing with actually longer epsilon goes to zero and so you should get like all

00:27:15

remember that from my school correct now the question is how do you find that derivative

00:27:21

and how do we find that inefficient way and this is truly

00:27:27

stunning this the way you do days is is amazing

00:27:31

and and how do we do that by using more mats of course

00:27:37

okay and this is the trick people here remember complex numbers

00:27:45

ideas complex numbers if you do physics or electronics or computer graphics

00:27:51

saw a cat like complex number is some mathematical

00:27:56

thingy that's somebody invented this as o. a. plus b. i. so

00:27:59

it's a pair of two numbers a and b. where

00:28:05

i square it was minus one

00:28:08

you can ask yourself why is i square minus one well

00:28:13

why not right and so now that's to ask another question what if we

00:28:19

didn't say trick we define a pair of numbers a and b.

00:28:25

and just a guess disintegrate we're not calling it i recalling actually long for they got second one

00:28:31

and then we say abstinence where it was zero why not right i mean

00:28:39

we can we can define our rules you can also ask yourself

00:28:43

what about if we do the same thing every instead of like i

00:28:46

and actually take j. i. we say j. squared equals one

00:28:50

good choice to the only great numbers or minus one zero i want all three choices

00:28:56

strangely enough have practical value city otherwise cold hyperbolic numbers you

00:29:02

can go to lead and people find uses for it

00:29:05

but this is the amazing thing so we take complex numbers that we've no for

00:29:08

for how long instead of i square kick was minus one we say

00:29:14

epsilon squared equals zero and what do we get from theirs for because we get

00:29:20

out the metric differentiation but just using dual numbers instead of normal numbers

00:29:26

we can compute in one go if a function and its derivative

00:29:32

it's just mind blowing and now if you want to prove of this it uses taylor expansion

00:29:39

but i think if you look at if you if you go to the

00:29:42

big database for taylor expansion that breed maybe more might blow and

00:29:45

i don't know how somebody can come up with the calf taylor expansion

00:29:48

formula and that's got like just crazy with like you know and

00:29:53

factorial is in some infinite summation but anyway if you want to prove this

00:29:58

you can use these are numbers so that's look at an example

00:30:01

right so here's our function f. that was defined as

00:30:06

an three x. squared plus four that's our ass

00:30:12

and that feed it and again like the reason i like mathematics is my brain is the size of a pea not

00:30:18

and i like to just crank the handle right i just like to do these

00:30:21

computations without thinking so that's his feet as this a plus b. epsilon

00:30:28

so x. equals a. plus b. f.'s also that becomes to re a. plus b. epsilon squared plus four

00:30:34

right just substitute now i'm going to go out like you

00:30:38

know um evaluate a plus b. apps lawn so a

00:30:42

plus b. squared is is squared plus to a. b. plus b. squared so that's what you see there

00:30:47

um no i distribute that three over does some and and a simplified little bit

00:30:55

and now we knew that um the derivative of have was um

00:31:01

and what was it it's a six a and a right or six x.

00:31:08

but here you see the the first term here three a squared plus four that's half of a

00:31:13

six eight that's the derivative of ever apply to eight times be upset on

00:31:18

and then hey because epsilon squared was zero that last year falls out

00:31:24

so if i apply this function to a dual number and i just compute

00:31:29

with it i get back a pair of the function apply to the

00:31:34

argument and the derivative and multiplied by the rest of the thing

00:31:40

isn't that pure magic i mean you won't believe me right

00:31:43

but this works for any function any differential function

00:31:48

and you know this is how you get now

00:31:50

implement uh and computing derivatives and using

00:31:56

operator overloading because we just overload all the operators would this be these dual numbers

00:32:04

and now what's even more beautiful

00:32:09

is that if i can't like the finder function that lives a normal function to a function over dual numbers

00:32:15

that function as a front door and now what does that mean

00:32:19

a phone to relevant or something that distributes over composition and

00:32:23

maintains the identities is like a crazy thing from category theory but if something is a font or it means that it's

00:32:30

it's good it's it's due to the end right so this thing is good and if you remember the chain rule

00:32:37

the chain rule for differentiation is really ugly but the chain role for

00:32:41

these dual numbers is really beautiful because it's just distribution over composition

00:32:47

so these dual numbers are my new best friends and i do it so oh

00:32:55

we can build this magic machine that takes data i and turns it into coat

00:33:01

by giving this think training data we separate the output in the input we pass the input

00:33:07

to this got like you know composition of differential functions

00:33:12

uh we got like a have the parameters to this you can function a. m. b.

00:33:16

then we compare the gap like you know the idea and

00:33:22

derivative of the error function and based on that we update and

00:33:27

the the uh parameters and we just repeat that a long time so

00:33:31

really the only thing that we need here is functions that are differential

00:33:36

and we know how to do differentiation using dual numbers

00:33:41

beautiful easy i know all want the fires no first order logic just simple high

00:33:48

school mathematics that's got like you know the kind of stuff that i like

00:33:52

and now what about neural networks well i don't i think neural

00:33:57

networks are a little bit got like the name is hyped

00:34:01

because it's got supposedly go like modelled on whatever your owns a

00:34:06

we have in our brain but i think the really

00:34:09

it in the neural network the function this composition of differential functions is

00:34:14

of the special form or even activation function and they you do take

00:34:18

the product of the some of the weights with the input

00:34:23

and really why people do this i think is because it involves a lot of multiplication zen additions

00:34:29

and we know in our graphics cards we got got like to that

00:34:33

very efficiently so i think the whole reason that deep learning

00:34:38

and neural nets is is popular or or that we pick these functions

00:34:43

it's because of should g. p. use for me has nothing to do we

00:34:46

don't have like matrix multiplication in our school and but it works

00:34:53

alright so what is differential programming well different about programming is

00:34:59

anything where you write the program using differential functions where these differential functions

00:35:04

don't have to be that have layers of a neural net

00:35:07

but can be more complicated one of the uh presentations the posters used to three l. s.

00:35:13

d. m.s i don't know who that was is that person still in the room

00:35:17

yes so that is a form of like you know in more complicated a differential

00:35:22

function good right so your this is the guy you should talk to him

00:35:27

he is the future he's do differential programming and he's doing get over more complicated structures

00:35:32

then get like just and the arrays of doubles he's doing it over trees and

00:35:40

the job so what's the future of programming number one this use differential programming to define models

00:35:47

okay good and now as i said always ask yourself

00:35:54

can you extract a you make this better well

00:35:58

people these days use light on and a lot of that

00:36:01

by dies and typed and and and strings and whatever

00:36:06

so i think even for the people that want to stay in the old world

00:36:09

of programming languages there are still a lot of stuff to do here

00:36:12

you got like you know make sure that we don't get stock in the world of quite on and and

00:36:17

a dog world and and the other thing i should say is that really

00:36:24

the this world of programming to point those are all the functional programming

00:36:28

and so people like martin has been we've been telling everybody here like his whole life you know functional

00:36:34

programming for you thought about artists hike that the functional programming that's well functional program as well

00:36:41

except it's called differential program and uh alright and so there's like a lot of papers that

00:36:48

you can get like if you have the slide you can use this e. r.

00:36:52

they're sick even this this is that you know from x. a. e. p. f. l. the

00:36:55

ark i think it is easy to use is here all done in skyline and alright

00:37:03

so i think i i pushed a little bit under the carpet that these models are got like it only approximate the function

00:37:11

and but of course this is true for all functions right i said that in the

00:37:15

beginning if even if we ride in a it's got a function or something

00:37:19

it's not the real mathematical function it's something that has side effects and these models

00:37:25

in that um if you are also functions that uh have side effects

00:37:30

and accept that this side effect is represented by a probability distribution

00:37:38

so oh we so from cars and now we see more that's right

00:37:43

well there's a talk about the future of programming without more nets

00:37:47

and so this probability distribution thing happens to be about that

00:37:52

or happens to be i don't know i think this is what like you know whoever got intended it to be right

00:37:57

if it's good then it's about that and before can be more now that has to be a front or and

00:38:04

so what is the probability monad think of it as like a little database um of values

00:38:10

of type a a for each of these values has a probability associated with it

00:38:16

okay so that's that's that my intuition for a probability distribution it's like a database

00:38:22

but each role of the database has some value some weight with it

00:38:27

and then you have to always have we have a more that you have to have

00:38:30

a bind um but to be um said decisions goal that um conditional distributions so

00:38:37

his me which pair brackets be a bar a means the probability of be given a

00:38:45

but that's exactly function right function is like something a function from

00:38:48

a to b. you can say that's a be given a

00:38:53

so a conditional distribution don't get fooled is just a function

00:38:57

that returns a probability distribution with and this thing here

00:39:03

is the most beautiful theorem i think ever invented

00:39:07

and this is from seventy sixty three this

00:39:10

you know magician base was the world's first functional programmer he must have been alien

00:39:17

because this theorem tells you how to take a function there on the top

00:39:22

so it's a function from eight to a probability distribution of b.

00:39:27

and turn that into a function that given a be readership robot edition of a

00:39:33

so the bayes theorem tells you how to invert a magnetic function

00:39:39

and in that it's nobody knew mon x. nobody knew functional programming but there's no it

00:39:45

yeah i came up with this remarkable theorem that shows how you can convert a magnetic function

00:39:52

and a lot of inference is based on bayes rule where you can like read in for this function it's amazing so this

00:39:59

this uh and all of this um with probability distributions you

00:40:04

can do a lot of statistics using bayes rule

00:40:07

and as i said like a probability distribution is a little bit like a database so let's look at

00:40:12

and where where this simple query language so let's look at a probability distribution over playing cards

00:40:18

right so i can say and what is the probability of a card or the rank is queen

00:40:26

you see there's a a a predicate in there and this gives me that the

00:40:30

value of that thing is i get off for out of fifty four

00:40:34

so think of this thing really i say dippy is a database and i

00:40:38

stick in a query and that gives me the answer is this value

00:40:43

or i can say give me the probability of the fact that

00:40:47

this thing is the heart given that the rank is we

00:40:53

well there's four queens there's one queen of hearts or this is like you know one out of four

00:40:59

and then i can say give me the probability that something is the queen and

00:41:04

a hard and that's got like you know there's only one out of the fifty four

00:41:08

so really don't get intimidated by this probability distributions think of these as databases

00:41:14

with a very simple query language and you will be fine no

00:41:18

i think a lot of the literature on statistics is is

00:41:21

overly complicated if you see it in a computer science way

00:41:26

is really simple i read years another got like example

00:41:32

but that's good like you know i want to do close off by showing you how we can use

00:41:39

um machines aren't models to write got so there's a famous quote by jeff the from google that

00:41:44

says like if google would have written from scratch today half of it would be learnt

00:41:50

but what that means is that half of it would be done with these got beautiful machine lowered algorithms

00:41:56

but still half would be that whole pesky imperative got

00:42:02

so well i've not solve your problems yet but you still have to get half of

00:42:07

your code will still be imperative goes i'm giving here example how that will work

00:42:12

say i want to build this killer apps i heard that like you know the the apps

00:42:17

stores like whatever hundred billion dollars of revenue i want to get get in there

00:42:21

so i'm going to write that have that will do the following you take of snap

00:42:27

a picture of an animal and it will tell you whether this animal respectable

00:42:32

okay buddy again petted and of course if you have a a meat eating plant i don't know if that's an animal that

00:42:39

it looks fishes you actually i wouldn't bet it or if you have a crocodile i wouldn't bet it

00:42:45

but if you have like a chick let you get back to it okay so this is the

00:42:48

apt that we're going to build that this is going to go make me filthy rich

00:42:53

and that the average of the seller uh the average well there will be a hundred million dollars here that

00:42:59

is apt will do that for me uh for us i should say because uh our average goes up

00:43:06

and so the first thing is is is what i do is i train a neural net to take in the picture and it

00:43:12

will give me a description of the animal so i feed in a picture of a big and it will say big

00:43:19

with a probability point eight eric meyer with probability

00:43:23

point two or something like that and

00:43:27

so that's the animal detector now i need to have a text and allies are which is

00:43:31

a different model which i give it a a and a description of the animal

00:43:37

and from that it will learn or it will understand because it can

00:43:41

do natural language processing better this animal is safe to pat

00:43:45

and then of course i need to get like you know or have a way where i can get from the description of an animal

00:43:50

to the text and how do i do that well we're using a web servers goal

00:43:55

to be could be yeah but that web service call will return a future

00:44:01

right because it's small go all over the web is also now we have a program

00:44:05

that requires probability distributions and futures so in order to know well there's together

00:44:13

i think a picture i busted or my a. c. and and i get the probability distribution animal

00:44:19

i want to push that animal and today we could be their web servers goal but hopes the types don't match

00:44:25

i get the type error there and then i get a future attacks that i want to put in this other

00:44:31

and the text um natural language thing but again it is a future and i need text

00:44:39

so how do i do this how can i write code that kind of combines

00:44:44

machines or models bit back service calls and and turn those into apps

00:44:51

so for that we need a second thing and that's cold

00:44:54

probabilistic programming so progress to program and is old fashioned

00:44:59

imperative programming except where we had probability distributions as

00:45:05

first class a factor first large values

00:45:08

so maybe some of you know that there's this gal library for

00:45:12

that but javascript to support for that darkness aboard for that's

00:45:15

the sharpness of or that for a single weight if you have a future you can get like a way to future

00:45:21

so probabilistic programming languages nothing else where you can sample from a distribution so

00:45:26

it it takes these probability distributions and turns them into first class affects

00:45:32

which means that we can now write code that does this

00:45:37

so here uh with this code look like and so what this thing readers

00:45:42

is a future of a probability the distribution of that um and then

00:45:48

we will still have to kind of collapse that but that's got like you know a separate problem

00:45:52

and so what we can do is we can say given the picture we feed identity

00:45:58

and they're all that we get back a probability distribution we

00:46:02

sample from the distribution which gives us an animal

00:46:05

replies that into we could be the ah we await that value to you getting value from the future

00:46:12

and then we take that text box and the other model which gives the probability distribution and then we

00:46:17

can like that tells us whether there's things but the ball and the implementation of this and um

00:46:26

of the sample and thinks that what it does it uses base rules under the covers

00:46:32

and it multiplies the probability so basically it got runs this

00:46:35

computation several times of course it tries to and

00:46:39

cash the um as much as possible including this uh wait if you're gonna trying to get like look up the saying i'm an

00:46:46

animal twice a week to be it only does it wants yeah but basically run saying well the garlic simulation of this program

00:46:53

and at face book we build this thing on top of speech b. and which is and you get i think maybe

00:47:01

crazy but again like that's what we do but i think there was a very old paper from many years ago

00:47:07

where there was a probabilistic programming language also in skyline it's everywhere and i think this is what we need

00:47:14

uh have as the second thing for the for the future so the first thing was use machine learning to define models

00:47:21

and then in order to compose these models into actual

00:47:23

code we need probabilistic program with okay so i'll

00:47:30

programming the program of the future is neural nets plus probabilities

00:47:37

and i i don't know what's how much time do i have ah i'm i'm over time but i want to show you one more

00:47:45

small thing because i'm addicted to duality so i want to show you

00:47:50

one more application of duality another way to get like you know

00:47:54

to show how deeply mathematically intimately connected

00:47:59

probabilistic programming and neural nets are

00:48:02

so in these neural nets what we're trying to do is we're trying to minimise the loss function

00:48:08

right we were using the derivative to find the minimum of this

00:48:11

got was function and use that to update our weights

00:48:17

what if instead of minimise it was function we try to maximise a probability

00:48:23

okay minimising something maximising the other picture this is another example of

00:48:27

duality sure let's use that here and as i said

00:48:31

you can look at probability distribution as a little bit of big small databases so that we

00:48:36

can write the query so and this is another way to ride is linear regression

00:48:44

but not using and machine learning not using training and updating

00:48:48

weights but using probability distribution so what we do

00:48:52

is we guess a and b. so we take a and b. from

00:48:55

some distribution usually a gaussian but i just guess them at random

00:49:00

then id find this function have to be a h. plus b.

00:49:05

then we have to model the noise the error that the the the the rats things that were in the graph

00:49:11

and then i just got like you know jack in the training data and i can try to get like find

00:49:18

all the a.'s and b.'s search that like you know um this thing is is minimised

00:49:25

and then i select the actually this is a function that returns a probability distribution of functions

00:49:31

and if you go go for basie and enter regression this is what you find

00:49:35

but this is also kind of really nice right i have a function

00:49:39

that returns a probability distribution of functions that's pretty mind blowing

00:49:45

and now you might not believe me that this is actual executable code uh if you

00:49:52

go on the web there's a language called by people wrap p. p. l.

00:49:56

um and if you write this program here which looks very much like this program

00:50:01

and you run it it will tell you that and the value for

00:50:06

b. or a is just like eight but with this distribution where

00:50:11

the machine art program would just give you like one value here

00:50:14

just like i'm uncertain about it so it's like this distribution

00:50:19

alright now i think we saw a couple of posters um about the challenges of machine learning

00:50:26

so there's things that like you know given the model you might be

00:50:29

able to extract the original training data which might leak information

00:50:34

and there's many kept like you know unsolved problems and

00:50:38

but this is to be expected right so for the normal so function in process for imperative programs to be

00:50:46

we have had fifty years to guess solve a lot of these problems for and for this machine learn things

00:50:55

we're still uh we have a lot of other problems so it's like how do we build

00:50:58

them how do we train them how do you put a model under version control

00:51:03

well really should put the data into version control how do you get like a it's a whole

00:51:08

bunch of things how do you debug these models how do you understand what they're doing

00:51:12

and so i got like try to paint a kind of rosy picture but

00:51:17

that this is like you know full employment to your room for

00:51:20

for us computer scientists because all those problems are and soul so

00:51:24

that's why this little person hears laying on the floor crying

00:51:30

so oh my last slide this in desert is near soon

00:51:38

computers will be able to programme themselves without our help

00:51:42

but i hope that this snake won't bite itself ankle itself before that happens and we get the seconds

00:51:48

and a i went there but so far i'm optimistic and and i really got like you know i'm

00:51:56

looking forward to the future where we just pump data into this thing and our code comes out

00:52:01

and uh that would be awesome because then we as humans can spend more time you know

00:52:08

drinking beer or hanging out making music whatever and that the machines do the work thank you so much

00:52:22

i

00:52:26

three and four persons

00:52:36

uh_huh

00:52:39

if not then huh so thank you for a a in fig talk

00:52:44

there was a interested about this uh oh percentages of arts are

00:52:48

part of the software infrastructure might be learned through art

00:52:51

a return so where are we now could you give some sort of intuition for uh oh what a

00:52:58

of course and concrete applications what what can we expect to train what are

00:53:03

of softer functional did we expect to have to write manually or two holes for yes so that's a good question so

00:53:10

if you look at for example your mobile phone right so like you know there you can like when user camera

00:53:15

it's got like shows like you know the faces in the crowd like you know in the picture as your suit your photo

00:53:22

so that is kind of like you know a a in an example

00:53:25

where you know dad got little square box that recognises the face

00:53:31

is learns but it's got like i bet it into got like you know traditional or yeah it to you why that kind of shows that

00:53:37

and other examples are and think of things like you know where

00:53:43

location and i was at the talk at at and left

00:53:47

so the thing is when you have like a left or you better

00:53:50

and and you you uh ask for a for a a ride

00:53:54

it will tell you the estimated time of arrival now of course

00:53:59

if the estimated time of arrival says five minutes and it takes ten minutes that the customers very unhappy

00:54:05

and so that's got like you know an example where they are

00:54:08

using the get like you know i'm a model together predict

00:54:12

and the estimated time of arrival but again it's embedded into kind

00:54:17

of a regular and draw it or iowa zap so

00:54:20

i would say well i i'm not sure that we're at fifty fifty

00:54:23

yet but i think you know this is quickly and increasing

00:54:38

so oh also i i want to ask some

00:54:40

problems the programming the tombs program one company

00:54:44

it is my understanding between is the easiest stops it'll probably stay one t. v.s oh

00:54:51

but it isn't exactly like that isn't because in probably is the

00:54:55

problem can i'm sticking is more like just some old bits

00:55:00

uh_huh sort and you know sometimes of the whole comes but may not at

00:55:04

all reached between his teeth on e. mail it it's it's not exactly

00:55:10

let the terms the cell simple mistake but between these these one

00:55:14

case probabilistic these kind of collins meetings often bigelow so is

00:55:18

there a lot and we didn't cool still easy actually getting calls always thinking too much because it was a little

00:55:24

you know it is just um you know probably think is gonna eat is should all be based

00:55:28

on your data you know he may or may not happen it's okay so it was um

00:55:34

uh let me get go back to this one here so here's got like you know an example of

00:55:41

like you know where when you can't like computes

00:55:43

the parameters of this model using probabilistic programming

00:55:48

it will give you this distribution so it will tell you the value of like you know it be

00:55:52

a. e. of a expose b. is eight but it's

00:55:57

got like you know you're not hundred percent certain

00:56:00

right where in the other case we used people earning you're trying to get like you know

00:56:04

you say under to minimise the error so you take the thing with the maximum probability

00:56:09

so this is one of the people often do is they pick the value with the highest

00:56:13

probability if they want to go over probabilistic model to would like it deterministic model but

00:56:19

what you can see is like a this is a very simple distribution but you might have distributions that

00:56:23

have like you know maybe two values with the same probabilities are then how do you pick those

00:56:29

and but i think it's not that much different than with any monad where you have to have like an unsafe operation to

00:56:35

get out of the mounted so like you know it if you're in the in the aisle more that order state monad

00:56:40

you have to have it and save way to get out and to really do go out of the

00:56:44

probability monad it's unsafe that that would be got like my intuition for that i think so

00:56:53

it

00:56:56

okay

00:56:58

okay i have a question of ah right on it

00:57:05

yeah um i don't know question about um that at the present moment we have some concern now

00:57:12

or not understanding why the models are meeting at this meeting and so in this ah approach

00:57:17

that you're suggesting or even for the rehearsals on the model building process i wonder

00:57:23

is that going to help or or actually heard is trying to do i understand what he's

00:57:28

oh so excellent question so that's again like you know i i just want to point here this this little person

00:57:34

there is a lot of tears and he's in a lot of agony and and so yeah so and

00:57:41

understanding what happens is kind of like you know why these things happen is

00:57:46

is kind of like you know a big over problem i think and

00:57:51

no people are trying to say is like oh maybe we should uses get different got a mobile decision trees or something because they are

00:57:57

kind of explainable i'm not sure that that's the case uh decisions

00:58:01

reuse it got giants like you know long nested conditional and

00:58:07

there are other ways to get that people are trying to get like you know do this by giving you can't look inside and

00:58:13

what happens in between the layers and so on and but yeah this is is a big open problem and again like

00:58:19

this is what i meant i mean by this right okay hold this thing doesn't quite itself

00:58:24

and got like you know accidentally kills itself because of these problems and but on the other hand

00:58:33

and maybe this is like a non answer but if we built a really complicated distributed system we also

00:58:40

really have a hard time to understand like you know how it works or like you know

00:58:45

or debug it or understand it and so in that sense i

00:58:50

think it's a little bit i i can ah i'm

00:58:53

not kind of like trying to get like blow you off but i do think that for sufficiently complex imperative goat

00:59:00

um you will have a hard time also to get like you know to understand exactly what is

00:59:04

do if i if i ask you to go explain to me the the linux kernel ah

00:59:09

that will be hard thing or this got like a buyer discovered that sector you will

00:59:13

have a hard time to do that so you'd rather than to read through

00:59:20

yeah

00:59:29

will it cause they record it sorry um

Andreas Mortensen, Vice President for Research, EPFL

7 June 2018 · 9:49 a.m.

Jim Larus, Dean of IC School, EPFL

7 June 2018 · 10 a.m.

K. Rustan M. Leino, Amazon

7 June 2018 · 10:16 a.m.

Katerina Argyraki, EPFL

7 June 2018 · 11:25 a.m.

George Candea, EPFL

7 June 2018 · 12:08 p.m.

Utku Sirin, (DIAS)

7 June 2018 · 12:11 p.m.

Arzu Guneysu Ozgur, EPFL (CHILI)

7 June 2018 · 12:15 p.m.

Lucas Maystre, Victor Kristof, EPFL (LCA)

7 June 2018 · 12:19 p.m.

Romain Edelmann, EPFL (LARA)

7 June 2018 · 12:22 p.m.

Stella Giannakopoulo, EPFL (DIAS)

7 June 2018 · 12:25 p.m.

Eleni Tzirita Zacharatou, EPFL (DIAS)

7 June 2018 · 12:27 p.m.

Matthias Olma, EPFL (DIAS)

7 June 2018 · 12:31 p.m.

Mirjana Pavlovic, EPFL (DIAS)

7 June 2018 · 12:34 p.m.

Eleni Tzirita Zacharatou, EPFL (DIAS)

7 June 2018 · 12:37 p.m.

Maaz Mohiuddlin, LCA2, IC-EPFL

7 June 2018 · 12:40 p.m.

El Mahdi El Mhamdi, EPFL (LPD)

7 June 2018 · 12:43 p.m.

Utku Sirin, (DIAS)

7 June 2018 · 2:20 p.m.

Arzu Guneysu Ozgur, EPFL (CHILI)

7 June 2018 · 2:21 p.m.

Romain Edelmann, EPFL (LARA)

7 June 2018 · 2:21 p.m.

Stella Giannakopoulo, EPFL (DIAS)

7 June 2018 · 2:21 p.m.

Eleni Tzirita Zacharatou, EPFL (DIAS)

7 June 2018 · 2:22 p.m.

Matthias Olma, EPFL (DIAS)

7 June 2018 · 2:22 p.m.

Mirjana Pavlovic, EPFL (DIAS)

7 June 2018 · 2:23 p.m.

Eleni Tzirita Zacharatou, EPFL (DIAS)

7 June 2018 · 2:24 p.m.

Maaz Mohiuddlin, LCA2, IC-EPFL

7 June 2018 · 2:24 p.m.

El Mahdi El Mhamdi, EPFL (LPD)

7 June 2018 · 2:24 p.m.

7 June 2018 · 2:25 p.m.

7 June 2018 · 2:25 p.m.

Erik Meijer, Facebook

7 June 2018 · 2:29 p.m.

Yoshua Bengio, University of Montreal, Canada

4 July 2016 · 2:01 p.m.

Gareth Morlais, Welsh Government, Cardiff, Wales

19 Jan. 2016 · 11:06 a.m.