Erik Meijer, Facebook

Thursday, 7 June 2018 · 2:29 p.m. · 59m 46s · 120 views

Embed code

fighting me um so yeah so i faced look i i'm

a runner group that does three things uh one

is a programming languages uh for data size and machine learning that's what i will talk about today

um and then we do to other things were using machine learning to make our developers more productive

and the third thing we do is we use machine learning to make

our systems more efficient and depending on my uh speaking speed

uh okay like you know might and be able to let me just put my um

yeah timer there so i i might have some time for some extra sites

um so here's what i've been cut like i see myself so

we saw a great presentation this morning about that formal methods

um but the uh current be kept in me was thinking oh oh the first

example was from tony whore in nineteen seventy one about h. search algorithm

and the second example was in twenty eighteen about a search algorithm and

so is this still the case that after so many decades of programming

language research formal methods that we still have trouble to ride got

and now a search method is something that's got really nice you know exactly what you want to

do you have like you know some values and you want to find a particular one

but what about more interesting problems like face recognition that we all have on our phones

or spam filtering and that we all rely on how

would you guys specify those improve done correct and

and so what i want to go to talk to you

about today is and maybe another way of programming were

read don't to use intentional specifications intentions messages in the

sense of a formula that describes what we want

but we're really are really easy that was also like in the second order like

programmer should be lazy so what if we just like you know try to get

like use examples and use that to kind of writer code for us

i think that would make this grumpy get even happier with

now here's i think the root of the problem the root of the problem

is that the abstractions that we use for programming are the

wrong once and and we always have this guy

like problem that's the abstractions that we want and the

abstractions that we have are far apart and

and we keep struggling cat like to move that stone up the wall and then like you know once we got like

up the the close up big i've held and the the

stone runs back and who here uh uses java script

ah not as many people as i thought but i get javascript this

is really bad right every week there's a new javascript framework

where you have to get like you know you just mastered it and then you know that the next one comes and then you have to and

re learn it now of course don't your didn't only guy like have this first example

of a formal verification he was also a very early person that identify this problem

in nineteen seventy two where he got like describe this famous diagram

of abstract data types where there's this functional the top a do a f. from a to b.

that's the thing you want but you have to implement it

in and uh uh and a more concrete domain

from x. to y. and so how to make a dire ram commute that is really the got like

what computer sciences are all about and that is where we spend a lot of our energy

and but as i said like are abstractions are

bad so the way we uh write code

is in terms of mutable actually labelled graph so if you would get like inspect the heap

of a java a house or star are or javascript program

what's the what's the data represents it's a new double edged labelled grass

um and then we're trying to write functions uh but they're

not really functions right these are also side effecting

uh functions that allow arbitrary nesting so mathematically to get

a get a grasp of that structure is really

easy and i think we shot ourselves in the foot by cab like you know starting with this

basically uh the machines that we know how to build are good and one thing and one thing

only they can move data around from one address in memory to another address in memory

and if you google for one instruction machine you will see that you know or this is actually

all you need to you need one instruction namely move data from one side to the other

and and you can do all of computing with that which by itself is pretty remarkable and

but the thing i second your that is ultimately get like it or relying on mutation

right because we're moving some some data from this memory location to that memory locations over writing

the other one so everything and you you can get like you know

you need only this one instruction but it's easy to emulate more

complicated instructions on top of the the the the essence of computing that

we have like the essence of the following my machine is

mutation by moving things from one addressed to the other

and that is the thing that really that we are fighting

and and here's the thing you know the the the the princess and the pea

and so dippy that's the i coloured from alignments b. and then we can get

like in our add layers and layers of abstraction on top of this

and like you know and the princess but the we still feel that darn

be right no matter how many layers of abstraction we put on top of our our underlying machines

you still feel it and in some sense you have to feel it this is called mechanical sympathy so we

cannot pretends to say oh we're doing declarative programming because

ultimately if you want to make your code fast

you have to understand that the thing underneath is a physical machine

and it's a physical machine that can move things from one member location

to the other what's even worse is that it's observer ball that's

moving from one memory location to another might be slower

than moving from some other member location to another

one because there's a whole grab like cash hierarchy so anyone that really was a right efficient goat

we'll have to deal with the underlying um concrete mess of the machines and

no layer of abstraction will help you to get like remove that right

and and this is the pain that i've been trying to get sold my

whole life and i think i must say maybe i've given up

i don't think i can solve this pain anymore in the traditional sense and

i don't think that like you know adding more layers of objectionable help i don't think

adding more formal methods will help we have to kind of go to it very different for

of computation with no that's look at another form of computational

another computational model that has been extremely extremely successful

and that's relational databases yeah i'm pretty sure that and maybe not that many people use javascript but a

lot of people use relational databases was so beautiful

about relational databases is that it will averages

really beautiful and simple mathematics

alright so it's relational algebra is very old

and that caught in nineteen seventy

say hey i see all this got mathematics is like you know i could i've said second take the

union of said second august simple operations who i can build like you know algebra on top

and certainly this thing turns into a multi billion dollar industry and and

you know a lot of our economy runs on it and

but here's the thing the reason it databases are shoes so successful is that

it started with a clean and this is my personal opinion

and it started with a very clean mathematical abstraction

and that allows you know you to build query up the miser send things like that

plus that the the implementation maps quite closely to the underlying harder and the

reason is the tables are not complicated they don't have nice things

and they're just flat things of base types um and still very powerful

now here's the problem as everyone that has got like ever

try to use databases from regular programming languages noticed

the data model for database is very different from these you double edged labelled graphs

and that our programs are normal imperative programs eleven

and all ready in um with the early eighties um dave

meyer got like you know i notice notice that

any goals that the impedance relational object relational impedance mismatch

and am i think this is still a a

people still still write papers about it and

but the right there all our map for big i've like design you type

systems to make these all the useful so again this is another

instance of this problem where there's the abstraction that we want to

which is a relational database there's the abstraction that we have

you double edged labelled graphs and we have to got like you know bridge that gap

and but here's the thing it's the mathematics that you cannot beat right

the fact that this is based on really simple beautiful mathematics

is what makes these databases unbeatable and i know i've the

scars on my back the bullet holes on my back

because i to write right at one point when i was young eric

the goal of my life was to eliminate sequel from this planet

okay i wanted to get rid of sequel and and so that's when i did they think uh

and everybody else was was doing no sequel i don't have anybody you're even remembers no sequel

then i showed that most the goal was to make the the categorical do all sequels i try to collect go sequel

but didn't work at all so now everybody's talking about new sequel

and even if you look at things such park who started out like you know it

flat map and all those great things now the preferred way to programme

spark is using data frames and are in a relational way

and so i've given up on trying to get like beads sequel and get rid of it

and i just looked back and said what is it that makes it was so powerful

and so um versatile and that must be the mathematics you you cannot be mathematics

and and so or if you cannot be that join it

so that's got like you know so that's use mathematics wherever possible

and and of course you should always gotta look too abstract from the mathematics so

and if you look at the relational algebra and it's a beautiful mathematical theory but the question

i always ask myself but i think everybody here should ask themselves

is is that an instance of a more general concept

um because maybe that more general concept as several instances

that you can use or maybe it's easier

or more powerful to implement that more general concept and that is true you know like

in the everyday shells rising instance of of applies league a category

um in in and category theory and now again

this is where i think the the simplicity of the regular like a relational algebra

a warm because people when they hear category teary they

got like a round screening and but i

still think that you know that it's really good to get like whenever you see something

try to abstract and see got like is this an instance of something more general

and and i think that is also what martin has been doing with dorothy got like you

know always trying to find there's in essence always say you can always peel more layers

from the onion and as you peel it later you start to cry because that's

what happens when you can handle onions but it's crying for good cost

alright so that's where we are today it's a very grim picture and

we are stock with this got like weird computational model of imperative computations

over these weird mutable graphs and and what happens is that

i think because we don't find a way forward in this thing because we're

limited by the they got like you know our our actual hardware

and what you see is that people got like the c. d.'s explosion of programming languages and

programming languages and frameworks i should say so it's like you know for the web

uh there's javascript with like hundreds of of frameworks every week there's a new wall

and if you look for more while there's also like you know free berkshire operating systems

uh languages that you know every day there's a gap like you know new wall

on the server side yeah we were talking over over the break about frost

well about this ross that's trying to get put the diaper lipstick on the cat like impaired this

big and which is great but that's not bring

a progress right we're not fundamentally changing something

so i think the world of software so for one point or is doomed

or do so that's why i like the title of of this data future software

the future software is not and what you see on the screen here the future software something different we have

tried for fifty sixty years to make this work we cannot make it work i'm convinced we cannot

so was there where their software to bordeaux arise because after one comes to an

and and so for two point oh there's no humans involved there's no a formal specifications involved

it's great that doesn't that sound great no humans no formal

specifications just data you just throw data at this thing

and it will train itself um and it will work and

now let's see if that can you know if that's just a fairy tale or not because we

just saw that like regular program is no fairy tale but maybe this is a very clear

so here's the thing this is what i'm going to try to get like convince you about uh today

and we know our databases database is a is a magic gadget where you give it codes

some people colour the query but really that's codes right you give it code it hands you back data

not a one trick the one mathematical trick that i used my whole career

and that serve me well it's duality worries flip things around so can we do the opposite

can be building magic device where we give it data and it gives it back got

would would that be amazing well and since it's just to do well

it's you work and because if we can do the thing

on the left we should be able to the thing on the right correct so that's got like programming two point oh

so we're going to get like build this magic device that thursday tack into got

and so programming one point o. was humans turning coffee into gold

um programming to bordeaux is machines that their data into

models of course people use fancy names is like queries but

model is just like it or is just regular code

and i should also got like ad is not just coffee right like

all the grad students here's like probably pizza more than coffee

and now didn't mention this briefly in his introduction

this morning there is a big difference between

the old world of code and a new world of model so maybe it's a good

the thing to get like it or distinguish between these two these two terms a coat

is deterministic and discrete when you have a boolean it's

discrete it has two values true or false

well that's that's very restrictive if you look at your spam

filter there something is span you know but maybe it's

not really span is kinda like a little that spam or a lot of spam so it's a boolean

but it's it's like you know it's true would like you

know eighty five percent and then falls fifteen percent

right and or even like you know it can be you know some range it

can be get like a are and i don't know the the um

how good a a sauna yes that's got like some number between zero and a hundred

um but you know it can be anything in between and like with as small as possible and

it has some kind of distribution so the big difference between go down baubles is that

models are uncertain have uncertainty and them and they're often continuous

and and so those are two things that we have really bands from our traditional programming

languages for a long time and a lot of developers are afraid of uncertainty

um because uncertainly means sloppiness now you didn't you see me

right sick you know i'm i love sloppiness i mean

i that's they got like you know the essence of happiness is

is like sloppiness and like you know to be free

and you know our fell use want to be free they don't want to be discreet they want to get take

you know different values with some probability and but here's the best thing

there's a lot of all really engine mad from the

sixty for some seventeen hundreds and eighteen hundreds that

we can use to make this all work and that makes me really exciting right because remember

you cannot beat maths so we're going to use this all mad from that you know

calculus off that was invented in sixteen hundred and bayes rule

and from seventeen sixty or something ever going to use that to build the new a wave of program

so and i'm going to start with supervised learning and and here's what

we want to do with supervised learning we want to use

again mathematics simple and all the mathematics i don't want to invent

new mathematics like we had to do for computer science

like mathematicians the old mathematicians i don't know what it was what they had in the

water or i don't know maybe it was the the fungus in their grain

but they were much smarter than than us these days right so i want to use

ancient mathematics and and i want to get use that to get like you know to

during those functions by looking at the difference of the training data

and the value that this function computes and they use that dealt

that okay i've tried to minimise that and do that by

minimising get by kind of updating mutating that function so i'm

still do invitation right so i'm still doing comparative stuff

but i'm only doing that when i'm training dysfunction when i'm shaping dysfunction think of that function as

being made of clay or something so i've got like holding that function but once it's done

it's a pure function so i also nicely separated big of imperative

parts and you have like you know i'll be and

purely functional part i know that's the very very simple example because i want to show you that this is really

to reveal math that everybody here in kindergarten has done this

matt okay so that's much easier than three opposed conditions

and separation logic and more triples and whatever complicated mathematics

reached a weekly i don't know lots like we have to gather invent all these artificial mathematics no

for machine learning we can use really simple that next so let's look at

the nerd redirection and is probably in physics class you don't some measurements

like speed versus distance or whatever as you get a whole bunch of wines and and what

you have to do is you have to to find a line through those points

so if this function there is defined by y. equals a. x. plus b.

what i need to find this a and b. because those are the two values that determined

at that line you see that read stuff there that's the error because you get these points are not aligned on this line

and and so what this magic device is going to do you feed at this point

and it will tell you a and b. and now you have learned this function

isn't that great and no programming required or maybe only wants to get built that magic device

and and some people say you know this is just curve fitting i'm happy with that

i'm happy i i'll take care of it they got any day of the week

over you know whatever proving to your rooms are fighting with the type system just get weaker fitting

so oh and the trick that this thing uses is something

called back propagation and back propagation relies on taking derivatives

but before that that's got like look a little bit at but like you know how these functions it look like

so in order to do uh and back propagation our functions need to be differential well

and in order for function to death be differential it has to

be a function that takes real numbers and returns real numbers

i don't know if you remember from calculus you know the functions have to

be continuously capsule delta i you know the you might remember those definitions

and so all these functions that we ride our functions that take a a

topple and and double of real numbers i return and double of of real numbers

and that function is defined as a compositional smaller functions

now and feed the chain rule we know that but

two functions are different trouble the competition is also differential so that works out nicely

um and then you can define the error function that you there's many ways to do that

but notice here that we are yet just like in the um case of relational databases

we do have a beautiful mathematical theory but the functions here are functions that take

doubles of real numbers where are imperative goats takes grass

so also for machine learning we have this abstraction problem where we have to find

and coders and decoders that go from like you know our domain objects into

they get like you know the the values that the machine or an elder the pentagon

i'm pretty sure that the the p. h. d. students here that do machine learning

and they they know how hard it is to go find efficient encoding so off like you

know say to reduce into these got like doubles of of real number should is it

an unsolved problem how do you officially encodes values um such that you can feed into

machine another thing but anyway what we want to do is given this training data

and we want to find a and b. that minimise the error and the air is defined as this song

off they get like you know the actual value minus they get like value

of this uh function applied to your to your uh training data

um this thing uses the derivatives of the staff

you don't have to give understand this i'm just writing this down the way i do these derivative right away

i go to will from all five i type in the function and will from awful tell me the derivative so it's even there you can

be very lazy and and then in order to train this what you do is you guys start with a and b. read them

and then you go for each have value and be a your

uh your training set you just got like you know um

updates the weights or a and b. using this format i i

you just repeat until you run out of your training data

isn't that beautiful so i'm using mutation and i just run this thing

i'm you they these values a and b. until i find them

and then i give them to you and then from then on you have a pure function that just computes

why from x. based on these a and b. a.

okay so now you might ask yourself eric you hands waved your waiter days

how does this all work and in a little bit more detail so that they got like if you

some details here which i think it's very beautiful that um

so why do we use derivatives well as i said we want to minimise this error function

and maybe you remember from high school that you know you can use derivatives to

find the minimum of a function so if you have this project function here

and you want want to find it men and then you gotta look where the derivative is zero right that's a good way to find it

um and here's the got like you know the derivative you learn

this would symbolic differentiation or you can define the derivative

using this thing with actually longer epsilon goes to zero and so you should get like all

remember that from my school correct now the question is how do you find that derivative

and how do we find that inefficient way and this is truly

stunning this the way you do days is is amazing

and and how do we do that by using more mats of course

okay and this is the trick people here remember complex numbers

ideas complex numbers if you do physics or electronics or computer graphics

saw a cat like complex number is some mathematical

thingy that's somebody invented this as o. a. plus b. i. so

it's a pair of two numbers a and b. where

i square it was minus one

you can ask yourself why is i square minus one well

why not right and so now that's to ask another question what if we

didn't say trick we define a pair of numbers a and b.

and just a guess disintegrate we're not calling it i recalling actually long for they got second one

and then we say abstinence where it was zero why not right i mean

we can we can define our rules you can also ask yourself

what about if we do the same thing every instead of like i

and actually take j. i. we say j. squared equals one

good choice to the only great numbers or minus one zero i want all three choices

strangely enough have practical value city otherwise cold hyperbolic numbers you

can go to lead and people find uses for it

but this is the amazing thing so we take complex numbers that we've no for

for how long instead of i square kick was minus one we say

epsilon squared equals zero and what do we get from theirs for because we get

out the metric differentiation but just using dual numbers instead of normal numbers

we can compute in one go if a function and its derivative

it's just mind blowing and now if you want to prove of this it uses taylor expansion

but i think if you look at if you if you go to the

big database for taylor expansion that breed maybe more might blow and

i don't know how somebody can come up with the calf taylor expansion

formula and that's got like just crazy with like you know and

factorial is in some infinite summation but anyway if you want to prove this

you can use these are numbers so that's look at an example

right so here's our function f. that was defined as

an three x. squared plus four that's our ass

and that feed it and again like the reason i like mathematics is my brain is the size of a pea not

and i like to just crank the handle right i just like to do these

computations without thinking so that's his feet as this a plus b. epsilon

so x. equals a. plus b. f.'s also that becomes to re a. plus b. epsilon squared plus four

right just substitute now i'm going to go out like you

know um evaluate a plus b. apps lawn so a

plus b. squared is is squared plus to a. b. plus b. squared so that's what you see there

um no i distribute that three over does some and and a simplified little bit

and now we knew that um the derivative of have was um

and what was it it's a six a and a right or six x.

but here you see the the first term here three a squared plus four that's half of a

six eight that's the derivative of ever apply to eight times be upset on

and then hey because epsilon squared was zero that last year falls out

so if i apply this function to a dual number and i just compute

with it i get back a pair of the function apply to the

argument and the derivative and multiplied by the rest of the thing

isn't that pure magic i mean you won't believe me right

but this works for any function any differential function

and you know this is how you get now

implement uh and computing derivatives and using

operator overloading because we just overload all the operators would this be these dual numbers

and now what's even more beautiful

is that if i can't like the finder function that lives a normal function to a function over dual numbers

that function as a front door and now what does that mean

a phone to relevant or something that distributes over composition and

maintains the identities is like a crazy thing from category theory but if something is a font or it means that it's

it's good it's it's due to the end right so this thing is good and if you remember the chain rule

the chain rule for differentiation is really ugly but the chain role for

these dual numbers is really beautiful because it's just distribution over composition

so these dual numbers are my new best friends and i do it so oh

we can build this magic machine that takes data i and turns it into coat

by giving this think training data we separate the output in the input we pass the input

to this got like you know composition of differential functions

uh we got like a have the parameters to this you can function a. m. b.

then we compare the gap like you know the idea and

derivative of the error function and based on that we update and

the the uh parameters and we just repeat that a long time so

really the only thing that we need here is functions that are differential

and we know how to do differentiation using dual numbers

beautiful easy i know all want the fires no first order logic just simple high

school mathematics that's got like you know the kind of stuff that i like

and now what about neural networks well i don't i think neural

networks are a little bit got like the name is hyped

because it's got supposedly go like modelled on whatever your owns a

we have in our brain but i think the really

it in the neural network the function this composition of differential functions is

of the special form or even activation function and they you do take

the product of the some of the weights with the input

and really why people do this i think is because it involves a lot of multiplication zen additions

and we know in our graphics cards we got got like to that

very efficiently so i think the whole reason that deep learning

and neural nets is is popular or or that we pick these functions

it's because of should g. p. use for me has nothing to do we

don't have like matrix multiplication in our school and but it works

alright so what is differential programming well different about programming is

anything where you write the program using differential functions where these differential functions

don't have to be that have layers of a neural net

but can be more complicated one of the uh presentations the posters used to three l. s.

d. m.s i don't know who that was is that person still in the room

yes so that is a form of like you know in more complicated a differential

function good right so your this is the guy you should talk to him

he is the future he's do differential programming and he's doing get over more complicated structures

then get like just and the arrays of doubles he's doing it over trees and

the job so what's the future of programming number one this use differential programming to define models

okay good and now as i said always ask yourself

can you extract a you make this better well

people these days use light on and a lot of that

by dies and typed and and and strings and whatever

so i think even for the people that want to stay in the old world

of programming languages there are still a lot of stuff to do here

you got like you know make sure that we don't get stock in the world of quite on and and

a dog world and and the other thing i should say is that really

the this world of programming to point those are all the functional programming

and so people like martin has been we've been telling everybody here like his whole life you know functional

programming for you thought about artists hike that the functional programming that's well functional program as well

except it's called differential program and uh alright and so there's like a lot of papers that

you can get like if you have the slide you can use this e. r.

they're sick even this this is that you know from x. a. e. p. f. l. the

ark i think it is easy to use is here all done in skyline and alright

so i think i i pushed a little bit under the carpet that these models are got like it only approximate the function

and but of course this is true for all functions right i said that in the

beginning if even if we ride in a it's got a function or something

it's not the real mathematical function it's something that has side effects and these models

in that um if you are also functions that uh have side effects

and accept that this side effect is represented by a probability distribution

so oh we so from cars and now we see more that's right

well there's a talk about the future of programming without more nets

and so this probability distribution thing happens to be about that

or happens to be i don't know i think this is what like you know whoever got intended it to be right

if it's good then it's about that and before can be more now that has to be a front or and

so what is the probability monad think of it as like a little database um of values

of type a a for each of these values has a probability associated with it

okay so that's that's that my intuition for a probability distribution it's like a database

but each role of the database has some value some weight with it

and then you have to always have we have a more that you have to have

a bind um but to be um said decisions goal that um conditional distributions so

his me which pair brackets be a bar a means the probability of be given a

but that's exactly function right function is like something a function from

a to b. you can say that's a be given a

so a conditional distribution don't get fooled is just a function

that returns a probability distribution with and this thing here

is the most beautiful theorem i think ever invented

and this is from seventy sixty three this

you know magician base was the world's first functional programmer he must have been alien

because this theorem tells you how to take a function there on the top

so it's a function from eight to a probability distribution of b.

and turn that into a function that given a be readership robot edition of a

so the bayes theorem tells you how to invert a magnetic function

and in that it's nobody knew mon x. nobody knew functional programming but there's no it

yeah i came up with this remarkable theorem that shows how you can convert a magnetic function

and a lot of inference is based on bayes rule where you can like read in for this function it's amazing so this

this uh and all of this um with probability distributions you

can do a lot of statistics using bayes rule

and as i said like a probability distribution is a little bit like a database so let's look at

and where where this simple query language so let's look at a probability distribution over playing cards

right so i can say and what is the probability of a card or the rank is queen

you see there's a a a predicate in there and this gives me that the

value of that thing is i get off for out of fifty four

so think of this thing really i say dippy is a database and i

stick in a query and that gives me the answer is this value

or i can say give me the probability of the fact that

this thing is the heart given that the rank is we

well there's four queens there's one queen of hearts or this is like you know one out of four

and then i can say give me the probability that something is the queen and

a hard and that's got like you know there's only one out of the fifty four

so really don't get intimidated by this probability distributions think of these as databases

with a very simple query language and you will be fine no

i think a lot of the literature on statistics is is

overly complicated if you see it in a computer science way

is really simple i read years another got like example

but that's good like you know i want to do close off by showing you how we can use

um machines aren't models to write got so there's a famous quote by jeff the from google that

says like if google would have written from scratch today half of it would be learnt

but what that means is that half of it would be done with these got beautiful machine lowered algorithms

but still half would be that whole pesky imperative got

so well i've not solve your problems yet but you still have to get half of

your code will still be imperative goes i'm giving here example how that will work

say i want to build this killer apps i heard that like you know the the apps

stores like whatever hundred billion dollars of revenue i want to get get in there

so i'm going to write that have that will do the following you take of snap

a picture of an animal and it will tell you whether this animal respectable

okay buddy again petted and of course if you have a a meat eating plant i don't know if that's an animal that

it looks fishes you actually i wouldn't bet it or if you have a crocodile i wouldn't bet it

but if you have like a chick let you get back to it okay so this is the

apt that we're going to build that this is going to go make me filthy rich

and that the average of the seller uh the average well there will be a hundred million dollars here that

is apt will do that for me uh for us i should say because uh our average goes up

and so the first thing is is is what i do is i train a neural net to take in the picture and it

will give me a description of the animal so i feed in a picture of a big and it will say big

with a probability point eight eric meyer with probability

point two or something like that and

so that's the animal detector now i need to have a text and allies are which is

a different model which i give it a a and a description of the animal

and from that it will learn or it will understand because it can

do natural language processing better this animal is safe to pat

and then of course i need to get like you know or have a way where i can get from the description of an animal

to the text and how do i do that well we're using a web servers goal

to be could be yeah but that web service call will return a future

right because it's small go all over the web is also now we have a program

that requires probability distributions and futures so in order to know well there's together

i think a picture i busted or my a. c. and and i get the probability distribution animal

i want to push that animal and today we could be their web servers goal but hopes the types don't match

i get the type error there and then i get a future attacks that i want to put in this other

and the text um natural language thing but again it is a future and i need text

so how do i do this how can i write code that kind of combines

machines or models bit back service calls and and turn those into apps

so for that we need a second thing and that's cold

probabilistic programming so progress to program and is old fashioned

imperative programming except where we had probability distributions as

first class a factor first large values

so maybe some of you know that there's this gal library for

that but javascript to support for that darkness aboard for that's

the sharpness of or that for a single weight if you have a future you can get like a way to future

so probabilistic programming languages nothing else where you can sample from a distribution so

it it takes these probability distributions and turns them into first class affects

which means that we can now write code that does this

so here uh with this code look like and so what this thing readers

is a future of a probability the distribution of that um and then

we will still have to kind of collapse that but that's got like you know a separate problem

and so what we can do is we can say given the picture we feed identity

and they're all that we get back a probability distribution we

sample from the distribution which gives us an animal

replies that into we could be the ah we await that value to you getting value from the future

and then we take that text box and the other model which gives the probability distribution and then we

can like that tells us whether there's things but the ball and the implementation of this and um

of the sample and thinks that what it does it uses base rules under the covers

and it multiplies the probability so basically it got runs this

computation several times of course it tries to and

cash the um as much as possible including this uh wait if you're gonna trying to get like look up the saying i'm an

animal twice a week to be it only does it wants yeah but basically run saying well the garlic simulation of this program

and at face book we build this thing on top of speech b. and which is and you get i think maybe

crazy but again like that's what we do but i think there was a very old paper from many years ago

where there was a probabilistic programming language also in skyline it's everywhere and i think this is what we need

uh have as the second thing for the for the future so the first thing was use machine learning to define models

and then in order to compose these models into actual

code we need probabilistic program with okay so i'll

programming the program of the future is neural nets plus probabilities

and i i don't know what's how much time do i have ah i'm i'm over time but i want to show you one more

small thing because i'm addicted to duality so i want to show you

one more application of duality another way to get like you know

to show how deeply mathematically intimately connected

probabilistic programming and neural nets are

so in these neural nets what we're trying to do is we're trying to minimise the loss function

right we were using the derivative to find the minimum of this

got was function and use that to update our weights

what if instead of minimise it was function we try to maximise a probability

okay minimising something maximising the other picture this is another example of

duality sure let's use that here and as i said

you can look at probability distribution as a little bit of big small databases so that we

can write the query so and this is another way to ride is linear regression

but not using and machine learning not using training and updating

weights but using probability distribution so what we do

is we guess a and b. so we take a and b. from

some distribution usually a gaussian but i just guess them at random

then id find this function have to be a h. plus b.

then we have to model the noise the error that the the the the rats things that were in the graph

and then i just got like you know jack in the training data and i can try to get like find

all the a.'s and b.'s search that like you know um this thing is is minimised

and then i select the actually this is a function that returns a probability distribution of functions

and if you go go for basie and enter regression this is what you find

but this is also kind of really nice right i have a function

that returns a probability distribution of functions that's pretty mind blowing

and now you might not believe me that this is actual executable code uh if you

go on the web there's a language called by people wrap p. p. l.

um and if you write this program here which looks very much like this program

and you run it it will tell you that and the value for

b. or a is just like eight but with this distribution where

the machine art program would just give you like one value here

just like i'm uncertain about it so it's like this distribution

alright now i think we saw a couple of posters um about the challenges of machine learning

so there's things that like you know given the model you might be

able to extract the original training data which might leak information

and there's many kept like you know unsolved problems and

but this is to be expected right so for the normal so function in process for imperative programs to be

we have had fifty years to guess solve a lot of these problems for and for this machine learn things

we're still uh we have a lot of other problems so it's like how do we build

them how do we train them how do you put a model under version control

well really should put the data into version control how do you get like a it's a whole

bunch of things how do you debug these models how do you understand what they're doing

and so i got like try to paint a kind of rosy picture but

that this is like you know full employment to your room for

for us computer scientists because all those problems are and soul so

that's why this little person hears laying on the floor crying

so oh my last slide this in desert is near soon

computers will be able to programme themselves without our help

but i hope that this snake won't bite itself ankle itself before that happens and we get the seconds

and a i went there but so far i'm optimistic and and i really got like you know i'm

looking forward to the future where we just pump data into this thing and our code comes out

and uh that would be awesome because then we as humans can spend more time you know

drinking beer or hanging out making music whatever and that the machines do the work thank you so much

i

three and four persons

uh_huh

if not then huh so thank you for a a in fig talk

there was a interested about this uh oh percentages of arts are

part of the software infrastructure might be learned through art

a return so where are we now could you give some sort of intuition for uh oh what a

of course and concrete applications what what can we expect to train what are

of softer functional did we expect to have to write manually or two holes for yes so that's a good question so

if you look at for example your mobile phone right so like you know there you can like when user camera

it's got like shows like you know the faces in the crowd like you know in the picture as your suit your photo

so that is kind of like you know a a in an example

where you know dad got little square box that recognises the face

is learns but it's got like i bet it into got like you know traditional or yeah it to you why that kind of shows that

and other examples are and think of things like you know where

location and i was at the talk at at and left

so the thing is when you have like a left or you better

and and you you uh ask for a for a a ride

it will tell you the estimated time of arrival now of course

if the estimated time of arrival says five minutes and it takes ten minutes that the customers very unhappy

and so that's got like you know an example where they are

using the get like you know i'm a model together predict

and the estimated time of arrival but again it's embedded into kind

of a regular and draw it or iowa zap so

i would say well i i'm not sure that we're at fifty fifty

yet but i think you know this is quickly and increasing

so oh also i i want to ask some

problems the programming the tombs program one company

it is my understanding between is the easiest stops it'll probably stay one t. v.s oh

but it isn't exactly like that isn't because in probably is the

problem can i'm sticking is more like just some old bits

uh_huh sort and you know sometimes of the whole comes but may not at

all reached between his teeth on e. mail it it's it's not exactly

let the terms the cell simple mistake but between these these one

case probabilistic these kind of collins meetings often bigelow so is

there a lot and we didn't cool still easy actually getting calls always thinking too much because it was a little

you know it is just um you know probably think is gonna eat is should all be based

on your data you know he may or may not happen it's okay so it was um

uh let me get go back to this one here so here's got like you know an example of

like you know where when you can't like computes

the parameters of this model using probabilistic programming

it will give you this distribution so it will tell you the value of like you know it be

a. e. of a expose b. is eight but it's

got like you know you're not hundred percent certain

right where in the other case we used people earning you're trying to get like you know

you say under to minimise the error so you take the thing with the maximum probability

so this is one of the people often do is they pick the value with the highest

probability if they want to go over probabilistic model to would like it deterministic model but

what you can see is like a this is a very simple distribution but you might have distributions that

have like you know maybe two values with the same probabilities are then how do you pick those

and but i think it's not that much different than with any monad where you have to have like an unsafe operation to

get out of the mounted so like you know it if you're in the in the aisle more that order state monad

you have to have it and save way to get out and to really do go out of the

probability monad it's unsafe that that would be got like my intuition for that i think so

it

okay

okay i have a question of ah right on it

yeah um i don't know question about um that at the present moment we have some concern now

or not understanding why the models are meeting at this meeting and so in this ah approach

that you're suggesting or even for the rehearsals on the model building process i wonder

is that going to help or or actually heard is trying to do i understand what he's

oh so excellent question so that's again like you know i i just want to point here this this little person

there is a lot of tears and he's in a lot of agony and and so yeah so and

understanding what happens is kind of like you know why these things happen is

is kind of like you know a big over problem i think and

no people are trying to say is like oh maybe we should uses get different got a mobile decision trees or something because they are

kind of explainable i'm not sure that that's the case uh decisions

reuse it got giants like you know long nested conditional and

there are other ways to get that people are trying to get like you know do this by giving you can't look inside and

what happens in between the layers and so on and but yeah this is is a big open problem and again like

this is what i meant i mean by this right okay hold this thing doesn't quite itself

and got like you know accidentally kills itself because of these problems and but on the other hand

and maybe this is like a non answer but if we built a really complicated distributed system we also

really have a hard time to understand like you know how it works or like you know

or debug it or understand it and so in that sense i

think it's a little bit i i can ah i'm

not kind of like trying to get like blow you off but i do think that for sufficiently complex imperative goat

um you will have a hard time also to get like you know to understand exactly what is

do if i if i ask you to go explain to me the the linux kernel ah

that will be hard thing or this got like a buyer discovered that sector you will

have a hard time to do that so you'd rather than to read through

yeah

will it cause they record it sorry um

Andreas Mortensen, Vice President for Research, EPFL

7 June 2018 · 9:49 a.m.

Jim Larus, Dean of IC School, EPFL

7 June 2018 · 10 a.m.

K. Rustan M. Leino, Amazon

7 June 2018 · 10:16 a.m.

Katerina Argyraki, EPFL

7 June 2018 · 11:25 a.m.

George Candea, EPFL

7 June 2018 · 12:08 p.m.

Utku Sirin, (DIAS)

7 June 2018 · 12:11 p.m.

Arzu Guneysu Ozgur, EPFL (CHILI)

7 June 2018 · 12:15 p.m.

Lucas Maystre, Victor Kristof, EPFL (LCA)

7 June 2018 · 12:19 p.m.

Romain Edelmann, EPFL (LARA)

7 June 2018 · 12:22 p.m.

Stella Giannakopoulo, EPFL (DIAS)

7 June 2018 · 12:25 p.m.

Eleni Tzirita Zacharatou, EPFL (DIAS)

7 June 2018 · 12:27 p.m.

Matthias Olma, EPFL (DIAS)

7 June 2018 · 12:31 p.m.

Mirjana Pavlovic, EPFL (DIAS)

7 June 2018 · 12:34 p.m.

Eleni Tzirita Zacharatou, EPFL (DIAS)

7 June 2018 · 12:37 p.m.

Maaz Mohiuddlin, LCA2, IC-EPFL

7 June 2018 · 12:40 p.m.

El Mahdi El Mhamdi, EPFL (LPD)

7 June 2018 · 12:43 p.m.

Utku Sirin, (DIAS)

7 June 2018 · 2:20 p.m.

Arzu Guneysu Ozgur, EPFL (CHILI)

7 June 2018 · 2:21 p.m.

Romain Edelmann, EPFL (LARA)

7 June 2018 · 2:21 p.m.

Stella Giannakopoulo, EPFL (DIAS)

7 June 2018 · 2:21 p.m.

Eleni Tzirita Zacharatou, EPFL (DIAS)

7 June 2018 · 2:22 p.m.

Matthias Olma, EPFL (DIAS)

7 June 2018 · 2:22 p.m.

Mirjana Pavlovic, EPFL (DIAS)

7 June 2018 · 2:23 p.m.

Eleni Tzirita Zacharatou, EPFL (DIAS)

7 June 2018 · 2:24 p.m.

Maaz Mohiuddlin, LCA2, IC-EPFL

7 June 2018 · 2:24 p.m.

El Mahdi El Mhamdi, EPFL (LPD)

7 June 2018 · 2:24 p.m.

7 June 2018 · 2:25 p.m.

7 June 2018 · 2:25 p.m.

Erik Meijer, Facebook

7 June 2018 · 2:29 p.m.

Basilio Noris, EPFL

1 Sept. 2011 · 4:18 p.m.