Embed code
Note: this content has been automatically generated.
fighting me um so yeah so i faced look i i'm
a runner group that does three things uh one
is a programming languages uh for data size and machine learning that's what i will talk about today
um and then we do to other things were using machine learning to make our developers more productive
and the third thing we do is we use machine learning to make
our systems more efficient and depending on my uh speaking speed
uh okay like you know might and be able to let me just put my um
yeah timer there so i i might have some time for some extra sites
um so here's what i've been cut like i see myself so
we saw a great presentation this morning about that formal methods
um but the uh current be kept in me was thinking oh oh the first
example was from tony whore in nineteen seventy one about h. search algorithm
and the second example was in twenty eighteen about a search algorithm and
so is this still the case that after so many decades of programming
language research formal methods that we still have trouble to ride got
and now a search method is something that's got really nice you know exactly what you want to
do you have like you know some values and you want to find a particular one
but what about more interesting problems like face recognition that we all have on our phones
or spam filtering and that we all rely on how
would you guys specify those improve done correct and
and so what i want to go to talk to you
about today is and maybe another way of programming were
read don't to use intentional specifications intentions messages in the
sense of a formula that describes what we want
but we're really are really easy that was also like in the second order like
programmer should be lazy so what if we just like you know try to get
like use examples and use that to kind of writer code for us
i think that would make this grumpy get even happier with
now here's i think the root of the problem the root of the problem
is that the abstractions that we use for programming are the
wrong once and and we always have this guy
like problem that's the abstractions that we want and the
abstractions that we have are far apart and
and we keep struggling cat like to move that stone up the wall and then like you know once we got like
up the the close up big i've held and the the
stone runs back and who here uh uses java script
ah not as many people as i thought but i get javascript this
is really bad right every week there's a new javascript framework
where you have to get like you know you just mastered it and then you know that the next one comes and then you have to and
re learn it now of course don't your didn't only guy like have this first example
of a formal verification he was also a very early person that identify this problem
in nineteen seventy two where he got like describe this famous diagram
of abstract data types where there's this functional the top a do a f. from a to b.
that's the thing you want but you have to implement it
in and uh uh and a more concrete domain
from x. to y. and so how to make a dire ram commute that is really the got like
what computer sciences are all about and that is where we spend a lot of our energy
and but as i said like are abstractions are
bad so the way we uh write code
is in terms of mutable actually labelled graph so if you would get like inspect the heap
of a java a house or star are or javascript program
what's the what's the data represents it's a new double edged labelled grass
um and then we're trying to write functions uh but they're
not really functions right these are also side effecting
uh functions that allow arbitrary nesting so mathematically to get
a get a grasp of that structure is really
easy and i think we shot ourselves in the foot by cab like you know starting with this
basically uh the machines that we know how to build are good and one thing and one thing
only they can move data around from one address in memory to another address in memory
and if you google for one instruction machine you will see that you know or this is actually
all you need to you need one instruction namely move data from one side to the other
and and you can do all of computing with that which by itself is pretty remarkable and
but the thing i second your that is ultimately get like it or relying on mutation
right because we're moving some some data from this memory location to that memory locations over writing
the other one so everything and you you can get like you know
you need only this one instruction but it's easy to emulate more
complicated instructions on top of the the the the essence of computing that
we have like the essence of the following my machine is
mutation by moving things from one addressed to the other
and that is the thing that really that we are fighting
and and here's the thing you know the the the the princess and the pea
and so dippy that's the i coloured from alignments b. and then we can get
like in our add layers and layers of abstraction on top of this
and like you know and the princess but the we still feel that darn
be right no matter how many layers of abstraction we put on top of our our underlying machines
you still feel it and in some sense you have to feel it this is called mechanical sympathy so we
cannot pretends to say oh we're doing declarative programming because
ultimately if you want to make your code fast
you have to understand that the thing underneath is a physical machine
and it's a physical machine that can move things from one member location
to the other what's even worse is that it's observer ball that's
moving from one memory location to another might be slower
than moving from some other member location to another
one because there's a whole grab like cash hierarchy so anyone that really was a right efficient goat
we'll have to deal with the underlying um concrete mess of the machines and
no layer of abstraction will help you to get like remove that right
and and this is the pain that i've been trying to get sold my
whole life and i think i must say maybe i've given up
i don't think i can solve this pain anymore in the traditional sense and
i don't think that like you know adding more layers of objectionable help i don't think
adding more formal methods will help we have to kind of go to it very different for
of computation with no that's look at another form of computational
another computational model that has been extremely extremely successful
and that's relational databases yeah i'm pretty sure that and maybe not that many people use javascript but a
lot of people use relational databases was so beautiful
about relational databases is that it will averages
really beautiful and simple mathematics
alright so it's relational algebra is very old
and that caught in nineteen seventy
say hey i see all this got mathematics is like you know i could i've said second take the
union of said second august simple operations who i can build like you know algebra on top
and certainly this thing turns into a multi billion dollar industry and and
you know a lot of our economy runs on it and
but here's the thing the reason it databases are shoes so successful is that
it started with a clean and this is my personal opinion
and it started with a very clean mathematical abstraction
and that allows you know you to build query up the miser send things like that
plus that the the implementation maps quite closely to the underlying harder and the
reason is the tables are not complicated they don't have nice things
and they're just flat things of base types um and still very powerful
now here's the problem as everyone that has got like ever
try to use databases from regular programming languages noticed
the data model for database is very different from these you double edged labelled graphs
and that our programs are normal imperative programs eleven
and all ready in um with the early eighties um dave
meyer got like you know i notice notice that
any goals that the impedance relational object relational impedance mismatch
and am i think this is still a a
people still still write papers about it and
but the right there all our map for big i've like design you type
systems to make these all the useful so again this is another
instance of this problem where there's the abstraction that we want to
which is a relational database there's the abstraction that we have
you double edged labelled graphs and we have to got like you know bridge that gap
and but here's the thing it's the mathematics that you cannot beat right
the fact that this is based on really simple beautiful mathematics
is what makes these databases unbeatable and i know i've the
scars on my back the bullet holes on my back
because i to write right at one point when i was young eric
the goal of my life was to eliminate sequel from this planet
okay i wanted to get rid of sequel and and so that's when i did they think uh
and everybody else was was doing no sequel i don't have anybody you're even remembers no sequel
then i showed that most the goal was to make the the categorical do all sequels i try to collect go sequel
but didn't work at all so now everybody's talking about new sequel
and even if you look at things such park who started out like you know it
flat map and all those great things now the preferred way to programme
spark is using data frames and are in a relational way
and so i've given up on trying to get like beads sequel and get rid of it
and i just looked back and said what is it that makes it was so powerful
and so um versatile and that must be the mathematics you you cannot be mathematics
and and so or if you cannot be that join it
so that's got like you know so that's use mathematics wherever possible
and and of course you should always gotta look too abstract from the mathematics so
and if you look at the relational algebra and it's a beautiful mathematical theory but the question
i always ask myself but i think everybody here should ask themselves
is is that an instance of a more general concept
um because maybe that more general concept as several instances
that you can use or maybe it's easier
or more powerful to implement that more general concept and that is true you know like
in the everyday shells rising instance of of applies league a category
um in in and category theory and now again
this is where i think the the simplicity of the regular like a relational algebra
a warm because people when they hear category teary they
got like a round screening and but i
still think that you know that it's really good to get like whenever you see something
try to abstract and see got like is this an instance of something more general
and and i think that is also what martin has been doing with dorothy got like you
know always trying to find there's in essence always say you can always peel more layers
from the onion and as you peel it later you start to cry because that's
what happens when you can handle onions but it's crying for good cost
alright so that's where we are today it's a very grim picture and
we are stock with this got like weird computational model of imperative computations
over these weird mutable graphs and and what happens is that
i think because we don't find a way forward in this thing because we're
limited by the they got like you know our our actual hardware
and what you see is that people got like the c. d.'s explosion of programming languages and
programming languages and frameworks i should say so it's like you know for the web
uh there's javascript with like hundreds of of frameworks every week there's a new wall
and if you look for more while there's also like you know free berkshire operating systems
uh languages that you know every day there's a gap like you know new wall
on the server side yeah we were talking over over the break about frost
well about this ross that's trying to get put the diaper lipstick on the cat like impaired this
big and which is great but that's not bring
a progress right we're not fundamentally changing something
so i think the world of software so for one point or is doomed
or do so that's why i like the title of of this data future software
the future software is not and what you see on the screen here the future software something different we have
tried for fifty sixty years to make this work we cannot make it work i'm convinced we cannot
so was there where their software to bordeaux arise because after one comes to an
and and so for two point oh there's no humans involved there's no a formal specifications involved
it's great that doesn't that sound great no humans no formal
specifications just data you just throw data at this thing
and it will train itself um and it will work and
now let's see if that can you know if that's just a fairy tale or not because we
just saw that like regular program is no fairy tale but maybe this is a very clear
so here's the thing this is what i'm going to try to get like convince you about uh today
and we know our databases database is a is a magic gadget where you give it codes
some people colour the query but really that's codes right you give it code it hands you back data
not a one trick the one mathematical trick that i used my whole career
and that serve me well it's duality worries flip things around so can we do the opposite
can be building magic device where we give it data and it gives it back got
would would that be amazing well and since it's just to do well
it's you work and because if we can do the thing
on the left we should be able to the thing on the right correct so that's got like programming two point oh
so we're going to get like build this magic device that thursday tack into got
and so programming one point o. was humans turning coffee into gold
um programming to bordeaux is machines that their data into
models of course people use fancy names is like queries but
model is just like it or is just regular code
and i should also got like ad is not just coffee right like
all the grad students here's like probably pizza more than coffee
and now didn't mention this briefly in his introduction
this morning there is a big difference between
the old world of code and a new world of model so maybe it's a good
the thing to get like it or distinguish between these two these two terms a coat
is deterministic and discrete when you have a boolean it's
discrete it has two values true or false
well that's that's very restrictive if you look at your spam
filter there something is span you know but maybe it's
not really span is kinda like a little that spam or a lot of spam so it's a boolean
but it's it's like you know it's true would like you
know eighty five percent and then falls fifteen percent
right and or even like you know it can be you know some range it
can be get like a are and i don't know the the um
how good a a sauna yes that's got like some number between zero and a hundred
um but you know it can be anything in between and like with as small as possible and
it has some kind of distribution so the big difference between go down baubles is that
models are uncertain have uncertainty and them and they're often continuous
and and so those are two things that we have really bands from our traditional programming
languages for a long time and a lot of developers are afraid of uncertainty
um because uncertainly means sloppiness now you didn't you see me
right sick you know i'm i love sloppiness i mean
i that's they got like you know the essence of happiness is
is like sloppiness and like you know to be free
and you know our fell use want to be free they don't want to be discreet they want to get take
you know different values with some probability and but here's the best thing
there's a lot of all really engine mad from the
sixty for some seventeen hundreds and eighteen hundreds that
we can use to make this all work and that makes me really exciting right because remember
you cannot beat maths so we're going to use this all mad from that you know
calculus off that was invented in sixteen hundred and bayes rule
and from seventeen sixty or something ever going to use that to build the new a wave of program
so and i'm going to start with supervised learning and and here's what
we want to do with supervised learning we want to use
again mathematics simple and all the mathematics i don't want to invent
new mathematics like we had to do for computer science
like mathematicians the old mathematicians i don't know what it was what they had in the
water or i don't know maybe it was the the fungus in their grain
but they were much smarter than than us these days right so i want to use
ancient mathematics and and i want to get use that to get like you know to
during those functions by looking at the difference of the training data
and the value that this function computes and they use that dealt
that okay i've tried to minimise that and do that by
minimising get by kind of updating mutating that function so i'm
still do invitation right so i'm still doing comparative stuff
but i'm only doing that when i'm training dysfunction when i'm shaping dysfunction think of that function as
being made of clay or something so i've got like holding that function but once it's done
it's a pure function so i also nicely separated big of imperative
parts and you have like you know i'll be and
purely functional part i know that's the very very simple example because i want to show you that this is really
to reveal math that everybody here in kindergarten has done this
matt okay so that's much easier than three opposed conditions
and separation logic and more triples and whatever complicated mathematics
reached a weekly i don't know lots like we have to gather invent all these artificial mathematics no
for machine learning we can use really simple that next so let's look at
the nerd redirection and is probably in physics class you don't some measurements
like speed versus distance or whatever as you get a whole bunch of wines and and what
you have to do is you have to to find a line through those points
so if this function there is defined by y. equals a. x. plus b.
what i need to find this a and b. because those are the two values that determined
at that line you see that read stuff there that's the error because you get these points are not aligned on this line
and and so what this magic device is going to do you feed at this point
and it will tell you a and b. and now you have learned this function
isn't that great and no programming required or maybe only wants to get built that magic device
and and some people say you know this is just curve fitting i'm happy with that
i'm happy i i'll take care of it they got any day of the week
over you know whatever proving to your rooms are fighting with the type system just get weaker fitting
so oh and the trick that this thing uses is something
called back propagation and back propagation relies on taking derivatives
but before that that's got like look a little bit at but like you know how these functions it look like
so in order to do uh and back propagation our functions need to be differential well
and in order for function to death be differential it has to
be a function that takes real numbers and returns real numbers
i don't know if you remember from calculus you know the functions have to
be continuously capsule delta i you know the you might remember those definitions
and so all these functions that we ride our functions that take a a
topple and and double of real numbers i return and double of of real numbers
and that function is defined as a compositional smaller functions
now and feed the chain rule we know that but
two functions are different trouble the competition is also differential so that works out nicely
um and then you can define the error function that you there's many ways to do that
but notice here that we are yet just like in the um case of relational databases
we do have a beautiful mathematical theory but the functions here are functions that take
doubles of real numbers where are imperative goats takes grass
so also for machine learning we have this abstraction problem where we have to find
and coders and decoders that go from like you know our domain objects into
they get like you know the the values that the machine or an elder the pentagon
i'm pretty sure that the the p. h. d. students here that do machine learning
and they they know how hard it is to go find efficient encoding so off like you
know say to reduce into these got like doubles of of real number should is it
an unsolved problem how do you officially encodes values um such that you can feed into
machine another thing but anyway what we want to do is given this training data
and we want to find a and b. that minimise the error and the air is defined as this song
off they get like you know the actual value minus they get like value
of this uh function applied to your to your uh training data
um this thing uses the derivatives of the staff
you don't have to give understand this i'm just writing this down the way i do these derivative right away
i go to will from all five i type in the function and will from awful tell me the derivative so it's even there you can
be very lazy and and then in order to train this what you do is you guys start with a and b. read them
and then you go for each have value and be a your
uh your training set you just got like you know um
updates the weights or a and b. using this format i i
you just repeat until you run out of your training data
isn't that beautiful so i'm using mutation and i just run this thing
i'm you they these values a and b. until i find them
and then i give them to you and then from then on you have a pure function that just computes
why from x. based on these a and b. a.
okay so now you might ask yourself eric you hands waved your waiter days
how does this all work and in a little bit more detail so that they got like if you
some details here which i think it's very beautiful that um
so why do we use derivatives well as i said we want to minimise this error function
and maybe you remember from high school that you know you can use derivatives to
find the minimum of a function so if you have this project function here
and you want want to find it men and then you gotta look where the derivative is zero right that's a good way to find it
um and here's the got like you know the derivative you learn
this would symbolic differentiation or you can define the derivative
using this thing with actually longer epsilon goes to zero and so you should get like all
remember that from my school correct now the question is how do you find that derivative
and how do we find that inefficient way and this is truly
stunning this the way you do days is is amazing
and and how do we do that by using more mats of course
okay and this is the trick people here remember complex numbers
ideas complex numbers if you do physics or electronics or computer graphics
saw a cat like complex number is some mathematical
thingy that's somebody invented this as o. a. plus b. i. so
it's a pair of two numbers a and b. where
i square it was minus one
you can ask yourself why is i square minus one well
why not right and so now that's to ask another question what if we
didn't say trick we define a pair of numbers a and b.
and just a guess disintegrate we're not calling it i recalling actually long for they got second one
and then we say abstinence where it was zero why not right i mean
we can we can define our rules you can also ask yourself
what about if we do the same thing every instead of like i
and actually take j. i. we say j. squared equals one
good choice to the only great numbers or minus one zero i want all three choices
strangely enough have practical value city otherwise cold hyperbolic numbers you
can go to lead and people find uses for it
but this is the amazing thing so we take complex numbers that we've no for
for how long instead of i square kick was minus one we say
epsilon squared equals zero and what do we get from theirs for because we get
out the metric differentiation but just using dual numbers instead of normal numbers
we can compute in one go if a function and its derivative
it's just mind blowing and now if you want to prove of this it uses taylor expansion
but i think if you look at if you if you go to the
big database for taylor expansion that breed maybe more might blow and
i don't know how somebody can come up with the calf taylor expansion
formula and that's got like just crazy with like you know and
factorial is in some infinite summation but anyway if you want to prove this
you can use these are numbers so that's look at an example
right so here's our function f. that was defined as
an three x. squared plus four that's our ass
and that feed it and again like the reason i like mathematics is my brain is the size of a pea not
and i like to just crank the handle right i just like to do these
computations without thinking so that's his feet as this a plus b. epsilon
so x. equals a. plus b. f.'s also that becomes to re a. plus b. epsilon squared plus four
right just substitute now i'm going to go out like you
know um evaluate a plus b. apps lawn so a
plus b. squared is is squared plus to a. b. plus b. squared so that's what you see there
um no i distribute that three over does some and and a simplified little bit
and now we knew that um the derivative of have was um
and what was it it's a six a and a right or six x.
but here you see the the first term here three a squared plus four that's half of a
six eight that's the derivative of ever apply to eight times be upset on
and then hey because epsilon squared was zero that last year falls out
so if i apply this function to a dual number and i just compute
with it i get back a pair of the function apply to the
argument and the derivative and multiplied by the rest of the thing
isn't that pure magic i mean you won't believe me right
but this works for any function any differential function
and you know this is how you get now
implement uh and computing derivatives and using
operator overloading because we just overload all the operators would this be these dual numbers
and now what's even more beautiful
is that if i can't like the finder function that lives a normal function to a function over dual numbers
that function as a front door and now what does that mean
a phone to relevant or something that distributes over composition and
maintains the identities is like a crazy thing from category theory but if something is a font or it means that it's
it's good it's it's due to the end right so this thing is good and if you remember the chain rule
the chain rule for differentiation is really ugly but the chain role for
these dual numbers is really beautiful because it's just distribution over composition
so these dual numbers are my new best friends and i do it so oh
we can build this magic machine that takes data i and turns it into coat
by giving this think training data we separate the output in the input we pass the input
to this got like you know composition of differential functions
uh we got like a have the parameters to this you can function a. m. b.
then we compare the gap like you know the idea and
derivative of the error function and based on that we update and
the the uh parameters and we just repeat that a long time so
really the only thing that we need here is functions that are differential
and we know how to do differentiation using dual numbers
beautiful easy i know all want the fires no first order logic just simple high
school mathematics that's got like you know the kind of stuff that i like
and now what about neural networks well i don't i think neural
networks are a little bit got like the name is hyped
because it's got supposedly go like modelled on whatever your owns a
we have in our brain but i think the really
it in the neural network the function this composition of differential functions is
of the special form or even activation function and they you do take
the product of the some of the weights with the input
and really why people do this i think is because it involves a lot of multiplication zen additions
and we know in our graphics cards we got got like to that
very efficiently so i think the whole reason that deep learning
and neural nets is is popular or or that we pick these functions
it's because of should g. p. use for me has nothing to do we
don't have like matrix multiplication in our school and but it works
alright so what is differential programming well different about programming is
anything where you write the program using differential functions where these differential functions
don't have to be that have layers of a neural net
but can be more complicated one of the uh presentations the posters used to three l. s.
d. m.s i don't know who that was is that person still in the room
yes so that is a form of like you know in more complicated a differential
function good right so your this is the guy you should talk to him
he is the future he's do differential programming and he's doing get over more complicated structures
then get like just and the arrays of doubles he's doing it over trees and
the job so what's the future of programming number one this use differential programming to define models
okay good and now as i said always ask yourself
can you extract a you make this better well
people these days use light on and a lot of that
by dies and typed and and and strings and whatever
so i think even for the people that want to stay in the old world
of programming languages there are still a lot of stuff to do here
you got like you know make sure that we don't get stock in the world of quite on and and
a dog world and and the other thing i should say is that really
the this world of programming to point those are all the functional programming
and so people like martin has been we've been telling everybody here like his whole life you know functional
programming for you thought about artists hike that the functional programming that's well functional program as well
except it's called differential program and uh alright and so there's like a lot of papers that
you can get like if you have the slide you can use this e. r.
they're sick even this this is that you know from x. a. e. p. f. l. the
ark i think it is easy to use is here all done in skyline and alright
so i think i i pushed a little bit under the carpet that these models are got like it only approximate the function
and but of course this is true for all functions right i said that in the
beginning if even if we ride in a it's got a function or something
it's not the real mathematical function it's something that has side effects and these models
in that um if you are also functions that uh have side effects
and accept that this side effect is represented by a probability distribution
so oh we so from cars and now we see more that's right
well there's a talk about the future of programming without more nets
and so this probability distribution thing happens to be about that
or happens to be i don't know i think this is what like you know whoever got intended it to be right
if it's good then it's about that and before can be more now that has to be a front or and
so what is the probability monad think of it as like a little database um of values
of type a a for each of these values has a probability associated with it
okay so that's that's that my intuition for a probability distribution it's like a database
but each role of the database has some value some weight with it
and then you have to always have we have a more that you have to have
a bind um but to be um said decisions goal that um conditional distributions so
his me which pair brackets be a bar a means the probability of be given a
but that's exactly function right function is like something a function from
a to b. you can say that's a be given a
so a conditional distribution don't get fooled is just a function
that returns a probability distribution with and this thing here
is the most beautiful theorem i think ever invented
and this is from seventy sixty three this
you know magician base was the world's first functional programmer he must have been alien
because this theorem tells you how to take a function there on the top
so it's a function from eight to a probability distribution of b.
and turn that into a function that given a be readership robot edition of a
so the bayes theorem tells you how to invert a magnetic function
and in that it's nobody knew mon x. nobody knew functional programming but there's no it
yeah i came up with this remarkable theorem that shows how you can convert a magnetic function
and a lot of inference is based on bayes rule where you can like read in for this function it's amazing so this
this uh and all of this um with probability distributions you
can do a lot of statistics using bayes rule
and as i said like a probability distribution is a little bit like a database so let's look at
and where where this simple query language so let's look at a probability distribution over playing cards
right so i can say and what is the probability of a card or the rank is queen
you see there's a a a predicate in there and this gives me that the
value of that thing is i get off for out of fifty four
so think of this thing really i say dippy is a database and i
stick in a query and that gives me the answer is this value
or i can say give me the probability of the fact that
this thing is the heart given that the rank is we
well there's four queens there's one queen of hearts or this is like you know one out of four
and then i can say give me the probability that something is the queen and
a hard and that's got like you know there's only one out of the fifty four
so really don't get intimidated by this probability distributions think of these as databases
with a very simple query language and you will be fine no
i think a lot of the literature on statistics is is
overly complicated if you see it in a computer science way
is really simple i read years another got like example
but that's good like you know i want to do close off by showing you how we can use
um machines aren't models to write got so there's a famous quote by jeff the from google that
says like if google would have written from scratch today half of it would be learnt
but what that means is that half of it would be done with these got beautiful machine lowered algorithms
but still half would be that whole pesky imperative got
so well i've not solve your problems yet but you still have to get half of
your code will still be imperative goes i'm giving here example how that will work
say i want to build this killer apps i heard that like you know the the apps
stores like whatever hundred billion dollars of revenue i want to get get in there
so i'm going to write that have that will do the following you take of snap
a picture of an animal and it will tell you whether this animal respectable
okay buddy again petted and of course if you have a a meat eating plant i don't know if that's an animal that
it looks fishes you actually i wouldn't bet it or if you have a crocodile i wouldn't bet it
but if you have like a chick let you get back to it okay so this is the
apt that we're going to build that this is going to go make me filthy rich
and that the average of the seller uh the average well there will be a hundred million dollars here that
is apt will do that for me uh for us i should say because uh our average goes up
and so the first thing is is is what i do is i train a neural net to take in the picture and it
will give me a description of the animal so i feed in a picture of a big and it will say big
with a probability point eight eric meyer with probability
point two or something like that and
so that's the animal detector now i need to have a text and allies are which is
a different model which i give it a a and a description of the animal
and from that it will learn or it will understand because it can
do natural language processing better this animal is safe to pat
and then of course i need to get like you know or have a way where i can get from the description of an animal
to the text and how do i do that well we're using a web servers goal
to be could be yeah but that web service call will return a future
right because it's small go all over the web is also now we have a program
that requires probability distributions and futures so in order to know well there's together
i think a picture i busted or my a. c. and and i get the probability distribution animal
i want to push that animal and today we could be their web servers goal but hopes the types don't match
i get the type error there and then i get a future attacks that i want to put in this other
and the text um natural language thing but again it is a future and i need text
so how do i do this how can i write code that kind of combines
machines or models bit back service calls and and turn those into apps
so for that we need a second thing and that's cold
probabilistic programming so progress to program and is old fashioned
imperative programming except where we had probability distributions as
first class a factor first large values
so maybe some of you know that there's this gal library for
that but javascript to support for that darkness aboard for that's
the sharpness of or that for a single weight if you have a future you can get like a way to future
so probabilistic programming languages nothing else where you can sample from a distribution so
it it takes these probability distributions and turns them into first class affects
which means that we can now write code that does this
so here uh with this code look like and so what this thing readers
is a future of a probability the distribution of that um and then
we will still have to kind of collapse that but that's got like you know a separate problem
and so what we can do is we can say given the picture we feed identity
and they're all that we get back a probability distribution we
sample from the distribution which gives us an animal
replies that into we could be the ah we await that value to you getting value from the future
and then we take that text box and the other model which gives the probability distribution and then we
can like that tells us whether there's things but the ball and the implementation of this and um
of the sample and thinks that what it does it uses base rules under the covers
and it multiplies the probability so basically it got runs this
computation several times of course it tries to and
cash the um as much as possible including this uh wait if you're gonna trying to get like look up the saying i'm an
animal twice a week to be it only does it wants yeah but basically run saying well the garlic simulation of this program
and at face book we build this thing on top of speech b. and which is and you get i think maybe
crazy but again like that's what we do but i think there was a very old paper from many years ago
where there was a probabilistic programming language also in skyline it's everywhere and i think this is what we need
uh have as the second thing for the for the future so the first thing was use machine learning to define models
and then in order to compose these models into actual
code we need probabilistic program with okay so i'll
programming the program of the future is neural nets plus probabilities
and i i don't know what's how much time do i have ah i'm i'm over time but i want to show you one more
small thing because i'm addicted to duality so i want to show you
one more application of duality another way to get like you know
to show how deeply mathematically intimately connected
probabilistic programming and neural nets are
so in these neural nets what we're trying to do is we're trying to minimise the loss function
right we were using the derivative to find the minimum of this
got was function and use that to update our weights
what if instead of minimise it was function we try to maximise a probability
okay minimising something maximising the other picture this is another example of
duality sure let's use that here and as i said
you can look at probability distribution as a little bit of big small databases so that we
can write the query so and this is another way to ride is linear regression
but not using and machine learning not using training and updating
weights but using probability distribution so what we do
is we guess a and b. so we take a and b. from
some distribution usually a gaussian but i just guess them at random
then id find this function have to be a h. plus b.
then we have to model the noise the error that the the the the rats things that were in the graph
and then i just got like you know jack in the training data and i can try to get like find
all the a.'s and b.'s search that like you know um this thing is is minimised
and then i select the actually this is a function that returns a probability distribution of functions
and if you go go for basie and enter regression this is what you find
but this is also kind of really nice right i have a function
that returns a probability distribution of functions that's pretty mind blowing
and now you might not believe me that this is actual executable code uh if you
go on the web there's a language called by people wrap p. p. l.
um and if you write this program here which looks very much like this program
and you run it it will tell you that and the value for
b. or a is just like eight but with this distribution where
the machine art program would just give you like one value here
just like i'm uncertain about it so it's like this distribution
alright now i think we saw a couple of posters um about the challenges of machine learning
so there's things that like you know given the model you might be
able to extract the original training data which might leak information
and there's many kept like you know unsolved problems and
but this is to be expected right so for the normal so function in process for imperative programs to be
we have had fifty years to guess solve a lot of these problems for and for this machine learn things
we're still uh we have a lot of other problems so it's like how do we build
them how do we train them how do you put a model under version control
well really should put the data into version control how do you get like a it's a whole
bunch of things how do you debug these models how do you understand what they're doing
and so i got like try to paint a kind of rosy picture but
that this is like you know full employment to your room for
for us computer scientists because all those problems are and soul so
that's why this little person hears laying on the floor crying
so oh my last slide this in desert is near soon
computers will be able to programme themselves without our help
but i hope that this snake won't bite itself ankle itself before that happens and we get the seconds
and a i went there but so far i'm optimistic and and i really got like you know i'm
looking forward to the future where we just pump data into this thing and our code comes out
and uh that would be awesome because then we as humans can spend more time you know
drinking beer or hanging out making music whatever and that the machines do the work thank you so much
three and four persons
if not then huh so thank you for a a in fig talk
there was a interested about this uh oh percentages of arts are
part of the software infrastructure might be learned through art
a return so where are we now could you give some sort of intuition for uh oh what a
of course and concrete applications what what can we expect to train what are
of softer functional did we expect to have to write manually or two holes for yes so that's a good question so
if you look at for example your mobile phone right so like you know there you can like when user camera
it's got like shows like you know the faces in the crowd like you know in the picture as your suit your photo
so that is kind of like you know a a in an example
where you know dad got little square box that recognises the face
is learns but it's got like i bet it into got like you know traditional or yeah it to you why that kind of shows that
and other examples are and think of things like you know where
location and i was at the talk at at and left
so the thing is when you have like a left or you better
and and you you uh ask for a for a a ride
it will tell you the estimated time of arrival now of course
if the estimated time of arrival says five minutes and it takes ten minutes that the customers very unhappy
and so that's got like you know an example where they are
using the get like you know i'm a model together predict
and the estimated time of arrival but again it's embedded into kind
of a regular and draw it or iowa zap so
i would say well i i'm not sure that we're at fifty fifty
yet but i think you know this is quickly and increasing
so oh also i i want to ask some
problems the programming the tombs program one company
it is my understanding between is the easiest stops it'll probably stay one t. v.s oh
but it isn't exactly like that isn't because in probably is the
problem can i'm sticking is more like just some old bits
uh_huh sort and you know sometimes of the whole comes but may not at
all reached between his teeth on e. mail it it's it's not exactly
let the terms the cell simple mistake but between these these one
case probabilistic these kind of collins meetings often bigelow so is
there a lot and we didn't cool still easy actually getting calls always thinking too much because it was a little
you know it is just um you know probably think is gonna eat is should all be based
on your data you know he may or may not happen it's okay so it was um
uh let me get go back to this one here so here's got like you know an example of
like you know where when you can't like computes
the parameters of this model using probabilistic programming
it will give you this distribution so it will tell you the value of like you know it be
a. e. of a expose b. is eight but it's
got like you know you're not hundred percent certain
right where in the other case we used people earning you're trying to get like you know
you say under to minimise the error so you take the thing with the maximum probability
so this is one of the people often do is they pick the value with the highest
probability if they want to go over probabilistic model to would like it deterministic model but
what you can see is like a this is a very simple distribution but you might have distributions that
have like you know maybe two values with the same probabilities are then how do you pick those
and but i think it's not that much different than with any monad where you have to have like an unsafe operation to
get out of the mounted so like you know it if you're in the in the aisle more that order state monad
you have to have it and save way to get out and to really do go out of the
probability monad it's unsafe that that would be got like my intuition for that i think so
okay i have a question of ah right on it
yeah um i don't know question about um that at the present moment we have some concern now
or not understanding why the models are meeting at this meeting and so in this ah approach
that you're suggesting or even for the rehearsals on the model building process i wonder
is that going to help or or actually heard is trying to do i understand what he's
oh so excellent question so that's again like you know i i just want to point here this this little person
there is a lot of tears and he's in a lot of agony and and so yeah so and
understanding what happens is kind of like you know why these things happen is
is kind of like you know a big over problem i think and
no people are trying to say is like oh maybe we should uses get different got a mobile decision trees or something because they are
kind of explainable i'm not sure that that's the case uh decisions
reuse it got giants like you know long nested conditional and
there are other ways to get that people are trying to get like you know do this by giving you can't look inside and
what happens in between the layers and so on and but yeah this is is a big open problem and again like
this is what i meant i mean by this right okay hold this thing doesn't quite itself
and got like you know accidentally kills itself because of these problems and but on the other hand
and maybe this is like a non answer but if we built a really complicated distributed system we also
really have a hard time to understand like you know how it works or like you know
or debug it or understand it and so in that sense i
think it's a little bit i i can ah i'm
not kind of like trying to get like blow you off but i do think that for sufficiently complex imperative goat
um you will have a hard time also to get like you know to understand exactly what is
do if i if i ask you to go explain to me the the linux kernel ah
that will be hard thing or this got like a buyer discovered that sector you will
have a hard time to do that so you'd rather than to read through
will it cause they record it sorry um

Share this talk: 

Conference program

Welcome address
Andreas Mortensen, Vice President for Research, EPFL
7 June 2018 · 9:49 a.m.
Jim Larus, Dean of IC School, EPFL
7 June 2018 · 10 a.m.
The Young Software Engineer’s Guide to Using Formal Methods
K. Rustan M. Leino, Amazon
7 June 2018 · 10:16 a.m.
Safely Disrupting Computer Networks with Software
Katerina Argyraki, EPFL
7 June 2018 · 11:25 a.m.
Short IC Research Presentation 2: Gamified Rehabilitation with Tangible Robots
Arzu Guneysu Ozgur, EPFL (CHILI)
7 June 2018 · 12:15 p.m.
Short IC Research Presentation 3: kickoff.ai
Lucas Maystre, Victor Kristof, EPFL (LCA)
7 June 2018 · 12:19 p.m.
Short IC Research Presentation 5: CleanM
Stella Giannakopoulo, EPFL (DIAS)
7 June 2018 · 12:25 p.m.
Short IC Research Presentation 6: Understanding Cities through Data
Eleni Tzirita Zacharatou, EPFL (DIAS)
7 June 2018 · 12:27 p.m.
Short IC Research Presentation 7: Datagrowth and application trends
Matthias Olma, EPFL (DIAS)
7 June 2018 · 12:31 p.m.
Short IC Research Presentation 8: Point Cloud, a new source of knowledge
Mirjana Pavlovic, EPFL (DIAS)
7 June 2018 · 12:34 p.m.
Short IC Research Presentation 9: To Click or not to Click?
Eleni Tzirita Zacharatou, EPFL (DIAS)
7 June 2018 · 12:37 p.m.
20s pitch 1: Cost and Energy Efficient Data Management
Utku Sirin, (DIAS)
7 June 2018 · 2:20 p.m.
20s pitch 2: Gamification of Rehabilitation
Arzu Guneysu Ozgur, EPFL (CHILI)
7 June 2018 · 2:21 p.m.
20s pitch 4: Neural Network Guided Expression Transformation
Romain Edelmann, EPFL (LARA)
7 June 2018 · 2:21 p.m.
20s pitch 5: Unified, High Performance Data Cleaning
Stella Giannakopoulo, EPFL (DIAS)
7 June 2018 · 2:21 p.m.
20s pitch 6: Interactive Exploration of Urban Data with GPUs
Eleni Tzirita Zacharatou, EPFL (DIAS)
7 June 2018 · 2:22 p.m.
20s pitch 7: Interactive Data Exploration
Matthias Olma, EPFL (DIAS)
7 June 2018 · 2:22 p.m.
20s pitch 8: Efficient Point Cloud Processing
Mirjana Pavlovic, EPFL (DIAS)
7 June 2018 · 2:23 p.m.
20s pitch 9: To Click or not to Click?
Eleni Tzirita Zacharatou, EPFL (DIAS)
7 June 2018 · 2:24 p.m.
20s pitch 10: RaaSS Reliability as a Software Service
Maaz Mohiuddlin, LCA2, IC-EPFL
7 June 2018 · 2:24 p.m.
20s pitch 11: Adversarial Machine Learning in Byzantium
El Mahdi El Mhamdi, EPFL (LPD)
7 June 2018 · 2:24 p.m.
Machine Learning: Alchemy for the Modern Computer Scientist
Erik Meijer, Facebook
7 June 2018 · 2:29 p.m.

Recommended talks

Torch 1
Soumith Chintala, Facebook
5 July 2016 · 10:02 a.m.