Embed code
Note: this content has been automatically generated.
00:00:00
fighting me um so yeah so i faced look i i'm
00:00:05
a runner group that does three things uh one
00:00:08
is a programming languages uh for data size and machine learning that's what i will talk about today
00:00:14
um and then we do to other things were using machine learning to make our developers more productive
00:00:20
and the third thing we do is we use machine learning to make
00:00:23
our systems more efficient and depending on my uh speaking speed
00:00:28
uh okay like you know might and be able to let me just put my um
00:00:38
yeah timer there so i i might have some time for some extra sites
00:00:43
um so here's what i've been cut like i see myself so
00:00:47
we saw a great presentation this morning about that formal methods
00:00:52
um but the uh current be kept in me was thinking oh oh the first
00:00:56
example was from tony whore in nineteen seventy one about h. search algorithm
00:01:02
and the second example was in twenty eighteen about a search algorithm and
00:01:08
so is this still the case that after so many decades of programming
00:01:14
language research formal methods that we still have trouble to ride got
00:01:20
and now a search method is something that's got really nice you know exactly what you want to
00:01:25
do you have like you know some values and you want to find a particular one
00:01:29
but what about more interesting problems like face recognition that we all have on our phones
00:01:36
or spam filtering and that we all rely on how
00:01:40
would you guys specify those improve done correct and
00:01:45
and so what i want to go to talk to you
00:01:46
about today is and maybe another way of programming were
00:01:52
read don't to use intentional specifications intentions messages in the
00:01:56
sense of a formula that describes what we want
00:01:59
but we're really are really easy that was also like in the second order like
00:02:04
programmer should be lazy so what if we just like you know try to get
00:02:08
like use examples and use that to kind of writer code for us
00:02:12
i think that would make this grumpy get even happier with
00:02:17
now here's i think the root of the problem the root of the problem
00:02:22
is that the abstractions that we use for programming are the
00:02:26
wrong once and and we always have this guy
00:02:31
like problem that's the abstractions that we want and the
00:02:34
abstractions that we have are far apart and
00:02:39
and we keep struggling cat like to move that stone up the wall and then like you know once we got like
00:02:45
up the the close up big i've held and the the
00:02:48
stone runs back and who here uh uses java script
00:02:53
ah not as many people as i thought but i get javascript this
00:02:57
is really bad right every week there's a new javascript framework
00:03:01
where you have to get like you know you just mastered it and then you know that the next one comes and then you have to and
00:03:09
re learn it now of course don't your didn't only guy like have this first example
00:03:14
of a formal verification he was also a very early person that identify this problem
00:03:20
in nineteen seventy two where he got like describe this famous diagram
00:03:24
of abstract data types where there's this functional the top a do a f. from a to b.
00:03:30
that's the thing you want but you have to implement it
00:03:33
in and uh uh and a more concrete domain
00:03:39
from x. to y. and so how to make a dire ram commute that is really the got like
00:03:44
what computer sciences are all about and that is where we spend a lot of our energy
00:03:51
and but as i said like are abstractions are
00:03:55
bad so the way we uh write code
00:03:59
is in terms of mutable actually labelled graph so if you would get like inspect the heap
00:04:04
of a java a house or star are or javascript program
00:04:08
what's the what's the data represents it's a new double edged labelled grass
00:04:15
um and then we're trying to write functions uh but they're
00:04:19
not really functions right these are also side effecting
00:04:23
uh functions that allow arbitrary nesting so mathematically to get
00:04:28
a get a grasp of that structure is really
00:04:31
easy and i think we shot ourselves in the foot by cab like you know starting with this
00:04:38
basically uh the machines that we know how to build are good and one thing and one thing
00:04:44
only they can move data around from one address in memory to another address in memory
00:04:52
and if you google for one instruction machine you will see that you know or this is actually
00:04:58
all you need to you need one instruction namely move data from one side to the other
00:05:03
and and you can do all of computing with that which by itself is pretty remarkable and
00:05:11
but the thing i second your that is ultimately get like it or relying on mutation
00:05:18
right because we're moving some some data from this memory location to that memory locations over writing
00:05:24
the other one so everything and you you can get like you know
00:05:28
you need only this one instruction but it's easy to emulate more
00:05:32
complicated instructions on top of the the the the essence of computing that
00:05:37
we have like the essence of the following my machine is
00:05:41
mutation by moving things from one addressed to the other
00:05:45
and that is the thing that really that we are fighting
00:05:50
and and here's the thing you know the the the the princess and the pea
00:05:55
and so dippy that's the i coloured from alignments b. and then we can get
00:06:00
like in our add layers and layers of abstraction on top of this
00:06:04
and like you know and the princess but the we still feel that darn
00:06:11
be right no matter how many layers of abstraction we put on top of our our underlying machines
00:06:18
you still feel it and in some sense you have to feel it this is called mechanical sympathy so we
00:06:24
cannot pretends to say oh we're doing declarative programming because
00:06:28
ultimately if you want to make your code fast
00:06:31
you have to understand that the thing underneath is a physical machine
00:06:36
and it's a physical machine that can move things from one member location
00:06:39
to the other what's even worse is that it's observer ball that's
00:06:44
moving from one memory location to another might be slower
00:06:48
than moving from some other member location to another
00:06:51
one because there's a whole grab like cash hierarchy so anyone that really was a right efficient goat
00:06:58
we'll have to deal with the underlying um concrete mess of the machines and
00:07:04
no layer of abstraction will help you to get like remove that right
00:07:09
and and this is the pain that i've been trying to get sold my
00:07:12
whole life and i think i must say maybe i've given up
00:07:18
i don't think i can solve this pain anymore in the traditional sense and
00:07:24
i don't think that like you know adding more layers of objectionable help i don't think
00:07:29
adding more formal methods will help we have to kind of go to it very different for
00:07:34
of computation with no that's look at another form of computational
00:07:41
another computational model that has been extremely extremely successful
00:07:46
and that's relational databases yeah i'm pretty sure that and maybe not that many people use javascript but a
00:07:53
lot of people use relational databases was so beautiful
00:07:57
about relational databases is that it will averages
00:08:02
really beautiful and simple mathematics
00:08:06
alright so it's relational algebra is very old
00:08:11
and that caught in nineteen seventy
00:08:14
say hey i see all this got mathematics is like you know i could i've said second take the
00:08:20
union of said second august simple operations who i can build like you know algebra on top
00:08:27
and certainly this thing turns into a multi billion dollar industry and and
00:08:31
you know a lot of our economy runs on it and
00:08:36
but here's the thing the reason it databases are shoes so successful is that
00:08:42
it started with a clean and this is my personal opinion
00:08:46
and it started with a very clean mathematical abstraction
00:08:50
and that allows you know you to build query up the miser send things like that
00:08:55
plus that the the implementation maps quite closely to the underlying harder and the
00:09:01
reason is the tables are not complicated they don't have nice things
00:09:05
and they're just flat things of base types um and still very powerful
00:09:12
now here's the problem as everyone that has got like ever
00:09:15
try to use databases from regular programming languages noticed
00:09:20
the data model for database is very different from these you double edged labelled graphs
00:09:27
and that our programs are normal imperative programs eleven
00:09:32
and all ready in um with the early eighties um dave
00:09:37
meyer got like you know i notice notice that
00:09:41
any goals that the impedance relational object relational impedance mismatch
00:09:46
and am i think this is still a a
00:09:49
people still still write papers about it and
00:09:53
but the right there all our map for big i've like design you type
00:09:57
systems to make these all the useful so again this is another
00:10:01
instance of this problem where there's the abstraction that we want to
00:10:05
which is a relational database there's the abstraction that we have
00:10:10
you double edged labelled graphs and we have to got like you know bridge that gap
00:10:14
and but here's the thing it's the mathematics that you cannot beat right
00:10:19
the fact that this is based on really simple beautiful mathematics
00:10:24
is what makes these databases unbeatable and i know i've the
00:10:28
scars on my back the bullet holes on my back
00:10:31
because i to write right at one point when i was young eric
00:10:36
the goal of my life was to eliminate sequel from this planet
00:10:41
okay i wanted to get rid of sequel and and so that's when i did they think uh
00:10:47
and everybody else was was doing no sequel i don't have anybody you're even remembers no sequel
00:10:54
then i showed that most the goal was to make the the categorical do all sequels i try to collect go sequel
00:11:01
but didn't work at all so now everybody's talking about new sequel
00:11:06
and even if you look at things such park who started out like you know it
00:11:10
flat map and all those great things now the preferred way to programme
00:11:15
spark is using data frames and are in a relational way
00:11:20
and so i've given up on trying to get like beads sequel and get rid of it
00:11:27
and i just looked back and said what is it that makes it was so powerful
00:11:31
and so um versatile and that must be the mathematics you you cannot be mathematics
00:11:37
and and so or if you cannot be that join it
00:11:42
so that's got like you know so that's use mathematics wherever possible
00:11:48
and and of course you should always gotta look too abstract from the mathematics so
00:11:56
and if you look at the relational algebra and it's a beautiful mathematical theory but the question
00:12:03
i always ask myself but i think everybody here should ask themselves
00:12:07
is is that an instance of a more general concept
00:12:11
um because maybe that more general concept as several instances
00:12:17
that you can use or maybe it's easier
00:12:21
or more powerful to implement that more general concept and that is true you know like
00:12:27
in the everyday shells rising instance of of applies league a category
00:12:32
um in in and category theory and now again
00:12:38
this is where i think the the simplicity of the regular like a relational algebra
00:12:43
a warm because people when they hear category teary they
00:12:47
got like a round screening and but i
00:12:50
still think that you know that it's really good to get like whenever you see something
00:12:55
try to abstract and see got like is this an instance of something more general
00:13:00
and and i think that is also what martin has been doing with dorothy got like you
00:13:04
know always trying to find there's in essence always say you can always peel more layers
00:13:09
from the onion and as you peel it later you start to cry because that's
00:13:13
what happens when you can handle onions but it's crying for good cost
00:13:19
alright so that's where we are today it's a very grim picture and
00:13:28
we are stock with this got like weird computational model of imperative computations
00:13:34
over these weird mutable graphs and and what happens is that
00:13:40
i think because we don't find a way forward in this thing because we're
00:13:44
limited by the they got like you know our our actual hardware
00:13:49
and what you see is that people got like the c. d.'s explosion of programming languages and
00:13:54
programming languages and frameworks i should say so it's like you know for the web
00:13:58
uh there's javascript with like hundreds of of frameworks every week there's a new wall
00:14:04
and if you look for more while there's also like you know free berkshire operating systems
00:14:10
uh languages that you know every day there's a gap like you know new wall
00:14:14
on the server side yeah we were talking over over the break about frost
00:14:19
well about this ross that's trying to get put the diaper lipstick on the cat like impaired this
00:14:25
big and which is great but that's not bring
00:14:29
a progress right we're not fundamentally changing something
00:14:34
so i think the world of software so for one point or is doomed
00:14:40
or do so that's why i like the title of of this data future software
00:14:44
the future software is not and what you see on the screen here the future software something different we have
00:14:51
tried for fifty sixty years to make this work we cannot make it work i'm convinced we cannot
00:14:59
so was there where their software to bordeaux arise because after one comes to an
00:15:06
and and so for two point oh there's no humans involved there's no a formal specifications involved
00:15:14
it's great that doesn't that sound great no humans no formal
00:15:19
specifications just data you just throw data at this thing
00:15:23
and it will train itself um and it will work and
00:15:29
now let's see if that can you know if that's just a fairy tale or not because we
00:15:35
just saw that like regular program is no fairy tale but maybe this is a very clear
00:15:42
so here's the thing this is what i'm going to try to get like convince you about uh today
00:15:47
and we know our databases database is a is a magic gadget where you give it codes
00:15:56
some people colour the query but really that's codes right you give it code it hands you back data
00:16:03
not a one trick the one mathematical trick that i used my whole career
00:16:07
and that serve me well it's duality worries flip things around so can we do the opposite
00:16:12
can be building magic device where we give it data and it gives it back got
00:16:21
would would that be amazing well and since it's just to do well
00:16:27
it's you work and because if we can do the thing
00:16:31
on the left we should be able to the thing on the right correct so that's got like programming two point oh
00:16:37
so we're going to get like build this magic device that thursday tack into got
00:16:44
and so programming one point o. was humans turning coffee into gold
00:16:50
um programming to bordeaux is machines that their data into
00:16:56
models of course people use fancy names is like queries but
00:16:59
model is just like it or is just regular code
00:17:04
and i should also got like ad is not just coffee right like
00:17:08
all the grad students here's like probably pizza more than coffee
00:17:12
and now didn't mention this briefly in his introduction
00:17:18
this morning there is a big difference between
00:17:21
the old world of code and a new world of model so maybe it's a good
00:17:26
the thing to get like it or distinguish between these two these two terms a coat
00:17:32
is deterministic and discrete when you have a boolean it's
00:17:36
discrete it has two values true or false
00:17:41
well that's that's very restrictive if you look at your spam
00:17:45
filter there something is span you know but maybe it's
00:17:50
not really span is kinda like a little that spam or a lot of spam so it's a boolean
00:17:55
but it's it's like you know it's true would like you
00:17:57
know eighty five percent and then falls fifteen percent
00:18:02
right and or even like you know it can be you know some range it
00:18:08
can be get like a are and i don't know the the um
00:18:15
how good a a sauna yes that's got like some number between zero and a hundred
00:18:22
um but you know it can be anything in between and like with as small as possible and
00:18:27
it has some kind of distribution so the big difference between go down baubles is that
00:18:33
models are uncertain have uncertainty and them and they're often continuous
00:18:38
and and so those are two things that we have really bands from our traditional programming
00:18:43
languages for a long time and a lot of developers are afraid of uncertainty
00:18:49
um because uncertainly means sloppiness now you didn't you see me
00:18:53
right sick you know i'm i love sloppiness i mean
00:18:57
i that's they got like you know the essence of happiness is
00:19:01
is like sloppiness and like you know to be free
00:19:04
and you know our fell use want to be free they don't want to be discreet they want to get take
00:19:09
you know different values with some probability and but here's the best thing
00:19:16
there's a lot of all really engine mad from the
00:19:20
sixty for some seventeen hundreds and eighteen hundreds that
00:19:24
we can use to make this all work and that makes me really exciting right because remember
00:19:29
you cannot beat maths so we're going to use this all mad from that you know
00:19:35
calculus off that was invented in sixteen hundred and bayes rule
00:19:40
and from seventeen sixty or something ever going to use that to build the new a wave of program
00:19:48
so and i'm going to start with supervised learning and and here's what
00:19:53
we want to do with supervised learning we want to use
00:19:56
again mathematics simple and all the mathematics i don't want to invent
00:20:02
new mathematics like we had to do for computer science
00:20:06
like mathematicians the old mathematicians i don't know what it was what they had in the
00:20:10
water or i don't know maybe it was the the fungus in their grain
00:20:14
but they were much smarter than than us these days right so i want to use
00:20:19
ancient mathematics and and i want to get use that to get like you know to
00:20:25
during those functions by looking at the difference of the training data
00:20:30
and the value that this function computes and they use that dealt
00:20:33
that okay i've tried to minimise that and do that by
00:20:37
minimising get by kind of updating mutating that function so i'm
00:20:42
still do invitation right so i'm still doing comparative stuff
00:20:47
but i'm only doing that when i'm training dysfunction when i'm shaping dysfunction think of that function as
00:20:52
being made of clay or something so i've got like holding that function but once it's done
00:20:59
it's a pure function so i also nicely separated big of imperative
00:21:04
parts and you have like you know i'll be and
00:21:09
purely functional part i know that's the very very simple example because i want to show you that this is really
00:21:17
to reveal math that everybody here in kindergarten has done this
00:21:21
matt okay so that's much easier than three opposed conditions
00:21:26
and separation logic and more triples and whatever complicated mathematics
00:21:32
reached a weekly i don't know lots like we have to gather invent all these artificial mathematics no
00:21:38
for machine learning we can use really simple that next so let's look at
00:21:42
the nerd redirection and is probably in physics class you don't some measurements
00:21:47
like speed versus distance or whatever as you get a whole bunch of wines and and what
00:21:53
you have to do is you have to to find a line through those points
00:21:58
so if this function there is defined by y. equals a. x. plus b.
00:22:05
what i need to find this a and b. because those are the two values that determined
00:22:10
at that line you see that read stuff there that's the error because you get these points are not aligned on this line
00:22:18
and and so what this magic device is going to do you feed at this point
00:22:24
and it will tell you a and b. and now you have learned this function
00:22:29
isn't that great and no programming required or maybe only wants to get built that magic device
00:22:36
and and some people say you know this is just curve fitting i'm happy with that
00:22:42
i'm happy i i'll take care of it they got any day of the week
00:22:45
over you know whatever proving to your rooms are fighting with the type system just get weaker fitting
00:22:53
so oh and the trick that this thing uses is something
00:22:59
called back propagation and back propagation relies on taking derivatives
00:23:05
but before that that's got like look a little bit at but like you know how these functions it look like
00:23:11
so in order to do uh and back propagation our functions need to be differential well
00:23:18
and in order for function to death be differential it has to
00:23:21
be a function that takes real numbers and returns real numbers
00:23:25
i don't know if you remember from calculus you know the functions have to
00:23:28
be continuously capsule delta i you know the you might remember those definitions
00:23:34
and so all these functions that we ride our functions that take a a
00:23:39
topple and and double of real numbers i return and double of of real numbers
00:23:44
and that function is defined as a compositional smaller functions
00:23:50
now and feed the chain rule we know that but
00:23:56
two functions are different trouble the competition is also differential so that works out nicely
00:24:01
um and then you can define the error function that you there's many ways to do that
00:24:07
but notice here that we are yet just like in the um case of relational databases
00:24:13
we do have a beautiful mathematical theory but the functions here are functions that take
00:24:20
doubles of real numbers where are imperative goats takes grass
00:24:28
so also for machine learning we have this abstraction problem where we have to find
00:24:33
and coders and decoders that go from like you know our domain objects into
00:24:38
they get like you know the the values that the machine or an elder the pentagon
00:24:42
i'm pretty sure that the the p. h. d. students here that do machine learning
00:24:46
and they they know how hard it is to go find efficient encoding so off like you
00:24:52
know say to reduce into these got like doubles of of real number should is it
00:24:57
an unsolved problem how do you officially encodes values um such that you can feed into
00:25:03
machine another thing but anyway what we want to do is given this training data
00:25:09
and we want to find a and b. that minimise the error and the air is defined as this song
00:25:15
off they get like you know the actual value minus they get like value
00:25:19
of this uh function applied to your to your uh training data
00:25:26
um this thing uses the derivatives of the staff
00:25:31
you don't have to give understand this i'm just writing this down the way i do these derivative right away
00:25:36
i go to will from all five i type in the function and will from awful tell me the derivative so it's even there you can
00:25:42
be very lazy and and then in order to train this what you do is you guys start with a and b. read them
00:25:50
and then you go for each have value and be a your
00:25:53
uh your training set you just got like you know um
00:25:58
updates the weights or a and b. using this format i i
00:26:02
you just repeat until you run out of your training data
00:26:06
isn't that beautiful so i'm using mutation and i just run this thing
00:26:11
i'm you they these values a and b. until i find them
00:26:15
and then i give them to you and then from then on you have a pure function that just computes
00:26:19
why from x. based on these a and b. a.
00:26:24
okay so now you might ask yourself eric you hands waved your waiter days
00:26:29
how does this all work and in a little bit more detail so that they got like if you
00:26:34
some details here which i think it's very beautiful that um
00:26:40
so why do we use derivatives well as i said we want to minimise this error function
00:26:45
and maybe you remember from high school that you know you can use derivatives to
00:26:50
find the minimum of a function so if you have this project function here
00:26:54
and you want want to find it men and then you gotta look where the derivative is zero right that's a good way to find it
00:27:01
um and here's the got like you know the derivative you learn
00:27:05
this would symbolic differentiation or you can define the derivative
00:27:09
using this thing with actually longer epsilon goes to zero and so you should get like all
00:27:15
remember that from my school correct now the question is how do you find that derivative
00:27:21
and how do we find that inefficient way and this is truly
00:27:27
stunning this the way you do days is is amazing
00:27:31
and and how do we do that by using more mats of course
00:27:37
okay and this is the trick people here remember complex numbers
00:27:45
ideas complex numbers if you do physics or electronics or computer graphics
00:27:51
saw a cat like complex number is some mathematical
00:27:56
thingy that's somebody invented this as o. a. plus b. i. so
00:27:59
it's a pair of two numbers a and b. where
00:28:05
i square it was minus one
00:28:08
you can ask yourself why is i square minus one well
00:28:13
why not right and so now that's to ask another question what if we
00:28:19
didn't say trick we define a pair of numbers a and b.
00:28:25
and just a guess disintegrate we're not calling it i recalling actually long for they got second one
00:28:31
and then we say abstinence where it was zero why not right i mean
00:28:39
we can we can define our rules you can also ask yourself
00:28:43
what about if we do the same thing every instead of like i
00:28:46
and actually take j. i. we say j. squared equals one
00:28:50
good choice to the only great numbers or minus one zero i want all three choices
00:28:56
strangely enough have practical value city otherwise cold hyperbolic numbers you
00:29:02
can go to lead and people find uses for it
00:29:05
but this is the amazing thing so we take complex numbers that we've no for
00:29:08
for how long instead of i square kick was minus one we say
00:29:14
epsilon squared equals zero and what do we get from theirs for because we get
00:29:20
out the metric differentiation but just using dual numbers instead of normal numbers
00:29:26
we can compute in one go if a function and its derivative
00:29:32
it's just mind blowing and now if you want to prove of this it uses taylor expansion
00:29:39
but i think if you look at if you if you go to the
00:29:42
big database for taylor expansion that breed maybe more might blow and
00:29:45
i don't know how somebody can come up with the calf taylor expansion
00:29:48
formula and that's got like just crazy with like you know and
00:29:53
factorial is in some infinite summation but anyway if you want to prove this
00:29:58
you can use these are numbers so that's look at an example
00:30:01
right so here's our function f. that was defined as
00:30:06
an three x. squared plus four that's our ass
00:30:12
and that feed it and again like the reason i like mathematics is my brain is the size of a pea not
00:30:18
and i like to just crank the handle right i just like to do these
00:30:21
computations without thinking so that's his feet as this a plus b. epsilon
00:30:28
so x. equals a. plus b. f.'s also that becomes to re a. plus b. epsilon squared plus four
00:30:34
right just substitute now i'm going to go out like you
00:30:38
know um evaluate a plus b. apps lawn so a
00:30:42
plus b. squared is is squared plus to a. b. plus b. squared so that's what you see there
00:30:47
um no i distribute that three over does some and and a simplified little bit
00:30:55
and now we knew that um the derivative of have was um
00:31:01
and what was it it's a six a and a right or six x.
00:31:08
but here you see the the first term here three a squared plus four that's half of a
00:31:13
six eight that's the derivative of ever apply to eight times be upset on
00:31:18
and then hey because epsilon squared was zero that last year falls out
00:31:24
so if i apply this function to a dual number and i just compute
00:31:29
with it i get back a pair of the function apply to the
00:31:34
argument and the derivative and multiplied by the rest of the thing
00:31:40
isn't that pure magic i mean you won't believe me right
00:31:43
but this works for any function any differential function
00:31:48
and you know this is how you get now
00:31:50
implement uh and computing derivatives and using
00:31:56
operator overloading because we just overload all the operators would this be these dual numbers
00:32:04
and now what's even more beautiful
00:32:09
is that if i can't like the finder function that lives a normal function to a function over dual numbers
00:32:15
that function as a front door and now what does that mean
00:32:19
a phone to relevant or something that distributes over composition and
00:32:23
maintains the identities is like a crazy thing from category theory but if something is a font or it means that it's
00:32:30
it's good it's it's due to the end right so this thing is good and if you remember the chain rule
00:32:37
the chain rule for differentiation is really ugly but the chain role for
00:32:41
these dual numbers is really beautiful because it's just distribution over composition
00:32:47
so these dual numbers are my new best friends and i do it so oh
00:32:55
we can build this magic machine that takes data i and turns it into coat
00:33:01
by giving this think training data we separate the output in the input we pass the input
00:33:07
to this got like you know composition of differential functions
00:33:12
uh we got like a have the parameters to this you can function a. m. b.
00:33:16
then we compare the gap like you know the idea and
00:33:22
derivative of the error function and based on that we update and
00:33:27
the the uh parameters and we just repeat that a long time so
00:33:31
really the only thing that we need here is functions that are differential
00:33:36
and we know how to do differentiation using dual numbers
00:33:41
beautiful easy i know all want the fires no first order logic just simple high
00:33:48
school mathematics that's got like you know the kind of stuff that i like
00:33:52
and now what about neural networks well i don't i think neural
00:33:57
networks are a little bit got like the name is hyped
00:34:01
because it's got supposedly go like modelled on whatever your owns a
00:34:06
we have in our brain but i think the really
00:34:09
it in the neural network the function this composition of differential functions is
00:34:14
of the special form or even activation function and they you do take
00:34:18
the product of the some of the weights with the input
00:34:23
and really why people do this i think is because it involves a lot of multiplication zen additions
00:34:29
and we know in our graphics cards we got got like to that
00:34:33
very efficiently so i think the whole reason that deep learning
00:34:38
and neural nets is is popular or or that we pick these functions
00:34:43
it's because of should g. p. use for me has nothing to do we
00:34:46
don't have like matrix multiplication in our school and but it works
00:34:53
alright so what is differential programming well different about programming is
00:34:59
anything where you write the program using differential functions where these differential functions
00:35:04
don't have to be that have layers of a neural net
00:35:07
but can be more complicated one of the uh presentations the posters used to three l. s.
00:35:13
d. m.s i don't know who that was is that person still in the room
00:35:17
yes so that is a form of like you know in more complicated a differential
00:35:22
function good right so your this is the guy you should talk to him
00:35:27
he is the future he's do differential programming and he's doing get over more complicated structures
00:35:32
then get like just and the arrays of doubles he's doing it over trees and
00:35:40
the job so what's the future of programming number one this use differential programming to define models
00:35:47
okay good and now as i said always ask yourself
00:35:54
can you extract a you make this better well
00:35:58
people these days use light on and a lot of that
00:36:01
by dies and typed and and and strings and whatever
00:36:06
so i think even for the people that want to stay in the old world
00:36:09
of programming languages there are still a lot of stuff to do here
00:36:12
you got like you know make sure that we don't get stock in the world of quite on and and
00:36:17
a dog world and and the other thing i should say is that really
00:36:24
the this world of programming to point those are all the functional programming
00:36:28
and so people like martin has been we've been telling everybody here like his whole life you know functional
00:36:34
programming for you thought about artists hike that the functional programming that's well functional program as well
00:36:41
except it's called differential program and uh alright and so there's like a lot of papers that
00:36:48
you can get like if you have the slide you can use this e. r.
00:36:52
they're sick even this this is that you know from x. a. e. p. f. l. the
00:36:55
ark i think it is easy to use is here all done in skyline and alright
00:37:03
so i think i i pushed a little bit under the carpet that these models are got like it only approximate the function
00:37:11
and but of course this is true for all functions right i said that in the
00:37:15
beginning if even if we ride in a it's got a function or something
00:37:19
it's not the real mathematical function it's something that has side effects and these models
00:37:25
in that um if you are also functions that uh have side effects
00:37:30
and accept that this side effect is represented by a probability distribution
00:37:38
so oh we so from cars and now we see more that's right
00:37:43
well there's a talk about the future of programming without more nets
00:37:47
and so this probability distribution thing happens to be about that
00:37:52
or happens to be i don't know i think this is what like you know whoever got intended it to be right
00:37:57
if it's good then it's about that and before can be more now that has to be a front or and
00:38:04
so what is the probability monad think of it as like a little database um of values
00:38:10
of type a a for each of these values has a probability associated with it
00:38:16
okay so that's that's that my intuition for a probability distribution it's like a database
00:38:22
but each role of the database has some value some weight with it
00:38:27
and then you have to always have we have a more that you have to have
00:38:30
a bind um but to be um said decisions goal that um conditional distributions so
00:38:37
his me which pair brackets be a bar a means the probability of be given a
00:38:45
but that's exactly function right function is like something a function from
00:38:48
a to b. you can say that's a be given a
00:38:53
so a conditional distribution don't get fooled is just a function
00:38:57
that returns a probability distribution with and this thing here
00:39:03
is the most beautiful theorem i think ever invented
00:39:07
and this is from seventy sixty three this
00:39:10
you know magician base was the world's first functional programmer he must have been alien
00:39:17
because this theorem tells you how to take a function there on the top
00:39:22
so it's a function from eight to a probability distribution of b.
00:39:27
and turn that into a function that given a be readership robot edition of a
00:39:33
so the bayes theorem tells you how to invert a magnetic function
00:39:39
and in that it's nobody knew mon x. nobody knew functional programming but there's no it
00:39:45
yeah i came up with this remarkable theorem that shows how you can convert a magnetic function
00:39:52
and a lot of inference is based on bayes rule where you can like read in for this function it's amazing so this
00:39:59
this uh and all of this um with probability distributions you
00:40:04
can do a lot of statistics using bayes rule
00:40:07
and as i said like a probability distribution is a little bit like a database so let's look at
00:40:12
and where where this simple query language so let's look at a probability distribution over playing cards
00:40:18
right so i can say and what is the probability of a card or the rank is queen
00:40:26
you see there's a a a predicate in there and this gives me that the
00:40:30
value of that thing is i get off for out of fifty four
00:40:34
so think of this thing really i say dippy is a database and i
00:40:38
stick in a query and that gives me the answer is this value
00:40:43
or i can say give me the probability of the fact that
00:40:47
this thing is the heart given that the rank is we
00:40:53
well there's four queens there's one queen of hearts or this is like you know one out of four
00:40:59
and then i can say give me the probability that something is the queen and
00:41:04
a hard and that's got like you know there's only one out of the fifty four
00:41:08
so really don't get intimidated by this probability distributions think of these as databases
00:41:14
with a very simple query language and you will be fine no
00:41:18
i think a lot of the literature on statistics is is
00:41:21
overly complicated if you see it in a computer science way
00:41:26
is really simple i read years another got like example
00:41:32
but that's good like you know i want to do close off by showing you how we can use
00:41:39
um machines aren't models to write got so there's a famous quote by jeff the from google that
00:41:44
says like if google would have written from scratch today half of it would be learnt
00:41:50
but what that means is that half of it would be done with these got beautiful machine lowered algorithms
00:41:56
but still half would be that whole pesky imperative got
00:42:02
so well i've not solve your problems yet but you still have to get half of
00:42:07
your code will still be imperative goes i'm giving here example how that will work
00:42:12
say i want to build this killer apps i heard that like you know the the apps
00:42:17
stores like whatever hundred billion dollars of revenue i want to get get in there
00:42:21
so i'm going to write that have that will do the following you take of snap
00:42:27
a picture of an animal and it will tell you whether this animal respectable
00:42:32
okay buddy again petted and of course if you have a a meat eating plant i don't know if that's an animal that
00:42:39
it looks fishes you actually i wouldn't bet it or if you have a crocodile i wouldn't bet it
00:42:45
but if you have like a chick let you get back to it okay so this is the
00:42:48
apt that we're going to build that this is going to go make me filthy rich
00:42:53
and that the average of the seller uh the average well there will be a hundred million dollars here that
00:42:59
is apt will do that for me uh for us i should say because uh our average goes up
00:43:06
and so the first thing is is is what i do is i train a neural net to take in the picture and it
00:43:12
will give me a description of the animal so i feed in a picture of a big and it will say big
00:43:19
with a probability point eight eric meyer with probability
00:43:23
point two or something like that and
00:43:27
so that's the animal detector now i need to have a text and allies are which is
00:43:31
a different model which i give it a a and a description of the animal
00:43:37
and from that it will learn or it will understand because it can
00:43:41
do natural language processing better this animal is safe to pat
00:43:45
and then of course i need to get like you know or have a way where i can get from the description of an animal
00:43:50
to the text and how do i do that well we're using a web servers goal
00:43:55
to be could be yeah but that web service call will return a future
00:44:01
right because it's small go all over the web is also now we have a program
00:44:05
that requires probability distributions and futures so in order to know well there's together
00:44:13
i think a picture i busted or my a. c. and and i get the probability distribution animal
00:44:19
i want to push that animal and today we could be their web servers goal but hopes the types don't match
00:44:25
i get the type error there and then i get a future attacks that i want to put in this other
00:44:31
and the text um natural language thing but again it is a future and i need text
00:44:39
so how do i do this how can i write code that kind of combines
00:44:44
machines or models bit back service calls and and turn those into apps
00:44:51
so for that we need a second thing and that's cold
00:44:54
probabilistic programming so progress to program and is old fashioned
00:44:59
imperative programming except where we had probability distributions as
00:45:05
first class a factor first large values
00:45:08
so maybe some of you know that there's this gal library for
00:45:12
that but javascript to support for that darkness aboard for that's
00:45:15
the sharpness of or that for a single weight if you have a future you can get like a way to future
00:45:21
so probabilistic programming languages nothing else where you can sample from a distribution so
00:45:26
it it takes these probability distributions and turns them into first class affects
00:45:32
which means that we can now write code that does this
00:45:37
so here uh with this code look like and so what this thing readers
00:45:42
is a future of a probability the distribution of that um and then
00:45:48
we will still have to kind of collapse that but that's got like you know a separate problem
00:45:52
and so what we can do is we can say given the picture we feed identity
00:45:58
and they're all that we get back a probability distribution we
00:46:02
sample from the distribution which gives us an animal
00:46:05
replies that into we could be the ah we await that value to you getting value from the future
00:46:12
and then we take that text box and the other model which gives the probability distribution and then we
00:46:17
can like that tells us whether there's things but the ball and the implementation of this and um
00:46:26
of the sample and thinks that what it does it uses base rules under the covers
00:46:32
and it multiplies the probability so basically it got runs this
00:46:35
computation several times of course it tries to and
00:46:39
cash the um as much as possible including this uh wait if you're gonna trying to get like look up the saying i'm an
00:46:46
animal twice a week to be it only does it wants yeah but basically run saying well the garlic simulation of this program
00:46:53
and at face book we build this thing on top of speech b. and which is and you get i think maybe
00:47:01
crazy but again like that's what we do but i think there was a very old paper from many years ago
00:47:07
where there was a probabilistic programming language also in skyline it's everywhere and i think this is what we need
00:47:14
uh have as the second thing for the for the future so the first thing was use machine learning to define models
00:47:21
and then in order to compose these models into actual
00:47:23
code we need probabilistic program with okay so i'll
00:47:30
programming the program of the future is neural nets plus probabilities
00:47:37
and i i don't know what's how much time do i have ah i'm i'm over time but i want to show you one more
00:47:45
small thing because i'm addicted to duality so i want to show you
00:47:50
one more application of duality another way to get like you know
00:47:54
to show how deeply mathematically intimately connected
00:47:59
probabilistic programming and neural nets are
00:48:02
so in these neural nets what we're trying to do is we're trying to minimise the loss function
00:48:08
right we were using the derivative to find the minimum of this
00:48:11
got was function and use that to update our weights
00:48:17
what if instead of minimise it was function we try to maximise a probability
00:48:23
okay minimising something maximising the other picture this is another example of
00:48:27
duality sure let's use that here and as i said
00:48:31
you can look at probability distribution as a little bit of big small databases so that we
00:48:36
can write the query so and this is another way to ride is linear regression
00:48:44
but not using and machine learning not using training and updating
00:48:48
weights but using probability distribution so what we do
00:48:52
is we guess a and b. so we take a and b. from
00:48:55
some distribution usually a gaussian but i just guess them at random
00:49:00
then id find this function have to be a h. plus b.
00:49:05
then we have to model the noise the error that the the the the rats things that were in the graph
00:49:11
and then i just got like you know jack in the training data and i can try to get like find
00:49:18
all the a.'s and b.'s search that like you know um this thing is is minimised
00:49:25
and then i select the actually this is a function that returns a probability distribution of functions
00:49:31
and if you go go for basie and enter regression this is what you find
00:49:35
but this is also kind of really nice right i have a function
00:49:39
that returns a probability distribution of functions that's pretty mind blowing
00:49:45
and now you might not believe me that this is actual executable code uh if you
00:49:52
go on the web there's a language called by people wrap p. p. l.
00:49:56
um and if you write this program here which looks very much like this program
00:50:01
and you run it it will tell you that and the value for
00:50:06
b. or a is just like eight but with this distribution where
00:50:11
the machine art program would just give you like one value here
00:50:14
just like i'm uncertain about it so it's like this distribution
00:50:19
alright now i think we saw a couple of posters um about the challenges of machine learning
00:50:26
so there's things that like you know given the model you might be
00:50:29
able to extract the original training data which might leak information
00:50:34
and there's many kept like you know unsolved problems and
00:50:38
but this is to be expected right so for the normal so function in process for imperative programs to be
00:50:46
we have had fifty years to guess solve a lot of these problems for and for this machine learn things
00:50:55
we're still uh we have a lot of other problems so it's like how do we build
00:50:58
them how do we train them how do you put a model under version control
00:51:03
well really should put the data into version control how do you get like a it's a whole
00:51:08
bunch of things how do you debug these models how do you understand what they're doing
00:51:12
and so i got like try to paint a kind of rosy picture but
00:51:17
that this is like you know full employment to your room for
00:51:20
for us computer scientists because all those problems are and soul so
00:51:24
that's why this little person hears laying on the floor crying
00:51:30
so oh my last slide this in desert is near soon
00:51:38
computers will be able to programme themselves without our help
00:51:42
but i hope that this snake won't bite itself ankle itself before that happens and we get the seconds
00:51:48
and a i went there but so far i'm optimistic and and i really got like you know i'm
00:51:56
looking forward to the future where we just pump data into this thing and our code comes out
00:52:01
and uh that would be awesome because then we as humans can spend more time you know
00:52:08
drinking beer or hanging out making music whatever and that the machines do the work thank you so much
00:52:22
i
00:52:26
three and four persons
00:52:36
uh_huh
00:52:39
if not then huh so thank you for a a in fig talk
00:52:44
there was a interested about this uh oh percentages of arts are
00:52:48
part of the software infrastructure might be learned through art
00:52:51
a return so where are we now could you give some sort of intuition for uh oh what a
00:52:58
of course and concrete applications what what can we expect to train what are
00:53:03
of softer functional did we expect to have to write manually or two holes for yes so that's a good question so
00:53:10
if you look at for example your mobile phone right so like you know there you can like when user camera
00:53:15
it's got like shows like you know the faces in the crowd like you know in the picture as your suit your photo
00:53:22
so that is kind of like you know a a in an example
00:53:25
where you know dad got little square box that recognises the face
00:53:31
is learns but it's got like i bet it into got like you know traditional or yeah it to you why that kind of shows that
00:53:37
and other examples are and think of things like you know where
00:53:43
location and i was at the talk at at and left
00:53:47
so the thing is when you have like a left or you better
00:53:50
and and you you uh ask for a for a a ride
00:53:54
it will tell you the estimated time of arrival now of course
00:53:59
if the estimated time of arrival says five minutes and it takes ten minutes that the customers very unhappy
00:54:05
and so that's got like you know an example where they are
00:54:08
using the get like you know i'm a model together predict
00:54:12
and the estimated time of arrival but again it's embedded into kind
00:54:17
of a regular and draw it or iowa zap so
00:54:20
i would say well i i'm not sure that we're at fifty fifty
00:54:23
yet but i think you know this is quickly and increasing
00:54:38
so oh also i i want to ask some
00:54:40
problems the programming the tombs program one company
00:54:44
it is my understanding between is the easiest stops it'll probably stay one t. v.s oh
00:54:51
but it isn't exactly like that isn't because in probably is the
00:54:55
problem can i'm sticking is more like just some old bits
00:55:00
uh_huh sort and you know sometimes of the whole comes but may not at
00:55:04
all reached between his teeth on e. mail it it's it's not exactly
00:55:10
let the terms the cell simple mistake but between these these one
00:55:14
case probabilistic these kind of collins meetings often bigelow so is
00:55:18
there a lot and we didn't cool still easy actually getting calls always thinking too much because it was a little
00:55:24
you know it is just um you know probably think is gonna eat is should all be based
00:55:28
on your data you know he may or may not happen it's okay so it was um
00:55:34
uh let me get go back to this one here so here's got like you know an example of
00:55:41
like you know where when you can't like computes
00:55:43
the parameters of this model using probabilistic programming
00:55:48
it will give you this distribution so it will tell you the value of like you know it be
00:55:52
a. e. of a expose b. is eight but it's
00:55:57
got like you know you're not hundred percent certain
00:56:00
right where in the other case we used people earning you're trying to get like you know
00:56:04
you say under to minimise the error so you take the thing with the maximum probability
00:56:09
so this is one of the people often do is they pick the value with the highest
00:56:13
probability if they want to go over probabilistic model to would like it deterministic model but
00:56:19
what you can see is like a this is a very simple distribution but you might have distributions that
00:56:23
have like you know maybe two values with the same probabilities are then how do you pick those
00:56:29
and but i think it's not that much different than with any monad where you have to have like an unsafe operation to
00:56:35
get out of the mounted so like you know it if you're in the in the aisle more that order state monad
00:56:40
you have to have it and save way to get out and to really do go out of the
00:56:44
probability monad it's unsafe that that would be got like my intuition for that i think so
00:56:53
it
00:56:56
okay
00:56:58
okay i have a question of ah right on it
00:57:05
yeah um i don't know question about um that at the present moment we have some concern now
00:57:12
or not understanding why the models are meeting at this meeting and so in this ah approach
00:57:17
that you're suggesting or even for the rehearsals on the model building process i wonder
00:57:23
is that going to help or or actually heard is trying to do i understand what he's
00:57:28
oh so excellent question so that's again like you know i i just want to point here this this little person
00:57:34
there is a lot of tears and he's in a lot of agony and and so yeah so and
00:57:41
understanding what happens is kind of like you know why these things happen is
00:57:46
is kind of like you know a big over problem i think and
00:57:51
no people are trying to say is like oh maybe we should uses get different got a mobile decision trees or something because they are
00:57:57
kind of explainable i'm not sure that that's the case uh decisions
00:58:01
reuse it got giants like you know long nested conditional and
00:58:07
there are other ways to get that people are trying to get like you know do this by giving you can't look inside and
00:58:13
what happens in between the layers and so on and but yeah this is is a big open problem and again like
00:58:19
this is what i meant i mean by this right okay hold this thing doesn't quite itself
00:58:24
and got like you know accidentally kills itself because of these problems and but on the other hand
00:58:33
and maybe this is like a non answer but if we built a really complicated distributed system we also
00:58:40
really have a hard time to understand like you know how it works or like you know
00:58:45
or debug it or understand it and so in that sense i
00:58:50
think it's a little bit i i can ah i'm
00:58:53
not kind of like trying to get like blow you off but i do think that for sufficiently complex imperative goat
00:59:00
um you will have a hard time also to get like you know to understand exactly what is
00:59:04
do if i if i ask you to go explain to me the the linux kernel ah
00:59:09
that will be hard thing or this got like a buyer discovered that sector you will
00:59:13
have a hard time to do that so you'd rather than to read through
00:59:20
yeah
00:59:29
will it cause they record it sorry um

Share this talk: 


Conference program

Welcome address
Andreas Mortensen, Vice President for Research, EPFL
7 June 2018 · 9:49 a.m.
Introduction
Jim Larus, Dean of IC School, EPFL
7 June 2018 · 10 a.m.
The Young Software Engineer’s Guide to Using Formal Methods
K. Rustan M. Leino, Amazon
7 June 2018 · 10:16 a.m.
Safely Disrupting Computer Networks with Software
Katerina Argyraki, EPFL
7 June 2018 · 11:25 a.m.
Short IC Research Presentation 2: Gamified Rehabilitation with Tangible Robots
Arzu Guneysu Ozgur, EPFL (CHILI)
7 June 2018 · 12:15 p.m.
Short IC Research Presentation 3: kickoff.ai
Lucas Maystre, Victor Kristof, EPFL (LCA)
7 June 2018 · 12:19 p.m.
Short IC Research Presentation 5: CleanM
Stella Giannakopoulo, EPFL (DIAS)
7 June 2018 · 12:25 p.m.
Short IC Research Presentation 6: Understanding Cities through Data
Eleni Tzirita Zacharatou, EPFL (DIAS)
7 June 2018 · 12:27 p.m.
Short IC Research Presentation 7: Datagrowth and application trends
Matthias Olma, EPFL (DIAS)
7 June 2018 · 12:31 p.m.
Short IC Research Presentation 8: Point Cloud, a new source of knowledge
Mirjana Pavlovic, EPFL (DIAS)
7 June 2018 · 12:34 p.m.
Short IC Research Presentation 9: To Click or not to Click?
Eleni Tzirita Zacharatou, EPFL (DIAS)
7 June 2018 · 12:37 p.m.
20s pitch 1: Cost and Energy Efficient Data Management
Utku Sirin, (DIAS)
7 June 2018 · 2:20 p.m.
20s pitch 2: Gamification of Rehabilitation
Arzu Guneysu Ozgur, EPFL (CHILI)
7 June 2018 · 2:21 p.m.
20s pitch 4: Neural Network Guided Expression Transformation
Romain Edelmann, EPFL (LARA)
7 June 2018 · 2:21 p.m.
20s pitch 5: Unified, High Performance Data Cleaning
Stella Giannakopoulo, EPFL (DIAS)
7 June 2018 · 2:21 p.m.
20s pitch 6: Interactive Exploration of Urban Data with GPUs
Eleni Tzirita Zacharatou, EPFL (DIAS)
7 June 2018 · 2:22 p.m.
20s pitch 7: Interactive Data Exploration
Matthias Olma, EPFL (DIAS)
7 June 2018 · 2:22 p.m.
20s pitch 8: Efficient Point Cloud Processing
Mirjana Pavlovic, EPFL (DIAS)
7 June 2018 · 2:23 p.m.
20s pitch 9: To Click or not to Click?
Eleni Tzirita Zacharatou, EPFL (DIAS)
7 June 2018 · 2:24 p.m.
20s pitch 10: RaaSS Reliability as a Software Service
Maaz Mohiuddlin, LCA2, IC-EPFL
7 June 2018 · 2:24 p.m.
20s pitch 11: Adversarial Machine Learning in Byzantium
El Mahdi El Mhamdi, EPFL (LPD)
7 June 2018 · 2:24 p.m.
Machine Learning: Alchemy for the Modern Computer Scientist
Erik Meijer, Facebook
7 June 2018 · 2:29 p.m.

Recommended talks

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
4 July 2016 · 2:01 p.m.
How technology is opening up new potential for democracy, participation and collaboration
Gareth Morlais, Welsh Government, Cardiff, Wales
19 Jan. 2016 · 11:06 a.m.