Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:00
a welcome everybody i guess there are a few more people coming in but we can already get started
00:00:06
so this is really exciting for me to stand
00:00:09
here and two thousand nineteen for the tenth edition of
00:00:13
scallop days uh you know because back then in two thousand ten when we had the first gala days
00:00:18
here at e. p. f. l. was just a young p. h. d. student in martin's lead trying to
00:00:24
you know do my first research contributions that my first papers published and so on so for this talk
00:00:30
you know i want to give you a little bit of an overview of some of the old work with it and how this has has actually
00:00:36
you know been very very important for some of the even more recent work we did
00:00:40
uh on a you know deep learning and data processing accelerates
00:00:45
so let's go back even one year for the two
00:00:49
two thousand nine so really ten years ago so uh
00:00:52
i had just been and martin's that for you know about one here and uh you know done my first work
00:00:58
on uh on the seller compiler and uh things on the language and i managed to get my first paper
00:01:04
published at i. c. f. p. which is an excellent converts right so this was you know super exciting for me
00:01:11
and uh so this is the paper and uh i'm just trying to distill the the
00:01:16
key ideas yeah right so i call backs are pervasive in many programming marts right so
00:01:23
take anything like java script right or uh you know other systems where you have asynchronous
00:01:29
computation right so often you have things like this set timeout call here right so you
00:01:34
you write a statement i want to do something and then i want to wait
00:01:37
right but i can't just say wait but i have to say okay you know
00:01:40
after you're done waiting call back this function and then continue rights and you can
00:01:44
imagine if you have multiple of these things you get deep nesting and so on
00:01:48
so that's not very pretty right so we're looking for ideas
00:01:51
how to actually bring that back into all direct style and
00:01:56
uh what we figured out in this paper is that you know we can leverage l. your work from the p. l. community
00:02:02
and uh um which is based on continuation passing style right and there are these control
00:02:07
operators one of them's called shift like you see here on the slow on the slide
00:02:12
that have been studied in the theoretical way and implemented in some languages using various means um
00:02:18
and we can use that to essentially capture the the rest of the
00:02:22
computation right so we can capture this uh everything that follows to sleep call
00:02:27
by implementing sleep to use this shift operator here right and then does return
00:02:32
function that becomes an argument to shift essentially kept just everything that falls okay
00:02:40
and now what how do we do that if we uh want to
00:02:42
implement this in the language likes gotta well the idea is relatively simple
00:02:47
we just tracked the information and the type system where we are using these control operates okay
00:02:53
and if we do that then you know we can handle things like the set time out and we
00:02:57
can do even more complicated fix right so basically anywhere where we have call backs we can hide them
00:03:03
by using this type driven transformation right so here's an example this
00:03:07
was actually from my studies talk in two thousand you let them
00:03:10
well we show that we can build a uh you know library of suspend the ball collections
00:03:15
well we can uh you know fire off asynchronous requests in parallel
00:03:20
and then once this for comprehension returns were basically synchronising all the
00:03:24
call backs and uh you know once we have all the results
00:03:27
different product queries here you know we proceed with doing what
00:03:32
comes after that right so i challenge all of you implement this
00:03:35
using using call backs directly and get it right right is at least gonna be like hundred two hundred lines like that more right
00:03:41
relatively simple idea very powerful and i'm going to show you in the uh
00:03:46
in the second part of the talk how this matters for a neural networks
00:03:53
so fast forward to two thousand ten right so this was around the time of the false
00:03:56
colour days i actually managed to get my second pickup arched and uh this is also the
00:04:03
i like to think very neat idea rights again using scholars type system and introducing
00:04:08
a kind of symbolic type that we called rat of t. and we can use that
00:04:13
to essentially built a a you know a system of symbolic data types
00:04:19
right so you can use implicit classes or something to say well whenever i have a
00:04:22
rubble double you know people implement all the operations i have on doubles using operator overloading
00:04:29
and uh then i can implement in such a way that i'm creating in new computation node like an expression
00:04:35
so this uh this uh plus here on the slide
00:04:39
right and i can insert that into a computation graph
00:04:44
and then essentially i can recall all operations on the symbolic double objects
00:04:48
and you know in that that uniform that uh i would like to run
00:04:53
so we could benefit now is that you know based on those rat of double thing i
00:04:57
can do about data types right i can build a data type of complex numbers for example
00:05:02
and the net effect is that any computation on complex
00:05:05
numbers will not be reflected in the computation graph i compute
00:05:10
as complex numbers but only on the double components
00:05:14
right so i'm removing an entire layer of abstraction here
00:05:18
so the slogan also is abstraction without regret it if we have complex numbers we might want to do something like fast
00:05:25
fourier transform right and you know if you take out the
00:05:28
text books and look for efficient implementations it's all very complicated right
00:05:32
and here's really like the text book only took a recurrence implemented as a recursive functions colour
00:05:38
and with some additional re writings on the graph level what this would produce is the uh
00:05:43
you know butterfly computation graph that you will find in like uh you know textbooks for low level
00:05:49
computation and so right and you will notice that there are no f. p. modifications which uh cost
00:05:53
the anode temporary data structures and so what's all the traits of complex numbers have been completely removed
00:06:01
so paper number three and this is actually you know that's the the fun volcano
00:06:05
story about scott is two thousand ten right so this third paper was actually written at
00:06:11
stella days when our collaborators from stanford we're stuck in our e. p. f. l. meeting room right so we had
00:06:16
nothing better to do so we said oh no that's that's right a paper okay and this is another key uh
00:06:22
idea here right so with this uh you know rap of t. approach an operator overloading and just recording operations
00:06:29
there of course certain limits and what you can do right and for example if you
00:06:33
look at this little piece of code here right so you have variables you have conditionals and so
00:06:37
on right and also not things that are traditionally um you know i mean a boat operator overloading
00:06:45
so the idea of this paper is that you are or what but we
00:06:48
can just really find all of this as method clots i'd know something's got up
00:06:53
from the beginning did for for comprehension zen other things right so we said well you know
00:06:57
why not just apply this to other language constructs as well right
00:07:01
and uh so we did that you know kind of redefined in a
00:07:05
a branch of the star compiler we really find things like variable declarations as method calls and you know if then
00:07:12
if condition else as a primitive becomes a method call unless goddess custom else and so on
00:07:17
and if you do that then you know you can all of these things right
00:07:21
and you can do symbolic computation on rap types on essentially arbitrary fragments of stock up
00:07:27
at the benefit here is that you know you can still makes
00:07:30
like you not present stage computation with the computation that you're lifting
00:07:35
into a computation graph right so it's more powerful than just you
00:07:38
know coding and a. s. t. and then trying to to our last
00:07:44
okay so since those days you know i i graduated at some point right that but
00:07:48
other research and uh since two thousand fourteen number professor myself at purdue university and uh
00:07:54
us right so most people don't know where that is right so we're in
00:07:59
the you know beautiful colours start of west lafayette right in the middle between
00:08:03
chicago and indianapolis and we have a pretty active uh programming languages research group
00:08:08
their rights and total something like ten faculty and then a commensurate number of students
00:08:13
yeah and uh you know a lot of all focus is
00:08:15
actually on p. l. ideas apply to a high and a security
00:08:21
and this is actually something i want to talk about them the rest of
00:08:24
my presentation how these score p. l.
00:08:28
ideas apply to machine learning and data processing
00:08:32
right and one of the key take away says that you know machine learning and then they i
00:08:37
is really both a data management and then a p. l. prop but it's
00:08:41
not only about model construction was really about you know doing things at scale
00:08:48
so in fact is most of the time you know we we need to combine different systems and
00:08:52
different uh you know means of computation let's imagine you have some data coming in from various sources
00:08:58
and most likely you will want to do some you know e. t.
00:09:00
l. style processing and uh your data centre using tools like sparkle fling
00:09:06
right and then maybe you want to train them all using tend to flow or maybe you just want to use some three train
00:09:11
model s. classifiers even on your mobile devices right so so the
00:09:15
key question is okay how do we make this whole pipeline efficient k.
00:09:20
and we can start by looking at individual components in saying yeah okay what are the bottleneck see
00:09:26
so let's take a look at sparc forced right so this is
00:09:29
actually a sickle query from one of the standard benchmarks one of the
00:09:32
simplest one right and we can use use sparks equal to wrap
00:09:36
it up and it'll take about like twenty four seconds k. on the
00:09:42
two gigabyte at us so we can think about is okay well is this fastest the slogan we improve
00:09:46
this a little bit and one way to do that is to you know just like the same thing see
00:09:52
so it's a question to all of your soul is the spark really takes
00:09:56
twenty four seconds how long is the c. program gonna take anybody have an idea
00:10:03
two seconds i get two seconds okay anyone below that twelve
00:10:11
twenty
00:10:13
okay about one point two seconds right so you can see that's part is twenty x. lower then handwritten
00:10:20
see if we run that a single core and she liked and some kind of upper bound on the
00:10:27
the the performance we get because i if it's slow even on the signal
00:10:31
corps i there's no way we can't really catch up just using more resources
00:10:36
and so you know we can do some profiling and figure out yeah where's the time actually spend and what we will find is that among all that
00:10:42
time that spark spends on this little query only twenty percent is actually doing proper computation right the
00:10:48
rest remaining eighty percent is just overhead light because it needs to convert data between different formats pants or
00:10:56
so what we did is uh you know we looked at the sparc architecture and there's a multiple different layers here
00:11:05
i use this one okay good and basically what we did is we you know replaced all
00:11:12
the slow layers with all all on cory and shit i don't the system is called a flare
00:11:18
and i want to show you a little bit of how this is
00:11:20
actually built right so we're taking great plans from the catalyst popped a miser
00:11:24
and then running them through on pipeline which generates native code which essentially
00:11:28
runs exactly the code i showed you and this is the season of that
00:11:33
so how do we built a tool like this i'd so you know we're
00:11:36
taking creep lance is essentially relational algebra so let's think about how we can implement
00:11:42
the simplest implementation really is an interpreter for these relation other body
00:11:46
i'd and it all starts with this uh you know record
00:11:52
class right we have a set of fields and we have a schema that tells us okay what are
00:11:57
the names of the fuse and then we can do for field look up and so right and uh
00:12:03
you know then we have and a function that we call x. like op which takes a query operator
00:12:08
is an argument and a call back that is
00:12:11
invoked whatever you wreck caught that this operator produce okay
00:12:15
and operator can can be things like scans of filter projects will draw lines and so
00:12:20
i don't basically for each of these operators we need to define what it means okay
00:12:24
so scan operate in this case we just load the c. s. v. file and invokes a call back for every
00:12:30
aligned that we read text appears record um if we run the filter then we have to call out to all parent
00:12:37
operator that produces a stream of records right and then inside the call backs we just evaluating
00:12:42
this predicate and if that is true we pass a record on otherwise we just uh nothing
00:12:47
same thing for projections we just like you know shuffle the fields and uh you know we can also implement joins and so on and so
00:12:54
so was this little interpreter right we can think about yet but how
00:12:57
we turn that into re code generator kind because we want it native code
00:13:03
and uh you know here's where we go back to the literature right to
00:13:07
you know this is the the two thousand ten paper i was talking about earlier
00:13:11
which introduce this like but not module staging or an s. framework
00:13:15
well we have this hierarchy of rap types which allow us
00:13:17
to essentially record operations that are down on certain data types
00:13:23
so what does a rub off to construction right we we just replace the
00:13:28
um the components off the record class that i showed you i'd
00:13:33
so instead of just saying well all fields or collection of strings
00:13:36
now this become a collection of right of scripts right so that
00:13:40
means we're recording all the operations that are down on the individual
00:13:44
oh oh record feels but at the same time completely
00:13:46
removing this overall record abstraction ice on the generated code there
00:13:50
won't be any records in the same way as there were no all complex numbers in the f. f. t. case okay
00:13:57
and the rest of the code is you know essentially can remain unchanged right
00:14:01
because we're sort of pushed the code generation into the leaves of the data structure
00:14:07
so this implementation of a hash joins right in this is the actual
00:14:10
code we can we can run right so there's a hash map after class
00:14:14
and then we call acts like all on the uh left argument of the joint right put everything
00:14:19
into the hash map and then which was the other side of the joint and look up all the
00:14:24
elements and the hash map and produce a combined topped i'd so this is a straightforward implementation of the hash join
00:14:30
and it'll generate this kind of low level c. code where you can see that they hash
00:14:34
a hash table lookup a code actually has been generated as a a for loop
00:14:40
which really the only uses low level operations right so this is going to run very very fast
00:14:47
i um
00:14:49
of course a lot of the magic is hidden in this hash map upper class white but even that
00:14:53
is relatively straightforward right so we can define a hierarchy
00:14:56
of collection classes which use these stage right types underneath
00:15:00
right so the key component here will be those are able for um which is a component of the hash
00:15:05
map task right and then uh we can even have inheritance and about this calamity dirty features to you know
00:15:11
program does as if it were really just a library i'd but and i knew that will generate c. code that you know is quite efficient
00:15:19
so how do you use a system like this you know this is all
00:15:22
uh you know reusing the standard spark front and so basically where we would normally
00:15:27
called something like sparkled sequel and then give it a data frame or or sickle strain
00:15:32
and then call a dot show on it right we just haven't a. p. i. that kind of mirrors that
00:15:37
where we can say flare off a data from object and then talk show and
00:15:43
it was sort of a you know very transparently it will run many times faster
00:15:47
okay and uh of course i showed you only one simple example here
00:15:51
but uh you know we can run the full set of benchmarks on uh
00:15:55
this one is still single core right but you can see when we're comparing a pasta grass
00:15:59
and spark and flair and a hyper which is kind of the state of the art in
00:16:04
database query compilation from the database community then you can really see that as these two kinds of systems the
00:16:10
the slow ones with just organ pastas and then the fast ones which are hyper and
00:16:14
play it and note that this is a lot scale right so this is not just
00:16:18
you know thirty percent faster but these are you know multiples or orders of magnitude
00:16:23
okay and uh you know since generating cools relatively easy in this setting
00:16:29
we can push code generation also into data loading for example i'd
00:16:32
so we can implement optimised loaders was use the files and apache parquet
00:16:37
and uh again there's lot scale so you can see that we get something like three
00:16:40
orders of magnitude entry and speed up if we optimistic or execution and also the day and
00:16:46
i showed you has maps right so you know realistic database is also need other kinds of index structures
00:16:52
for example string dictionaries to make lookups more a fish and we
00:16:56
can do that again we're sort of in the same uh you know
00:17:00
beating or matching the performance of the best systems from the database community
00:17:05
um of course we also want to penalise right so we can their allies on the single large a machine
00:17:11
and uh so this is interesting because you can actually see that you know spark appears to have very good scalability
00:17:17
um but this is largely due to internal overhead that trivially power
00:17:21
lights right so you can see that flare actually um you know
00:17:27
it used to hit a scaling limits
00:17:32
they're right uh we will like to you can see the the bar going down by two seems like it's
00:17:38
a bad thing but does actually means that we're operating at the hardware capabilities of the machine right so here
00:17:44
we're using multiple c. p. u. sockets into going one beyond one c. p. u. socket which means that we're getting my facts
00:17:50
and if we don't itemise for this day and uh we would expect to see us all right so spark
00:17:55
doesn't doesn't show the slow down because uh you know it's too uh continue skating because there's so much that
00:18:01
but there's also indicates that you know for large scale of
00:18:04
machines we need to do additional optimisation is and in particular
00:18:07
uh these i've read printing and uh you know petitioning or
00:18:11
data structures in a way that makes use of multiple memory regions
00:18:15
so when we do that we can actually leverage the aggregate memory bandwidth in a more efficient way
00:18:20
and for something like a seventy two course on a single machine across force you
00:18:24
to use sockets we get speedup so fifty six which i believe are are pretty good
00:18:30
of course we also uh i need to go distributed right so after all we want to run stuff on clusters
00:18:34
i don't have performance numbers for this yet because this is still work in progress um but we are actually pushing down on
00:18:40
the the code generation ideas all the way into the how
00:18:43
to fight system to eliminate multiple layers of overhead in those cases
00:18:50
similarly i showed you uh you know results relating to spark um
00:18:54
since we have a pretty generic relation agree and and we can also
00:18:57
apply that to a a pet you think as a front it right and
00:19:00
uh you know if link works slightly differently it has different kinds of operators
00:19:04
and so on but basically you can see the same kind of speed ups
00:19:07
um although it's it's interesting to note that even in the you know the full configuration
00:19:13
fling here appears to be uh you know something like um more eighty percent or
00:19:17
more maybe like a little little bit fast so then then sparked by by default
00:19:24
so there's a system we've built with students over the last couple of years
00:19:28
it's not open source currently but we have a private beta program run right so
00:19:32
if you put your company or your your group is interested in uh you
00:19:36
know running this and you know maybe using it to accelerate your spark of link
00:19:41
workloads then um there's a website there's a link uh to sign up
00:19:45
and uh you know you can also talk to me after the talk
00:19:50
good so this is like you know data processing in various ways um i also promised use
00:19:55
some results on machine learning so one of the key things now as well we want to
00:20:01
we want to integrate sparkle systems like kinds of low and have these enter and pipelines i'd say
00:20:06
he's an example right maybe i want to you know fire up a sequel query using spark sickle
00:20:12
and then i want to use a trained model as a u. b. f. i don't wanna be able to say okay
00:20:18
we select something where classes find yours cluster and that's fine
00:20:23
use cluster is actually a model that has been trained using fanciful
00:20:28
oh and uh uh so we did this and uh it's actually
00:20:33
um it's actually you quite simple right so tens of low has this axle
00:20:36
a compiler right which can generate native code and we can generate native code from
00:20:42
i'm a spot creep lands right so we can just link them together right we just need to be clear about
00:20:46
what the interfaces are and then we have an enter and pipeline that includes both the classifier and the underlying data processing
00:20:54
okay and uh this is all python right because in some sense you know we don't really care what the front end thing
00:20:59
like we just need a query plan it depends uh from all and then we can do all of that
00:21:04
and so the one thing we provide here is this a flare adopt u. d. f. dot t. of compile
00:21:10
a function that will register a chance of flow model as a you
00:21:14
know being complied of x. l. a. and then load it into fear
00:21:18
and uh you know if we do that you know there's also speed
00:21:21
ups of multiple orders of magnitude because the d. for execution path is
00:21:25
just really really slow right because you need to cross the hyphen and
00:21:29
j. v. m. boundaries basically for every topple ah that you're invoking the classifier
00:21:35
okay now of course the question is well you know tens of low is the um you know
00:21:39
has its limitations and in particular it doesn't do very well with uh because of models i'd so
00:21:46
we build a system that is called lantern uh which is really based on the idea that we want
00:21:52
to have full scale differential program i'd and this
00:21:55
is something that uh you know many people including uh
00:21:59
young becomes a turing award winner and also under the receiver of
00:22:02
an honorary doctorate from e. p. f. l. has been arguing for
00:22:06
that the traditional view of neural networks which is kind of this organisation into layers
00:22:11
it's kind of outdated i'd and if you look around and on a you know pro polls no network architectures
00:22:18
and this is the figure from last year right so this year that would be more things right but you can see that you know
00:22:25
no still kind of layer slide but they're all shortcut connections right
00:22:28
residual connections and resin that's their inception modules are because of no networks
00:22:34
and so on and so that's how can we make use of this like a you know
00:22:39
larger variety of different architectures in a way that is
00:22:41
still of efficient while being more expressive than what we have
00:22:46
okay so the basic idea boils down to that we have a function that takes
00:22:50
some argument x. produce result why and this parameter rise by a set of weights okay
00:22:57
and uh as a training process we want to optimise these weights
00:23:02
to minimise the observed arrow on the set of training data i basically a set of x. bypass
00:23:08
and the standard way to do this in machine learning all
00:23:10
deep learning is uh you know this this algorithm called stochastic gradient
00:23:15
descent i'd so we we know we computer gradient and then we
00:23:18
move a little steps in the direction where we minimise the r.
00:23:23
but that of course means that we need to compute gradients all the relatives for essentially
00:23:29
other three cool if we want to take this idea series okay and everybody learns how to
00:23:34
do this for individual functions in high school right so these are like standard rules you can
00:23:40
look up in any math textbook but how to do this on real program code which includes
00:23:45
like loops and functions and and mutable state
00:23:49
this is not really so trivial okay and uh
00:23:55
actually simpler implementations are a relatively straight fall right so
00:24:00
there's there's a idea which is called followed mode automatic differentiation
00:24:04
which is based on the idea that we can just compute d.
00:24:08
i'm normal value and the tangent or derivative at the same time
00:24:13
so we introduce a class now um that paris value x. with
00:24:17
its tangent d. and then we all the load all the operations
00:24:21
uh you know like plus ten times using basically the standard differentiation rules okay
00:24:28
and if we do that then we're able to compute derivatives for essentially out record so we still have to provide a
00:24:35
uh you know first class gradient operator here that that takes a function from non to nominate produce the function from
00:24:41
double to double i'd initialising the argument with a derivative of
00:24:45
one because you know that's what the what the rule sets
00:24:49
um and so you know we can implement those in like five lines of schuyler and actually run this why don't we
00:24:54
can vary five that if we write a function like a on the slide here to x. plus x. q. pen computing radiant
00:25:01
and uh then we can run a set of test and verify that this
00:25:04
is actually the the rule that if we would have computed by hand basically okay
00:25:10
so now the problem with this for what mode automatic differentiation approaches is that it's inefficient if we have
00:25:16
a large number of the routers all parts of the religious we need to compute at the same time
00:25:21
i'd so if you have a neural network model you might have like a thousand or a million of different parameters
00:25:27
right and if you want to compute gradient with respect to all of these parameters then
00:25:33
it's not enough to just have one tangent here but basically
00:25:36
would have a million of them in this nom class okay
00:25:40
so for every computation there's like a million um
00:25:44
elementary operations you need to do to computer routers
00:25:48
so more efficient uh approach which is actually used impact is is
00:25:53
the uh the duo to forward morgan this is called reverse mode automatic differentiation
00:25:58
i'd and you know specific versions of that are exactly what is called back propagation
00:26:04
so the idea here is that essentially we you know we execute the program in the fall what pass
00:26:09
to compute the normal values and then we go back
00:26:11
trace each step of execution backwards and propagate gradient it's okay
00:26:17
and we can break this down again on a you know each individual operation here right so we
00:26:23
can for every operation we can we can uh you know specify what is the for what component
00:26:28
and then what is the backwards component i don't understand you can go through and do this one by one
00:26:33
okay and uh you know we proceed step wise and for
00:26:36
every step there is kind of a racks of the computation remain
00:26:41
right into the rest gets shorter and shorter as we go along onto it essentially wraps around okay
00:26:50
so the language i've been using to call this the rest of the computations
00:26:53
kind of suggestive right so you know if you know a little bit about uh
00:26:58
no p. l. theory then uh it's easy to identify
00:27:02
that the rest of the computation is actually something that's called
00:27:05
a continuation in many ways i and uh you know we kind of thought that before in the beginning of the talk
00:27:11
right so the idea here is that you know we can model this the rest of the computation as a call back
00:27:17
right so let's do that that's define a class nom
00:27:21
that uh now has eight double for the normal computation and then a unusable variable d.
00:27:27
which is used to accumulate gradient updates right and then we overload loading plus ten times as in the forward setting
00:27:33
but now we have an additional parameter that is the call back that is to be
00:27:39
invoked after we execute the follow us i'd and then inside the implementation of plus and times
00:27:45
you know we created new nom objects here and
00:27:47
invoking our continuation with this new number object as parameter
00:27:52
right and uh while the rest of the computation
00:27:55
executes it will accumulate the gradient updates in the the
00:27:59
d. field and then after that we have to propagate this you know gradient update to the two inputs okay
00:28:08
and this is really the essence of reverse mode a. d. and uh we can implement
00:28:13
a full system um that does that just by operator overloading right so he's the gradient
00:28:20
i thought right so again it's a a you know a a a first class gradient operator and
00:28:25
you can see the the signatures a little bit more complicated because we have these these call backs right
00:28:31
and um this also means if we actually want to use this it's
00:28:36
really really painful right because you know for every operation plus all times
00:28:40
right we can't just you know nest these expressions but everything has to
00:28:45
you know get a call back to tell us what should happen after that
00:28:49
but this is again well you know we go back to the literature and we you know take all the old paper from
00:28:56
two thousand nine which tools okay how can we hide please
00:28:59
call backs in a generic way i'm using the type system okay
00:29:05
so here's how this works instead of using the uh call backs in an
00:29:09
explicit way we using this shift operator in the implementation of plus ten times
00:29:14
i'd and this will give us access to the continuation that is known
00:29:19
to the compiler at that point and uh we're back to my syntax right
00:29:26
so we have these type annotations here which essentially tell us okay this is an
00:29:29
expression that is actually differential so we can compute gradients for it right and um
00:29:35
with that we can write to x. plus x. cube again
00:29:39
without doing all those call back hell in a manual way
00:29:44
okay
00:29:46
so with that we have something like a high top style defined by
00:29:52
run system right which is you know just executing things as we go along
00:29:57
so what about a system like tens of little right where we haven't extras a graph constructions that
00:30:02
right but you know maybe unlike tens of little in the sense that it's more syntactically pleasant to use
00:30:09
so we take the second paper right which we already seen which uh introduce this rap type
00:30:14
thing i'd and uh same story here right just like we've complex numbers all we have a query interpreters
00:30:21
we just have to insert these rap types in key places
00:30:24
and does a mutable version of refuge called bar of key
00:30:28
and if we do bad and uh basically change nothing
00:30:31
else in our definition of the um differential number type
00:30:35
then we now have a system that generates computation grass right just like tens of low
00:30:41
but it's actually much nicer to use because we have the no language modulation couple capabilities
00:30:47
that allow us to override variables and while loops and so on and so from this code
00:30:54
we can actually generate low level c. code that implements the gradient computation
00:31:00
i'm using you know recursion and loops and so on i'd and i don't expect
00:31:05
you to read all the generated code here right but one thing that's interesting is that
00:31:09
the while loop on the left side which we can view as eighteen because of function
00:31:15
becomes a proper because a function in the generated code might and the intuitive understanding here is that
00:31:21
you know we're executing the loop twice i'd be executing the loop in the
00:31:25
fall what way to compute the regular values and then executing the look backwards
00:31:30
to compute the gradient updates right and this is transparently
00:31:33
handled by you know mapping everything to function calls and returns
00:31:39
and if you know a little bit about we was more a d. right so standard formulations have this notion of the tape
00:31:45
which records intermediate values and operations that need to be traced back so
00:31:49
we're mapping the state data structure essentially to the call stack right and this
00:31:52
of course has efficiency benefits because it don't have to allocate data at one
00:31:57
time and we can be clean up when we return from the function call
00:32:02
okay question of causes well how how does this work impact is right so
00:32:06
you know we built the system called lantern and uh we can run benchmarks
00:32:10
uh comparing to ply torch and tend to flow right and what you can see us you know on the on the c. p. u.
00:32:16
uh you know we're we're quite a bit faster then um these
00:32:20
other systems in a on any benchmarks for matching the performance on others
00:32:24
um of course you also want to do a g. p. u. training right and uh
00:32:29
there again uh you know we're in the same ballpark
00:32:32
if not better on a standard models like a resonant fifty
00:32:36
and uh also deep speech too and i'm just seeing so the the deep speech to think this is actually a
00:32:42
and all the graph right so at this point we optimise things we actually faster than the title johns
00:32:47
i'd basically what we're doing here is we're you know we're using the same you know um differential computing
00:32:54
methods but we're using like a g. p. u. cars
00:32:57
that are provided by comedian and and and other frameworks
00:33:00
right so the i'm not buying substrate is the same but was to get speed ups in many cases because we're
00:33:05
with generating to call to tie these comes together instead of going back to the slow i interrupt
00:33:12
okay so one of the uh you know its drawbacks still is that what what we're doing is all and it's got a
00:33:18
right and of course colours a nice language to implement things but you know people in a i really love to use python okay
00:33:26
the question is with the kind of things we've been doing is colour can actually apply the same ideas back to life and and
00:33:32
and the answer is we can right so we go back to this
00:33:35
paper which introduce this uh you know idea of making language built ins
00:33:39
overload herbal tea and um essentially the same thing is possible in python as well
00:33:45
so will the system uh we have a prototype which is
00:33:48
called snack that you know does this in a relatively straightforward way
00:33:53
um and you can see here that you if you put is ellen mass annotation on the piece of python code
00:33:58
it will re write things in a syntactic way into method calls okay
00:34:03
and this is a bit more complicated and it's got a lot because you know we have to
00:34:06
deal with return statements and break and continue and so on which you know python programmers love to use
00:34:13
ah so we have to take care of all these things and scoping rules and so on but
00:34:17
and uh so based on that we actually collaborated with the team at google
00:34:22
and build a more sophisticated system which is called autograph i don't this is actually part of the sense
00:34:27
of flow two point oh really use and uh there's also paper which appeared at us isn't all this yeah
00:34:34
i just give you little idea of uh what you can do right so now in
00:34:37
place and you can write a recursive model like it realised yeah more tree are and and
00:34:43
and uh using autograph this would be rewritten and to uh you know
00:34:48
function calls for all the parameters then those can be overloaded to actually generate
00:34:53
uh you know some intermediate form that we can then shipped
00:34:55
off to schuyler and use our existing lantern code generation strategy
00:35:01
to generate low level c. plus plus or deep you caught i so this
00:35:06
is the one on the right and again you can see that you know
00:35:08
this has become a because the function that uses continue asians and x. 'cause that
00:35:14
way to manage these great and updates that's and and all the uses of condemnations
00:35:19
uh are are highlighted here and uh you know basically we
00:35:22
using c. possible slammed us to implement these these next factions
00:35:27
good to sum up so i think the you know maim take
00:35:33
away from me is uh p. l. research matters right so you know
00:35:37
fundamental ideas like continuation like multi stage programming is really can have
00:35:43
a big impact in the data in a i insecurity in other fields
00:35:48
right i should be a glimpse of some of the work that uh you know we've been doing here and um
00:35:54
yeah if you're interested in uh you know using fearful anything i'd love to
00:35:58
talk to you and uh otherwise i'm happy to take your questions thank you
00:36:07
ah
00:36:13
please raise your hand if you have any question so i can come there yes please
00:36:25
so i was wondering first on the seat is a a group and replacement for the sparkly bright and do
00:36:31
you try to change it to use this for that is also you can extend it to do it this year
00:36:38
or we can so it's a drop in replacement for the spark data frame
00:36:43
if you i i said to using r. d. d.'s on your on then
00:36:46
is not something another level of attraction we work on right but we working on
00:36:49
everything that is the data frame right and uh you know if you have data sources
00:36:53
that uh we know about right to see the ah parquet a potentially all those
00:36:58
then uh we can also generate native comfort it's either that the data source that we have no knowledge about
00:37:03
dan uh again this is not something we can analysts and entrance like an anyway
00:37:13
more questions
00:37:19
guess what
00:37:23
oh how to deal with the participation problem internet in vermont
00:37:29
selling what the differentiation yeah there is a what a shame it and the
00:37:34
airbus yeah excellent question right so uh what i showed you here is um
00:37:40
let's go back um
00:37:44
so this is essentially assuming that you only computing one um
00:37:50
grady and at the same time right so if you would computer
00:37:53
gradient inside a gradient right then there's a problem that's called perturbation confusion
00:37:58
that you essentially have to you know do some big you ache and
00:38:02
my computing the gradient with respect to x. all with respect to why
00:38:06
hide and the way this is typically don is by introducing it tag field in those numb class
00:38:11
uh which tells you essentially gives a unique identifier right and then you can mix and match and see okay well this is the one
00:38:17
this is a variable i'm using all all the yellow it's i'm i'm not showing
00:38:20
this on the slides but you know just entertaining opposed to using basically great question
00:38:33
any more questions

Share this talk: 


Conference Program

Welcome!
June 11, 2019 · 5:03 p.m.
1574 views
A Tour of Scala 3
Martin Odersky, Professor EPFL, Co-founder Lightbend
June 11, 2019 · 5:15 p.m.
8337 views
A story of unification: from Apache Spark to MLflow
Reynold Xin, Databricks
June 12, 2019 · 9:15 a.m.
1268 views
In Types We Trust
Bill Venners, Artima, Inc
June 12, 2019 · 10:15 a.m.
1569 views
Creating Native iOS and Android Apps in Scala without tears
Zahari Dichev, Bullet.io
June 12, 2019 · 10:16 a.m.
2232 views
Techniques for Teaching Scala
Noel Welsh, Inner Product and Underscore
June 12, 2019 · 10:17 a.m.
1296 views
Future-proofing Scala: the TASTY intermediate representation
Guillaume Martres, student at EPFL
June 12, 2019 · 10:18 a.m.
1157 views
Metals: rich code editing for Scala in VS Code, Vim, Emacs and beyond
Ólafur Páll Geirsson, Scala Center
June 12, 2019 · 11:15 a.m.
4695 views
Akka Streams to the Extreme
Heiko Seeberger, independent consultant
June 12, 2019 · 11:16 a.m.
1552 views
Scala First: Lessons from 3 student generations
Bjorn Regnell, Lund Univ., Sweden.
June 12, 2019 · 11:17 a.m.
577 views
Cellular Automata: How to become an artist with a few lines
Maciej Gorywoda, Wire, Berlin
June 12, 2019 · 11:18 a.m.
386 views
Why Netflix ❤'s Scala for Machine Learning
Jeremy Smith & Aish, Netflix
June 12, 2019 · 12:15 p.m.
5029 views
Massively Parallel Distributed Scala Compilation... And You!
Stu Hood, Twitter
June 12, 2019 · 12:16 p.m.
958 views
Polymorphism in Scala
Petra Bierleutgeb
June 12, 2019 · 12:17 p.m.
1113 views
sbt core concepts
Eugene Yokota, Scala Team at Lightbend
June 12, 2019 · 12:18 p.m.
1656 views
Double your performance: Scala's missing optimizing compiler
Li Haoyi, author Ammonite, Mill, FastParse, uPickle, and many more.
June 12, 2019 · 2:30 p.m.
837 views
Making Our Future Better
Viktor Klang, Lightbend
June 12, 2019 · 2:31 p.m.
1682 views
Testing in the postapocalyptic future
Daniel Westheide, INNOQ
June 12, 2019 · 2:32 p.m.
498 views
Context Buddy: the tool that knows your code better than you
Krzysztof Romanowski, sphere.it conference
June 12, 2019 · 2:33 p.m.
394 views
The Shape(less) of Type Class Derivation in Scala 3
Miles Sabin, Underscore Consulting
June 12, 2019 · 3:30 p.m.
2321 views
Refactor all the things!
Daniela Sfregola, organizer of the London Scala User Group meetup
June 12, 2019 · 3:31 p.m.
514 views
Integrating Developer Experiences - Build Server Protocol
Justin Kaeser, IntelliJ Scala
June 12, 2019 · 3:32 p.m.
551 views
Managing an Akka Cluster on Kubernetes
Markus Jura, MOIA
June 12, 2019 · 3:33 p.m.
735 views
Serverless Scala - Functions as SuperDuperMicroServices
Josh Suereth, Donna Malayeri & James Ward, Author of Scala In Depth; Google ; Google
June 12, 2019 · 4:45 p.m.
936 views
How are we going to migrate to Scala 3.0, aka Dotty?
Lukas Rytz, Lightbend
June 12, 2019 · 4:46 p.m.
709 views
Concurrent programming in 2019: Akka, Monix or ZIO?
Adam Warski, co-founders of SoftwareMill
June 12, 2019 · 4:47 p.m.
1974 views
ScalaJS and Typescript: an unlikely romance
Jeremy Hughes, Lightbend
June 12, 2019 · 4:48 p.m.
1377 views
Pure Functional Database Programming‚ without JDBC
Rob Norris
June 12, 2019 · 5:45 p.m.
6375 views
Why you need to be reviewing open source code
Gris Cuevas Zambrano & Holden Karau, Google Cloud;
June 12, 2019 · 5:46 p.m.
484 views
Develop seamless web services with Mu
Oli Makhasoeva, 47 Degrees
June 12, 2019 · 5:47 p.m.
785 views
Implementing the Scala 2.13 collections
Stefan Zeiger, Lightbend
June 12, 2019 · 5:48 p.m.
811 views
Introduction to day 2
June 13, 2019 · 9:10 a.m.
250 views
Sustaining open source digital infrastructure
Bogdan Vasilescu, Assistant Professor at Carnegie Mellon University's School of Computer Science, USA
June 13, 2019 · 9:16 a.m.
375 views
Building a Better Scala Community
Kelley Robinson, Developer Evangelist at Twilio
June 13, 2019 · 10:15 a.m.
245 views
Run Scala Faster with GraalVM on any Platform
Vojin Jovanovic, Oracle
June 13, 2019 · 10:16 a.m.
1342 views
ScalaClean - full program static analysis at scale
Rory Graves
June 13, 2019 · 10:17 a.m.
463 views
Flare & Lantern: Accelerators for Spark and Deep Learning
Tiark Rompf, Assistant Professor at Purdue University
June 13, 2019 · 10:18 a.m.
380 views
Metaprogramming in Dotty
Nicolas Stucki, Ph.D. student at LAMP
June 13, 2019 · 11:15 a.m.
1250 views
Fast, Simple Concurrency with Scala Native
Richard Whaling, data engineer based in Chicago
June 13, 2019 · 11:16 a.m.
624 views
Pick your number type with Spire
Denis Rosset, postdoctoral researcher at Perimeter Institute
June 13, 2019 · 11:17 a.m.
245 views
Scala.js and WebAssembly, a tale of the dangers of the sea
Sébastien Doeraene, Executive director of the Scala Center
June 13, 2019 · 11:18 a.m.
661 views
Performance tuning Twitter services with Graal and ML
Chris Thalinger, Twitter
June 13, 2019 · 12:15 p.m.
2003 views
Supporting the Scala Ecosystem: Stories from the Line
Justin Pihony, Lightbend
June 13, 2019 · 12:16 p.m.
163 views
Compiling to preserve our privacy
Manohar Jonnalagedda and Jakob Odersky, Inpher
June 13, 2019 · 12:17 p.m.
302 views
Building Scala with Bazel
Natan Silnitsky, wix.com
June 13, 2019 · 12:18 p.m.
565 views
245 views
Asynchronous streams in direct style with and without macros
Philipp Haller, KTH Royal Institute of Technology in Stockholm
June 13, 2019 · 3:45 p.m.
304 views
Interactive Computing with Jupyter and Almond
Sören Brunk, USU Software AG
June 13, 2019 · 3:46 p.m.
681 views
Scala best practices I wish someone'd told me about
Nicolas Rinaudo, CTO of Besedo
June 13, 2019 · 3:47 p.m.
2708 views
High performance Privacy By Design using Matryoshka & Spark
Wiem Zine El Abidine and Olivier Girardot, Scala Backend Developer at MOIA / co-founder of Lateral Thoughts
June 13, 2019 · 3:48 p.m.
754 views
Immutable Sequential Maps – Keeping order while hashed
Odd Möller
June 13, 2019 · 4:45 p.m.
277 views
All the fancy things flexible dependency management can do
Alexandre Archambault, engineer at the Scala Center
June 13, 2019 · 4:46 p.m.
389 views
ScalaWebTest - integration testing made easy
Dani Rey, Unic AG
June 13, 2019 · 4:47 p.m.
468 views
Mellite: An Integrated Development Environment for Sound
Hanns Holger Rutz, Institute of Electronic Music and Acoustics (IEM), Graz
June 13, 2019 · 4:48 p.m.
213 views
Closing panel
Panel
June 13, 2019 · 5:54 p.m.
400 views