Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:02
awesome
00:00:03
so it's a big venue in a lot of people are still gonna be filing
00:00:06
an but women's we'll get started or maybe they want but that's fine um welcome uh
00:00:13
i'm gonna be talking about massively parallel distributed scout compilation and you um and i
00:00:19
i want to really stick to the end you part um but we've that this is relevant to
00:00:24
absolutely everyone who builds colour or bills could in general it's not just stuff that waiters doing so if
00:00:30
if you don't have that take away as the questions of the and help me clarify that um so
00:00:35
a little bit about myself on the tech lead of tutors building um as of about a year ago
00:00:41
and i hope to take him out with one of the other excellent folks under team so
00:00:44
i can call it a little bit more but i'm sure is still very much just goes up
00:00:50
and uh we more than ten million lines of handwritten store
00:00:53
code and many millions of uh lines of generate it's got hurt
00:00:57
uh doesn't really change the the the ratios between the code bases that we have
00:01:03
and i stayed pretty steady scholars continue to grow at a rate faster than
00:01:06
most other languages we have um but what is the trajectory of building code
00:01:13
i'm gonna particular maybe buildings all code we should does build should contain a
00:01:19
lot more salad code that would be great uh maybe some some score very good
00:01:24
and they should be a lot faster but that represents a little bit of a
00:01:27
paradox or no because skull code takes awhile to compile that's fine i'm just call uh
00:01:34
the the rule of thumb is sort of that's all it takes something like ten x. the time of java to compile right
00:01:39
but there's a long tradition of high level languages taking longer than their low level counterparts to compile
00:01:45
um see in c. plus plus and then things like ask will take a long time rust providing a whole bunch of obstructions that
00:01:53
cost you a little bit in terms of performance but we'd like them also to be faster of
00:01:58
um so the says goebbels today in open source is that in c.
00:02:03
i. maybe it's travis or drone or something you have some women catching
00:02:07
um and see i times that probably float up into the fifteen minute range um there are
00:02:15
but now there are amounts of time that people are comparable waiting and you sort
00:02:19
of have this natural plato's um on laptops you might have called bills that take minutes
00:02:26
um and you don't really have an expectation that if you are checking something out that you
00:02:31
i have to build it you you expect that you will have to build that um and
00:02:36
it's probably the case that the projects sizes that are built in open source
00:02:40
um or lately constrained by this because the uh amount of
00:02:44
infrastructure you just get out of the box isn't really there um
00:02:48
and so if you wanted to your c. i. to to if you wanted to build significantly
00:02:53
more code and still have your c. i take fifteen minutes you have to build a lot of
00:02:57
infrastructure to make that happen um and if you wanted your call bills on a laptop to continue
00:03:04
this data comparable level as the project got larger larger you'd have to build a lot of infrastructure
00:03:10
um and that probably put a cap on how large projects get um h. what're we built some that infrastructure
00:03:17
so although we have many millions of lines of code the the summit tomato master stuff is still about fifteen minutes
00:03:24
um and in some cases lower um we have parallel build to distribute caching
00:03:30
um in the sense that when you're building that monterey bow and it
00:03:33
contains a um something like thirty thousand a targets are projects um it
00:03:41
the worst case bill this is something like forty five minutes um across seven hundred ish machines and and then
00:03:49
on laptops a whole builds mostly get the cash cold in the sense that i have not personally build something
00:03:55
that someone else has got um in particular c. i. has built um
00:04:00
and but completely missing the cash might mean a much longer built
00:04:05
um when you completely missed the cash we've change something completely fundamental locally
00:04:10
um that has never been bolton c. i. um and building from source is
00:04:15
a feature you wanna be able to change something very fundamental in one step
00:04:20
and then see the effect on your code base it's it's extremely powerful from integration perspective to have everything
00:04:27
test of like that um you can make tiny
00:04:29
changes it's either impact uh dozens of projects away um
00:04:36
but it means that you have to have very good caching so currently it's a manual step to get c. i. involved but whatever um
00:04:43
you run or exam box or you're under jeff and that basically
00:04:47
tells uh the good review system that you'd like to either try out
00:04:52
do a dry run of landing something um or post it and simultaneously do a dry run of planning it
00:04:58
um without actually winning it um and this means that gets you involved is also the
00:05:04
case in open source going back a little bit um in that you will eventually learn
00:05:11
oh sorry and the reason harbour or reason harder refresh that is finishing rolling out of six
00:05:16
core and i'm not books which are nice um but the downsides of the the the bills today
00:05:23
are you probably don't wanna carry more than six scores on your back um and and but also in theory you
00:05:29
could get a lot more performance if you carry more than six course with you and if they were commodity um
00:05:35
if you have distributed caching uh only builds that
00:05:39
run in c. i. probably populate it um so if
00:05:44
that yeah and then catches are also likely uncorrelated if using some like
00:05:47
travis are drawn each each shard might have it some cash and it
00:05:51
it right so the thing and then at the end um upload and
00:05:55
download to the next i mean it's around that short for example um
00:05:59
and getting more fine grained than that can be hard to do correctly um
00:06:05
so and as is that it's a manual step to get more machines involved so users sort of learn
00:06:10
where that's that boundaries where is it reasonable like i don't wanna wait five minutes for these tasks
00:06:16
and i've i've i've learned i've learned what not to do because i
00:06:20
accidentally use the instructions that were little old on the slate that said
00:06:25
build the whole kobe some entire test every time you wanna do something
00:06:28
and and that slowly grew up to ten minutes worth of worth of time
00:06:32
um so they slowly learn like okay that's the thing i can run locally that's the thing i can't run locally um
00:06:39
and that's unfortunate so what we'd like to do is improve this work flow transparently
00:06:45
unless it transparently because it's definitely the case that um remote that machines are this very
00:06:50
interesting thing and in particular the v. s. code split mode denounced recently is very interesting
00:06:57
um because it means that if you have a pet box somewhere um or are willing to pay
00:07:04
for like long long running the m.'s that you connect to you can have a lot more course um
00:07:10
but we like to do better because that doesn't that means changing your work flow to to some degree and then also
00:07:16
having a pet machine somewhere that that's hosting your code um so we'd like to head toward a world where the uh
00:07:24
yeah we're the explicit remote step so was necessary eventually you're gonna post your code review maybe that's still remote step
00:07:30
but in the meantime would be nice if you did not have to do
00:07:33
anything to verify anything beyond what your usual workload data verify that it's correct
00:07:38
yeah and so project schedule because everytime you make things faster projects ketchup like by growing larger
00:07:47
um so this is a this is they another talk in a continuing story
00:07:52
most recently i'm using it to talk you timber marco who uh
00:07:57
a moment of silence and uh give a talk about the day about r. s. c. and and
00:08:04
that that will play part here um tutor also i think has another talk in in this conference about
00:08:10
a graph and i'm not gonna talk about brother runs i'm gonna talk about another component of crawl um
00:08:17
and then also i i previously discussed have to rebuild source a solid from source and it's relevant here
00:08:25
so one thing i wanna say though is that um this is
00:08:28
a what i'm presenting is a very long running product um and
00:08:33
we're on the cusp of having excellent results from it um and it but
00:08:39
these are preliminary results uh sort of stay tuned for the rest of that um
00:08:45
clearly if so the project is going into it what is clearly a buck really
00:08:49
uh is a bird it is the most numerous bird in the world which is
00:08:54
uh f. contact there's a red build clearly yeah and in africa they
00:09:00
uh just or a field so quickly that farmers basically have to stand out
00:09:04
there and manually like whack whack them with sticks to get them off the
00:09:07
crops um the farmers will blow trees that night because this this this heard
00:09:12
of you have heard of locusts but they in parallel do something very quickly
00:09:20
just the whole let's let's just or this goes um so that fundamentally we wanna change how we invoke compilers
00:09:26
in order to uh uh mark additional parallelism and so you do is talk a little bit about us um
00:09:32
the outlining the r. c. does is it is a key component
00:09:36
of this um and then also using that foundation executing the compiles remotely
00:09:43
and transparently right is not at the box misses i remodel tool
00:09:48
and some of the fact the execution was not on my laptop
00:09:53
um and initially this on the j. v. m. and only compiles but there's no reason this can extend to all the
00:09:58
other processes in your pipeline there's there's prior art in that
00:10:02
it is on a google and other places so what is outlining
00:10:07
um so outlining in this case in going there is implemented by r. c.
00:10:13
um and it is what i'm calling external outlining
00:10:17
basically the compiler extract since releases headers from your code um and
00:10:24
the answer lies is part is what makes it extra no right you run a tool
00:10:29
on your source files it emits something that that binary but can be
00:10:33
persisted and then passed between machines um and allows you to launch dependent compiles
00:10:42
um before you've finished compiling the dependency um and that's also known as pipeline compilation so
00:10:50
i'm gonna i'm gonna straight up use your genes the slides uh on the section so
00:10:57
given this example you have a target a it it takes two seconds to compile a target b.
00:11:02
that takes a one and a half and still and a target see that takes one second
00:11:07
finally a target that takes one half time and we have dependencies between them so
00:11:14
a p. n. c. both depend on a they can start until it is completed
00:11:19
um and the depends on c. n. can startle c. is completed
00:11:23
compiling right so these are the compiler running and then completely um
00:11:28
we refer to hurt this is referred to as one phase compilation because there's no header extraction happening here
00:11:34
by if you extract headers and imagine that extracting headers takes about
00:11:39
fifty percent of the amount of time it would take to compile something
00:11:43
um you see this graph just get charged um with and
00:11:51
effectively the compile of b. and c. only depends on the outline of i
00:11:56
you don't to finish compiling a before you can start compiling the n. c.
00:12:01
um additionally the outlining of c. only depends on the outlining of a
00:12:07
um which means that the compilation of d. you can start a much sooner if um
00:12:16
so that's interesting more interesting is that if outlining is
00:12:19
significantly faster and use that up your entire build off
00:12:25
a a a frown starting at four point five seconds to two seconds right is a hypothetical but it's really interesting
00:12:31
the other thing that's interesting is that we're allowing more things to happen in parallel
00:12:35
at the end here all for the symbols are actually happening almost entirely in perl um
00:12:42
i believe in using that word correctly okay so outlining um
00:12:47
prior obviously it's it's like extracting headers well what others others
00:12:51
c. plus plus i'm sure i'll because people's purses handwritten headers and it
00:12:56
is the case that after as you can see ghost was compilation precedes it
00:13:02
uh uh walks through all the headers and is able to compile all the files in parallel right so and that's where
00:13:09
for me uh the connection but just call compilation clicked after talking to somebody about house it
00:13:15
was was compilation works um but doable has also done a a little bit in this area
00:13:20
a turbine isn't open source header compiler um for java and
00:13:24
so it's it's analogous to what our c. does first call um
00:13:29
i mean it's a tiny jars um that contain just enough to fool the compiler that depends
00:13:37
on you uh into thinking that you've already been compiled and r. c. does a similar thing
00:13:42
i'm using just the scholar signatures rather than full castles and then still
00:13:48
uh also recently in this is mostly like last year and a half
00:13:51
um has had a series of steps in the direction about
00:13:54
lining up by planning ellen white lining or are very interesting um
00:13:59
so continue all the outlining types for a lot
00:14:02
when types um so similar similar situation a pipeline in
00:14:08
loop based on using d. x. just the first few phase of the compilers and then using the symbol table
00:14:16
uh of the compiler and then using the symbol table from a compile as an input to the the other compilers
00:14:22
um and and uh he um had an implementation is off or die
00:14:27
and then i heard just yesterday that uh jason's ought has
00:14:30
said has done receiver can scorsese so that's really exciting um
00:14:34
and up at the end also talk about what i think this particular exciting um
00:14:39
but equally uh we're using raising our see the other component of this is remote execution right um we have this new
00:14:47
parallelism but how do we take advantage of it right we don't we're not caring more than six course on our backs
00:14:52
ah and shouldn't so the remote execution strategy is to
00:14:56
have clients that talk to an a. p. i. um
00:15:01
we can go for a little bit uh in in this model um we're using a standard actually a baseball has a
00:15:08
remote execution eighty i standard second version um that's implemented by
00:15:13
four five different clients including a wanna talk about in a second
00:15:18
and for for different servers again one which i'll talk about a second um
00:15:23
it's roughly invoke this used unix process in the context of this content
00:15:28
addressable store i just um i a unique identifier for a collection of files
00:15:34
and give me back the digestive the outputs again a unique identifier for collection of files
00:15:39
um you can sort of think about the inputs and outputs is the files except significantly more efficiently stored arms
00:15:47
so going back to the diagram a clients would would uses it yeah you can you give it a. p. i.
00:15:51
um which would talk back and forth with the the content addressable storage which is the the raw rough
00:15:57
equivalent of s. three if you have something very tiny maybe use a different database for the very tiny stuff
00:16:02
um and the poll workers and workers are all going to invoke unix processes right
00:16:08
basic things in this case we're invoking or c. n. sync um
00:16:13
where's think is the incremental compiler
00:16:15
a library for um style ah on
00:16:21
and is what tutor uses in general so the client in this case
00:16:27
is pants um our case uh we over all the and then the
00:16:32
pants to support snapshot thing of inputs basically building those those condom addressable
00:16:36
digests of the imports as just a native kind of assumption um so that
00:16:42
the execution of pants role wise on their existence um
00:16:47
we have about ten percent of pants implemented in ross now and it's mostly the slayer um
00:16:53
when when you snapshot those imports you put them in a local uh ellen de b. database
00:16:59
um and when you need to send them somewhere you take them from that database and upload them um and
00:17:07
the it guys implemented as a g. o. p. c. a. p. i. um and all this is open source so
00:17:14
if somebody wanted went against this code and begin using it today they could um but also basal uh
00:17:21
uh as as demonstrated by the name of the a. p. i. implements s. a. v. i. as quite right so in theory this could also be it'll
00:17:29
um and then servers such water internally has other system
00:17:32
got screwed for awhile that implements um remote execution uh it
00:17:38
basically has two modes in the past we've used it exclusively as get
00:17:42
you get workspace and then run whatever website renee build tool probably um
00:17:49
but getting a get workspace is is a very relatively expense operation that special it what are
00:17:55
um but in general that's gonna be maybe you gigabyte or something of stuff um
00:18:00
this alternate mode where it's basically just hosting the um remote execution a. p. i.
00:18:07
is dealing with tiny processes invoked on exactly some small sort of files right to the remote um
00:18:16
but this could also be one of the other implementations of the the backend of this it yeah right
00:18:22
google has something called r. p. um in alpha now um which is very interesting and they want
00:18:29
quite commit to whether it will definitely become a product but it might um and then they're
00:18:35
open source implementations uh call bill for mental brian from rubber and we'll work right so they're options
00:18:41
um the it guys stateless enter medic so the tools that you run inside of it
00:18:46
um have to declare everything they're going to use and have to declare what they wanna capture on the other side um and
00:18:54
so generally that means you bring your own binary is because you don't wanna assume anything about the execution environment
00:19:01
um all of the work yourself little local caches of the the cast contents
00:19:06
and so they won't have to go to the network to actually get um the compiler for example
00:19:11
for the j. v. m. um you can also sort of like bust a whole break the fourth wall
00:19:17
and use whatever you know about your execution environment um if it's not dark arises
00:19:23
uh or if it is not right for that matter um to chip sort of get a jury um if you want to um
00:19:32
but i still cinematic makes performance a little bit more challenging because the j. v. m. likes warm up
00:19:40
um so how to get around that
00:19:43
we're all comes back and so uh we do any in particular
00:19:49
and i'll i'll talk talk about the entire team later on um
00:19:53
uh worked on making native images for us in sync um and
00:19:58
with a little bit earlier had great success um it does overhead
00:20:03
um but brawl native image compilation is a pretty a
00:20:08
nascent technology and so we expect that to get better
00:20:11
but also what the amount of pearls on this on locks makes uh makes this a non issue for us right now
00:20:19
um there's a native image that you can run under sorry native image agent
00:20:22
that you can run on your application to generate the config that you need
00:20:26
to use native image so that important and helpful so where are we now um this system is working and
00:20:33
and has a though the last two weeks and so we have an example of a real internal library that
00:20:39
and we could point at a whole bunch of other ones were the ones i'm showing to their from this library um
00:20:45
i was surprised by how much generated code it contains but it contains four point five million generated lines of
00:20:50
style and three hundred handwritten once a style um and then seventy five k. and two million lines of job
00:20:57
um so obviously been profiling the heck out of this thing in the last few
00:21:01
weeks um and we've been using distribute tracing and we've been up as it um
00:21:08
and before starting the project we we simulated what we expected to see um and that's part of why
00:21:14
we're so we we have been so excited about this project there that this has a huge amount of promise
00:21:20
um and so what we've been doing is pushing down for that hypothetical lower bound that we've seen down
00:21:27
so what are we doing we been optimising right i'm in and out of the
00:21:33
compiler using jars helps quite a bit because you can reduce round trips to the network
00:21:38
it already helps to use jars locally because interacting with says calls is very expensive but
00:21:44
additionally when you're interacting with the network file system effectively um reducing i was is significant um
00:21:52
we're also avoiding material isn't outputs so this whole
00:21:55
a. p. i. actually allows for never downloading um
00:22:00
the output of a compile if you don't want to avoid if all you needed was
00:22:03
hey there were no errors that's a tiny little output at the very end of your compile
00:22:08
and it's just standard out it's none of the class files um and that's all you need to download
00:22:14
but even if you do download it you don't actually have to put on the file system you just have to put in your local database um
00:22:21
and and file systems are still slow apple tried um and a great job
00:22:29
okay so we're also optimising by uh adding matching um i'm actually going to
00:22:37
go ahead and use another huge inside so
00:22:42
if you know not not so what you can do is if you if you extract matters for particular target a set of files
00:22:50
um you can compile all the files in that target in parallel because they have already been compiled
00:22:58
right weird time loop type thing but that means that if you have ten files
00:23:03
you outlined them in a fraction of the time it takes to compile them you can then compile all them in parallel
00:23:10
you'd literally invokes ever compilers um so justin looking separate compilers
00:23:15
is what this was all about um the overhead of doing this
00:23:20
is that compilation depends on outlining right so if outlining takes some amount of
00:23:26
time even have to wait for outlined finish before you can begin the compilation
00:23:31
we're otherwise um compilation only depends on the outlining of dependencies
00:23:36
um thing and you have to look the compiler multiple times right
00:23:39
so it has never had you the compilers are are used to
00:23:44
being kept warm and i'm compiling large batches of things in general
00:23:50
and so if you give them smaller imports you you're paying more that overhead cost
00:23:54
and it's up to take advantage otherwise it'll course so looking at this again
00:23:58
if you break each of those targets basically in half you pay some overhead
00:24:03
a cost to compile them but you get even more parallelism
00:24:07
you basically just double that w. parallelism with with twenty percent overhead
00:24:11
and really get down to one point four i think we were to before have thank you again due to into the slides so
00:24:19
uh what else are doing optimising and you don't have to run everything remotely so figuring out what you wanna run
00:24:26
remotely um is is it is an important place documents
00:24:30
um and if you just cherry pick individual things that you
00:24:34
you wanna run remotely um you
00:24:39
oh sorry if you just cherry pick individual things you can say this one this one doesn't take very long i know
00:24:45
it doesn't take very long or run it locally it has it high as high remote overhead for some reason right um
00:24:52
we haven't done any of that for our demo or not demo but
00:24:55
for the result also um but we think it will be very important um
00:25:02
and finally of course i've talked about the ratio between the
00:25:05
compilation time an l. any time that's critical as well um so
00:25:10
we've been doing a lot of profiling we win has been
00:25:12
doing a a lot of profiling and optimising of r. c. um
00:25:17
and uh getting that ratio down the super helpful um
00:25:24
okay okay so as as that we've been doing distributed profiling um
00:25:29
mineral and video compression not a week any secrets what you're seeing here
00:25:35
is a is it can trace a fifteen seconds worth and
00:25:42
what you'll notice is that in the beginning of this video um
00:25:47
things are pretty parallel off and that's this is what a hundred corps were the perils and looks like right there a
00:25:55
whole bunch of compiles happening in parallel but what you also notice is that the and we have this one two three
00:26:02
go the long tail goes off the and i'm going to need to
00:26:10
we need to skip some leeway um the long tails swoops off to the end
00:26:15
here there is a long tail um we started to see what the long tail is
00:26:20
uh we have a pretty good idea that it relates to cycles between java install
00:26:25
because what we haven't done yet that's levels sorry uh has the alternate between java install
00:26:30
we are not currently outlining java code and that means that if a java compiles
00:26:36
sits between its colour compile and another scholar compile um discoverable has to wait for
00:26:43
java and java has to wait for um the full compilation of the socket ah
00:26:51
we think that's all uh we will also confirm that others as before hand off before continuing to implement it
00:26:59
yeah but where is the result like what is the result um so again
00:27:04
doubling down on on uh the carrier enter this is this is really new
00:27:08
and yeah and this is today's result but by tomorrow
00:27:12
it'll be history already um so performances doubled in the
00:27:17
last week um part that was that we had our entire team in london which is awesome i'm very excited uh
00:27:24
cell called compiles cool compiles using no band which is
00:27:29
a little server that keeps a sink instance warm um
00:27:34
one four course which is what this laptop as actual record box takes six hundred twenty eight seconds
00:27:40
a a course which is apparently not something you can carry on your
00:27:43
back and we don't have the six core number uh is fourteen thirty
00:27:48
the interesting thing is at sixteen course um you're a little bit slower uh
00:27:54
i could fill autograph a little bit more but you've reached the limits of parallelism
00:27:59
um for this target right this set of seven hundred projects that were compiling um
00:28:06
the graph shape without outlining i'm sorry and this is without outlining right zinc only
00:28:13
um graphic without outlining means that you can only extract so much parallelism
00:28:18
um and so even if we were to give people bigger boxes
00:28:22
we would not be able to utilise if so or c. n. sync
00:28:28
using free to course is just about as fast as the
00:28:32
course right oh so that's that's not that's not wonderfully exciting
00:28:36
by using a hundred twenty eight course it's faster than eight so it's faster than something that you can carry on your back
00:28:42
um this is this is the beginning of something very very very exciting
00:28:48
and it might be weeks away from being massively more exciting uh stay tuned
00:28:55
the other thing though is that benchmarking called compiles doesn't actually show all the benefits of remote and
00:29:00
um we were talking about work flows and remote execution uh allows
00:29:05
for this tour transparent thing where because the compiles are happening remotely
00:29:10
inner medic environment but basically can't possibly going could be polluted
00:29:16
uh you can you can put the outputs of those compiles in catches immediately so
00:29:22
all of all the peoples that it's can be cached um
00:29:27
and uh that means like if i if i merge a
00:29:31
master into my branch and i had significant changes in my branch
00:29:35
um i'm gonna see a call compile i'm gonna populate the cash and everyone
00:29:39
else using my ranch is populated i didn't have any sort of manual step
00:29:43
to to basically say i'm about to share this grand should i warm it up for people
00:29:48
um which is the thing that people sort of have to do a tour ah on the end
00:29:53
also i sort of a reference not meaning to download things if you assume that tests are remote it as well
00:30:00
which is very very useful as well because tests are are embarrassingly parallel without any sort
00:30:05
of aligning rate if you have a hundred tests you can run them all um in batches
00:30:09
uh on a remote machine this means that uh and you you actually get out uniformly i was talking about you can run as much
00:30:17
of the repository as you'd like um without uh without really thinking
00:30:23
too much about that so what are the next steps um for us
00:30:29
we're gonna continue optimising on because we're on the cusp of having an excellent results um
00:30:35
we will probably outline java um our simulations didn't show it being a massive speed up
00:30:42
um but it's looking a little bit like in reality it's more significant than we thought um and
00:30:49
most likely that will be either r. s. the just adding
00:30:53
support for parsing java such that it can produce an outline
00:30:57
uh for the java code for using turbine which is good but again goebbels a java hatter compiler um
00:31:05
we will also potentially do dynamic badgering um because one of the things we've seen is that having a big
00:31:11
a big knob in terms of how much how how what the maximum number of file somehow it once is
00:31:18
is a little bit limited in the sense that if you encounter
00:31:21
any large target near the beginning where we already have all the parallelism
00:31:25
we don't want more pearls and right so you wanna do something close
00:31:28
before join where if i have extra parallelism break it up otherwise don't um
00:31:34
also speculation is very important um and speculation refers to
00:31:39
but making too intense for something and then taking the best
00:31:41
of them um and in this context that might look like waiting
00:31:47
no on the order of fifty two hundred milliseconds after starting something in
00:31:52
your more desired location and then starting it in your last desired location
00:31:57
in hopes that the less desert location and zapping faster than your more desired location
00:32:02
um and so local remote that might be if i'm on
00:32:05
a i'm shoddy internet connection i might speculate by i'm starting locally
00:32:12
and then only kicking to the remote if uh if my local competition is
00:32:16
taking longer than some amount of time um and generally you you tune speculation
00:32:22
uh the percentiles so so i've made it past my my
00:32:26
fifty percent our men ended percentile i wanna try to remote something
00:32:30
or vice versa if you're running in c. i. maybe you wanna speculate remotely analyser compelling something locally if it takes too long um
00:32:38
finally the input fetching so this running of unix processes in in a cluster um
00:32:45
we have potentially encountered things about the a. p. i. that we believe can
00:32:50
be improved to allow for more prevention so you spend less time on startup
00:32:54
um it's already pretty well we're spending significantly less time starting up processes than we are running them
00:33:01
in in most cases but there are other cases where the progress is pretty short lived relative to what it means
00:33:07
and so we could always go faster there and the other thing is
00:33:11
it's possible to make the fetching of the things the process needs to run
00:33:15
a sink with the cross word processor running if although it you're using a
00:33:20
fuse file system because you can be back filling the things that it might request
00:33:25
as it starts and um possibly be them you know beat it
00:33:30
so that none of those file file system operations actually end up walking
00:33:35
um so for for the next steps for the project post optimising um or makes
00:33:42
them without missing a hand is shipping remote increasing in c. or in c. i. um
00:33:48
this does allow us that like shopping precinct is an important step toward shipping with r. c.
00:33:55
um there are a few more steps involved in actually rolling r. s. c. out um
00:34:01
if you've if you've seen has talked about r. c. or
00:34:03
c. requires public return types rounds i returned example that members
00:34:08
which is not a thing that the cell language currently requires um it's the
00:34:12
thing that it's a convention that a lot of people right rates colour um
00:34:16
follow in the sense that you should have about a return type if you're going
00:34:19
to have a public a. p. i. um but requiring it is is a step
00:34:26
um there's a celtics rewrite it's very reasonable um and it's automatic
00:34:31
for for most of the cases so we fully expect that that
00:34:35
as we are as we're able to demonstrate the huge benefit of this work it will be straightforward what up
00:34:43
um and then also enabling remote laptops we're gonna ship and see i first but on laptops
00:34:48
you additionally need that speculation i was talking about um because particularly in in shoddy network environments
00:34:55
um we're completely off line cases you don't want to rely on this the the server being there
00:35:03
um and then perhaps more controversial we think that it might be possible to remove zinc from this equation so
00:35:10
a little bit more about think 'cause there was a didn't really lay that
00:35:13
down um zinc is extracted from s. p. t. s. p. t. uses inc
00:35:18
um as it's incremental compilation library so when
00:35:23
uh s. p. t. re compiles things it uses
00:35:26
an analysis file that tracks the dependencies between files
00:35:31
to know which files it needs to compile right so you have a project and my contained twenty ish files
00:35:37
um if this ink analysis shows that only one of those twenty files is affected by some at it
00:35:44
um it will compile well basically invoked by with exactly one file um and
00:35:51
uh this is a little bit fragile uh it can lead to under
00:35:55
compilation can't canon has um and over the years a lot about semantics
00:35:59
there but humans been introduced and compilers change and so it's it's a
00:36:04
moving target um and so we want to maybe see whether external outlining
00:36:12
uh is a better answer to the question of how do
00:36:14
i how do i make incremental it's got compilers fast so um
00:36:20
i talked about some very compilation the other thing is that
00:36:23
uh to the analysis file contains enough information to know whether um
00:36:28
a the public a. p. i. of something has changed since the
00:36:32
last build if the public a. p. i. has changed is lost all
00:36:35
you have to you have to read about all the things that touch that public a. p. i. and outlines are the public a. p. i.
00:36:43
in this year less format right so that that one's richer really truly handled um
00:36:49
i have a cash key it indicates that i depend on this outline and not the whole compilation and so
00:36:55
i don't need to recompile unless the public members of changed so that that that
00:37:01
falls out automatically sub very compilation ah we can't do one out of twenty probably
00:37:07
but if you apply badgering and basically partition the the sets of
00:37:11
files as i was talking about with with matching inanimate badgering um
00:37:16
you can only you can you can compile just the partitions that of change
00:37:21
just maybe a ten partitions of of two hundred files um
00:37:25
your incremental recompile will take a maximum some other time um
00:37:32
and you can also parallels it so that's exciting um
00:37:37
yeah so we'll see whether that works out i'm more interested in talking with more folks in the community about that one and
00:37:42
uh it's it's sort of post shipping agree with um
00:37:47
so the rest of the trajectory i think i talked about the trajectory
00:37:52
for bills this applies everywhere um and the reason i think it applies everywhere
00:37:57
is that um these are a lot of open source a
00:38:02
code bases um compile plans are gonna continue to go up
00:38:07
i think that all build tools should support remote execution
00:38:10
um and i think that all compiler should support extra lining
00:38:15
as defined before right you a compiler age it basically it should be on the compiler writer
00:38:20
to have an a. p. i. not even it yeah it's eli interface that says please outline
00:38:28
i still don't fully can pilot and gimme gimme back a file that i can use um
00:38:33
it should it should just be the defector standard um and open
00:38:38
source user should be able to bring their own remote execution credentials so
00:38:42
if you come along to a large project um you you have an account
00:38:48
with somebody or you have a hosted posted remote execution cluster but probably
00:38:52
only get with somebody because it's the year of the cloud um but
00:38:57
uh that open source project may not trust you enough to give you
00:39:00
access to their cluster machines but you could bring your own why not
00:39:05
um maybe they give you read only access to their cash and you populate with right surround cash right um
00:39:12
and also i think all organisations should transparently
00:39:15
share or remote in cluster rather than necessarily having
00:39:19
a dedicated a boxes so the idea that every much every you know one of your your
00:39:24
developers has a laptop and then a dedicated a box or even a cluster of ian's um
00:39:31
that will not get you the mother performance as possible with remote it and there's there's some lower bound
00:39:36
depending on how efficient remote thing is and we'll get there but maybe if you're an organisation of ten people
00:39:42
um that's a cluster of ten machines that when you are
00:39:46
not compiling something somebody else could be using talk about something
00:39:49
right um and so you basically get better performance beyond some threshold of it doesn't it and it will be pretty well
00:39:58
so we think that everyone can sort of hope or the community for that future um
00:40:03
yeah and there are multiple open source implementations of the remote station clients um but there's definitely more
00:40:10
is a very basic uh at a high level it's a very basic it yeah i
00:40:15
um it takes some time to implement um but there are reference implementations in java unrest and
00:40:20
i think i thought as well um and it should be integrated with with basically all bottles
00:40:26
right we should have transparent support for doing this if i bring my crowds my credentials along
00:40:31
um s. p. d. maven bleep male furious or additional additional probably
00:40:35
haven't additions and then the servers exist i'm also the the google
00:40:43
or be easy services the thing so even if you didn't want to host your own cluster um
00:40:49
you could do that uh and i do hope that somebody will take
00:40:54
one or the other of them open source implementations and has that um
00:40:58
uh to get help give goggles and becomes what a little um and then also there are multiple nascent
00:41:06
implementations of ellen purcell i sort of reference a few of them um earlier uh we think that polishing
00:41:14
and external rising importantly like making this the thing that the compiler in that's
00:41:19
somewhere to disk um is an important thing if you if you also have a way to use it in memory that's fine
00:41:26
but making sure that you admit it um is this croaker for this model
00:41:30
um for the shared nothing sort of model um yeah so
00:41:35
we'd like to basically make external outlining for para wasn't unchanged action
00:41:39
a defector standard um there are a whole bunch of language communities in which is are the standard um
00:41:46
so in conclusion uh i did relatively little going on
00:41:50
this i'm i'm incredibly proud of everyone who's been involved um
00:41:55
this is all that you get all that i'm doing yielding any uh nora
00:42:01
and when a particular uh andrew i wanna call out as having spent a huge amount of time on this
00:42:07
um the rest the team has been coming up to speed on all their efforts and uh it's it's it's awesome
00:42:14
on that note we're hiring and we have seven positions
00:42:18
within engineering effectiveness um which is tutors a developer productivity dean
00:42:24
i'm in a set in london and the one that used a guess as well so
00:42:28
nearby um and we will likely e. e. making a push to improve i. d.s upper later this year
00:42:36
all off a of models fame is joining us in london which is super awesome and we're very excited about that
00:42:43
um no promises that we will we will switch the models but it's it's looking pretty shiny um
00:42:51
yeah and uh we have tons more hiring within the platform or get water
00:42:55
we invest in this to him to make the lives of all of our tutor developers awesome
00:43:02
and we write open source code to help the community in general i'm in right
00:43:10
thank you
00:43:19
question all ones too cliche like when you go
00:43:29
nobody e. g.
00:43:34
l. o. b. j. s. u. k.
00:43:43
so you said uh oh thanks actual execution was
00:43:49
execution including from using that guy i h. uh_huh um
00:44:00
so i didn't put i didn't but the number on the slides but uh without any overhead simulations of
00:44:06
this can be something like sixteen times as fast okay yet um there are huge number of different overheads
00:44:12
um and we basically started optimising in the last like
00:44:15
two weeks um so it the at the timing was not
00:44:23
completely ideal because i think uh at the previous friday
00:44:27
the system began working and and so this is just
00:44:30
extremely nascent um the simulation so significantly better speed ups um
00:44:37
the grenade of image aspect uh at is probably not the largest overhead right now we have a whole bunch more
00:44:45
yeah
00:44:49
question
00:44:53
well i have a huh uh do have spun sony open so singular when it becomes more major maybe
00:45:04
um so you all of the components include they're already open source
00:45:10
her hands um the dimension well uh well pants is it's been open
00:45:16
source for six years r. c. is open source um for two years years
00:45:22
skewed is open source but they're they're not really pushing to get a community right now
00:45:27
um there are a few different other uh open source servers that are pushing for
00:45:32
communities on but but in reality there's nothing stopping people from doing these days uh uh
00:45:37
it's really serious when i think uh when i said that i think that getting
00:45:41
a lot more people involved would be great in the in the whole general concept overloading
00:45:47
possibly a good change there's just uh i was going to make
00:45:51
you message uh it didn't dejected fishy j. bungee everything reading anything rescued
00:45:56
enhance uh and i see i didn't think it yeah to all the contents happens as you can you can you know to
00:46:02
do and i will make sure that it's hypnosis usable right yeah
00:46:06
we're we're definitely happy happy to answer questions i didn't mention um
00:46:10
we have all the head of the uh and uh daniel and
00:46:12
any and when here and they all contributed now so thank you
00:46:16
uh you know some questions as well so you have a having public talks to
00:46:21
me it's if you g. just globally as slow slow things down my champagne experimentation
00:46:27
using one video probably it's full oh i see uh_huh but how do you
00:46:33
in full sewing show you develop is project was appropriately as you had
00:46:38
written and you keep public a. p. i. disease yeah the diesel be
00:46:42
public right um the convention is definitely will will definitely sort of grow
00:46:48
out of how it feels to do this i think i expect that um
00:46:54
being forced to add a return type on something that's public will be a good thing
00:46:59
in terms of forcing you to consider whether you needed to
00:47:01
be public right it will probably lead to more private things um
00:47:08
and so probably by default means you will need richard abide evil
00:47:12
you need the private key word in more places you should of trading off the private key word versus the the return but
00:47:19
um we had never the the slow down uh by adding return types so that will
00:47:25
we will so i i that s. g. globally in shame the yankee to make things actually
00:47:29
slightly slow yeah interesting okay but interestingly my
00:47:33
talk tomorrow on talk about doing this doing
00:47:36
protestations genetically as well also which is yeah so i think we'll see i think you were
00:47:45
maybe that's question
00:47:51
there is no more question and you can go to lunch and you
00:47:59
well yeah i and we have that sheet is for people or
00:48:04
yeah from in business when working and so she was it isn't

Share this talk: 


Conference Program

Welcome!
June 11, 2019 · 5:03 p.m.
1574 views
A Tour of Scala 3
Martin Odersky, Professor EPFL, Co-founder Lightbend
June 11, 2019 · 5:15 p.m.
8337 views
A story of unification: from Apache Spark to MLflow
Reynold Xin, Databricks
June 12, 2019 · 9:15 a.m.
1267 views
In Types We Trust
Bill Venners, Artima, Inc
June 12, 2019 · 10:15 a.m.
1569 views
Creating Native iOS and Android Apps in Scala without tears
Zahari Dichev, Bullet.io
June 12, 2019 · 10:16 a.m.
2232 views
Techniques for Teaching Scala
Noel Welsh, Inner Product and Underscore
June 12, 2019 · 10:17 a.m.
1296 views
Future-proofing Scala: the TASTY intermediate representation
Guillaume Martres, student at EPFL
June 12, 2019 · 10:18 a.m.
1157 views
Metals: rich code editing for Scala in VS Code, Vim, Emacs and beyond
Ólafur Páll Geirsson, Scala Center
June 12, 2019 · 11:15 a.m.
4695 views
Akka Streams to the Extreme
Heiko Seeberger, independent consultant
June 12, 2019 · 11:16 a.m.
1552 views
Scala First: Lessons from 3 student generations
Bjorn Regnell, Lund Univ., Sweden.
June 12, 2019 · 11:17 a.m.
577 views
Cellular Automata: How to become an artist with a few lines
Maciej Gorywoda, Wire, Berlin
June 12, 2019 · 11:18 a.m.
386 views
Why Netflix ❤'s Scala for Machine Learning
Jeremy Smith & Aish, Netflix
June 12, 2019 · 12:15 p.m.
5026 views
Massively Parallel Distributed Scala Compilation... And You!
Stu Hood, Twitter
June 12, 2019 · 12:16 p.m.
958 views
Polymorphism in Scala
Petra Bierleutgeb
June 12, 2019 · 12:17 p.m.
1113 views
sbt core concepts
Eugene Yokota, Scala Team at Lightbend
June 12, 2019 · 12:18 p.m.
1655 views
Double your performance: Scala's missing optimizing compiler
Li Haoyi, author Ammonite, Mill, FastParse, uPickle, and many more.
June 12, 2019 · 2:30 p.m.
837 views
Making Our Future Better
Viktor Klang, Lightbend
June 12, 2019 · 2:31 p.m.
1682 views
Testing in the postapocalyptic future
Daniel Westheide, INNOQ
June 12, 2019 · 2:32 p.m.
498 views
Context Buddy: the tool that knows your code better than you
Krzysztof Romanowski, sphere.it conference
June 12, 2019 · 2:33 p.m.
394 views
The Shape(less) of Type Class Derivation in Scala 3
Miles Sabin, Underscore Consulting
June 12, 2019 · 3:30 p.m.
2321 views
Refactor all the things!
Daniela Sfregola, organizer of the London Scala User Group meetup
June 12, 2019 · 3:31 p.m.
514 views
Integrating Developer Experiences - Build Server Protocol
Justin Kaeser, IntelliJ Scala
June 12, 2019 · 3:32 p.m.
551 views
Managing an Akka Cluster on Kubernetes
Markus Jura, MOIA
June 12, 2019 · 3:33 p.m.
735 views
Serverless Scala - Functions as SuperDuperMicroServices
Josh Suereth, Donna Malayeri & James Ward, Author of Scala In Depth; Google ; Google
June 12, 2019 · 4:45 p.m.
936 views
How are we going to migrate to Scala 3.0, aka Dotty?
Lukas Rytz, Lightbend
June 12, 2019 · 4:46 p.m.
709 views
Concurrent programming in 2019: Akka, Monix or ZIO?
Adam Warski, co-founders of SoftwareMill
June 12, 2019 · 4:47 p.m.
1974 views
ScalaJS and Typescript: an unlikely romance
Jeremy Hughes, Lightbend
June 12, 2019 · 4:48 p.m.
1377 views
Pure Functional Database Programming‚ without JDBC
Rob Norris
June 12, 2019 · 5:45 p.m.
6374 views
Why you need to be reviewing open source code
Gris Cuevas Zambrano & Holden Karau, Google Cloud;
June 12, 2019 · 5:46 p.m.
484 views
Develop seamless web services with Mu
Oli Makhasoeva, 47 Degrees
June 12, 2019 · 5:47 p.m.
785 views
Implementing the Scala 2.13 collections
Stefan Zeiger, Lightbend
June 12, 2019 · 5:48 p.m.
810 views
Introduction to day 2
June 13, 2019 · 9:10 a.m.
250 views
Sustaining open source digital infrastructure
Bogdan Vasilescu, Assistant Professor at Carnegie Mellon University's School of Computer Science, USA
June 13, 2019 · 9:16 a.m.
374 views
Building a Better Scala Community
Kelley Robinson, Developer Evangelist at Twilio
June 13, 2019 · 10:15 a.m.
245 views
Run Scala Faster with GraalVM on any Platform
Vojin Jovanovic, Oracle
June 13, 2019 · 10:16 a.m.
1340 views
ScalaClean - full program static analysis at scale
Rory Graves
June 13, 2019 · 10:17 a.m.
463 views
Flare & Lantern: Accelerators for Spark and Deep Learning
Tiark Rompf, Assistant Professor at Purdue University
June 13, 2019 · 10:18 a.m.
380 views
Metaprogramming in Dotty
Nicolas Stucki, Ph.D. student at LAMP
June 13, 2019 · 11:15 a.m.
1250 views
Fast, Simple Concurrency with Scala Native
Richard Whaling, data engineer based in Chicago
June 13, 2019 · 11:16 a.m.
624 views
Pick your number type with Spire
Denis Rosset, postdoctoral researcher at Perimeter Institute
June 13, 2019 · 11:17 a.m.
245 views
Scala.js and WebAssembly, a tale of the dangers of the sea
Sébastien Doeraene, Executive director of the Scala Center
June 13, 2019 · 11:18 a.m.
661 views
Performance tuning Twitter services with Graal and ML
Chris Thalinger, Twitter
June 13, 2019 · 12:15 p.m.
2003 views
Supporting the Scala Ecosystem: Stories from the Line
Justin Pihony, Lightbend
June 13, 2019 · 12:16 p.m.
163 views
Compiling to preserve our privacy
Manohar Jonnalagedda and Jakob Odersky, Inpher
June 13, 2019 · 12:17 p.m.
301 views
Building Scala with Bazel
Natan Silnitsky, wix.com
June 13, 2019 · 12:18 p.m.
565 views
244 views
Asynchronous streams in direct style with and without macros
Philipp Haller, KTH Royal Institute of Technology in Stockholm
June 13, 2019 · 3:45 p.m.
304 views
Interactive Computing with Jupyter and Almond
Sören Brunk, USU Software AG
June 13, 2019 · 3:46 p.m.
681 views
Scala best practices I wish someone'd told me about
Nicolas Rinaudo, CTO of Besedo
June 13, 2019 · 3:47 p.m.
2704 views
High performance Privacy By Design using Matryoshka & Spark
Wiem Zine El Abidine and Olivier Girardot, Scala Backend Developer at MOIA / co-founder of Lateral Thoughts
June 13, 2019 · 3:48 p.m.
754 views
Immutable Sequential Maps – Keeping order while hashed
Odd Möller
June 13, 2019 · 4:45 p.m.
277 views
All the fancy things flexible dependency management can do
Alexandre Archambault, engineer at the Scala Center
June 13, 2019 · 4:46 p.m.
389 views
ScalaWebTest - integration testing made easy
Dani Rey, Unic AG
June 13, 2019 · 4:47 p.m.
468 views
Mellite: An Integrated Development Environment for Sound
Hanns Holger Rutz, Institute of Electronic Music and Acoustics (IEM), Graz
June 13, 2019 · 4:48 p.m.
213 views
Closing panel
Panel
June 13, 2019 · 5:54 p.m.
400 views