Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:00
okay you can never of this i assume you can hear me i can hear my own voice so that probably means yes okay
00:00:06
good after then everyone my name's exchange one no director of machine
00:00:10
learning research an influx and i'm jeremy i work with facial recognition system
00:00:16
so today we want to talk to you about why relapse colourful doing machine learning an influx
00:00:22
um but first i want to review as well judy secret or machine learning research
00:00:27
now you might think of machine learning research is as something like this
00:00:32
we're working and very high complex environment but we're trying scientists
00:00:36
are working with persuasion but the reality is more like this if
00:00:42
a lot of what we do is requires a lot of uttered of work a lot of trial and error
00:00:47
ah referent trying things really fast and unfortunately that results in a lot
00:00:52
of spaghetti code and a lot of mace and their engineering sit ups
00:00:58
so what is this the case
00:01:01
so there's a number of things that make engineering quite challenging for machine learning
00:01:07
or not a particular to just machine the a bonsai that they're very prevalence
00:01:12
um so number one all called out as the numbers have meaning machine learning
00:01:17
we're working with mathematical objects so when i say numbers i mean things like
00:01:23
that they operate unless into my but the types that we typically use out in it
00:01:31
for languages that we use ah these don't have really in a meeting
00:01:35
they have things like primitive so the floats on doubles and things like that
00:01:40
so i guess the canonical example of this is if we're working on that
00:01:44
particular uh the the main such a small space uh when you add to
00:01:50
what numbers to get the unlock space that does not equal just the sum of the two locks that's different things
00:01:56
so this is a number that some huge f. with some um semantics that
00:02:02
things to get lost with the somewhat simple types we use of just prototypes
00:02:07
the second one i pull out is that mathematics is obviously a very rich to mine
00:02:12
with a lot of structure and arguably the whole point of mathematics is that we can make
00:02:17
hi i have a full generalisations about happens between things in the structure of things
00:02:22
but when we express isn't coloured especially fast coloured that gets completely lost
00:02:29
so are those of you that but i i think i'm talking to the right audience
00:02:33
here when i say a talk about abstract algebra and you know the beauty all that and
00:02:38
some of the high end of obstructions in his sights you can gain um between um things um
00:02:46
that is all present the mathematics but what it does tend to get lost once we're operating in colour
00:02:52
so the reason is that we have to write really
00:02:55
fast code ah so we are always trading off performance for
00:03:01
ah maintained ability and it creates a huge yes so um this was something i randomly copy and
00:03:08
pasted um from the numeric algebra library if this is what a lot of the code that we right
00:03:13
what do you look like ah that's very procedural uh it's
00:03:19
very hard to read um but it is fast
00:03:25
and then the full so last fall colours that and that's either this is really the did you secure the machine learning that we
00:03:30
actually spend eighty percent of the time just blowing things they get that a lot of what we do is not too late invader
00:03:37
find the new data sources new signals tried apart women through the algorithms we spend a lot of
00:03:42
time doing this and it's really great on water that box so i think when you think would be
00:03:49
i'd say that will be yes elsewhere without the machine learning we tend to think
00:03:53
of ourselves riding the numerical code the the deep within model uh but really that
00:03:58
is probably where we spend just a tiny hole fraction of a time and a
00:04:01
lot of what's on this job yell persist around everything that will plug into that system
00:04:07
so that kinda sits in saint but now let's talk about why we
00:04:11
think that's now actually helps with a lot of the sun by that's
00:04:15
you know fundamentally uh advantage of scour folder in
00:04:18
machine learning so germs got i guess are some examples
00:04:23
okay i'm in a kind of a stick close to my notes here because this is a terrifying number
00:04:27
of people um so uh a strategy when talk about is uh to make the type system work for you
00:04:34
and uh in our do that we're gonna try encoding some more meeting uh in the types that we use
00:04:40
uh we have two examples here uh uh the scheme
00:04:45
schema types uh is about uh assigning meaning to uh
00:04:50
two things are are normally not typed
00:04:53
um and reducing some boilerplate in the process
00:04:57
um and then we'll talk about uh uh this idea symbolic functions and one of these
00:05:02
we actually use a one of them is just a a a fun sort of uh
00:05:07
a side project and i'll let you guess uh which is which uh but both user made possible by implicit you just kind of
00:05:13
uh as engineers sorry look into the into the type system um so if you hate him plus it's this is probably
00:05:19
not a the talk for you um so the first thing
00:05:24
uh have that i'll talk about is is the schema types um
00:05:28
and the the motivation here is uh when we're working with data transformation
00:05:32
pipelines uh it would be really nice to have some type safety for example
00:05:36
uh i have some some function target scores it takes a data frame returns the data frame
00:05:41
well that's great but it says nothing about what the input needs to have an says nothing about with outputs gonna look like
00:05:47
um so that's just a recipe for exceptions uh when somebody you didn't write that function tries to use it
00:05:53
um now spark has these uh uh types data sets word uh
00:06:00
it has a a type parameter that's like a case class but we can't
00:06:05
be expected to encode every intermediate shape of our data into
00:06:09
a case class that would just be an explosion of case classes
00:06:12
and we also use a net flicks we have some
00:06:16
complex nested schemas uh uh raises trucks and so forth
00:06:20
uh where it would just be a unworkable to use case classes everywhere so
00:06:24
we have schema types as kind of the us relatively straight for middle ground
00:06:29
what we do is we just to find some columns that we
00:06:32
care about um and these are just sort of abstract bits of meaning
00:06:38
uh that that uh all it represents is an idea of what some column
00:06:43
means um to and then we'll we'll create a data set to rapid data from
00:06:49
this is not sparks data set um it's heartening things are uh but uh
00:06:56
we're just gonna give it a phantom type that describes what it's clean it is
00:06:59
um and just gonna wrap data frame and and has this validate method
00:07:04
uh which will check at run time that the column that you
00:07:07
said is there is there and and uh do some other checks
00:07:11
at run time and then at at compile time it's gonna actually track
00:07:15
the type we have few overloads or that i didn't show them all but
00:07:18
uh uh you you could use your imagination there so here's an example without looks like um
00:07:24
so uh if i wrap a data frame in
00:07:29
this data set then i can validate these columns and
00:07:33
say what the physical column name is now we checked the run time at compile time i have some nice
00:07:38
uh uh type information i can carry around a with that with a data set um and
00:07:45
rather than encoding a bunch of case classes of every possible shape
00:07:49
of data i could just sort of mix and match ad hoc here
00:07:52
um uh to to represent what's actually in that in that data frame
00:07:57
well also have this rappers store some information about what's been validated um that w.'s later on um
00:08:06
so now the funk this function can be a little more descriptive we could say uh i mean i needed is that with country
00:08:11
in profile id in score and then i'll give you back whatever
00:08:15
you gave me but i'll add some aggregate score to it for example
00:08:19
um and that that intersection type does all the compiler
00:08:23
all the work at compile time radio even the cheapest for this and i i mean i
00:08:28
love shapeless uh but it's it's not even necessary here because uh if you pass in something that
00:08:34
has there that's missing one of the requirements then the compiler will just tell you that's not a that's not
00:08:41
gonna work uh but you can also pass and something that has more than needed and that'll be just fine
00:08:48
and you can have some uh uh additional information in those those uh
00:08:53
sort of column ideas also we could have some logic for example to validate the data type and spark
00:08:58
we can have an uh another stellar type that says maybe what our domain
00:09:02
level data type is um and maybe some logic to uh to move between those
00:09:08
and then are you guys will be able to use that information to uh to take that one validate call uh
00:09:16
and just do more useful work with it right so there's no magic here we still have to to to
00:09:22
explicitly validate what we think is is there in the data set uh we're not uh
00:09:27
hitting a data warehouse at compile time or anything to to a magically infer things but
00:09:33
uh rather than trying to do magic with a uh we're just gonna try to do more work from that one call
00:09:42
and we also of remove the hard coded physical call names that were in that function before
00:09:48
uh any could just ask that they said what's the physical column that that represents the profile id
00:09:55
because uh that was provided if you remember when
00:09:57
we validated that that schema user provided that physical problem
00:10:04
and for functions like this we actually have a data type for them so it's a colour transform function it has to
00:10:10
type parameters for input schema that's required in output schema uh
00:10:14
that will be the result and we have the the uh compositional
00:10:20
methods that you'd expect uh for example and then it takes another transform function
00:10:25
that uh as its own input and output schema and uh uh gives you back another
00:10:31
transform function that's uh but the composition of them and we have this this type last year they'll
00:10:37
notice in the composition so this is a dependent type class uh that computes what the what the resulting
00:10:44
input and output scheme are so it's not a straightforward is just a mash them together uh uh
00:10:50
and uh like here's an example that so if i have one that takes an a a b. and output to see and i have
00:10:57
another that requires a b. and a. c. a. d. and outputs indian have
00:11:01
um then you you can notice that took the resulting composition
00:11:06
doesn't require c. even though the second one acquires the and the reason is because the first
00:11:11
one produces that um so we have a a type class that will compute that at compile time
00:11:19
uh and we can also uh in for some some transforms um
00:11:25
so if you use the adapted method instead of the apply method
00:11:30
um then we can actually go out and search for a more transforms that we need
00:11:35
to get some of the columns that you might be missing um and this helps us
00:11:39
uh reduce a lot boilerplate blue but like keep up small modular pieces
00:11:46
um without making the user uh sort of copy and paste a whole bunch of things
00:11:51
that they use every time 'cause the compiler can kind of do that copy pasting for you
00:11:55
no word something like this this is all very simplified i know it's a lot of also code um but i'm
00:12:00
just gonna try to to uh get through everything and maybe we'll have time at and uh for for questions um
00:12:07
so uh if we just have some type last column provider it could say uh that
00:12:15
this column b. uh can be provided given a problem say in c. we can
00:12:20
derive b. m. also drive you at the same time um and uh and then
00:12:26
if i call adapted then uh i don't even have to think about where
00:12:29
that that became from um and we actually do is recursive least so um
00:12:36
uh if if there's a bunch of levels of of requirements and and uh
00:12:41
produced columns uh that the implicit mechanism is colour uh is perfect for resolving
00:12:49
a resolving those things are personally um and that just produces
00:12:53
a lot of boilerplate uh uh when you're composing large uh pipelines
00:12:58
uh data transforms and of course if they can't satisfy all requirements
00:13:03
then you can s. compiler also nice compiler but it's a compiler
00:13:08
um and uh uh i was an extra compositional flavours uh for example you could have
00:13:14
a stake transform function where also emits a value in addition to the the transformed data set
00:13:20
um and that can lead to some really nice expressions of uh
00:13:25
large transformation pipelines um and this is just sort of a contrived example
00:13:29
of that i could say for some gonna stratified and that's going to
00:13:34
uh transform the data set and also give me some stratification stratification statistics
00:13:39
uh at now generate features or go ahead and train model and that's going to
00:13:44
give me back the train model and also maybe score the that
00:13:48
the data set so that's it the transformation on the data set
00:13:52
and then i'll do some ranking and then all run some test metrics uh
00:13:57
uh which will just be back up an idea of how good that model it's um
00:14:05
and then i can output all those of the and uh and then i would have a one composed transform
00:14:11
function that does all these things and outputs those those uh interesting is the data that and um so that that
00:14:20
it's it's a pretty simple idea um but it's it's
00:14:25
very helpful for uh you know tracking
00:14:30
a tracking the types and schemas throughout the entire data transformation pipeline without
00:14:36
introducing a lot a bird in uh on the person is using it so
00:14:39
if we can take this in rapid at the nice e. p. i. a and make it kind of seamless uh then uh it works pretty well
00:14:48
alright so this is the second thing i want to talk about is uh is this idea of symbolic functions
00:14:55
uh and and the motivating thing here is uh we like probably
00:15:00
morph polymorphic functions for for math rates so here's an example spire
00:15:05
um i could define logistic function like this and uh i just need an implicit trade
00:15:11
which is uh the thing that lets me do um a ex uh exponential and uh
00:15:21
and it actually also has to be field but you can only fit so much
00:15:24
on the slide um and this is great works on anything with the trig algebra
00:15:29
but it doesn't read that well and they're also performance implications here uh with regard to
00:15:34
boxing so you have to be real careful about uh when you use type parameters um
00:15:41
you you kind of have an explosion of specialised things uh or you could choose to have boxing code instead
00:15:48
um and we really can't afford that in in a tight group we're doing a lot of uh number crunching
00:15:53
um so uh this is a fun idea for dealing with
00:15:56
this and we're just gonna sort of bear with me um i
00:16:02
i i mentioned this idea to to martin at a stall exchange
00:16:06
and he looked a little bit horrified uh and i think that's probably a common reaction but
00:16:11
bear with me i think it's i think it's fun uh at the end of the day
00:16:14
um so we'll start with an expression d. s. l. in this is pretty probably pretty familiar uh idea
00:16:20
uh but the the difference i noticed is that the these methods don't return
00:16:27
uh expression d. s. l. they return a more specific things
00:16:31
and so we're actually gonna build up a a recursive type
00:16:35
instead of uh of forgetting about anything that's happening and we have a
00:16:40
you know case classes for each of the operations that we're gonna support and we have like an argument
00:16:44
a case classes well it's gonna represent some abstract argument um
00:16:49
and uh there there's two overloads of each of these uh methods and the reason is uh if we
00:16:56
pass a constant to one of these operators were gonna remember the the constant
00:17:01
literal type of that constant as well and here's a oh and then uh
00:17:07
once we have that we're we're will create this symbolic function one uh
00:17:12
in that has a a type parameter that's a a higher kind type parameter
00:17:18
um and if you apply it with some type thing will just get that
00:17:23
thing filled in to the expression so it doesn't seem all that useful
00:17:26
right now the bear bear with me um and the constructor for that uh
00:17:30
uh it looks a little gnarly but i'm really just pass the uh function from
00:17:35
an argument to something and then we're gonna figure out with that's something means with
00:17:38
this uh apply type class right here um and all that does is we're cursed
00:17:42
through and find our zero uh in that type and then make a hole there instead
00:17:47
um so it's kinda like at expanding that that that type out um
00:17:53
and uh here here's what that looks like so uh now to define logistic
00:17:58
as about function one all say x. two one over one plus either my sex
00:18:05
and and here's what that type looks like uh so it's kind of an interesting
00:18:10
type um but what it means is given some t. this function computes
00:18:17
the literal type one plus uh exponentially asian of or a exponential of
00:18:24
literal minus one times that t. to the power of barrel minus one
00:18:30
um and since that's such an early type of muscle
00:18:33
a pretty printer for that with a fan unit code uh
00:18:38
super scripts and stuff so uh so the entire a. s. t. the function is is encoded into the type
00:18:45
um which might seem completely insane but really what
00:18:48
better uh type for function then exactly everything it does
00:18:52
right i mean this this is the poorest function you can get you literally cannot perform side effects here uh
00:19:01
nelson is we can do some fun things with implicit so let's take a
00:19:03
look uh at what that could be so maybe we could teach the compiler algebra
00:19:10
so i'll just create a a a a type class called simplify a dependent type class
00:19:15
um given expression it will apply bunch of rules that were
00:19:19
defined down here and here's some examples of them uh for example
00:19:25
a plus zero as a a times wanna say a time zero zero
00:19:30
a the one is a it is your was one et cetera et cetera uh and really could have as many of these as you want um
00:19:39
and uh we'll go back and add this to our constructor for symbolic function one so we're gonna
00:19:46
we're gonna take that uh results type from the function you passed in first run simplify it and then we'll apply that
00:19:52
uh and make this the the function out of out of the result of that any of that looks like
00:19:57
um so if i pass it x. to x. times one plus zero it
00:20:01
just it back x. x. which is good so it simplifies well the trivial stuff
00:20:05
um and it's really only limited by how many rules are willing to right right and how
00:20:10
how how long you're willing to wait for the compiler but there's actually a relatively fast um
00:20:16
so can we work relaxed maybe will teach the the compiler some calculus all making other
00:20:22
uh the pen type function derivative it'll take an expression indexing will give us the the
00:20:27
derivative of the expression with with respect to x. and it's just again just the rules that you learned a in grade school
00:20:34
a node or derivative of x. with respect act as one derivative any
00:20:39
constant zero you know the some of the derivative of a plus b. is
00:20:44
you're the papers rubies and so forth uh you know we'll
00:20:49
have other things here like our our trig functions in log in
00:20:52
exponential and the chain rule and so forth um and it's all pretty straightforward um
00:21:01
uh but we actually need a couple macros here um i tried uh
00:21:04
i really try to reduce the macros as much as possible but uh
00:21:09
for some of the stuff we need them um and this is just a simple uh uh type class
00:21:14
that's going to take to entice or literal into
00:21:17
types and it to everybody uh is familiar with
00:21:22
uh they're all types now remembered mentioning the keynote yesterday uh but for example the number one has the tight
00:21:29
ends but also as a type one um and so we're we're gonna to we're gonna use those a lot um
00:21:36
uh and so this type last we'll just uh tell us
00:21:40
if we have to literal ins what's the difference between them
00:21:45
um and here's how you uh uh i didn't actually show that a macro implementation but i think this kind of thing will
00:21:50
be easier uh in daddy um and so using that we could we could implement the power lawyers with that would look like
00:21:57
um yeah the derivative x. to the p. if p. is some integer is
00:22:04
uh just a few times x. to the p. minus one is what that means
00:22:08
um okay so put this together uh we can
00:22:13
uh but it derivative method on the symbolic function
00:22:17
and uh what's gonna do is it's going to first
00:22:21
apply are higher kind is type parameter there with
00:22:24
art zero and again that's the literal type zero um
00:22:28
which which are just mean like the first argument uh assuming we wanna generalises to functions
00:22:34
of more than one argument so first we're gonna apply it with uh with that first argument
00:22:39
um there i think that the derivative of that resulting expression with respect
00:22:43
to the to the argument uh and put in this type parameter d.
00:22:49
i have never gonna simplify the result of that and put the simplified
00:22:53
version in this type record yes and then finally we're gonna expand that
00:22:57
t. s. and uh get back another symbolic function that's the derivative of of
00:23:02
the one created okay so here's what that looks like an a. works um uh
00:23:09
define my logistic function here and then i asked for the
00:23:11
derivative and there is a figured out um you'll notice that uh
00:23:17
i have two negatives there be multiplied and they didn't cancel out
00:23:20
because i've forgot to write a a simplification role for that um
00:23:25
uh so i should go back and do that um but i did find the derivative in it's correct
00:23:31
um but am i gonna use this thing right it's it's just uh
00:23:36
uh expression tree uh at the type level if i if i have the symbolic
00:23:41
function value it's apply doesn't do anything except a oops it's apply doesn't do anything except
00:23:49
give me back another expression tree um which is great 'cause then i can call symbolic functions from within other
00:23:54
symbolic functions and uh and have the whole thing sort of uh in lined uh uh in the result type um
00:24:02
but i also probably wanna be able to actually use these things without a evaluating a bunch
00:24:06
of layers uh of of objects um so that's where we have five comes in are gonna turn
00:24:14
the the symbolic function in the back into sort of real code uh with some more help from little types
00:24:21
i mean a couple more macros here um i i'm
00:24:24
not gonna go into to their implementation but i'll just a
00:24:28
name those type classes uncanny use your imagination as to what they do and it's it's all pretty straightforward and i
00:24:35
i will post the the p. o. c. code uh uh later today if you wanna see how it all works
00:24:41
um but yours kind with that looks like so we have a dependent type class again called
00:24:45
redefine it takes expression in some t. that we want to pass a pass into it and uh
00:24:54
but the instances of that are are pretty straight is straight forward
00:24:57
so uh the first one here are that's just saying that um uh
00:25:02
the first our guard zero is gonna expand to be x.
00:25:07
zero in code and the constant is just it's gonna be itself
00:25:11
uh in code and uh to multiply two things together
00:25:15
if there uh uh the reason for any val numeric there
00:25:19
is to say that this is for primitive a numeric um 'cause i know all those have a have a times operator
00:25:27
uh so i can just a motion together with the time up times operator between i got valid code for that expression
00:25:34
um so that's kind of like our our algebra uh but all it does is with this
00:25:39
that i can really five uh an expression over
00:25:42
given type uh so for every expression operator that we
00:25:47
wanna use we have to have a real fire instance for for any given type that we wanna use
00:25:53
um but we can make that be different for different types uh uh which could be interesting here's an example of what
00:25:58
looks like if i really fight this expression over doubles i
00:26:02
get this code uh which looks like pretty much the uh
00:26:07
you know pretty much the quickest way you could probably do that in in in style
00:26:15
okay and then we're gonna like we need one more macro to take that code and and you
00:26:20
can guess what that macros gonna do is gonna take the code is going actually emit a a function
00:26:25
it's colour that in lines echoed in the function and
00:26:29
the result is gonna be specialised function over doubles that uh
00:26:34
does whatever was in in that it's about function i'm again i'm not gonna uh
00:26:39
put up yet another wall of code for that macro implementation but it's
00:26:42
pretty it's pretty straightforward and here's kind of or looks like all put together
00:26:48
um so this is a rep position and this is to got thirteen so i can have a
00:26:52
of the rules there's like really logistic function i fans derivative and then i'm gonna say
00:26:57
to double operator which uh which will revive that into specialised java you till function uh
00:27:04
a double operator and then i um i spit out the december that's we can see that yes indeed
00:27:10
a specialised uh does pretty much what you expected to do uh and it should be a relatively uh performed
00:27:19
okay so uh just as a reminder this all happened at compile time uh
00:27:25
in in contrast with with some stay there's other staging mechanisms kind of like
00:27:29
this that are partially at compile time partially around time so this was all
00:27:33
at compile time uh that we took that's about function we fans derivative and then
00:27:39
we turn the derivative into specialised function over doubles or compile time and again i
00:27:44
if i implemented instances that i could uh use that for other types as well
00:27:50
um and we can fairly trivially extend that to symbolic functions of more than one argument so that's why we
00:27:56
had that uh int type parameter uh in the ark uh expression d. s. l. was so that we could uh
00:28:04
you know have our zero or one or two et cetera um uh
00:28:08
it's a little more work but we could also extends to functions over vectors
00:28:12
um the the calculus is a little more complicated but uh it might be worth it to get a
00:28:19
you know specialist gradient computations material wise at compile time
00:28:23
uh i think that we pretty cool anyway um and it also gives us
00:28:26
an opportunity to optimise based on what the type is that we're working with
00:28:30
uh so you could imagine if we had a matrix type
00:28:33
for example um are are really five instances for that could
00:28:37
a look ahead into the expression tree to figure out hey
00:28:41
what's the optimal series of blast operations uh to to do this
00:28:45
computation um so but there's a lot of opportunities there and uh
00:28:50
i i is this the zero cost abstraction i mean not exactly
00:28:55
uh there there was that uh objecting around the represented expression tree but it turns out you don't actually
00:29:00
even need that value uh all you need is this type as long as the compiler as the type
00:29:06
uh you can material as the functional if in their basically um
00:29:10
yeah so so it's kind of fun and i will post the code of this uh
00:29:14
to this u. r. l. um later i haven't any yeah 'cause i was frantically uh
00:29:19
scrambling to to finish presentation uh but i promise to to do that it's a little
00:29:24
bit rough but uh you know it's fine uh and and they take away that i won
00:29:29
oh this is that uh people think about what problems
00:29:34
you could solve by moving more information in the type system
00:29:38
uh because uh a scholar program is really to program so there's a program that happens at compile
00:29:43
time and the result of that program is a program that happens at run time and the more
00:29:48
a computation you can do one second pile time the less computation
00:29:51
you have to do many times around time um it's so uh
00:29:57
go out and uh you know a rather take away is whatever crazy use case you have that uh
00:30:06
that it's colour wasn't even thinking about what it was designed um
00:30:11
you can you can leverage the type system to uh to make some of those things are reality
00:30:18
um and then wrap it in i. c. p. i. n. it'll blend seamlessly with all your other uh stole components
00:30:24
uh so that's that's why low that's call it gives you the power
00:30:27
to to to do these things um uh it's not always pretty but uh
00:30:34
you know the powers there and uh i think that's pretty unique to discover
00:30:41
alright so i'm just a little a peek at some
00:30:45
of the things that we're looking forward to install three um
00:30:50
uh that is also the problems that h. h. mentioned uh so opaque types
00:30:55
uh those are gonna be huge for us because we can give more meaning
00:31:00
uh uh two or data without any performance penalty
00:31:05
um so that's that's going to be huge for us i i mean i don't know i don't care what martin's
00:31:10
a whole set that's gonna be a a again change i think
00:31:15
um and uh uh there's also some other interesting features that uh
00:31:21
you know maybe other people have touched on the uh principal structured types uh you know maybe that will will
00:31:28
end up being a better way to do that kind of a ad hoc schemas that we that we were uh
00:31:34
trying to accomplish with those phantom types earlier on so that that could be
00:31:37
interesting um the the matter programming story looks a lot cleaner uh it's got three
00:31:43
um so a lot of the ugly macros that i do use 'em for uh
00:31:48
for my uh hair brained scheme earlier uh
00:31:52
will look much nicer a much more comprehensible uh installer three and uh the the
00:32:01
cleaner dependent types a meeting that you know and plus
00:32:05
it can refer to the the output of the previous implicit
00:32:10
i'm a little there's throw away a bunch of those extra type parameters
00:32:14
that we had hanging around so if remember uh i had like a
00:32:18
you know an hour in the d. n. a. d. s. and each in plus it's uh have subsequently filling in one of those
00:32:24
um you don't really need those anymore a ticket because of that uh machinery so i'm looking forward to that
00:32:32
it's just more ways the type system can work for you which means more reasons to less colour
00:32:38
um i'll turn it back over to h. now and i'll talk about some of the things that uh
00:32:44
our on our wish list yeah this is very much a wish list
00:32:47
and we did some brainstorming about trying to answer the question well scales
00:32:51
so great for machine learning why don't you see a lot of the
00:32:54
shaman research is that scientists using it uh especially candid eyes and and
00:33:00
really a lot of it came down to uh some t. told those missing sort of ah
00:33:07
we changed really really put together wish list of things that we
00:33:10
we should build and i think that really helped adoption machine the community
00:33:15
so um one of the big ones is a great noble compartment
00:33:18
and scholar a slow as researchers we spend a lot of time
00:33:22
i'm working in the picture and tools like that um to do this kind
00:33:28
of visual we pull type environment is just a very natural way for us to
00:33:34
do our work ah because a lot of what we're trying is is very
00:33:37
interested in nature of we need to visualise it to the um there there's some
00:33:43
projects loving worked on at the moment um i think there's some really interesting things with me tools being looked at
00:33:49
with been combined with jupiter so that's promising and there's some other projects
00:33:54
there are on the rise as well on the spice but uh i think at the moment we just really look full to those nurturing
00:34:01
a a a simple one but absolutely required we need a good plot and
00:34:07
lighter if uh the needs to be a way to visualise what we do
00:34:11
uh in the python world the usenet plot lab uh i'd say that's a
00:34:15
pretty at the library other a. p. i. is pretty well full but it's comprehensive
00:34:21
uh and and scour the just isn't anything
00:34:24
that's got a great country insensitive visualisation so
00:34:30
um hum plus an interesting one so when i reflect back about why why has python
00:34:38
been such a successful machine learning uh one of the things that a cage was was that
00:34:44
none pie so if you're not familiar with compliant
00:34:46
imply is a multi dimensional right library um it
00:34:54
came out early in python and uh became the de facto standard so if
00:34:57
you want to have a a re with multiple bit multiple dimensions of later um
00:35:03
that was just to go to troll and because of that became the liquefied crap off
00:35:08
a day the science and machine learning uh and they gave us the way that we could
00:35:14
uh in python have many libraries speaking the same language that the same types essentially
00:35:19
uh and then the java j. v. m. scallop community uh
00:35:23
we we still listen this the one true multi do you
00:35:28
dimensional re why pray that we can all knowing how we communicate in the same way that you can rely on that type
00:35:35
ah so i think that that's a blessing that there is things like breeze
00:35:39
um but there has performance some performance implications and last time i checked that wasn't actively developed
00:35:45
um these numerous java libraries that wrap
00:35:49
a low little little primitives liked last
00:35:52
on they are also very good but uh but when he's just extremely frightening to
00:35:56
than the spice so it makes it hard to publish a library will you
00:36:00
have a you have an a. p. i. that require as a multi dimensional right
00:36:05
ah and the last one as it is the big good story
00:36:08
around reading c. plus plus um that's ah unfortunately we can't get
00:36:13
away from c. plus plus um maybe in the future we will
00:36:17
be able to that at the moment is still just so much stuff
00:36:20
written in c. plus plus that we need to read it needs to be a good story
00:36:24
fair doing that easily there's some great stuff on the horizons with that um we're looking for
00:36:29
the grail v. i'm i'm still got some question marks are either it's really don't assaults the
00:36:34
whole problem there that looks really interesting uh there's also number of chips and progress with java
00:36:40
uh that tried which is this a kind of read o. j. ally from the ground up to make it
00:36:45
more compatible more easy to pass data back and falls
00:36:50
uh after the j. v. m. into the native um substrate but that's still
00:36:54
on the rise and and it's the town friends sullivan nolan and in the is
00:37:00
now we have to support that to so it that's still a missing piece of the puzzle
00:37:05
but i'd say the largest but this this thing is you guys um
00:37:11
winnie the scout community to take an interest in machine learning and
00:37:16
excited about building some of these tools in tackling some of these problems um
00:37:22
on a firm believer that that some of the engineering tone working in
00:37:26
the scout to me uh some of the best engineering talent available um so
00:37:32
i think if i can do anything ah i wanna get you excited about machine learning ah and about
00:37:38
data science ah i did you excited about building some of these tools and tackling some of these challenges
00:37:44
and if i can do that that's really successful maybe i'm sorry
00:37:50
yeah on that i think what we and our i don't know if we have time for questions until we
00:37:55
actually my rest so quickly through um i still is that we don't have several minutes for questions yeah it's
00:38:06
oh
00:38:20
oh
00:38:23
yeah
00:38:31
it's in the list of things that you mentioned we need you dimensional so they use the
00:38:38
the frame you to your room would you view those because you have this special replacement what you
00:38:44
don't really you would you think this box is good enough or do you think we need something uh
00:38:49
um i i do sucks but does a good job
00:38:52
um so i guess the python world people use pandas um
00:38:57
i think spock so there is one thing the second sparta i think the
00:39:02
data from a. p. i. and the tool around that is actually really good
00:39:05
and spock ah it's very intuitive and very easy to pick up ah but i
00:39:09
wish they had a uh the distraction the that made you could also use a locally
00:39:14
owned later ah i didn't have to be distributed about that the slide that at the
00:39:20
top of my list because i do think spot does a good job but okay parents
00:39:36
ah ah hugh designed to solution to
00:39:42
mm innocence checks schemas up compile time for data frames
00:39:47
have you ever tried frame last and what are you reason
00:39:52
i have my own reasons not to use that assets
00:39:56
but what are your reasons not to use that assets
00:40:02
um that's a question uh yes i'm i'm intimately
00:40:05
familiar with frame less um i i think uh
00:40:11
there's a there's a little bit more machinery but framers requires um
00:40:19
and a lot of the consumers are a. p. i. r. uh
00:40:26
are not stall experts and when they when
00:40:30
they'll see something uh that requires all these uh
00:40:35
uh dozens of implicit uh arguments to to prove things about the about the the code
00:40:42
um it they can be a little and intimidating i think that's one of the reasons would we don't
00:40:49
uh really use for unless i think it's a really
00:40:51
cool project um but uh there there's some other shortcomings
00:40:57
that we just need like a little bit more of a relaxed middle ground i think um
00:41:02
and uh the the this thing that we've came up came up with a solves are pretty well
00:41:08
because it's a flexible enough to uh give us the safety that
00:41:14
we need and and a little extra information uh from that but uh
00:41:20
but still have a escape hatches sort of um uh for when things uh
00:41:28
when the user needs to say trust me you know we have to stay patches for that to make
00:41:32
sense i i think that ah yeah so the question is what about data sits so the the spark
00:41:49
is that a. p. i. is we referring to um yeah that that uh
00:41:55
but i kinda covered the the motivation around not using that but it's uh
00:42:00
it would just be an explosion of of case classes to dress a every
00:42:05
combination of of columns that might appear at some point inner inner transformation pipeline
00:42:11
yeah a lot of the work flow of machine learning as we iteratively buildup
00:42:16
the schema of the time um so it's not like it's transforming from age to be
00:42:22
a it's a much broken the gradual parses is it still the malls between the two
00:42:29
i'm just just record question um i saw you called you like read a boat country are
00:42:35
i just want warm weather part grandpa calls um i
00:42:40
a naming things is hard uh but i guess the idea was um
00:42:45
that uh you know is it a programmer is a type uh
00:42:49
uh maybe at some point at some level what's the difference um and i
00:42:54
the the reason i named it after after a boat yard was the uh
00:43:00
you know a a sort of parallel to the idea of are we
00:43:03
a universe or re simulation or and maybe at some point what's the difference
00:43:10
it's the
00:43:15
oh you just predicament you're built and i'm want to stay
00:43:19
now um could you talk more about how your experience very fine
00:43:24
but it's correct and departing uh so when it comes to
00:43:29
macros or or implicit sir or which has the whole design um
00:43:35
well in in plus it's a will are are pretty trustworthy um
00:43:43
if it's not gonna be corrected probably won't compile a within plus it's um
00:43:49
uh assuming that you've defined everything correctly uh in the rules that you made
00:43:54
um so so that's less of a worry is is that if you get the wrong results then you can usually a
00:44:00
poke three rules and and figure out why i'm macros are
00:44:04
uh are kind of a difficult thing to to uh to
00:44:11
become an expert in because the tape you guys are are
00:44:15
not very well documented and uh there's a lot of internal
00:44:21
works there you just have to encounter before you kind of
00:44:26
a linear way round in to begin to getting around and uh
00:44:31
one thing that helps a lot i think is in intelligent you can actually uh connect to in the room uh in
00:44:38
s. p. t. session where your code is running and then actually put break points inside the macro and see what's happening
00:44:45
um that really helped me at least they're in the air and kind of
00:44:48
the the ropes of the mac creepy guy but uh fortunately that's all going away
00:44:53
and the macro system install threes completely redesigned and and looks a lot cleaner and
00:44:59
as more parity with actual code um so it should be better story around that the dancer question it
00:45:09
oh oh oh sorry ah like buildings types ah depends too likely ah
00:45:20
i use a little the full details to clones using
00:45:23
tax that means having it most monoclonal each tactile aflame
00:45:28
ah but the mikes thing it's it's natural to have
00:45:30
several calls off one type that sometimes using joints for instance
00:45:35
right um and that that's why what the the columns are
00:45:39
the the columns that we were expressing in our our schema type
00:45:43
are are more about uh the meaning of the column rather than like a data type so those were
00:45:49
data types those were like have a column that represents a the profile id or the or the video
00:45:56
or uh the features or a et cetera and uh it's it's
00:46:02
not as common to have two columns that mean the same thing
00:46:05
um that it or have different values um so uh yeah
00:46:13
i mean that means that you if you did have that
00:46:15
which and you could just create a new idea that you know if i have two columns that are both profile i. d.'s
00:46:22
what would we shouldn't mean and make a more refined concept about what about
00:46:25
what that meaning is so that it was about encoding little bits of meaning
00:46:29
uh into the type system that does answer question or not really okay well we'll talk about it afterwards i had the grace
00:46:41
i'm here pablo here okay ah last summer i meant with my
00:46:45
colleague who is working at the time during the department lay net flicks
00:46:49
and now we were talking about vertical condo development and in that regard
00:46:53
of that um i have a question what is the very next step extent
00:46:58
uh combining machine learning as ca admits takes ah i'm talking about more perspective more less
00:47:03
technology more kind of strategic if i may ask this question of course ah yeah sure um
00:47:10
oh yeah i i wouldn't say we have anything kind of those grandiose is a strategy uh i
00:47:16
would say if that were quite happy with what scour provides us um we use a mixture alls
00:47:21
a python for all of the modern work because the
00:47:24
ecosystems a rich there um but scale of everything else around
00:47:28
that um and we're quite happy with that combinations so it's
00:47:34
yeah i wouldn't say we have kind of have a strategic
00:47:37
uh initiative or anything to try and change that
00:47:41
as a person uh oh i'm a big fan
00:47:45
of scale also i would love to see it get richer and more top to than this area um
00:47:52
but it i'd say oh if it's us a little bit more technical as
00:47:56
you come across problems we adjustments all of them and put resources behind them
00:48:07
hello oh um i was curious about your
00:48:11
symbolic type so if i understood correctly that
00:48:15
that was kind of an interesting for experiment but also fall
00:48:19
like lose valuable because well was difficult to express your great ones
00:48:24
and then you know you can use uh i guess like
00:48:27
a library to me like the machine learning aspect easier so
00:48:31
like how far did you take it orally why did you not trying to take
00:48:35
it further okay um so all spoil it and say that one was the research project
00:48:42
i'm not the one that we actually use 'em in case anybody didn't guess that um
00:48:47
but uh so why why didn't i take it further um i mean
00:48:56
it's uh they're they're so in in versus colour prior to to thirteen uh
00:49:03
it was very difficult to express some of the things that that that kind of relies on
00:49:08
um so it would it would i had implemented it before and it was just kinda ugly
00:49:14
um and it was intel to thirteen king around that i really kind of thing that little bit i
00:49:18
guess that was the cleaned up version that that was a displaying if you can believe that um but uh
00:49:25
yeah why not take it further uh i don't think there's any good reason like
00:49:29
i kind of think it's a a cooling useful thing um that uh you know maybe
00:49:35
it's not maybe it's still a little bit strange to to do like version of spark
00:49:42
co gen but in the type system um and it's probably not the ideal way to
00:49:48
to do that at compile time but uh you know the the reason i haven't done
00:49:53
more with that i think is a 'cause i'm not sure anybody would use something like
00:50:00
that in real life i guess is honest answer i can i can kind of give
00:50:05
you my into that is uh someone that would be more the user and all that
00:50:09
um so i think they get would be um supports the g. p. use and that
00:50:16
kind of extra level of if it and work that needs to go and still about
00:50:22
really highly optimising it for 'em computation or g. p. use or
00:50:27
victor as code all that type of thing um so what we mostly useful or
00:50:33
uh this with the gradient type work at the moment as things like
00:50:37
cans of will apply to watch uh i think is languages there actually
00:50:41
uh have a soviet differences uh that in the
00:50:46
language should be up to express these things nicely but
00:50:50
there's been big investments on button people written highly optimised on the line jeep you and c. plus plus
00:50:56
code to make it really fast and uh if that's got then uh i think starting from style uh
00:51:02
be much be the place to start from because we can do the things jeremy
00:51:06
um showed us but still that needs to be that work
00:51:09
of someone to go with an really write that performance optimised code
00:51:13
and i think another reason is that uh that obviously i mean
00:51:18
both of those things that i i show while on type inference
00:51:22
like you wouldn't want somebody having to type out one of those
00:51:24
big a schema types or type god forbid type out that the type
00:51:29
of that function that's about function um and uh i kindling intelligent 'cause they have a default setting
00:51:36
that hates uh type inference anywhere use it it says that your code is wrong or it says that uh uh
00:51:44
you know your style is wrong or or whatever so um that's a perception that kind of blocks interesting things
00:51:53
hi on the side so using scholar to
00:51:57
implementing tail pipe once you you also use
00:52:01
a scholar to evaluate your model in life online systems
00:52:07
in general do you think it makes sense to strive to reuse
00:52:11
some code between two worlds like off line training on online evaluation
00:52:18
right yeah i'll take the first ever that ah yes absolutely in fact outside that that's one of
00:52:22
the big advantages uh for uh something it of using scalar also l. so that that's looks is that
00:52:30
i we mostly the things that we partner with uh mostly run java production
00:52:35
um we have a a very huge genius environment and the backing off
00:52:40
line though in terms of modelling really with model and that sort of
00:52:44
anything goes it depends on the slightest so we need a way to
00:52:47
be able to share these off like components and uh with online teens
00:52:52
uh and that's why we that was what was the initial motivations for us about
00:52:57
really getting into scour was that we felt that if we could express a lot of the
00:53:03
ah fiji engineering the work that we do um through scout was
00:53:07
a. p. i. that would be a a much more expressive but me
00:53:11
it would be something that we could i have run off line but the in the plate in the jar and the online
00:53:18
java systems bubbles around the same code that we're using off
00:53:21
line to produce the models and to do the research work from
00:53:29
that's that's a robot okay okay well after for the command chair

Share this talk: 


Conference Program

Welcome!
June 11, 2019 · 5:03 p.m.
1574 views
A Tour of Scala 3
Martin Odersky, Professor EPFL, Co-founder Lightbend
June 11, 2019 · 5:15 p.m.
8337 views
A story of unification: from Apache Spark to MLflow
Reynold Xin, Databricks
June 12, 2019 · 9:15 a.m.
1267 views
In Types We Trust
Bill Venners, Artima, Inc
June 12, 2019 · 10:15 a.m.
1569 views
Creating Native iOS and Android Apps in Scala without tears
Zahari Dichev, Bullet.io
June 12, 2019 · 10:16 a.m.
2232 views
Techniques for Teaching Scala
Noel Welsh, Inner Product and Underscore
June 12, 2019 · 10:17 a.m.
1296 views
Future-proofing Scala: the TASTY intermediate representation
Guillaume Martres, student at EPFL
June 12, 2019 · 10:18 a.m.
1157 views
Metals: rich code editing for Scala in VS Code, Vim, Emacs and beyond
Ólafur Páll Geirsson, Scala Center
June 12, 2019 · 11:15 a.m.
4695 views
Akka Streams to the Extreme
Heiko Seeberger, independent consultant
June 12, 2019 · 11:16 a.m.
1552 views
Scala First: Lessons from 3 student generations
Bjorn Regnell, Lund Univ., Sweden.
June 12, 2019 · 11:17 a.m.
577 views
Cellular Automata: How to become an artist with a few lines
Maciej Gorywoda, Wire, Berlin
June 12, 2019 · 11:18 a.m.
386 views
Why Netflix ❤'s Scala for Machine Learning
Jeremy Smith & Aish, Netflix
June 12, 2019 · 12:15 p.m.
5026 views
Massively Parallel Distributed Scala Compilation... And You!
Stu Hood, Twitter
June 12, 2019 · 12:16 p.m.
958 views
Polymorphism in Scala
Petra Bierleutgeb
June 12, 2019 · 12:17 p.m.
1113 views
sbt core concepts
Eugene Yokota, Scala Team at Lightbend
June 12, 2019 · 12:18 p.m.
1655 views
Double your performance: Scala's missing optimizing compiler
Li Haoyi, author Ammonite, Mill, FastParse, uPickle, and many more.
June 12, 2019 · 2:30 p.m.
837 views
Making Our Future Better
Viktor Klang, Lightbend
June 12, 2019 · 2:31 p.m.
1682 views
Testing in the postapocalyptic future
Daniel Westheide, INNOQ
June 12, 2019 · 2:32 p.m.
498 views
Context Buddy: the tool that knows your code better than you
Krzysztof Romanowski, sphere.it conference
June 12, 2019 · 2:33 p.m.
394 views
The Shape(less) of Type Class Derivation in Scala 3
Miles Sabin, Underscore Consulting
June 12, 2019 · 3:30 p.m.
2321 views
Refactor all the things!
Daniela Sfregola, organizer of the London Scala User Group meetup
June 12, 2019 · 3:31 p.m.
514 views
Integrating Developer Experiences - Build Server Protocol
Justin Kaeser, IntelliJ Scala
June 12, 2019 · 3:32 p.m.
551 views
Managing an Akka Cluster on Kubernetes
Markus Jura, MOIA
June 12, 2019 · 3:33 p.m.
735 views
Serverless Scala - Functions as SuperDuperMicroServices
Josh Suereth, Donna Malayeri & James Ward, Author of Scala In Depth; Google ; Google
June 12, 2019 · 4:45 p.m.
936 views
How are we going to migrate to Scala 3.0, aka Dotty?
Lukas Rytz, Lightbend
June 12, 2019 · 4:46 p.m.
709 views
Concurrent programming in 2019: Akka, Monix or ZIO?
Adam Warski, co-founders of SoftwareMill
June 12, 2019 · 4:47 p.m.
1974 views
ScalaJS and Typescript: an unlikely romance
Jeremy Hughes, Lightbend
June 12, 2019 · 4:48 p.m.
1377 views
Pure Functional Database Programming‚ without JDBC
Rob Norris
June 12, 2019 · 5:45 p.m.
6374 views
Why you need to be reviewing open source code
Gris Cuevas Zambrano & Holden Karau, Google Cloud;
June 12, 2019 · 5:46 p.m.
484 views
Develop seamless web services with Mu
Oli Makhasoeva, 47 Degrees
June 12, 2019 · 5:47 p.m.
785 views
Implementing the Scala 2.13 collections
Stefan Zeiger, Lightbend
June 12, 2019 · 5:48 p.m.
811 views
Introduction to day 2
June 13, 2019 · 9:10 a.m.
250 views
Sustaining open source digital infrastructure
Bogdan Vasilescu, Assistant Professor at Carnegie Mellon University's School of Computer Science, USA
June 13, 2019 · 9:16 a.m.
374 views
Building a Better Scala Community
Kelley Robinson, Developer Evangelist at Twilio
June 13, 2019 · 10:15 a.m.
245 views
Run Scala Faster with GraalVM on any Platform
Vojin Jovanovic, Oracle
June 13, 2019 · 10:16 a.m.
1340 views
ScalaClean - full program static analysis at scale
Rory Graves
June 13, 2019 · 10:17 a.m.
463 views
Flare & Lantern: Accelerators for Spark and Deep Learning
Tiark Rompf, Assistant Professor at Purdue University
June 13, 2019 · 10:18 a.m.
380 views
Metaprogramming in Dotty
Nicolas Stucki, Ph.D. student at LAMP
June 13, 2019 · 11:15 a.m.
1250 views
Fast, Simple Concurrency with Scala Native
Richard Whaling, data engineer based in Chicago
June 13, 2019 · 11:16 a.m.
624 views
Pick your number type with Spire
Denis Rosset, postdoctoral researcher at Perimeter Institute
June 13, 2019 · 11:17 a.m.
245 views
Scala.js and WebAssembly, a tale of the dangers of the sea
Sébastien Doeraene, Executive director of the Scala Center
June 13, 2019 · 11:18 a.m.
661 views
Performance tuning Twitter services with Graal and ML
Chris Thalinger, Twitter
June 13, 2019 · 12:15 p.m.
2003 views
Supporting the Scala Ecosystem: Stories from the Line
Justin Pihony, Lightbend
June 13, 2019 · 12:16 p.m.
163 views
Compiling to preserve our privacy
Manohar Jonnalagedda and Jakob Odersky, Inpher
June 13, 2019 · 12:17 p.m.
301 views
Building Scala with Bazel
Natan Silnitsky, wix.com
June 13, 2019 · 12:18 p.m.
565 views
244 views
Asynchronous streams in direct style with and without macros
Philipp Haller, KTH Royal Institute of Technology in Stockholm
June 13, 2019 · 3:45 p.m.
304 views
Interactive Computing with Jupyter and Almond
Sören Brunk, USU Software AG
June 13, 2019 · 3:46 p.m.
681 views
Scala best practices I wish someone'd told me about
Nicolas Rinaudo, CTO of Besedo
June 13, 2019 · 3:47 p.m.
2705 views
High performance Privacy By Design using Matryoshka & Spark
Wiem Zine El Abidine and Olivier Girardot, Scala Backend Developer at MOIA / co-founder of Lateral Thoughts
June 13, 2019 · 3:48 p.m.
754 views
Immutable Sequential Maps – Keeping order while hashed
Odd Möller
June 13, 2019 · 4:45 p.m.
277 views
All the fancy things flexible dependency management can do
Alexandre Archambault, engineer at the Scala Center
June 13, 2019 · 4:46 p.m.
389 views
ScalaWebTest - integration testing made easy
Dani Rey, Unic AG
June 13, 2019 · 4:47 p.m.
468 views
Mellite: An Integrated Development Environment for Sound
Hanns Holger Rutz, Institute of Electronic Music and Acoustics (IEM), Graz
June 13, 2019 · 4:48 p.m.
213 views
Closing panel
Panel
June 13, 2019 · 5:54 p.m.
400 views

Recommended talks