Player is loading...

A story of unification: from Apache Spark to MLflow
Reynold Xin, Databricks

Wednesday, June 12, 2019 · 9:15 a.m. · 41m 46s

Apache Spark was created as a unified analytics engine in the context of "big data". The past few years have witnessed the meteoric rise of machine learning. Data scientists and engineers are now building sophisticated ML applications with tool sprawl. In this talk, we will discuss the challenges organizations face in this new world, and how we envision to tackle these challenges with two new open source projects: Delta and MLflow.

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.

00:00:00

alright thank you um it's really on it to be here during so the ten year anniversary of the salad days conference

00:00:07

i was brainstorming was martin i want to talk about and that offer hey i could talk

00:00:11

a lot more about how sparkles callously working together how grey and essential sell us the spark

00:00:16

and martin so wine talk about something else um the rest of the conference feel

00:00:20

debates callous stories and i conferences and text um so let's talk about something else

00:00:29

and

00:00:32

right

00:00:34

um so i'm gonna be talking to you about actually three different things one is all the for those of you then never been

00:00:40

to a spark summit conference and in your house parker started i'm not telling the story of how spark chewy was a first grade

00:00:47

and then a lot has changed since part was created so won't talk to you about

00:00:51

what has changed what other new problems we have seen them on our customers a smart users

00:00:56

and two of the open source project were created um in response to those palatine an awful

00:01:01

right i'm just a little more about myself also to my p. c. u. c. berkeley and lap

00:01:05

um the thing i'm most proud of is actually updated more code added in spark

00:01:10

um once you consider all the pork was combine um

00:01:17

they see the night my neck contribution reached zero um i pop a bottle of champagne migrate i made it

00:01:23

uh i'll be so here's the three things are we talking

00:01:26

about let's get started um sports also around ten years old

00:01:33

by this time on the start yes a verse of academic prototype around ten

00:01:37

years ago and the reason we started um was because of this guy that's there

00:01:42

and ten years ago last those the p. c. c. and also the

00:01:44

u. c. berkeley amp lap use doing machine learning abides by michael jordan

00:01:49

um the and last year back there and found out one thing on language than ethics price how

00:01:55

many of you remember what this thing was right about like a third of you put up your hands

00:02:00

um no thanks back then decided hey we can agree that

00:02:04

challenge by anonymous in all our data set movie rating data sets

00:02:08

by our users and we'll put this l. agree the public contests whoever can

00:02:12

come up with the best recommendation algorithm for movies we win the million dollars

00:02:18

a million dollars a lotta money for lester back then um last was making about two thousand dollars someone's out

00:02:24

of this type and unlike the p. c. c. and c. e. p. f. l. years making a lot more

00:02:29

um d. c. job on the con um the contest

00:02:34

but he still relies hey this is the first day that's that you had

00:02:37

to work with it was larger then demanded this space on this laptop back then

00:02:43

and you need to some solution to just yeah all the process all those user move i'm waiting data

00:02:48

and he looked around is not a lot of good solutions back then either work really well on the single nope doesn't work well in it

00:02:54

it should be the steady or they worked well in addition to the setting but it's very inefficient the pro time it's very slow to iterate

00:03:01

so that's the talk to the simon pay these are yeah we may have you know to

00:03:05

the original creators of spark it home but hey hey i think if you give me couple primitives

00:03:11

um i could actually levers that to pro time my machine learning algorithms for quickly

00:03:16

and one of the key with machine learning is you have to be doing a lot of experiments with

00:03:20

the rate of speech matters a lot more than the result you get any some particular step joint time

00:03:26

so much yeah sure role um i think he for show you my the first five would be and then they they decided hey there's a really work

00:03:32

um then we use a more proper language and then he actually picked up stuff so i think cells also one

00:03:38

of the first uh crap um actually spark was one of the first projects mckay i use got for um the

00:03:46

so about a week and they they're only six hundred lines go this the very very

00:03:50

first version of spark of course look very different from start um what's part is today

00:03:55

um but we call it the first thing i find other books and in in the sense that spark had to functionalities one is

00:04:03

there's a way to process data because a lot of um so the

00:04:06

lot of parts of the machine learning is about how to process unprepared

00:04:10

data to be ready for mentioning models to train on the second parsley

00:04:13

actually your spark to pull time machine learning algorithms especially dish but once

00:04:20

and it's kind of smart just for cost volley from there right

00:04:24

what happened to lester is usually what people care about so that actually the work of the net flicks price

00:04:30

from back then um and you can see the top two places that hide unless owes part of on somebody

00:04:40

put on some old in some of the twenty minutes later in the first place so they lost the million dollars

00:04:48

and here's a picture of the other team happy accepting the check

00:04:53

if sparse invented twenty minutes earlier last that would be the million dollar richer right

00:05:00

not a lot has changed we started the brakes a company twenty thirty

00:05:04

about six years ago i'm sorry working with a lot of reel to reel

00:05:08

will customers in production and which white used to answer on this that hey

00:05:11

what are people doing with data what are the biggest hundred people are facing

00:05:15

and there's compose options when a um when we started the bricks one and i think

00:05:20

a very important one was hey sparsely you about other things and and that's all you need

00:05:25

we can do machine or anything train the models for you you can do data

00:05:29

prep for you that's a woods gonna focus on that everything else let's forget about it

00:05:34

but as we were more more was customers um we start to realise hey sparring is pretty powerful the such a

00:05:41

lot of surrounding pieces hardcore too i said data engineering

00:05:44

and also softer ensuring the last race all by spark itself

00:05:48

it's as some advice a complete different paradigms part of human touch is

00:05:51

so it's not even appropriated is extends parker cover so the use cases

00:05:56

and one of the um misconception about machine learning in those industries when

00:06:00

you talk to a lot of um people haven't done it for a while

00:06:04

they would think eight machine earnings about if i go down all kinds of lower down high torch

00:06:09

and apply the latest neural net architecture and iffy some data into it and boom um i get great results

00:06:16

but the reality is there's actually a lot more to machine learning then just

00:06:21

mentioning itself and this is i think is probably the best illustration of that

00:06:25

am i taking out of a google nips paper in twenty fifteen or a hidden technical definition running systems

00:06:31

and what this chart shows is who was taken 'em saved

00:06:36

at an analysis of all their machine running applications within do

00:06:40

in each of the box we present specific module for machine learning applications and the size of the box

00:06:46

indicate it basically amount of code um this return for

00:06:49

those systems so the proxy for complexity of the components

00:06:54

as you guys see or maybe you can see in the middle

00:06:57

of it is the little black box this as machine learning code

00:07:01

and everywhere else is a lot of all the boxes configuration data collection

00:07:05

verification serving infrastructure resource management all that are not specific to machine earning

00:07:12

um but i think there needs to be done as matter of fact we have

00:07:16

to be real war machine learning applications you spend most of the time to this

00:07:21

and even more interesting a is people they really understand the black box

00:07:26

are differ according to scientists some shunning engineers the people that you understand

00:07:31

um the rest of the boxes are we call data engineers is off engineers and

00:07:36

this to press on that's used very different technologies they

00:07:40

often sit in different places in many and the prices

00:07:43

they don't we wouldn't have the same reporting chain one of my favourite question i go to customers asked them

00:07:48

could you know where i if i see it inside his ass and he would you

00:07:51

know where data in here step in building and often the responses no i occasionally talk about

00:07:57

and slack and attacked the use also very different which create a lot of challenges in buildings overall systems

00:08:05

so what we've been doing a lot of the breaks after realising that is

00:08:09

how do we bridge the gap between the different persona as how do we actually

00:08:12

um make all this work by enabling people to collaborate when you define them

00:08:17

right um so with that we're talking about two separate projects the first ones delta it's

00:08:24

about scalable reliable the lakes and the key to delta to focus on making your data ready

00:08:30

for analytic switch could include a lot of them are just doing motion and

00:08:34

a second parts an awful which focused more on the life cycle management of machinery

00:08:39

let's get started um i don't know how many you buy our data engineers here on star was extremely essential to date engineers

00:08:46

live on many then she starts lazy de facto programming language the

00:08:50

data engineers a lot of things to sparking calf kind other frameworks

00:08:54

um but one of the day is the change in a ah the industry for uh

00:08:59

the the uh in the past decade is the transition or the addition of the data lex

00:09:04

um in addition to just do all screwed data warehouses and the constable the lakes pretty simple

00:09:10

um you collect all over data everything you have structure on structures sensor data images

00:09:16

tables transactions locks and don't thing to this data lake which typical is that

00:09:20

you should be the file system or some objects stored cloud i guess three

00:09:25

um and then you can have or did earlier for

00:09:28

or downstream another excuse cases too fancy machine earning our

00:09:31

reasons onto the the signs on it and you're wordperfect and it's kind of the picture the industry is paid

00:09:38

right and what we started data breaks one of the assumption we went with space

00:09:44

let's focus on computer which is park and storage he likes

00:09:47

about storage i things us all pro we actually believe this picture

00:09:52

um until we started building um so did alex ourselves re release a few problems

00:09:59

and here is is actually mirrors the journey we had to go through um at the

00:10:03

bricks in building our internal day likes a lot about data packets so what they look like

00:10:09

what we want to do is we hello events a week like from for

00:10:12

services um they actually been in the world i think tens of terror by today

00:10:17

and then there's a few things want to do one is we want

00:10:20

we pour our metrics in no time to can see what exceptions ordering

00:10:25

um across all of our manage bar clusters want to see how

00:10:28

people are using this bar clusters how you were using sparky the eyes

00:10:32

um and so streaming on alex and the same time was on the dome all those into

00:10:37

the lake so we can do historic analysis for example we want our like word to decide hey

00:10:43

what um eighty i can we break in the future for say the next version of spark when dallas how often do people you'd

00:10:49

be on a guy isn't it got almost nobody uses is causing all okay she read remove it right and then we also have

00:10:57

so the um machine learning algorithms running to predict for example what

00:11:01

the behaviour of different uh customers and users in terms of usage

00:11:07

now the first thing we have the bureau is hey dispersed reality it's

00:11:10

alright um once we don't all the events in the coffee pot it's not

00:11:14

too difficult to actually get spark working you right so the streaming job and spark

00:11:19

you can active strumming another takes it right you get your dance coming in

00:11:22

a few seconds later and a show about a dash or somewhere is great

00:11:27

now the problem is hey but you also want to allied unstable those event

00:11:32

i'm and r. dash works can just look at the last maybe day it also needs to look at

00:11:38

him any case where present for example to see over um to the bore you to show historic trends

00:11:44

right so we apply this should neither the trickle land the architecture which many of you might have heard of

00:11:49

and what it does is a you but i think a a pipe wise you have one pipeline for real time either

00:11:54

one pipeline for batch which is basic off line historic data

00:11:59

you know every time you buy for k. is software engineering

00:12:03

your wrists architecture tech that as was correct this box because now you have to write everything twice

00:12:10

right but it's not too bad because partner in many ways the sparkly get allows you to express one

00:12:17

um set the program and you can one imposing a batch fashion in a streaming fashion and all introduce reconfigure

00:12:25

and then the other thing we do or say when to do a and

00:12:28

reporting but what the problem was actually streaming data is creates a lot of um

00:12:33

as you create as you have events coming in real time and you want to really tried on a

00:12:38

late is you start writing how the data very quickly to your data lakes in our case as three

00:12:44

and as part of that you create a lot of actually um it's this if you problem once more data

00:12:49

um small files agreeable small files and you turn down improve our jobs

00:12:53

the the time uh dominated by just mete data fashionable small files even just file this

00:12:59

and the other problem is as the number pipelines increase so that i'm showing you if you're just one simple

00:13:05

diagram but in we are the different date engineers trying

00:13:08

different pipelines sometimes collecting from different data sources we actually

00:13:13

um have one program writing data in a specific schema and not a program written three months later

00:13:18

maybe by different team they assume a slightly different schema writing the same source now i had pretty messy

00:13:25

and so we've added validation hey let's make sure we have a validation jobs um of course you have added

00:13:31

to both places right and the other bigger from is a a um we write software with bugs everybody does

00:13:40

um what happens if there's a actually some failure

00:13:45

um sometimes not even because of a box because the machine went down the middle processing

00:13:50

um the the ways the most of industry uh worked out to set a

00:13:55

tackle this is the partition their data in this joint chunks for example by date

00:14:00

and every time if there's a fee if our job when you have partially which analysis specific set of data

00:14:06

um for example for today the next time we want to fix it you override the entire day of data

00:14:13

now it is adding one more logic to the application

00:14:17

um and then you re process today's data and sometimes you found

00:14:20

about three days maybe later you replace all the three days day

00:14:25

and your processing and then the other one is hail vocational yeah i realise hey maybe one of my customer change and name

00:14:32

i don't really want to like how to actually update that reflect then all my records

00:14:37

um the standard processes that actually do a

00:14:41

also partition replacement so sometimes to partition the customers

00:14:45

um you push and they also by customers and then you pacing hard customers data sometimes you

00:14:49

can do that for example get too many customers uh you have to rewrite all over it

00:14:56

it's kind of very difficult and then the last lot he says hey i have downstream jobs also doing

00:15:00

maybe real time clearing on on the stay that everything now and now every time you have every process

00:15:10

the reprocessing required deleting solid existing data and if they happen to be one

00:15:15

of the job that's not really the data's being d. v. d. that job without

00:15:21

so now you have to even come up with this of scheduling ways of doing hey man section

00:15:25

maybe only one yeah three am san francisco time and you add a european office and restate everybody's good

00:15:33

right so one complexity and war relies we have this him

00:15:36

coded engineering teams by people five engineers a rock solid data engineers

00:15:40

really what it's all these not data problems resolving log issue the system problems um in concurrence it prompts

00:15:47

and that's because actually the underline they like doesn't provide them sufficient guarantees

00:15:52

and they are distracted by i think three or four or a

00:15:56

big things and you i mean the the particular little bit more

00:15:59

the first there's no item a c. d. um provided by down the line depicts the store system

00:16:05

just takes whatever uh it gets if you have a

00:16:09

partial right coming from a job that she fell the sources

00:16:13

and understand any of it because it's so so but it should be the file system doesn't have any higher level semantics

00:16:20

and we have actually partially written data is very difficult to trust the data system and the other was no quality enforcement

00:16:28

you could add these writing garbage you have data coming from priests like the difference schema

00:16:33

um and that's very difficult to actually uh make sure the downstream jobs are correct

00:16:39

and last is this knoll isolation which means when you have a right job is

00:16:44

the meeting data has to do with reprocessing the re job would just fell right

00:16:51

and maybe show you how real how widespread this promise um

00:16:57

i've also went through a lot of my email so that so a support ticket i've got it from our customers

00:17:02

and i've taken screen shots of them at the uh anonymous that um there's a lot of problems would be the next

00:17:09

display them these of tenure all mature technologies um one here's some examples

00:17:16

actually missed all the the frame loading comments block commanded operations when profile them

00:17:20

hey the crease selfish me buying like a one or two

00:17:23

seconds base spends like a minute listing files figuring out what three

00:17:29

they then there really is very difficult to get out of town exception um this is what

00:17:33

i was talking about we have a re job that's a reading wow right job the the data

00:17:38

um different views have conflicting schemas 'cause now you might have paid a program

00:17:43

between different times is jim different schema um and see a lot of this coming

00:17:49

too many small files one of the classic we've spent any time to read

00:17:53

in engineering um concatenate kate have to make small thousand concatenate them how do i

00:17:57

control them and all that everybody wants him as another at some point this

00:18:01

class of the issues i think take up half of engineering support to get one

00:18:07

so we started working on um after realising that hate have

00:18:11

also portuguese and have not very little to do with spark or

00:18:14

anything is really does not lie sources and doesn't quite provide you

00:18:18

the right guarantees under start a new open source project for delta

00:18:25

and before i tell you how the other works um ours um maybe explain

00:18:29

to you what the picture look like when you go from this to doubt

00:18:35

the pretty complicated but with delta

00:18:40

here's the picture we get without choose you have or a dance is coming in from friday

00:18:45

different sources you actually don't them into a single delta table and typically the first go the table

00:18:51

would be able we call brown stable um and they just think who's wall events

00:18:57

and then you incrementally refine that you create so the pipeline of

00:19:01

delta tables each of them you have some e. t. o. job

00:19:04

connecting them um to go from one to another and what we see it you might know actually get

00:19:09

for example exactly three you might have applied or twenty map up i live thirty and my by for k.

00:19:14

because you have different is large x. the way it works is the first

00:19:17

one is typically the role events there's like no parsing virtually no application logic

00:19:23

so it becomes a archive and is a widget into the retention because

00:19:27

it's how much you can fit energy should be the file system are three

00:19:31

and then you incrementally refine the quality of the data goes higher and higher as you go um

00:19:37

through this delta pipelines and there's some point you have some day that's completely ready for another six

00:19:43

um that's either during machine earning reviewing streaming we're doing a dashboard

00:19:51

and what does that provides is a few guarantees one is it has forecasted transactions

00:19:57

um so now you actually get out from the c. d. you get isolation

00:20:01

um you get basically simple style um transactions and those open source power by spark

00:20:08

um but maybe just walk through all of that so bronze table

00:20:13

as simple as possible to whether be really write the whole point of that is to store all of the raw data

00:20:23

because uh it um and then i mean to me the table sometimes maybe part we do some parsing for

00:20:31

example j.'s on parties and now you structured is on the different fields and you clean up some garbage you have

00:20:37

and often the biz level i re gaze people actually rolled up the data for

00:20:40

example you might be getting a a dance data and say microsecond they um twenty very

00:20:45

for the purpose of the and um use cases you don't quite fair about a microsecond around ideas to roll them up every second

00:20:52

or even every day sometimes read now you can should be that spark an oppressed

00:20:58

and one of the key here is hey looks okay now you have a line

00:21:03

and maybe you can buy for casey which we like what's the big deal here

00:21:06

well the thing is a if your biz logic changes let's say the way you

00:21:10

parse they tried changes likely because you realise hey you're getting daytime even formats now

00:21:16

um it's actually very easy to uh i re process although day that because all of them

00:21:21

are stored in the single delta table um with infinite potential go back as long as you want

00:21:27

and the only to do was right maybe a batch job to actually um

00:21:31

according to the right is right the sparkly objected one either batch of course that's trimming

00:21:36

and it should be the white dial your downstream tables like this you weren't

00:21:40

go tables and then just restart that's truly job from specific point in time

00:21:49

so it's pretty powerful it makes a lot of things very simple um it essentially

00:21:54

out there we rode out delta um on the customers that are start using it

00:21:58

we start listing very few of the uh uh uh tickets about that point how does it work right

00:22:03

where the technical conference people care about exactly how does work on that but it seem like well the matter

00:22:09

um it the way the other words actually pretty simple

00:22:13

we applies a lot of the oh screw database techniques and

00:22:16

would base into this new settings some tweaks and idea here's we have a rider had transaction out um for data

00:22:24

and if you look at any of the delta table the store um on the docket

00:22:28

use relies hey there's data files in the data files are typically partition by for example day

00:22:33

um what country and you have the data files is stored parquet format which the most common calmer format

00:22:39

for a data and then um you have the transaction walk in addition

00:22:43

to just storing a bunch of all files transaction log some label the

00:22:47

monotonically increasing it was a zero dodgy someone dodgy designs keep going up

00:22:52

and the table was spacey defined by a set of actions in the transaction and the different

00:22:57

actions here are uh it's gonna this of them you have adding a file removing a file

00:23:04

um and there's one other thing is a lot different maybe get into here

00:23:08

um but once you have actually for example really in all the transaction logs effectively have

00:23:14

the latest snapshot um of the table you know what other valid files in the table

00:23:22

and if you want to actually change it um at the table we just keep adding

00:23:28

then you are transaction log files to it for example the first version

00:23:32

of the first transaction log zero dodgy some isaiah adding first on second how

00:23:37

but let's say you rely save wonder parking into the park is too small

00:23:43

and reading them is not really fission that's actually coalescing combined i'm

00:23:47

i'm into single bigger file gadgetry in your transaction one dodges on

00:23:51

and what they describe to say that's remove one then to top arcade has read all parquet

00:23:56

which is just my riding um one it took to get right on three simple

00:24:02

00:24:05

no one thing to actually enforce uh and to give you actually transaction guarantees

00:24:10

me to agree on the ordering of changes so when there are multiple riders

00:24:15

when i'm audible riders for example the usual one by zero dark days on and you should to

00:24:20

rise one dodges on but then it was hard to write to dodges on one that has the fell

00:24:26

so in this case you should two races ensure when's user to reclaim the two basic version to

00:24:33

right but um in many cases if there's actually no conflict user one could just reach why

00:24:39

and the software does it automatically for the users are you shouldn't have to worry about it to write three okay so

00:24:45

that's how the actually solve conflict because for example to transactions my contact they might be for

00:24:50

example contacting the same to the file and write them out they match up the the duplication

00:24:55

um it's also pretty simple really what we do is a if somebody else commits before you for example in

00:25:02

this case you should to commits a after user want to win user tries to commit you realise the ones taken

00:25:09

well i need to do is our really in one dodgy signed on is that what

00:25:13

was returned and what was the the that any of the rise that of the um

00:25:17

a user once transaction didn't contact was my release and you should too

00:25:21

um i could just commit again with no actually filling the end users shop

00:25:26

but if there's a real conflict for example bowser did even the same file probably

00:25:30

means an actual application logic conflict nine to fell the job unusual to tell

00:25:34

the user hey somebody else have done something the context was uh what you're doing

00:25:39

so you can try you know up until this point his like sugar

00:25:44

simple if ever taken database internal class and know exactly what's happening here

00:25:49

um one of the big a change was big data is hey if you have streaming

00:25:53

job is coming and and every maybe have a second even hundred millisecond riding your transaction

00:25:59

you can have a lot of days on files that wouldn't be a problem if um

00:26:04

you just turn them at the the are too many faust from you too many men today about pro

00:26:09

right so we thought about this a lot and we decided hey

00:26:13

we actually have a very scalable engine to process large amount of data

00:26:18

and the matter they they're really gets to lodge why don't which we never did

00:26:21

exactly as they are just not method is not a special class um in the system

00:26:27

and what we do is every once in a while we actually take all the jays on files read

00:26:31

them and using spark is all one checkpoint them into parquet format so sick shimmy scalable higher throughput to read

00:26:38

and then in the future which is read that checkpoint directly from uh

00:26:42

using smart itself all the method they that becomes just normal data here

00:26:47

and this is how we could also for example process billions of files in a single table

00:26:52

you could have reggie hundred hello by tables art because all the metadata are no longer bottlenecks

00:27:01

um i don't have enough time to go into the different use cases of doubt that here um for the projects

00:27:06

basie one year old um it's been in production one euro i. database recently open source it about two months ago

00:27:12

um every months right now is promising agenda that the buyer data

00:27:16

are more diverse platform and um it's numbers actually increasing very quickly

00:27:21

um the thing is production ready it solves a lot of a bit edgy impromptu decided hey doesn't just that if

00:27:27

it is the brits customer's gonna create open source version of it to actually make sure uh it works for everybody

00:27:34

so they told us about base e. date engineering pieces of analytic same machinery

00:27:41

um the next one to talk about m. l. for which machine running life cycle management

00:27:48

and see if i were to take a more machine

00:27:51

learning view um of data pipelines uh or just machining pipelines

00:27:57

or the i'm sure new uh a charter breaking down to

00:28:00

different components but you're let's simplify little bit to just basically starts

00:28:04

um this is a typical process machine learning engineers the the scientists go through

00:28:09

what they do is impaired person you prepare data a lot of is done since park a lot

00:28:13

of is done on this these also delta and second is based on those data to be building models

00:28:20

and one of the things as i uh talked about earlier is

00:28:23

hey the thing about machine earnings you gotta be experimenting a lot

00:28:27

to get to the best result is not about the best result a particular snapshot in time it's about iterate iterations

00:28:34

do you have to be a lot of things ever to be doing a lot of experiments and then last but not least once you have something you actually

00:28:39

kind of happy with you have to deploy in production or you can just do

00:28:42

a model and say hey i'm done with it you gonna use them or somehow

00:28:47

and there's actually very disparate technologies throughout this entire start um the technologies uh not

00:28:52

really designed to work with each other and as i said earlier you also require

00:28:56

different persona asked different as engineers and scientists um and they need to be working

00:29:01

together but there's really no tool um before to make them a easier to work together

00:29:08

as well you know there's also needs to be away for standardisation across is a three steps

00:29:13

and this is why we actually started open source i'm all for projects

00:29:17

with three separate components tracking projects models wrap going to each one of them

00:29:24

so but before explain um m. l. flow that's available before and after picture

00:29:31

so here's how it very so the python maybe more low training

00:29:37

coca look like a lot of the the scientists use python um which by the way is also promise most

00:29:43

they'd engineers to use python um but they know someday in they disliked the different functions every turn it right

00:29:50

it's of a print debugging right on the say hey this is all type parameters for mentioning model in for my data

00:29:57

and a half years so the accuracy i get based on my test data set and once i'm done

00:30:03

with the l. down the model somewhere was python pickle um so i can actually use a layer different program

00:30:10

and so the result you actually get looking at the sender out um

00:30:15

and now is different questions here what if i change my

00:30:18

input data is doesn't describe mine today that describe so the parameters

00:30:22

a machine learning model is produced by communication cool parameters

00:30:28

and data when you data set changes you get different models

00:30:33

and what about you and so the other parameters now maybe i should

00:30:37

put in especially um and what if i i'll check out the weather

00:30:42

the library i depend on the upgraded and they fixed the bug which actually that were regression in my model

00:30:50

and over maybe a span of them on some change my program quite a bit

00:30:56

what happened when i got this lock file but it happened right p. some people

00:31:00

a lot them use excel specially is really go especially shocked a lot of us

00:31:04

and funny things happen i remember very summer to uh

00:31:09

so if the situation i have run into when i was in a college

00:31:12

taking physics labs um i remember working on the small team we're doing experiments

00:31:18

so i was start tracking experiments in such a excel and

00:31:22

of course and we you know my a colleague those uh experiments

00:31:26

so i send them one that they do you want

00:31:29

dot um x. l. s. d. to be three final

00:31:34

final for real this time final final final um a become a mess

00:31:41

and the other one is a now let's say you got them all yeah should be happy with

00:31:47

as the the scientists and now you want to uh at the point production

00:31:51

were designed this you don't really know all the production engineering study don't normal

00:31:55

s. l. is really need to know how to achieve that you know what

00:31:58

i couldn't eddie's or containers you as the the engineer production engineer for help

00:32:03

and then the so the conversation happens while i'm on our

00:32:06

customers eight years i something actually was like it then please deployed

00:32:10

and here's like something i train with sparkles i train with tens of all your

00:32:14

side it was our other productive enters i have no idea we're talking about um really

00:32:19

use writing job or start code and i'm really good at at the pouring those them

00:32:23

application server all those are python cycle learned spark stuff i don't know what they are

00:32:29

um and it's interesting wanna say i'd read about something interesting

00:32:34

on this uh archive could you also the points for me

00:32:39

um no that's a cool guy would m. l. flowing neighbour to write very simple program is the o. program

00:32:47

for now is that just printed you import m. l. follow it the ice and then you actually lock them

00:32:53

right all it does is it cause m. l. so wait yeah and therefore i just was them in the database

00:33:00

and when you watch m. l. for you i yeah actually give your from the tracking component or experiments

00:33:06

so here's just a very simple screen i'd visualisation let's say your chart all the experiments and

00:33:11

now she actually go in and look at a what this output for each of the experiment

00:33:17

what do i get you could even add those two it also

00:33:19

tracks you're cool with the get command and the also into groups delta

00:33:24

um to give it to one other functionality delta uh gives you a guy didn't talk about time travel

00:33:29

because we actually save the transaction logs you could go back to refer to any specific version of the data in the past

00:33:36

and by integrating that was m. l. flow we actually allow you to

00:33:40

base reproduce your model and to and by combining the all the data

00:33:44

um your code itself would be good to meet hash and

00:33:48

all that count as are tracked to now to reproduce everything

00:33:52

you can also come um compare the different ones of the model

00:33:55

um because that's what you usually want to visualise in a big experiment

00:34:01

um the other one is uh the uh project component

00:34:06

m. l. flow allows you a dessert offers a standard spec

00:34:10

the base allows you to find a project was the co dependency on the

00:34:13

configuration and then you can i should one the project in all different places

00:34:18

for example to run it locally was just them up for open source project or why on sparc

00:34:22

processor um and all those used to be a bit done is just a single line of command

00:34:28

and here's how project looks like a base have a demo file um and because a lot of design is a lot

00:34:34

of the machinery dependencies in python would base allowed to define

00:34:38

environment using honda and you can waste of your python programs there

00:34:43

and this becomes a standard container think of it almost like a may the

00:34:46

the specifically go for machine learning um and you guys are wanted pretty much everywhere

00:34:54

and the last piece of the ah m. l. flow is the aspect for models abbe c. for running

00:34:59

them and um eh what he does is really just a simple robbery devices than the a. p. i. for

00:35:06

here are different types of models and here's how i catch it execute them for example visit tens of all specific model

00:35:13

um i could just one it was tens of all itself into

00:35:15

the general model i 'cause runs and how our country python function

00:35:20

and you cash to pour for example arbitrary five python function directly

00:35:23

in spark itself so you can actually penalise or influence uh inference

00:35:27

across the cluster machines even if you train your model just using kinds of always sell one single though

00:35:36

so now with the standards back did production engineer all

00:35:41

the reading to understand there's hey um here's a um

00:35:48

here's a command line to run because i'm given an awful project

00:35:53

and i can just run it across a lot different i will environments i have

00:36:03

so the components tracking projects models they're really done to make the life of

00:36:08

data scientists and also all the supporting zap including production engineers the the engineers easy

00:36:13

um because we so tried on this then what i did

00:36:16

the sciences and all the supporting sad really struggling with everyday

00:36:21

um and how we can make their lives that animal flows about your old um

00:36:27

we actually open source it a year ago um was now spark summit in san francisco

00:36:32

and that's already gotten a hundred plus contributors and i just looked up a pipe by five

00:36:38

sometimes index last night and this actually at clubs more than half a million dollar their remotes

00:36:44

i'm not a lot of this doubts are probably say i see the pie prices keep catching them

00:36:48

um but no no less is pretty uh impressive for only a one euro project or it's

00:36:52

actually solving some real problems for uh uh they designed is that i was the historically overlook

00:36:58

because everybody focus so much on just how do i feel

00:37:02

the machine any piece of it without looking at the surrounding constructor

00:37:07

um just to wrap up the talk spar was created as a huge fan of extension really try to unify

00:37:16

big day than a either smart taken the approach of how do we below the computer part

00:37:21

um of this piece and there's more get more more with

00:37:24

different customers in user relies is a lot of other pieces

00:37:28

um even the ones we thought it was all prawns and not really soft and people are struggling with

00:37:33

and a lot of responsibility of this industry and also they breaks in

00:37:37

particular is we need rebuilding a understanding the problems users and customers wanting to

00:37:43

providing high level solution so they don't have to spend as much time fiddling with the infrastructure

00:37:48

and they can focus on there's domain specific problems this requires choose

00:37:52

the action unify a lot of this and this purpose on us

00:37:55

um require using to smash a unified different language

00:37:59

tax um and i'm sort of go from there

00:38:04

and the two specific project talk about one still like a

00:38:08

lot of isabel making data ready to support the downstream on ethics

00:38:13

and the second was m. l. flow and smell making managing the life cycle the machine name projects

00:38:19

right um that's all i have um i can't be here without

00:38:24

telling you hey we also hiring we have offices in answer them

00:38:27

san francisco also small one control also opened the remote opportunities and i think

00:38:32

i can take a little bit of question um with the rest of the time

00:38:52

okay if there's no question a means of explain everything perfectly so i

00:38:57

i think this one there

00:39:03

yeah well i um i had a question about totally

00:39:08

yep um so you mentioned that there's a sort of

00:39:13

i guess complexion step with regards to metadata files is that right yeah

00:39:17

um is there any sort of tilted contraction for the data files this is something that you need to

00:39:23

run in and out of band process or does it just cover come for for you when you use delta

00:39:28

yeah so um it's a very good question um we so that it's

00:39:33

not something that's done automatically for you right now although experimenting with it

00:39:37

um it's actually very easy to do compassion you just for example schedule regular job every three minutes

00:39:44

what did the job is literally one right it's it's spark read the file in it by the file

00:39:49

um and do in a transaction um we are experiment

00:39:53

thing was automatically doing that directly in the hoses of

00:39:56

this part of the charger is complexion is depending on

00:40:00

data volumes all them complexion jobs could be very expensive

00:40:04

um soaps all of our customers explicitly don't want us to automatic compact for them

00:40:09

because they worry about the cost it would happen if they don't control it themselves

00:40:13

um so it is something we actively looking at but um i

00:40:17

suspect you'll be more they he can opt in to do automatic complexion

00:40:21

but there's a way to uh or or maybe you can buy the votes on bunny

00:40:25

hopped out in the future just to cover difference about use cases i i i think i

00:40:40

if you show it to rip your question about your uh well uh of the questions they compare deli with ice break

00:40:52

um the yeah i think the lexus more production ready much

00:40:57

more that to try to probably solve some more problems in away

00:41:00

on the other like has been running in production for a long time and

00:41:03

also more uniquely um one of the big thing we're probably sounds actually support streaming

00:41:09

um and serve incremental computation which i don't think i spurred us um because that's coming

00:41:15

from a lot of are so real rate requirements i want to see data and more uh

00:41:19

a real time and that's one big difference but the underlying i would say

Share this talk:

Conference Program

11:11

Welcome!
June 11, 2019 · 5:03 p.m.

1574 views

56:08

A Tour of Scala 3
Martin Odersky, Professor EPFL, Co-founder Lightbend
June 11, 2019 · 5:15 p.m.

8337 views

41:46

A story of unification: from Apache Spark to MLflow
Reynold Xin, Databricks
June 12, 2019 · 9:15 a.m.

1267 views

44:09

In Types We Trust
Bill Venners, Artima, Inc
June 12, 2019 · 10:15 a.m.

1569 views

45:44

Creating Native iOS and Android Apps in Scala without tears
Zahari Dichev, Bullet.io
June 12, 2019 · 10:16 a.m.

2232 views

48:06

Techniques for Teaching Scala
Noel Welsh, Inner Product and Underscore
June 12, 2019 · 10:17 a.m.

1296 views

44:52

Future-proofing Scala: the TASTY intermediate representation
Guillaume Martres, student at EPFL
June 12, 2019 · 10:18 a.m.

1157 views

47:15

Metals: rich code editing for Scala in VS Code, Vim, Emacs and beyond
Ólafur Páll Geirsson, Scala Center
June 12, 2019 · 11:15 a.m.

4695 views

45:41

Akka Streams to the Extreme
Heiko Seeberger, independent consultant
June 12, 2019 · 11:16 a.m.

1552 views

49:11

Scala First: Lessons from 3 student generations
Bjorn Regnell, Lund Univ., Sweden.
June 12, 2019 · 11:17 a.m.

577 views

35:02

Cellular Automata: How to become an artist with a few lines
Maciej Gorywoda, Wire, Berlin
June 12, 2019 · 11:18 a.m.

386 views

53:42

Why Netflix ❤'s Scala for Machine Learning
Jeremy Smith & Aish, Netflix
June 12, 2019 · 12:15 p.m.

5026 views

48:15

Massively Parallel Distributed Scala Compilation... And You!
Stu Hood, Twitter
June 12, 2019 · 12:16 p.m.

958 views

41:23

Polymorphism in Scala
Petra Bierleutgeb
June 12, 2019 · 12:17 p.m.

1113 views

46:44

sbt core concepts
Eugene Yokota, Scala Team at Lightbend
June 12, 2019 · 12:18 p.m.

1656 views

46:36

Double your performance: Scala's missing optimizing compiler
Li Haoyi, author Ammonite, Mill, FastParse, uPickle, and many more.
June 12, 2019 · 2:30 p.m.

837 views

49:24

Making Our Future Better
Viktor Klang, Lightbend
June 12, 2019 · 2:31 p.m.

1682 views

43:10

Testing in the postapocalyptic future
Daniel Westheide, INNOQ
June 12, 2019 · 2:32 p.m.

498 views

46:07

Context Buddy: the tool that knows your code better than you
Krzysztof Romanowski, sphere.it conference
June 12, 2019 · 2:33 p.m.

394 views

01:00:48

The Shape(less) of Type Class Derivation in Scala 3
Miles Sabin, Underscore Consulting
June 12, 2019 · 3:30 p.m.

2321 views

44:01

Refactor all the things!
Daniela Sfregola, organizer of the London Scala User Group meetup
June 12, 2019 · 3:31 p.m.

514 views

43:05

Integrating Developer Experiences - Build Server Protocol
Justin Kaeser, IntelliJ Scala
June 12, 2019 · 3:32 p.m.

551 views

38:23

Managing an Akka Cluster on Kubernetes
Markus Jura, MOIA
June 12, 2019 · 3:33 p.m.

735 views

46:45

Serverless Scala - Functions as SuperDuperMicroServices
Josh Suereth, Donna Malayeri & James Ward, Author of Scala In Depth; Google ; Google
June 12, 2019 · 4:45 p.m.

936 views

44:43

How are we going to migrate to Scala 3.0, aka Dotty?
Lukas Rytz, Lightbend
June 12, 2019 · 4:46 p.m.

709 views

47:18

Concurrent programming in 2019: Akka, Monix or ZIO?
Adam Warski, co-founders of SoftwareMill
June 12, 2019 · 4:47 p.m.

1974 views

38:03

ScalaJS and Typescript: an unlikely romance
Jeremy Hughes, Lightbend
June 12, 2019 · 4:48 p.m.

1377 views

54:05

Pure Functional Database Programming‚ without JDBC
Rob Norris
June 12, 2019 · 5:45 p.m.

6375 views

52:22

Why you need to be reviewing open source code
Gris Cuevas Zambrano & Holden Karau, Google Cloud;
June 12, 2019 · 5:46 p.m.

484 views

34:46

Develop seamless web services with Mu
Oli Makhasoeva, 47 Degrees
June 12, 2019 · 5:47 p.m.

785 views

51:16

Implementing the Scala 2.13 collections
Stefan Zeiger, Lightbend
June 12, 2019 · 5:48 p.m.

811 views

05:57

Introduction to day 2
June 13, 2019 · 9:10 a.m.

250 views

52:42

Sustaining open source digital infrastructure
Bogdan Vasilescu, Assistant Professor at Carnegie Mellon University's School of Computer Science, USA
June 13, 2019 · 9:16 a.m.

375 views

35:31

Building a Better Scala Community
Kelley Robinson, Developer Evangelist at Twilio
June 13, 2019 · 10:15 a.m.

245 views

41:01

Run Scala Faster with GraalVM on any Platform
Vojin Jovanovic, Oracle
June 13, 2019 · 10:16 a.m.

1342 views

43:59

ScalaClean - full program static analysis at scale
Rory Graves
June 13, 2019 · 10:17 a.m.

463 views

38:48

Flare & Lantern: Accelerators for Spark and Deep Learning
Tiark Rompf, Assistant Professor at Purdue University
June 13, 2019 · 10:18 a.m.

380 views

48:17

Metaprogramming in Dotty
Nicolas Stucki, Ph.D. student at LAMP
June 13, 2019 · 11:15 a.m.

1250 views

47:34

Fast, Simple Concurrency with Scala Native
Richard Whaling, data engineer based in Chicago
June 13, 2019 · 11:16 a.m.

624 views

50:14

Pick your number type with Spire
Denis Rosset, postdoctoral researcher at Perimeter Institute
June 13, 2019 · 11:17 a.m.

245 views

50:41

Scala.js and WebAssembly, a tale of the dangers of the sea
Sébastien Doeraene, Executive director of the Scala Center
June 13, 2019 · 11:18 a.m.

661 views

58:45

Performance tuning Twitter services with Graal and ML
Chris Thalinger, Twitter
June 13, 2019 · 12:15 p.m.

2003 views

36:40

Supporting the Scala Ecosystem: Stories from the Line
Justin Pihony, Lightbend
June 13, 2019 · 12:16 p.m.

163 views

49:28

Compiling to preserve our privacy
Manohar Jonnalagedda and Jakob Odersky, Inpher
June 13, 2019 · 12:17 p.m.

302 views

45:58

Building Scala with Bazel
Natan Silnitsky, wix.com
June 13, 2019 · 12:18 p.m.

565 views

53:21

Diversity and Inclusion Panel: Bring the Thunder!
Panel
June 13, 2019 · 2:30 p.m.

245 views

47:59

Asynchronous streams in direct style with and without macros
Philipp Haller, KTH Royal Institute of Technology in Stockholm
June 13, 2019 · 3:45 p.m.

304 views

47:41

Interactive Computing with Jupyter and Almond
Sören Brunk, USU Software AG
June 13, 2019 · 3:46 p.m.

681 views

48:48

Scala best practices I wish someone'd told me about
Nicolas Rinaudo, CTO of Besedo
June 13, 2019 · 3:47 p.m.

2707 views

50:43

High performance Privacy By Design using Matryoshka & Spark
Wiem Zine El Abidine and Olivier Girardot, Scala Backend Developer at MOIA / co-founder of Lateral Thoughts
June 13, 2019 · 3:48 p.m.

754 views

46:05

Immutable Sequential Maps – Keeping order while hashed
Odd Möller
June 13, 2019 · 4:45 p.m.

277 views

48:32

All the fancy things flexible dependency management can do
Alexandre Archambault, engineer at the Scala Center
June 13, 2019 · 4:46 p.m.

389 views

42:17

ScalaWebTest - integration testing made easy
Dani Rey, Unic AG
June 13, 2019 · 4:47 p.m.

468 views

49:16

Mellite: An Integrated Development Environment for Sound
Hanns Holger Rutz, Institute of Electronic Music and Acoustics (IEM), Graz
June 13, 2019 · 4:48 p.m.

213 views

57:50

Closing panel
Panel
June 13, 2019 · 5:54 p.m.

400 views

Recommended talks

21:49

Big Data with Health Data
Sébastien Déjean
Sept. 5, 2019 · 9:20 a.m.

38:29

How Docker revolutionized the IT landscape
Vadim Bauer, 8gears AG / Zürich, Switzerland
Nov. 26, 2016 · 4:32 p.m.

120 views

A story of unification: from Apache Spark to MLflow Reynold Xin, Databricks

Embed

Transcriptions

Conference Program

Welcome! June 11, 2019 · 5:03 p.m.

A Tour of Scala 3 Martin Odersky, Professor EPFL, Co-founder Lightbend June 11, 2019 · 5:15 p.m.

A story of unification: from Apache Spark to MLflow Reynold Xin, Databricks June 12, 2019 · 9:15 a.m.

In Types We Trust Bill Venners, Artima, Inc June 12, 2019 · 10:15 a.m.

Creating Native iOS and Android Apps in Scala without tears Zahari Dichev, Bullet.io June 12, 2019 · 10:16 a.m.

Techniques for Teaching Scala Noel Welsh, Inner Product and Underscore June 12, 2019 · 10:17 a.m.

Future-proofing Scala: the TASTY intermediate representation Guillaume Martres, student at EPFL June 12, 2019 · 10:18 a.m.

Metals: rich code editing for Scala in VS Code, Vim, Emacs and beyond Ólafur Páll Geirsson, Scala Center June 12, 2019 · 11:15 a.m.

Akka Streams to the Extreme Heiko Seeberger, independent consultant June 12, 2019 · 11:16 a.m.

Scala First: Lessons from 3 student generations Bjorn Regnell, Lund Univ., Sweden. June 12, 2019 · 11:17 a.m.

Cellular Automata: How to become an artist with a few lines Maciej Gorywoda, Wire, Berlin June 12, 2019 · 11:18 a.m.

Why Netflix ❤'s Scala for Machine Learning Jeremy Smith & Aish, Netflix June 12, 2019 · 12:15 p.m.

Massively Parallel Distributed Scala Compilation... And You! Stu Hood, Twitter June 12, 2019 · 12:16 p.m.

Polymorphism in Scala Petra Bierleutgeb June 12, 2019 · 12:17 p.m.

sbt core concepts Eugene Yokota, Scala Team at Lightbend June 12, 2019 · 12:18 p.m.

Double your performance: Scala's missing optimizing compiler Li Haoyi, author Ammonite, Mill, FastParse, uPickle, and many more. June 12, 2019 · 2:30 p.m.

Making Our Future Better Viktor Klang, Lightbend June 12, 2019 · 2:31 p.m.

Testing in the postapocalyptic future Daniel Westheide, INNOQ June 12, 2019 · 2:32 p.m.

Context Buddy: the tool that knows your code better than you Krzysztof Romanowski, sphere.it conference June 12, 2019 · 2:33 p.m.

The Shape(less) of Type Class Derivation in Scala 3 Miles Sabin, Underscore Consulting June 12, 2019 · 3:30 p.m.

Refactor all the things! Daniela Sfregola, organizer of the London Scala User Group meetup June 12, 2019 · 3:31 p.m.

Integrating Developer Experiences - Build Server Protocol Justin Kaeser, IntelliJ Scala June 12, 2019 · 3:32 p.m.

Managing an Akka Cluster on Kubernetes Markus Jura, MOIA June 12, 2019 · 3:33 p.m.

Serverless Scala - Functions as SuperDuperMicroServices Josh Suereth, Donna Malayeri & James Ward, Author of Scala In Depth; Google ; Google June 12, 2019 · 4:45 p.m.

How are we going to migrate to Scala 3.0, aka Dotty? Lukas Rytz, Lightbend June 12, 2019 · 4:46 p.m.

Concurrent programming in 2019: Akka, Monix or ZIO? Adam Warski, co-founders of SoftwareMill June 12, 2019 · 4:47 p.m.

ScalaJS and Typescript: an unlikely romance Jeremy Hughes, Lightbend June 12, 2019 · 4:48 p.m.

Pure Functional Database Programming‚ without JDBC Rob Norris June 12, 2019 · 5:45 p.m.

Why you need to be reviewing open source code Gris Cuevas Zambrano & Holden Karau, Google Cloud; June 12, 2019 · 5:46 p.m.

Develop seamless web services with Mu Oli Makhasoeva, 47 Degrees June 12, 2019 · 5:47 p.m.

Implementing the Scala 2.13 collections Stefan Zeiger, Lightbend June 12, 2019 · 5:48 p.m.

Introduction to day 2 June 13, 2019 · 9:10 a.m.

Sustaining open source digital infrastructure Bogdan Vasilescu, Assistant Professor at Carnegie Mellon University's School of Computer Science, USA June 13, 2019 · 9:16 a.m.

Building a Better Scala Community Kelley Robinson, Developer Evangelist at Twilio June 13, 2019 · 10:15 a.m.

Run Scala Faster with GraalVM on any Platform Vojin Jovanovic, Oracle June 13, 2019 · 10:16 a.m.

ScalaClean - full program static analysis at scale Rory Graves June 13, 2019 · 10:17 a.m.

Flare & Lantern: Accelerators for Spark and Deep Learning Tiark Rompf, Assistant Professor at Purdue University June 13, 2019 · 10:18 a.m.

Metaprogramming in Dotty Nicolas Stucki, Ph.D. student at LAMP June 13, 2019 · 11:15 a.m.

Fast, Simple Concurrency with Scala Native Richard Whaling, data engineer based in Chicago June 13, 2019 · 11:16 a.m.

Pick your number type with Spire Denis Rosset, postdoctoral researcher at Perimeter Institute June 13, 2019 · 11:17 a.m.

Scala.js and WebAssembly, a tale of the dangers of the sea Sébastien Doeraene, Executive director of the Scala Center June 13, 2019 · 11:18 a.m.

Performance tuning Twitter services with Graal and ML Chris Thalinger, Twitter June 13, 2019 · 12:15 p.m.

Supporting the Scala Ecosystem: Stories from the Line Justin Pihony, Lightbend June 13, 2019 · 12:16 p.m.

Compiling to preserve our privacy Manohar Jonnalagedda and Jakob Odersky, Inpher June 13, 2019 · 12:17 p.m.

Building Scala with Bazel Natan Silnitsky, wix.com June 13, 2019 · 12:18 p.m.

Diversity and Inclusion Panel: Bring the Thunder! Panel June 13, 2019 · 2:30 p.m.

Asynchronous streams in direct style with and without macros Philipp Haller, KTH Royal Institute of Technology in Stockholm June 13, 2019 · 3:45 p.m.

Interactive Computing with Jupyter and Almond Sören Brunk, USU Software AG June 13, 2019 · 3:46 p.m.

Scala best practices I wish someone'd told me about Nicolas Rinaudo, CTO of Besedo June 13, 2019 · 3:47 p.m.

High performance Privacy By Design using Matryoshka & Spark Wiem Zine El Abidine and Olivier Girardot, Scala Backend Developer at MOIA / co-founder of Lateral Thoughts June 13, 2019 · 3:48 p.m.

Immutable Sequential Maps – Keeping order while hashed Odd Möller June 13, 2019 · 4:45 p.m.

All the fancy things flexible dependency management can do Alexandre Archambault, engineer at the Scala Center June 13, 2019 · 4:46 p.m.

ScalaWebTest - integration testing made easy Dani Rey, Unic AG June 13, 2019 · 4:47 p.m.

Mellite: An Integrated Development Environment for Sound Hanns Holger Rutz, Institute of Electronic Music and Acoustics (IEM), Graz June 13, 2019 · 4:48 p.m.

Closing panel Panel June 13, 2019 · 5:54 p.m.

Recommended talks

Big Data with Health Data Sébastien Déjean Sept. 5, 2019 · 9:20 a.m.

How Docker revolutionized the IT landscape Vadim Bauer, 8gears AG / Zürich, Switzerland Nov. 26, 2016 · 4:32 p.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

A story of unification: from Apache Spark to MLflow
Reynold Xin, Databricks

Welcome!
June 11, 2019 · 5:03 p.m.

A Tour of Scala 3
Martin Odersky, Professor EPFL, Co-founder Lightbend
June 11, 2019 · 5:15 p.m.

A story of unification: from Apache Spark to MLflow
Reynold Xin, Databricks
June 12, 2019 · 9:15 a.m.

In Types We Trust
Bill Venners, Artima, Inc
June 12, 2019 · 10:15 a.m.

Creating Native iOS and Android Apps in Scala without tears
Zahari Dichev, Bullet.io
June 12, 2019 · 10:16 a.m.

Techniques for Teaching Scala
Noel Welsh, Inner Product and Underscore
June 12, 2019 · 10:17 a.m.

Future-proofing Scala: the TASTY intermediate representation
Guillaume Martres, student at EPFL
June 12, 2019 · 10:18 a.m.

Metals: rich code editing for Scala in VS Code, Vim, Emacs and beyond
Ólafur Páll Geirsson, Scala Center
June 12, 2019 · 11:15 a.m.

Akka Streams to the Extreme
Heiko Seeberger, independent consultant
June 12, 2019 · 11:16 a.m.

Scala First: Lessons from 3 student generations
Bjorn Regnell, Lund Univ., Sweden.
June 12, 2019 · 11:17 a.m.

Cellular Automata: How to become an artist with a few lines
Maciej Gorywoda, Wire, Berlin
June 12, 2019 · 11:18 a.m.

Why Netflix ❤'s Scala for Machine Learning
Jeremy Smith & Aish, Netflix
June 12, 2019 · 12:15 p.m.

Massively Parallel Distributed Scala Compilation... And You!
Stu Hood, Twitter
June 12, 2019 · 12:16 p.m.

Polymorphism in Scala
Petra Bierleutgeb
June 12, 2019 · 12:17 p.m.

sbt core concepts
Eugene Yokota, Scala Team at Lightbend
June 12, 2019 · 12:18 p.m.

Double your performance: Scala's missing optimizing compiler
Li Haoyi, author Ammonite, Mill, FastParse, uPickle, and many more.
June 12, 2019 · 2:30 p.m.

Making Our Future Better
Viktor Klang, Lightbend
June 12, 2019 · 2:31 p.m.

Testing in the postapocalyptic future
Daniel Westheide, INNOQ
June 12, 2019 · 2:32 p.m.

Context Buddy: the tool that knows your code better than you
Krzysztof Romanowski, sphere.it conference
June 12, 2019 · 2:33 p.m.

The Shape(less) of Type Class Derivation in Scala 3
Miles Sabin, Underscore Consulting
June 12, 2019 · 3:30 p.m.

Refactor all the things!
Daniela Sfregola, organizer of the London Scala User Group meetup
June 12, 2019 · 3:31 p.m.

Integrating Developer Experiences - Build Server Protocol
Justin Kaeser, IntelliJ Scala
June 12, 2019 · 3:32 p.m.

Managing an Akka Cluster on Kubernetes
Markus Jura, MOIA
June 12, 2019 · 3:33 p.m.

Serverless Scala - Functions as SuperDuperMicroServices
Josh Suereth, Donna Malayeri & James Ward, Author of Scala In Depth; Google ; Google
June 12, 2019 · 4:45 p.m.

How are we going to migrate to Scala 3.0, aka Dotty?
Lukas Rytz, Lightbend
June 12, 2019 · 4:46 p.m.

Concurrent programming in 2019: Akka, Monix or ZIO?
Adam Warski, co-founders of SoftwareMill
June 12, 2019 · 4:47 p.m.

ScalaJS and Typescript: an unlikely romance
Jeremy Hughes, Lightbend
June 12, 2019 · 4:48 p.m.

Pure Functional Database Programming‚ without JDBC
Rob Norris
June 12, 2019 · 5:45 p.m.

Why you need to be reviewing open source code
Gris Cuevas Zambrano & Holden Karau, Google Cloud;
June 12, 2019 · 5:46 p.m.

Develop seamless web services with Mu
Oli Makhasoeva, 47 Degrees
June 12, 2019 · 5:47 p.m.

Implementing the Scala 2.13 collections
Stefan Zeiger, Lightbend
June 12, 2019 · 5:48 p.m.

Introduction to day 2
June 13, 2019 · 9:10 a.m.

Sustaining open source digital infrastructure
Bogdan Vasilescu, Assistant Professor at Carnegie Mellon University's School of Computer Science, USA
June 13, 2019 · 9:16 a.m.

Building a Better Scala Community
Kelley Robinson, Developer Evangelist at Twilio
June 13, 2019 · 10:15 a.m.

Run Scala Faster with GraalVM on any Platform
Vojin Jovanovic, Oracle
June 13, 2019 · 10:16 a.m.

ScalaClean - full program static analysis at scale
Rory Graves
June 13, 2019 · 10:17 a.m.

Flare & Lantern: Accelerators for Spark and Deep Learning
Tiark Rompf, Assistant Professor at Purdue University
June 13, 2019 · 10:18 a.m.

Metaprogramming in Dotty
Nicolas Stucki, Ph.D. student at LAMP
June 13, 2019 · 11:15 a.m.

Fast, Simple Concurrency with Scala Native
Richard Whaling, data engineer based in Chicago
June 13, 2019 · 11:16 a.m.

Pick your number type with Spire
Denis Rosset, postdoctoral researcher at Perimeter Institute
June 13, 2019 · 11:17 a.m.

Scala.js and WebAssembly, a tale of the dangers of the sea
Sébastien Doeraene, Executive director of the Scala Center
June 13, 2019 · 11:18 a.m.

Performance tuning Twitter services with Graal and ML
Chris Thalinger, Twitter
June 13, 2019 · 12:15 p.m.

Supporting the Scala Ecosystem: Stories from the Line
Justin Pihony, Lightbend
June 13, 2019 · 12:16 p.m.

Compiling to preserve our privacy
Manohar Jonnalagedda and Jakob Odersky, Inpher
June 13, 2019 · 12:17 p.m.

Building Scala with Bazel
Natan Silnitsky, wix.com
June 13, 2019 · 12:18 p.m.

Diversity and Inclusion Panel: Bring the Thunder!
Panel
June 13, 2019 · 2:30 p.m.

Asynchronous streams in direct style with and without macros
Philipp Haller, KTH Royal Institute of Technology in Stockholm
June 13, 2019 · 3:45 p.m.

Interactive Computing with Jupyter and Almond
Sören Brunk, USU Software AG
June 13, 2019 · 3:46 p.m.

Scala best practices I wish someone'd told me about
Nicolas Rinaudo, CTO of Besedo
June 13, 2019 · 3:47 p.m.

High performance Privacy By Design using Matryoshka & Spark
Wiem Zine El Abidine and Olivier Girardot, Scala Backend Developer at MOIA / co-founder of Lateral Thoughts
June 13, 2019 · 3:48 p.m.

Immutable Sequential Maps – Keeping order while hashed
Odd Möller
June 13, 2019 · 4:45 p.m.

All the fancy things flexible dependency management can do
Alexandre Archambault, engineer at the Scala Center
June 13, 2019 · 4:46 p.m.

ScalaWebTest - integration testing made easy
Dani Rey, Unic AG
June 13, 2019 · 4:47 p.m.

Mellite: An Integrated Development Environment for Sound
Hanns Holger Rutz, Institute of Electronic Music and Acoustics (IEM), Graz
June 13, 2019 · 4:48 p.m.

Closing panel
Panel
June 13, 2019 · 5:54 p.m.

Big Data with Health Data
Sébastien Déjean
Sept. 5, 2019 · 9:20 a.m.

How Docker revolutionized the IT landscape
Vadim Bauer, 8gears AG / Zürich, Switzerland
Nov. 26, 2016 · 4:32 p.m.