Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:02
hi i'm super excited to be here thank you very much for
00:00:04
coming thank you for waking up early and getting over your hangovers perhaps
00:00:08
i really appreciate i'm super excited to be here i actually missed two planes or
00:00:12
real one was cancelled and one was the way it almost didn't make it so
00:00:16
extracts either because of this thank you for coming up before i begin
00:00:21
um to do things for someone auditions uh all the work that i will be presenting
00:00:26
here has obviously not been done just maybe has been done by my students primarily for
00:00:30
the smart people that are actually doing all this cool stuff um and by my collaborators
00:00:35
who have influence the way i think and so on and helping with all of this so
00:00:38
the stock but not impossible without them have a big uh i think it to them a second
00:00:45
heather was right and thank you for introduction i'd do come here uh as an outsider right
00:00:51
spend my time at carnegie mellon university in pittsburgh this is our campus looks like
00:00:57
i spend my time teaching and a mentoring in the research
00:01:00
should graduate students when i'm not cheating funding like everyone else
00:01:04
um but that means that all um if your noticing here we have not one but
00:01:11
two ivory towers for all the academic sets you need to spend their time and um
00:01:17
this means that i'm a little bit uh an outsider uh both to the stock annuity as well
00:01:22
as to the work that many of you are actually doing on the grounds building cool things um
00:01:28
but at the same time i did spend quite a
00:01:30
bit of time studying how people build open source software and
00:01:35
how these communities work and and function so i would really like to share some of these insights would you do they
00:01:41
hopefully it will serve as an interesting point of comparison to the scholar community
00:01:45
um but really i'm here to learn from all of you about how this
00:01:49
colour community operate and works or challenges that this thought community faces okay um
00:01:56
both for we we need to dive in on can i ask you to
00:02:01
also introduce ourselves a little bit um so i'm gonna do this with the quiz
00:02:06
um and let's see how well the internet works here
00:02:12
so if you would impulse p. and uh take your
00:02:15
mobile phones out assuming your connected to wife i somehow
00:02:20
uh could you tell me a little bit about of what your engagement with open source
00:02:26
uh it's just just for me to get a better flavour for uh who the audience it's
00:02:31
um so yeah go to this website men t. dot com and type in the
00:02:37
code you see there ah wow cool i actually never done this before i'm quite impressed
00:02:49
cool whoever bogus deserves the credit for that
00:03:07
yeah love it right
00:03:12
oh
00:03:13
so this is like a this means i mean the right crowd here it's exactly what i was hoping for of hoping
00:03:18
for a mix of people that are busy building open source software and people that are
00:03:24
busy using open source software because that we're seeing interesting dynamics from from both a stake holders
00:03:31
okay this is this is great so a piece hold onto your phones and to this
00:03:37
web pages will be using it a couple more times throughout the talk if you don't mind
00:03:45
okay so i wanna talk to you today about um
00:03:49
oh open source software in how i see it uh uh changing in in the last two years
00:03:55
uh obviously i don't need to tell you about how important open
00:03:58
source software is on some of you may have read this great reports
00:04:01
that now the egg bowl wrote a couple years ago but talking
00:04:04
about how open source software is the digital infrastructure that our world relies
00:04:09
on in much the same way that rosen bridges are the physical
00:04:13
infrastructure that our economy is based on um and haul pretty much anyone
00:04:17
and everyone uses open source or train it to some extent if
00:04:21
i'm sure you know this uh i'm sure you also know that oh
00:04:24
open source software needs this study supply of effort for it to be sustainable it
00:04:30
it needs people doing things and sort of continuing to do things over time because all
00:04:36
kinds of bad things will happen otherwise you might remember um the whole left had
00:04:41
incident does anybody know about this and and came from india menu to know about this
00:04:46
uh eleven lines of code that took the have the internet down for for a few hours
00:04:51
uh you might remember the heart bleed incident that happened in openness to sell some years ago
00:04:56
uh at the time open as the sole was maintained by single full time person
00:05:01
uh and it's such a critical piece of infrastructure for all of our economy be much
00:05:07
okay so i'm not argue that time is uh for this talk is that
00:05:12
ah creating sustainable open source communities is hard it has always been hard but
00:05:18
i'm gonna argue it's maybe even harder today than it used to be before
00:05:25
because of the ways in which open source software changed in the last few years um
00:05:32
much like um the academics your type i will over promised under the liver so i'm gonna be
00:05:40
raise more problems then i will provide solutions for for just just a heads up but to
00:05:46
compensate a little bit for the inevitable disappointments that you might have here's a photo of uh
00:05:52
no no exclusive photo of metal uh that i took myself in pittsburgh when he was
00:05:58
younger notice how long the time gets a i don't know how this is physically possible
00:06:04
okay so hall has open source software change let me tell you some of the ways that i see it
00:06:10
as having changed oneness obviously get how does a lot of credit for
00:06:14
standard rising the practised and for changing our standard rising the way people work
00:06:20
through their you why and the previous model that made it easier for people to
00:06:23
contribute and so on now we all speak the same language does get homework to language
00:06:28
um and this has resulted in a lot more production
00:06:32
um these that are probably all by now there's literally over a hundred
00:06:38
million open source repositories on get how there's millions more on the other competitors
00:06:43
um some of you might remember does big move when microsoft announced their blanket how people were
00:06:49
migrating en mass to develop so a lot is now also a big a competitor into space
00:06:55
okay so a lot more open source now then we used to have
00:07:00
a big change um is also really this high
00:07:03
level of transparency and this is maybe not obvious um
00:07:08
i at first glance but it's really i argued one of the fundamental ways in which open sources change together with the
00:07:14
murders of get help the fact that now all open source
00:07:17
programmers have these social media like well we're still hasn't profile pages
00:07:23
online that aggregate all kinds of information about them and similarly open source
00:07:28
projects have these profile pages that aggregate all kinds of information about them
00:07:33
um and there's been a lot of research showing how all this high level of
00:07:38
transparency into what people are doing and how projects are being developed and meeting that's on
00:07:43
how is high level of transparency has been changing pretty much everything has been changing
00:07:48
the way people collaborate the way the right code um has been changing recruiting and hiring
00:07:54
these profiles of come to act or to replace traditional resumes for for many of us
00:08:01
um another change here is the fact that um or i think the fact that these
00:08:08
projects are not really developed in isolation but
00:08:11
we need this is a very complex organic ecosystem
00:08:15
of inter dependencies interconnections between projects between people oh
00:08:20
there's technical dependencies and so on between libraries there is
00:08:24
social dependencies that people are making multiple projects open chipping tomato products at the same time um
00:08:30
all kinds of relationships being formed between these
00:08:34
um stakeholders this is a very organic very complex
00:08:38
um socio technical ecosystem is not just codes that live
00:08:42
there somewhere but it does this big social technical thing
00:08:46
um and if you've seen that this can be quite brittle right uh the l. left bad example
00:08:52
uh coming back from a minute ago um you could see how very easily a small
00:08:58
changes somewhere in the ecosystem can perturbed of pretty much everything else and have a huge impact
00:09:07
um and i think that changed is this increasing level off
00:09:12
commercial isolation and professional edition excuse me
00:09:21
it used to be the case that open source
00:09:23
software was build was maintained by these communities the volunteers
00:09:29
and that's pretty much gone uh these days as far as i can tell there's a lot
00:09:35
a lot of commercial involvement in problems words as open source projects that are started by companies
00:09:41
and made open source there's all kinds of star labs are operating on the space and so on
00:09:46
i'm a snapshot from the big survey that get hebron a couple years ago
00:09:51
something might remember this shows that a lot of people that are contribute to
00:09:56
open source are doing get i'm doing it as part of their jobs and
00:10:00
the small paul be run and a couple minutes ago showed the same thing again
00:10:07
so interestingly i basically nobody at the facts scandal
00:10:14
a few were of you then left right okay good so
00:10:17
aptly facts is a company in the united states that um
00:10:22
calculates the credit score of americans which affects their ability
00:10:28
to get a credit up from banks and so on
00:10:32
um at the facts was hacked a a year or two ago i don't remember exactly
00:10:37
um and as part of that the um personal information of about a hundred and
00:10:42
fifty million americans it's half the population of united states uh the personal information of about
00:10:49
hundred and fifty million people was golden um and interesting lee
00:10:55
um at the facts uh blamed all of this on an open
00:11:00
source project on apache struts which is where the vulnerability has happened
00:11:06
even though they never contributed themselves to
00:11:10
open source another supported open source development
00:11:13
and they actually never reacted to all the uh
00:11:18
warnings in requests to update their out of date
00:11:21
uh versions of starts to remove this a vulnerability so a
00:11:26
lot of expectation a lot of pressure from users of open source
00:11:32
that the infrastructure be of high quality the very reliable be very secure and so on and so forth
00:11:38
this is just one of many examples of is increasing pressure that's being put on open source meetings
00:11:46
um you also see this um in the
00:11:51
well smaller data the interactions between users and me painters
00:11:56
because again would get hard is so much easier now too
00:12:00
to contribute to report issues to submit requests anyone can do it there's obviously growing volumes
00:12:06
of these things there's a lot more of these this is a snapshot i took um
00:12:15
yesterday i ups that we go a snapshot i took yesterday off well
00:12:22
the sky like it had page showing close to two thousand issues being open
00:12:28
uh i guess pretty much at any point in time there's thousands of
00:12:31
these issues being opened no i don't know there's any plan all of
00:12:35
uh ever closing their more of the just the backlog i don't know exactly what the situation is but
00:12:40
lots of these requests that somebody has to look at and decide something uh about and do something with
00:12:46
um there's also social pressure to respond to your contributors as a maintainer
00:12:52
because if you don't you risk losing them they was going away never coming back in research on this
00:12:57
um and all kinds of very interesting
00:13:02
argue be toxic interactions that are visible
00:13:05
uh in these open source communities i'm pasting here two quotes
00:13:10
oh we have been collecting these uh there's many examples like these
00:13:14
often very rude varying titled types of requests that users are making
00:13:21
on the all the source or to the open source think painters up
00:13:27
okay so this brings me to my next quiz i'd like to get a
00:13:31
feel for how well prevalent to these things or in the scholar community i
00:13:39
um can you tell me if you ever feel
00:13:44
as many painters or contributors to open source those uh
00:13:48
among you that they're doing this could you tell me if you ever feel overwhelmed by the amount of
00:13:53
feature requests are issues about reports and so on that uh
00:13:57
people are um that users are producing on your projects i
00:14:08
i i interesting
00:14:14
i i i bought so this is worse than i thought about half of you are
00:14:25
saying that that this is an issue that there's a lot of the manned
00:14:32
and argue below stress being caused by these high volumes of um of request
00:14:39
thank you thank you so much
00:14:47
so similarly do you feel that this is
00:14:50
sustainable are these interactions healthy do you think sorry
00:14:59
i will come back to the ornament do you feel that these are healthy interactions
00:15:06
this is thing the ball is is if you see this continuing
00:15:11
um without people burning out without you bring out without user's going away
00:15:19
i i i i i
00:15:36
right so this mirrors to a large extent the what we saw earlier there still
00:15:43
still a lot we could we could do to improve it seems i'm related to this
00:15:48
how do you prioritise these things when you get all these requests from from your users
00:15:54
how do you prioritise these right i'm i'm assuming that almost two thousand
00:16:00
uh open issues and and scar right now so somebody
00:16:04
has to prioritise uh even even looking at these little
00:16:07
on closing them and dealing with them how do you do this oh i would love to hear how people
00:16:14
uh how people actually do this and is there any best track this is that we could learn from as a community
00:16:18
that others can learn from from you um in terms of
00:16:22
how to uh how to deal with this uh huh random
00:16:31
high risk first oppose that's a nice that's a nice way so
00:16:36
i was i guess means what're you perceive the this issue affects
00:16:40
lots of people were a lot of people care about fixing this to get higher priority interesting i what does go to sleep mean
00:16:50
whoever answered that if it maybe we could help clarify i
00:17:00
how do you know what's critical though i that obvious from just the surface level
00:17:08
um report can you can you always tell oh okay thank you all come back to this
00:17:22
later and and share a summary of all of these things with you um after after talk
00:17:29
um one more thing that i see changing um i don't know to what extent this changing but i'm
00:17:33
gonna argue uh in the second that it has but or as opposed to it being always this telescope
00:17:39
really low demographic diversity in these open source communities
00:17:44
on the one hand you have this low barrier to entry at least in theory
00:17:48
on paper anyone can contribute to any open source project it's all out there on on
00:17:53
the web somewhere and get have work usually on the other hand very very low
00:17:59
uh demographic diversity him particular i'm talking about women as one of the under represented minorities
00:18:05
there is around five percent women in these open
00:18:08
source communities either get how we're stack overflow um
00:18:12
which is really surprising because industry reports about twenty percent or
00:18:16
so uh women in technical jobs are in in software engineering
00:18:22
and this is very strange because if you ask
00:18:26
open source contributors inmate painters about what the perception is
00:18:30
you get answers like these we actually asked people this is from a study with a few years ago
00:18:35
people perceive that demographic identity is irrelevant and it's
00:18:39
all about the code write the code season all colour
00:18:43
or gender is one of the quotes we we get from open source meetings how is it that uh
00:18:50
there's just so few uh minorities in general and women in particular contributing to these
00:18:56
i'm just as an aside uh regardless of what your of
00:19:01
the philosophy or political orientation here is all on this issue
00:19:05
um there is evidence that teams that are more
00:19:09
diversity in particular software teams that are more diverse
00:19:12
uh are more productive we we ourselves in our group found some of
00:19:16
this evidence a few years ago in a study how we saw that ah
00:19:20
holding arbour confounds fixed teams that are more diverse with respect to
00:19:26
gender and for tenure or experience bait tends to write code faster
00:19:32
uh then teams are less diverse again holding other variables constant
00:19:37
um there's also this uh idea that by making anything really more
00:19:44
inclusive for one particular minority you make it better for everyone else
00:19:48
uh i stole the slide from money does our mind mark burnett there to faculty members at oregon
00:19:54
state university in the us on the classical example are these curb cuts okay these little curb cuts
00:20:01
have been developed for people in wheelchairs historically but they've
00:20:07
come to serve a very wide a range of users
00:20:12
plumb bikes people with strollers people with walking a a
00:20:16
helpers people with language and so on and so forth so
00:20:19
really by making or to muse more inclusive to one particular
00:20:22
minority we end up making a it better for everyone else
00:20:27
okay so um what we do what my group does what
00:20:31
i've been doing over the last two years is looking at
00:20:35
data that comes from these open source communities and we have the luxury
00:20:40
of uh getting access or having access to large quantities of this data
00:20:45
we'll be looking at this to try to learn about hall um
00:20:50
disney does that mean changing what effects is uh changes that had
00:20:54
and trying to understand really there's anything we could do to reduce the reverse some of these negative effect
00:20:59
and i wanna shares a few examples of for research with you today
00:21:03
just to give you a flavour of the kinds of things are finding
00:21:06
i'm gonna start with an example of transparency um
00:21:11
you know we're looking at these are post three badges double for then i wanna
00:21:16
ask you one more time if you will indulge me could you please tell me
00:21:20
what non code contributions do you see
00:21:24
or or do you do yourselves a in open source projects
00:21:29
so what is there besides just writing code what other kinds of contributions
00:21:35
do you see your do you value on in in open source communities besides code
00:21:44
i i i i i i i i i i i'm gonna leave this on hopefully
00:22:06
oh great thank you a documentation is a big deal
00:22:13
marketing publicity fee arching oh support get or support very interesting stack overflow
00:22:24
that's great thank you so much i ah documentation is really a big deal
00:22:33
judging to okay um
00:22:36
the reason i'm asking you this is because of um
00:22:41
one particular new kind of contribution that we're seeing people do which is
00:22:45
this so beautifully keychain if you will work that's happening around open source
00:22:49
a project adding sparkle i think later to open source projects um
00:22:55
i told you already we talked about how all these communities are very transparent that you you get
00:23:00
this first hand view into what everyone and anyone
00:23:03
is doing um one particular kind of transparency in
00:23:07
we call the signals we call these little visible cues that are up
00:23:11
present on these web pages like the number of stars in the number of
00:23:15
uh commit and so on um number follows people have we call the signals
00:23:20
one particular kind of signal that we're seeing pop up more and more
00:23:24
are these repository badges and some of you may have seen these on um
00:23:29
uh usually embedded in read me pages that are displayed on the homepage of an open source about three
00:23:36
i'm not it's very interesting because these signals extent
00:23:41
by a lot the range of things that uh
00:23:44
open source meteors can choose to make transparent beyond the few
00:23:48
things are available by default and they get have user interface
00:23:52
um so we've been studying what effects
00:23:56
just signalling displaying these pieces of information has
00:24:01
on how users perceive these or containers perceive these open source projects
00:24:07
uh to be the flavour on going to detail supposed to feed asking afterwards if you're interested but just to give you a flavour
00:24:13
we can detect based on the uh the get history the version control
00:24:17
history of these projects we can detect when a particular badge has been introduced
00:24:22
for example here we're looking at a dependency manager badge and we can detect
00:24:26
one that was introduced and um we can plot a relative to this introduction time
00:24:32
a different outcome measures for example how up
00:24:35
to date people's dependencies work relative to when
00:24:39
this was introduced say this is in one particular package a lower on the y. axis
00:24:44
better here the the fresher the dependencies or the better on and you could see how ah
00:24:49
once they started using a tennis imagine distorted signalling this would depends manager
00:24:54
badge people kept up with their dependence is much better indicate debating that
00:24:58
um we could do is over many projects and we could overlaid is different graphs
00:25:03
which gives us a bunch of these distributions off values which we can then
00:25:08
an ally statistically to reason about trends um and changes in these trends are relative
00:25:14
to the intervention and this gives us a very robust way off i'm making seven
00:25:19
almost causal arguments about the uh the value or the fact of these particular interventions
00:25:26
i'll give you two examples uh here we
00:25:28
looked at how people's dependencies are becoming fresh air
00:25:34
uh if at all after people start displaying these
00:25:38
dependency manager badges and you can see a very strong
00:25:41
effect here you can see that i'm not just that
00:25:46
but these dependencies become immediately for sure but they also
00:25:49
uh remain fresh over time this is over a sample
00:25:53
of many tens of thousands of packages from n. p. m.
00:25:57
so really really big sample similarly how we looked at the value of
00:26:03
these things integration builder called coverage badges and we found something really really interesting
00:26:10
we found that um users contributors to these uh i'll open
00:26:15
source packages once these uh testing related badges are being displayed
00:26:23
the contributors are much more likely to incorporate to
00:26:28
add unit tests together with their uh pull request contributions
00:26:33
they they do this just naturally because the main painters or signalling
00:26:39
that they care about called quality in day care about testing what i'm showing on the y. axes is for
00:26:45
every month the fraction off all requests that have been
00:26:49
submitted to those projects in that particular month containing test cases
00:26:55
you can see how after they started signalling that they care about testing and called quality that
00:27:01
uh people started submitting test cases much more and they kept doing this over time really interesting i
00:27:08
so what about colour off i looked at this a little bit yesterday and all i could find it is
00:27:15
a is colour c. i. badge on the read me page of style
00:27:19
so before you get too excited and start a adding all kinds of that is
00:27:25
the scar on couple of take away his first of all we are saying that
00:27:31
it's not just enough to put any badges there
00:27:34
for for these effects to happen um it really matters
00:27:39
where the badge is doing anything interesting or not
00:27:43
um so in this case we're seeing that badges that have some kind of underlying
00:27:48
analysis for example of continuous integration score coat
00:27:51
cover score what have you dependency managers score
00:27:55
something that has some kind of analysis in the background computing
00:27:58
the value that is to be displayed these have much stronger effects
00:28:03
then um the badges that just display some random information
00:28:07
um so this means for example if you have a choice between a support batch like tests
00:28:13
um one the one on the the green ones like join badge just
00:28:18
links to the channel where people can can go and ask questions the the other one the the red one
00:28:24
not not just does that but also shows in real time
00:28:27
how many maintain ours are active on that particular slack channel
00:28:32
so this one has a stronger effects in terms of attracting curve contributors
00:28:38
because it's harder to fake uh it also requires more effort from the part of the meeting it's sort
00:28:44
of preferred these ones that have some kind of underlying analysis over over the ones that just link to things
00:28:51
i'm also don't that too many how we're seeing an interesting
00:28:54
effect here where if you had too many badges to these projects
00:28:59
then it takes away from the perceived quality in the perceived value so here i'm showing you
00:29:05
the popularity of these packages the number of downloads that they
00:29:09
get a as a function of how many different badges they have
00:29:13
oh controlling for other confounds and you could see that there's a magic
00:29:17
number around five or so where you get the most bang for your buck
00:29:23
um what else would you do would be useful for example we're doing some research at the moment i'm looking
00:29:30
at what uh kinds of signals potential contributors are looking
00:29:35
for when deciding to contribute to your package and um
00:29:39
one thing that comes up both in interviews with contributors as well as in these
00:29:45
models over large data sets is that people care about the tone of a particular community
00:29:51
a key people care about how friendly particular community appears to be um
00:29:57
so perhaps there's some room to automatically compute some of these and display
00:30:03
uh i dunno sentiment analysis politeness scores what have you something
00:30:07
that tells you something about how welcoming a particular community is
00:30:12
um similarly we're seeing that just small things like that
00:30:16
explicitly asking for contributions in explicitly asking for help
00:30:21
goes a long way to words enticing people to to comment contribute your project
00:30:26
so small things that um make a big difference okay so this was a transparency
00:30:34
um another's quick thing we have been looking at um
00:30:40
ecosystem of python packages that are part of the
00:30:43
pipe a package manager um and we're trying to understand
00:30:49
what predictor explains what are some of these packages will
00:30:53
become a band so here i wanna ask you again
00:30:56
how to use screen libraries when deciding which wants to adopt how
00:31:01
do you screen them to make sure that they'll be maintained in
00:31:04
the future what kinds of things to look for not just flip to go to this i i and why you do this all around
00:31:23
the little girl about what we found a two examples first we looked at
00:31:29
the impact of transit of dependencies things that
00:31:34
uh the wreck dependencies have become more visible
00:31:37
more salient uh there's the infrastructure now aunt libraries that are your for example or on
00:31:43
give haven't uh directly to display or uh the wrecked dependencies but there's very little that's
00:31:50
visible in terms of transit of dependencies arguably the impact of left had when that happened
00:31:56
uh was a lot because of these transit of dependencies
00:31:59
things that depended on things that depended on left that
00:32:02
rather than the things that depending on the fly directly so
00:32:06
here um one argument could be that um if you have
00:32:12
some problems upstream if you need help if you need
00:32:15
effort um then the more transit of dependencies you have
00:32:20
the bigger the pool of potential contributors that would come in and help well what do you think
00:32:25
we're finding finding exactly the opposite with models the
00:32:29
chances of packages becoming dormant here and we're seeing that
00:32:34
the more transit of downstream dependencies there are again controlling for maybe
00:32:39
a many other a confounding factors the more transit of dependencies there are
00:32:44
the more likely a package the option packages to
00:32:47
become a abandoned um and uh one possible explanation
00:32:53
we're learning from the interviews with the python maintainer is
00:32:57
is that these uh downstream users the far removed once
00:33:02
are just as likely to complain about issues um
00:33:07
but they are much less likely to actually do something about it and and help so interesting
00:33:13
how you know the farther you wore are from something the the less you care about about
00:33:19
it seems another thing we looked at in the study is
00:33:23
the impact of commercial involvement in these open
00:33:27
source packages so here we model and um
00:33:31
whether um packages have a lot of contributions from people with
00:33:37
uh explicit commercial affiliations uh as opposed to to not and what impact that has on the
00:33:44
chances of the package becoming abandoned again controlling for all kinds of confounding factors what we're seeing is
00:33:51
something worrying again that these packages with high levels of commercial involvement
00:33:57
appeared to be less sustainable sustainable than the ones with lower levels levels of commercial involvement
00:34:04
against what you might uh believe otherwise earliest against what we believe going into the study
00:34:09
um and one explanation that uh the meeting as we talked to or gave is that
00:34:15
while uh this high level of commercial involvement comes with more
00:34:19
resources and and a lot of help in the short term
00:34:23
it's also a lot more unstable um this support can with drawing thing go away anytime um
00:34:30
so there's a lot more risk that comes with it interesting so i take away here is that um
00:34:39
whatever happens in any particular project
00:34:43
is really not just the function of
00:34:47
the the local community but the function of the much broader ecosystem
00:34:51
off interconnections an inter dependencies between the that project and and everything else
00:34:57
right so we should be looking at the entire ecosystem we were a plumber looking at individual products uh packages
00:35:05
finally i wanna talk to you a little bit about um a study
00:35:09
we did very recently looking at um how a good open source contributors
00:35:17
remain engaged uh and more active in these open source of news over time
00:35:22
um so here that this is the last question that i would like to ask you a
00:35:33
were there any events or or people
00:35:36
um that encouraged you'd use use the get involved in
00:35:41
stay involved in in open source so as opposed to the turns things that might throw you away was
00:35:47
it was there anything in particular that really encouraged
00:35:51
you to stick around and and to become more active
00:35:54
was there a person that will help you in that sense what kind of
00:35:58
roll with this person and were they was an event that helped you
00:36:03
a wise was their commute event like this or what was
00:36:06
it that made if if anything uh a big transformation on
00:36:11
on uh your engagement um with open source um and i'll tell you what i am asking this
00:36:22
uh_huh here we're seeing a coming back to this issue of a diversity
00:36:29
we're seeing that when men on get have disengage earlier than men do
00:36:34
uh for example of here uh you can see sorry
00:36:39
you can see that say after twelve months
00:36:43
about seventy percent of men are still active but only about
00:36:48
sixty percent of women um and i apologise for this binary
00:36:53
a gender or did a split here um we had to do this
00:36:58
because otherwise the analysis with uh been intractable computationally
00:37:02
uh but we're obviously aware that gender is much more a new ones then then uh
00:37:08
then just binary we just couldn't uh compute uh with more um categories and then too
00:37:16
so okay so a lot of this could be control and they've
00:37:19
been many many studies are arguing uh this about open source culture
00:37:24
ah for example i'm quoting sexes
00:37:27
behaviour in floss is as constant as
00:37:32
it is extreme this isn't a topper fee from a few years ago
00:37:36
um there's a study that we did a few years ago as well a a code again from our participants
00:37:41
i've used a fake could have handle so that people would su my was male
00:37:48
okay um by the way before you ask 'em gender is not explicitly recorded on get
00:37:54
hub uh we know that is but people
00:37:58
do in for this very easily and very often
00:38:02
from things like your name in your profile picture again because of this high level of
00:38:07
transparency into who you wore in open source um so this is a ranking of features
00:38:14
that uh open source contributors be surveyed recognise amongst each other
00:38:19
and gender is the second most
00:38:22
visible attribute here after programming skills
00:38:27
very surprisingly an interesting i so people do really tell this even if it's not explicit record
00:38:34
um another interesting study from a couple years ago poll request acceptance raise or lower
00:38:40
when gender is apparent as opposed to not easily observable
00:38:46
meaning people don't have i easily identifiable names or profile pictures
00:38:52
for both men and women request acceptance rates are lower very interesting
00:38:57
um you could also argue that the platform
00:39:00
itself perhaps at times creates the wrong incentives um
00:39:05
does anybody remember the longest streak feature on give her a okay if you
00:39:12
view this was a feature that was a display it on people's profile pages
00:39:19
it showed the number of consecutive days
00:39:23
that get hard betsy new uh committing
00:39:26
to some uh reporters hosted there but this person was very proud to have
00:39:32
a committed for a whole here every single day without
00:39:37
any interruptions of this call this huge backlash in the community
00:39:40
you might remember um uh people complaining that this really
00:39:45
creates the wrong incentives it motivates people to avoid taking breaks
00:39:50
uh and stepping back and can be harmful to the well being of contributors and
00:39:54
therefore harmful to open source as a whole this was the manifesto that started the entire
00:39:59
movement against the along the speech feature which has since been removed from get have
00:40:04
pages is not part of good help pages and i am still all kinds of reasons
00:40:10
we um looked at some sociological reasons that might explain why people stick around or
00:40:17
disengage from these open source communities and turns out there's a really really rich and interesting
00:40:23
theory a social capital theory that people have been i'm using in sociology for for a long time
00:40:30
ah it tops roughly about two kinds of social capital there's a
00:40:35
bonding kind in the bridging kind the bonding kind is about the
00:40:40
close ties in about community building i'm feeling embedded in a particular
00:40:44
community and this causes a higher willingness to to continue to contribute
00:40:50
the bridging kind is about getting access to new opportunities
00:40:54
in new ideas and so on the bridging across communities
00:40:57
ah and this provides people new opportunities to uh continue their indigent
00:41:04
so um that's all fine but again the theory here says that
00:41:10
when you have a very unbalanced whoops like we do for example in terms of
00:41:15
gender in use open source communities when you have such high imbalance along any dimension really
00:41:21
um there's the risk of echo chambers forming an exclusion a happening where
00:41:27
the majority group and up excluding the minority group whatever that may be
00:41:31
so here because of the lower presentation of them and there's a
00:41:34
higher risk for them to be excluded from these male dominated communities
00:41:40
'cause what the theory would predict so i'm at the
00:41:43
same time the theory explains or predicts that um if uh
00:41:49
minorities are instead part of teams that are diverse not demographics because people we
00:41:56
know that's not the case but if they're diverse with respect to other attributes
00:42:00
like the kinds of ideas they have are the kinds of information they have available if uh
00:42:05
minorities or part of these more information only diverse
00:42:09
groups then that should provide with a higher oh
00:42:14
uh engagement in the longer term engagement ah and here they
00:42:18
should have especially benefit people uh there are uh minorities like
00:42:22
women so with that is really interesting study i can tell you the details of fine but we looked at a sample
00:42:30
well but sixty thousand open source contributors from different communities and
00:42:34
we uh compile the sample in such a way that says balance
00:42:38
uh men and women half and half a we use an automated gender inference technique here
00:42:44
based on people's names which works reasonably well i could tell you the details of that
00:42:49
uh but again this is why we had to make this assumption of binary gender and and we
00:42:54
couldn't do well anything more sophisticated and then we'll
00:42:57
survive analysis on top of this we try to model
00:43:00
why people what factors explain people's disengagement from open source
00:43:06
as a whole um and we're seen across the board
00:43:10
exactly what the theory would predict we're seeing that the more social capital people build
00:43:17
um the higher this thing gauge so this is again
00:43:22
uh the same kind of survival curve shows the probability of a
00:43:26
still being engaged in open source in any project really over time
00:43:31
and here you're seeing how i'm i'm showing you two curves one for
00:43:35
people with high social capital and the second for people with low social capital
00:43:40
uh and this holes for many different ways in which we measured social capital
00:43:44
but we're seeing the same effect time and again that
00:43:47
people that build more social capital the more successful long term
00:43:53
at the same time we're seeing this really interesting interaction effect
00:43:57
between gender and these measures of information that adversity in these themes
00:44:02
um i'm showing you here curves off the a difference in
00:44:07
survival probability between contributors with high
00:44:11
and low all programming language diversity
00:44:15
i'm using programming language as a proxy for the kinds of information
00:44:20
or technologies are ideas that uh these themes might have access to um
00:44:25
and your saying that if everybody speaks the same programming language uh in those groups
00:44:31
people are less information only that first and if if people have a lot of
00:44:35
experience with a lot of programming languages from other projects they worked on before so
00:44:39
on two things to note here first of all the um as you'd expect um
00:44:46
the curves uh here are all this positive so uh the more uh
00:44:52
of this information diversity there is the more people
00:44:56
survive this is a holes for both men and women
00:45:00
but second then interestingly i seems women benefit more from
00:45:04
this than men do again exactly as the theory would predict
00:45:08
so here uh we're seeing that there's no difference in survival
00:45:12
probability between men and women yeah in roughly the first year
00:45:16
so well regardless of how much of this information diversity there
00:45:21
is but uh later on uh there's a much higher uh
00:45:28
increase in survival probability for women that are exposed to these information only the reverse
00:45:34
teams compared to men women benefit more from this uh information ever see them and
00:45:41
so now does provide a very clear a way for ah
00:45:48
well us going forward in trying to increase diversity
00:45:52
in retention in these communities um by investing and
00:45:56
ways that could help people build more social capital and of fostering these sort
00:46:02
of inter technological uh collaborations and so on were were people get access to
00:46:08
these different ideas and is different technologies and they get a chance to
00:46:13
to learn about new things in continue their engagement or uh
00:46:17
so just to wrap up um our i argued that
00:46:24
it's hard to create sustainable open source communities are it has always been hard
00:46:29
but perhaps it's particularly harder these days because of
00:46:33
the ways in which open source a softer it's changed
00:46:37
um i show new some of these changes talked about why transparency
00:46:42
we talked about hide levels of them and the stress that are
00:46:46
input on the on on these make here's talked about uh that's really low demographic diversity
00:46:52
that's present in open source communities i give you some examples three for more research group
00:46:57
of studies that have to try to understand these problems better
00:47:02
and then try to propose some solutions um to these problems
00:47:07
but really um we're only beginning to scratch the
00:47:11
surface of even understanding these problems little lawns holding them
00:47:16
and this is something that goes well beyond what any
00:47:19
single research group could do so really i'm here to back
00:47:24
for your help we really need your help to um
00:47:29
on the stand these problems better and and even begin to solve them so um
00:47:34
please you know i'll talk to to me
00:47:37
talk to my colleagues about the sustainability challenges that
00:47:42
your particular open source projects are facing um and would love to uh
00:47:47
to think about those problems together with you and and try to sort them
00:47:52
so what does all and and thank you very much
00:48:05
anyway it's torque lines is one over here
00:48:11
when i guess having to uh i have a hard time seeing you because of the whites
00:48:21
anybody after
00:48:23
okay so the question which is over here and then we can i
00:48:27
think and talk afterwards actually talk a an unseen that's a that's a
00:48:33
it's a an inclusion of um i know it is just a it's a big thing
00:48:37
yeah that's got a special so couple weeks ago ask you to come but i was wondering
00:48:42
how does this exclusion work because i'm not
00:48:45
really aware that's community ah excluding people and i'm
00:48:49
not aware of me excluding minority so i'm
00:48:52
just curious ditch also if look into how them
00:48:56
people ah being excluded yeah so okay so thank you that's a good question um
00:49:00
one thing i'm going to think that it's a
00:49:03
one is um historically in these platforms there is um
00:49:08
higher emphasis being put on quote
00:49:11
contributions over other kinds of contributions
00:49:14
this is the reason why i was asking the audience for what other non called
00:49:19
coat contributions do you do see do you do a new value in open source um
00:49:24
so this isn't um about men versus women were
00:49:27
um about any particular minority versus any particular image already
00:49:31
but it's about catering to different information processing silos and styles of contributing and so on
00:49:37
i so there's a very wide a range uh in in any
00:49:40
particular demographic group of of a preferences for these things but really um
00:49:46
i think we should be welcoming more diverse kinds of contributions in valuing
00:49:50
them equally that's one thing second is there's this implicit culture that um
00:49:58
we're not we're not explicitly um discriminating
00:50:02
uh or hardly ever as a community or
00:50:05
as individuals are hardly ever doing this explicitly but there's this implicit culture
00:50:10
that carried over i'm not sure exactly uh from from where but it's uh
00:50:14
the perception is there and to our studies and other people studies and the big uh give had to
00:50:20
open source studies with a thousands of people have shown time and again that there is this perception that
00:50:27
uh of uh some of these communities and some of these meetings are just not welcoming
00:50:32
um maybe linda store bought is one example of a kind of a
00:50:37
interaction style that is not what coming uh to the again not about women versus
00:50:43
men but to to anyone really mind is only probably a few people that um
00:50:50
like this kind of uh interaction indigent and and seek it
00:50:56
if one were and maybe after rex yeah probably change sessions
00:51:03
um chief talks about sustainability and from top
00:51:08
to city i mean we haven't really acknowledge see
00:51:12
the massive hidden subsidy foe open source that comes through the fact that
00:51:18
it takes skill people with the time and space to to contribute ah um
00:51:23
apart she'll get college fees to dump a chalk didn't look after elderly parents and
00:51:30
they don't pay rent um could you say if you were to some economic
00:51:36
bases and and how we can actually improved augustine song yet thank thank
00:51:40
you so much i think that ties perfectly to the previous uh question um
00:51:44
and i forgot to mention that are you thinking for for asking um
00:51:47
i i think you're right i think that's absolutely part of the problem and
00:51:51
it's only the privileged people if i can say
00:51:54
that that can afford to do open source i
00:51:59
the right now uh because most other people are just busy
00:52:03
having real jobs and what not and and supporting their families and
00:52:07
we're not putting themselves through school so they don't have the time or the energy to do this as a hobby right is the
00:52:12
people that already have the means to support themselves and their families
00:52:16
that can afford to put in this extra time and then hobby
00:52:19
and passion into body open source i think you're absolutely right i i don't know how to fix it but i think you're right
00:52:25
okay comment she now professor

Share this talk: 


Conference Program

Welcome!
June 11, 2019 · 5:03 p.m.
1574 views
A Tour of Scala 3
Martin Odersky, Professor EPFL, Co-founder Lightbend
June 11, 2019 · 5:15 p.m.
8333 views
A story of unification: from Apache Spark to MLflow
Reynold Xin, Databricks
June 12, 2019 · 9:15 a.m.
1267 views
In Types We Trust
Bill Venners, Artima, Inc
June 12, 2019 · 10:15 a.m.
1569 views
Creating Native iOS and Android Apps in Scala without tears
Zahari Dichev, Bullet.io
June 12, 2019 · 10:16 a.m.
2231 views
Techniques for Teaching Scala
Noel Welsh, Inner Product and Underscore
June 12, 2019 · 10:17 a.m.
1294 views
Future-proofing Scala: the TASTY intermediate representation
Guillaume Martres, student at EPFL
June 12, 2019 · 10:18 a.m.
1156 views
Metals: rich code editing for Scala in VS Code, Vim, Emacs and beyond
Ólafur Páll Geirsson, Scala Center
June 12, 2019 · 11:15 a.m.
4695 views
Akka Streams to the Extreme
Heiko Seeberger, independent consultant
June 12, 2019 · 11:16 a.m.
1552 views
Scala First: Lessons from 3 student generations
Bjorn Regnell, Lund Univ., Sweden.
June 12, 2019 · 11:17 a.m.
577 views
Cellular Automata: How to become an artist with a few lines
Maciej Gorywoda, Wire, Berlin
June 12, 2019 · 11:18 a.m.
386 views
Why Netflix ❤'s Scala for Machine Learning
Jeremy Smith & Aish, Netflix
June 12, 2019 · 12:15 p.m.
5025 views
Massively Parallel Distributed Scala Compilation... And You!
Stu Hood, Twitter
June 12, 2019 · 12:16 p.m.
958 views
Polymorphism in Scala
Petra Bierleutgeb
June 12, 2019 · 12:17 p.m.
1113 views
sbt core concepts
Eugene Yokota, Scala Team at Lightbend
June 12, 2019 · 12:18 p.m.
1655 views
Double your performance: Scala's missing optimizing compiler
Li Haoyi, author Ammonite, Mill, FastParse, uPickle, and many more.
June 12, 2019 · 2:30 p.m.
837 views
Making Our Future Better
Viktor Klang, Lightbend
June 12, 2019 · 2:31 p.m.
1682 views
Testing in the postapocalyptic future
Daniel Westheide, INNOQ
June 12, 2019 · 2:32 p.m.
498 views
Context Buddy: the tool that knows your code better than you
Krzysztof Romanowski, sphere.it conference
June 12, 2019 · 2:33 p.m.
393 views
The Shape(less) of Type Class Derivation in Scala 3
Miles Sabin, Underscore Consulting
June 12, 2019 · 3:30 p.m.
2321 views
Refactor all the things!
Daniela Sfregola, organizer of the London Scala User Group meetup
June 12, 2019 · 3:31 p.m.
514 views
Integrating Developer Experiences - Build Server Protocol
Justin Kaeser, IntelliJ Scala
June 12, 2019 · 3:32 p.m.
551 views
Managing an Akka Cluster on Kubernetes
Markus Jura, MOIA
June 12, 2019 · 3:33 p.m.
735 views
Serverless Scala - Functions as SuperDuperMicroServices
Josh Suereth, Donna Malayeri & James Ward, Author of Scala In Depth; Google ; Google
June 12, 2019 · 4:45 p.m.
936 views
How are we going to migrate to Scala 3.0, aka Dotty?
Lukas Rytz, Lightbend
June 12, 2019 · 4:46 p.m.
709 views
Concurrent programming in 2019: Akka, Monix or ZIO?
Adam Warski, co-founders of SoftwareMill
June 12, 2019 · 4:47 p.m.
1974 views
ScalaJS and Typescript: an unlikely romance
Jeremy Hughes, Lightbend
June 12, 2019 · 4:48 p.m.
1376 views
Pure Functional Database Programming‚ without JDBC
Rob Norris
June 12, 2019 · 5:45 p.m.
6374 views
Why you need to be reviewing open source code
Gris Cuevas Zambrano & Holden Karau, Google Cloud;
June 12, 2019 · 5:46 p.m.
484 views
Develop seamless web services with Mu
Oli Makhasoeva, 47 Degrees
June 12, 2019 · 5:47 p.m.
785 views
Implementing the Scala 2.13 collections
Stefan Zeiger, Lightbend
June 12, 2019 · 5:48 p.m.
810 views
Introduction to day 2
June 13, 2019 · 9:10 a.m.
250 views
Sustaining open source digital infrastructure
Bogdan Vasilescu, Assistant Professor at Carnegie Mellon University's School of Computer Science, USA
June 13, 2019 · 9:16 a.m.
374 views
Building a Better Scala Community
Kelley Robinson, Developer Evangelist at Twilio
June 13, 2019 · 10:15 a.m.
245 views
Run Scala Faster with GraalVM on any Platform
Vojin Jovanovic, Oracle
June 13, 2019 · 10:16 a.m.
1340 views
ScalaClean - full program static analysis at scale
Rory Graves
June 13, 2019 · 10:17 a.m.
463 views
Flare & Lantern: Accelerators for Spark and Deep Learning
Tiark Rompf, Assistant Professor at Purdue University
June 13, 2019 · 10:18 a.m.
380 views
Metaprogramming in Dotty
Nicolas Stucki, Ph.D. student at LAMP
June 13, 2019 · 11:15 a.m.
1250 views
Fast, Simple Concurrency with Scala Native
Richard Whaling, data engineer based in Chicago
June 13, 2019 · 11:16 a.m.
624 views
Pick your number type with Spire
Denis Rosset, postdoctoral researcher at Perimeter Institute
June 13, 2019 · 11:17 a.m.
245 views
Scala.js and WebAssembly, a tale of the dangers of the sea
Sébastien Doeraene, Executive director of the Scala Center
June 13, 2019 · 11:18 a.m.
661 views
Performance tuning Twitter services with Graal and ML
Chris Thalinger, Twitter
June 13, 2019 · 12:15 p.m.
2003 views
Supporting the Scala Ecosystem: Stories from the Line
Justin Pihony, Lightbend
June 13, 2019 · 12:16 p.m.
163 views
Compiling to preserve our privacy
Manohar Jonnalagedda and Jakob Odersky, Inpher
June 13, 2019 · 12:17 p.m.
301 views
Building Scala with Bazel
Natan Silnitsky, wix.com
June 13, 2019 · 12:18 p.m.
565 views
244 views
Asynchronous streams in direct style with and without macros
Philipp Haller, KTH Royal Institute of Technology in Stockholm
June 13, 2019 · 3:45 p.m.
304 views
Interactive Computing with Jupyter and Almond
Sören Brunk, USU Software AG
June 13, 2019 · 3:46 p.m.
681 views
Scala best practices I wish someone'd told me about
Nicolas Rinaudo, CTO of Besedo
June 13, 2019 · 3:47 p.m.
2702 views
High performance Privacy By Design using Matryoshka & Spark
Wiem Zine El Abidine and Olivier Girardot, Scala Backend Developer at MOIA / co-founder of Lateral Thoughts
June 13, 2019 · 3:48 p.m.
753 views
Immutable Sequential Maps – Keeping order while hashed
Odd Möller
June 13, 2019 · 4:45 p.m.
276 views
All the fancy things flexible dependency management can do
Alexandre Archambault, engineer at the Scala Center
June 13, 2019 · 4:46 p.m.
389 views
ScalaWebTest - integration testing made easy
Dani Rey, Unic AG
June 13, 2019 · 4:47 p.m.
468 views
Mellite: An Integrated Development Environment for Sound
Hanns Holger Rutz, Institute of Electronic Music and Acoustics (IEM), Graz
June 13, 2019 · 4:48 p.m.
213 views
Closing panel
Panel
June 13, 2019 · 5:54 p.m.
400 views

Recommended talks

Open Access run by and for professional scientists – Experiences with SciPost
Jean-Sébastien Caux, Institute of Physics, University of Amsterdam
Oct. 19, 2020 · 12:45 p.m.