Embed code
Note: this content has been automatically generated.
thank you very much for pretty introductions up i would present like a little bit about the basal project and uh
evaluation as a service it's more of a concept that came out of the
project like some of the reflections support the work that we have done
and if we work in medical imaging we we always have like i have plenty of nice pictures to show us
mm if so when um i started preparing for this which the toppings reproduce abilities i was looking around with what
actually exists to reproduce ability and it's i think in line with a remote this morning from here about uh
uh open science et cetera they're actually quite a few like
a initiatives person institute for incentive for open signs
in the us non for profit organisation that tries to really look at like open transparent reducible signs
because everybody should have an interest in this and uh there are many many other initiatives there's
a group but uh uh stanford university around like john you and you guys who has
written a couple of articles very critical about reproduce ability of medical research
one of them called why most published research findings of falls
and um so he also has like a matter centre like metal research innovation centre
and i saw that even now have a stanford centre for reproduce when your slides
so meaning that i think many universities around the world actually really invest into this domain
and it shows that i think in the long run it it is something that uh when we make research better and stronger
but there's also like with respect to what we saw this morning there also a critical
voices there's wasn't article in in nature like the risks of up the replication drive
and i think something we need to take into account is what particular experimental sciences so
i think computational sciences we can very easily reproduce results of experiments but if we're running like
medication test these kind of things actually experimental conditions like small changes and experimental conditions
can lead to incorrect results and so if we want to reproduce medical results
and actually we want as an outcome that they're not reproduce but it's very easy to reach so
just make like small changes modify things a little bit so there's also
um uh and risk in that but sometimes it's just impossible
to reproduce exactly the same conditions and i think we need to take that into account also when we talk about reproduce ability
this is something that um um i looked into because
i like the scientific environment is very competitive so
we stay in here like everybody wants to publish we uh we
want to get funding and uh quite often like to
fighting sort of against each other and sometimes it's good like if we just sit back and look at each other
and uh how can we do that better and how can we maybe uh i
how to use this also to take that assigns i think this is something that i would like to see also out of this meeting because i'm like
which to institutions very close to each other that we like to see more
of these things will actually looking together to uh to get results
in in terms of like competitions will often scott scientific
competition i would rather call it cool petition
like a mix between everybody wants to get good results was only to look into actually corporations
really working together to uh to get results and this is a it's from the citation of nineteen
sixty eight so i uh uh that was richard hamming when he received the turing award
so he said like uh well uh newton said uh i sat stood
on the shoulders of giants today we stand on each other's feet
um perhaps the central problem we face in all of computer science is how we get to the situation we built on top
of the work of others rather than re doing so much of it which really different way that with sixty eight
fifty years ago and very often i i feel a little bit like
that like will making things slightly differently but we all have
like pressure we need to publish to work modifying things a little bit and but much of it is like should really different
and when i started my p. h. d. in image retrieval content based image retrieval
like i started reading papers and relies actually have no idea whether these algorithms good or not
i reprogrammed several algorithms from other people that are reading papers they didn't work on my data
and it was like impossible it was impossible to compare to other programs so i started working on evaluation making
data set available pushing people to compare everybody said like it's great that's what we need to do
but i had almost no one when the end came down and compare the results t. so was a little bit frustrated that's when we
traded like uh i've had a couple of names here but of
of when i think about fifty scientific challenges over the years
and arm it's quite interesting to see because then everybody has exactly the same
environment the same data the same experimental setup uh to compare results
and uh over the years we've had several results in also we've seen this morning like people
pushed was making code available making data available which do down like a bit pneumatic analysis
so if the code is available deposit to get more citations in particular the data papers like it
was making data available also usually get a many more citations than papers that don't too
it doesn't happen within a year or two but it's rather over a longer period of time saying like five to ten years
um they're also like commercial platforms like cackle of that
have made available data sets or a scientific
environments and the plenty of room with the prize money so people can participate in this war
in our scientific challenges for we don't have prize money but to people but it's more of
a scientific will in one of the things that struck me at some point is like
the plumbing needed to move data is an unavoidable part of purchasing data silence so moving data arm
was seen as like go something important but um one of the things that we released over the years is actually i'm
like yes we want to share data um we want to have sort of the point of large data sets
what have them well annotated and out here this is the paper off the that the
group didn't mention the beginning and they said like um if like very often
results on very small data sets the more competitive the domain of the more wrong all the findings because people
don't take the time to actually properly analysis but get results are very quickly because there's pressure people
um but um there's also n. i. h. that really pushes people
to make data sets available out of projects every n. i.
h. funded presentation institutes of all in h. funded project has to make the data available the end of the project
the problem is you can make it available in in many ways that make it unusable so that's what quite often have often happens
so it's not well annotated sometimes a important parts and missing and how the data were created so that there are many problems
and so over the years we realise that actually with the challenges we've organised on there are many many
challenges with organising challenge so one is extremely large
data sets so what kyle said like um
shipping the data plumbing around shooting the data is unavoidable um like many data sets like i've
just downloaded five tera bytes from the national library of medicine with like department data set
we like three weeks five million zip files and then another two weeks
to unzip them so it's not not messy very convenient to to
do that and there's always respect depending on what your we have pretty fast lines in switzerland if you happen to be in
i don't know in a um a jerry at university and you want to do that
are it'll take you two years to download the data so it's just not possible
i've also had hot tests and there's like the national lines being from the us it's not three tera bytes i think
so i sent the hottest there they send you what is that the house was broken when it right
so how do we manage to respond to reconstruct most of it but but there are problems with
another problem i work on medical data so medical data is confidential in general so it's hard to distribute it
small data sets can often synonymous we can check it manually particular or
text documents and images you can check it but i'm even
there's always a small wrist that something was not the scenic prostheses and realise they have unique identification number so you can actually
there's a prostate is somewhere in some graphic image you can actually we and number and you can we identify the pros
so they're always like small rates that we might not see actually that we only relies afterwards and if you
have a small problem and you multiply with ability in actually have a pretty big problem so
um we need to look at like how can we how can we deal with it
than any other domains like enterprise search can make available email archives of companies
investigative domain can police officers make available data on terrorist
and all like screening and things like that
g. p. s. status a telephone companies or so who would have a lot of g. p. s. data from people
and actually you can see where the people lets you cannot really distribute the data to researchers even though they
would really like to use apart from my cackle uh or anything similar to actually run things on
no other products quickly changing data so some domains we have new data writing constantly and
we would like to evaluate the or rhythms always on the like to status it's
oh if we create test data set we generated we make it available
on the platform with people work on it we get results evaluated
that's nine months to a year in some domains that's the that's quite quite a lot
and then as soon as we have that's what what i mentioned in cairo us with one half price money is cheating actually problem
so can people just annotate the test data and then train with
that to get better results maybe in a purely scientific setting
like we we never worried about it in make most of the the main like most of the contains we had
is the fact i got lucky provided paper if it's not reproduce abode at some point it
people might relies it because they cannot reproduce exactly the results if there's like there's no competition on
line cancer with into a million dollars of prize money so i think there's incentives
in this case to get the optimal results to get the best possible results
another problem is like groups with a lot of computing power have advantages
so if you you can one more complex models you can
run more training so potentially you can get better results can we
actually normalised it can can we make things really comparable
as a for like this was one to get rid of that so we're looking into this so we're looking into clout computing
it was like for this whole project started i think in two thousand twelve and
at that time it was like came up so we said like okay
we want to make a competition with the participants actually get a virtual machine
they can work on a small data set so they can that that their own rhythms
uh that that training systems and there are like segmentation systems or retrieval systems
and then for testing we take over the virtual machines we kept
the axis and we run it on a large data set
so i think some of like what you've seen this morning with the b. platform many
the objectives are actually quite similar to to what was in to be like from
this is not is integrated but i think that's why also asked the question about like what two main is it for
medical imaging people use totally different tools and uh it's very hard with a
virtual machine they can use the favourite environment huh they can use matlab
if they want to they can use windows or linux machines and then install so this was a little bit the idea so this is what
we had in in a project proposal i think in two thousand ten or eleven movie but when we go to propose
and then we started annotating data so we owe the
first competition we ran wasn't medical image segmentation
so that was the cover picture that you've seen so we had twenty organs that be annotated
in eighty images for testing and then we had another forty in l. a.
for training and another forty images that uh that we then when used
for testing some of the organs are not not visible in in all
modalities so we can only undertake what what's actually visible into data
and are you want thing like just as an example so these are people with these organs annotated you might think that i'll
one segmentation to pretty simple problem but it's the basic thing humans up pretty different when we look at that like uh
there are some have like pick levels smaller those small lines big longs so there's a lot of variability
and when we want to treat medical data when we actually want to make sure that we're comparing the same things
that we are we need to actually look into these uh these kind of things into it automatically
and these are some of the results like in like grade we have marked what radiologists annotated
and then we had different participant algorithm so this is then what we can compare weak compared to
in in in in in three dimensions can see also sometimes the automatic algorithms totally fail sometimes
the the organs are really difficult also to to to undertake in the date another thing we realise also that
even humans have a lot of variations so we had everything was annotated by several humans
um we then run comparisons so we also have a
baseline to once we consider like a maximum performance
but then i remembered you save the organs are at human performance actually itself
uh uh if the agreement with the same then that was so that that was considered
for us for like uh like being in an an optimal a segmentation or group
in other task we we ran with in the visceral oh
a project was reason detections so detecting small lesions in our
bones the liver brain long and in lymph nodes
and we are you can like make data sets available that people could then one on
unfortunately this is very hard task and uh for this task we
have like a basically nobody who participated in the competition
another uh part was are a similar case retrieval so the idea was we give a case
uh with a a region of interest marks we mark which organ is actually interesting
uh we do with automatic segmentation we have the radiology
reports but text documents are hard to actually distribute
be because an an imitation is not as easy as with uh with images of structured data
so we extract it semantic terms from these documents and then are that the participants had to
find similar cases in the database of i think like three thousand or four thousand to
volumes with a touch with so that was so the third task that we run within the spongy
another advantage off having the code of people is also that we can
create something that we call the silver corpus so we actually
like this is an example we took for our becomes we'd run like the algorithms on new data because we do have
um hundred twenty volumes that i manually annotated have a few thousand volumes well we have no annotations
so by by using several algorithms to simple label fusion we can actually
create something that is fairly similar to what the human might annotate
and i'm like that we can create additional training data uh that can then be released we can
then use for the algorithms to train again into to run on the test data so hopefully
uh with an iterative but automatic approach we can we can improve
our rooms and it can also serve to like really
get annotations even if they are not perfect for a a
very large so uh for very large number of cases
this also can allows for example to choose those cases for manual annotations with all of the
participant algorithms have like different results is even if all of the participant algorithms are wrong
we would only get more information in terms of ranking the algorithms with the cases where they actually different
so um maybe then brown chooses not the most important part but really finding these cases that
allow us to uh to to separate the pots so in the end like arm
this is pretty much what we had at the end of a project so this isn't got a lot more complex
so we still have virtual machines for the participants with registration system
so people can register automatically when they sign the license agreement
they get a virtual machine assign we haven't analysis system so when they train it um and they say
like okay now i submit my sit system the link is cut the analysis system runs on
uh the test data uh and then the analysis system
submits result to us organise us into the participants
and also all of the annotation actually happens in the system so we don't need to move data anymore
all of the data remains at the same place uh all the annotations happened see and with time we
actually realise also that which machines are actually not as portable as with all like moving them
from an is er clout to numbers and clout is nontrivial
so that's why we are actually move toward stock and i'm really happy like was it would be too
but to go to was because doc that's really like white approach to our part to move
quote unquote actually gets increasingly are mobile in in this respect we had several projects so
in the us and we're also discussing that in switzerland we can like this we cannot only work on data that is in the clout
which is usually like most most often anonymous data we can actually move the code further
on we can move it to hospitals so we can work directly on life data
because we don't have the problem of our our having to anonymous the
data having to move the data out because nobody only the
code sees the data no researcher actually sees the data the research
is only get evaluated results back and ups and um
and with the national cancer institute we also uh uh have worked too
uh look into how we can generalise this so they can actually
run their data challenges also in distributed way so potentially
sending the code to several institutions where the results evaluated been abrogated for a
further in august and this is just as an overly so really
it's a lot lighter than than virtual machines it's much more mobile it's easy to move around and it also voice over had
for groups that need to reinstall the whole software step because they can just
use like a local doctor contain and move that a move that over
um and there are couple of other lessons we learned from from from the this uh project so
cloud space is actually a pretty expensive so when initially like fortunately thank you microsoft so they
okay 'cause i think a hundred fifty thousand dollars in job computing resources over the years
um so we had to dispense running things are evaluating them so but you
think that bush machines open without doing anything it's quite so now we
developed a methods actually shut them down to stop them because that cost in terms of like sustainability
also i like with a beep from if you have people running things on the platform
you're responsible for for making making it like like sustainable and so we're looking into what what actually possibilities to
do that to control the cost and to limit maybe also the amount of computation but somebody can take
and um for installation is additional work we also have requests that the time for g. p. use in two thousand twelve
in the a.'s or cloud the window g. p. use now they have the traditions that you used so we could more easily do that but at
the time couldn't so some groups wanted to use did learning at the time uh they were pretty much excluded from uh from from this
it's also when we run competitions or cool petitions
um it actually takes quite a while initially like in when we rent the
project for the first time where five participants the twelve and seventeen
in i think now are like three years after the project finished i think we have over two hundred groups that are registered on the web page
uh on on the registration system we've run the code on the data that
also shows that we really need to think i think long term if
we want to have an impact on this and i think that's really something that's important when we think about open signs it's not like
feedback like uh you will not get our uh like a gratification
for your data or your code within here it's really
with a span of like five years that you you did you see whether people pick it up with the people would use it
and um we also realise that like trouble shooting like the prototypes admission it's it's
actually probably because people cannot see what happens on the test data so sometimes
the systems faded like out of memory errors and then with a small number it's fine
because we can go there we can check it out in manually supply feedback
but for larger number you need something more more like automatic which just like that that beep from for example
um but uh i i think i think we should be already like or a soft quite a few of the problem so
um we can make large data sets available so we don't have the problem shipping so
in the work like tens of a terrible it's not a problem we bring
the order against was the data not moving it the other way around
um we can run it on confidential data are we can always use
the later status because sometimes we actually in our data set
sometimes the ground to change because somebody said like always saw that there's an incorrect or
segmentation so we really are we modified the data and run run everything again
if we have the code it's it's triple to do we have more cases we can re run it again like that we can also see like how stable the ranking
because there is like variability depends on the topic it depends on the exact images
and very often the differences of the best systems are actually
within arrow so there's no statistically significant difference quite often
we can reproduce things because it's it's it's it's virtual machines so it's relatively easy even if the if
we're working on different hardwood because everything's virtual actually um it's it's relatively easy to cops to produce
and well we can also avoid cheating because nobody sees the test data nobody
can annotated nobody can optimise manually on the test data looking about
um and we also removed by so we can we have computation time so
we know how long it took forever everybody got the same virtual machine
so we can see it some of the other rooms for two minutes of us run for like twenty four hours for exactly the same thing
so there's like huge differences in the order of like a a factor of a hundred in that respect but
still reuse of components is not trivial making people collaborate more on components
was something that we found quite hard in really pushing people to to like work
together i think that those were the two parts that uh the different then
based on the outcomes of this so within think like how can we
okay if you push this this further uh what what are the problems actually with making disposable more
more general line simple in we came up with organised one watch appearance here in two thousand
fifteen in march and then another one in boston in november two thousand fifteen with
rip some very different stakeholders affirm industry with people from intel for microsoft
from funding organisations from academic part most user groups et cetera
to look at um how can we how how what are the existing approaches and how can we do with it
so there's there's a white paper are that we put on archive stowed openly accessible it's not he reviewed
um uh but people people can have a look at that and um
like the different ways of doing this using a. p. i. is of using executable code um
i'm looking at like what are the different stakeholders that uh that we can
look at we can see that for example the national cancer institute
push towards uh are um the organisation of challenge which got the digital mammography dream challenge
it's organised by sage fire networks which sure is an american company
there in seattle and they run scientific challenges and uh
they picked up exactly the small so they used talker containers people wanted people haven't limited amount
of computation available it's one on i. b. m. infrastructures and i'm isn't infrastructures in
um they have a million dollar prize so if your interest and that it's uh it's quite
interesting and they have what is called a community face initially everybody's separately submits results
but in the end they really want people to collaborate to work together to get the best results so the best
possibility very likely to win one of the prices a have good results early
but then team up without which which i quite liked in that respect
their business model so there's a company called other medical vision but they also make data available docket containers and
read the idea is that of algorithms they can commercial ice results and then ship benefits on a lot
of it's it's it's i think an interesting model and it shows that there is also possibility
of business model on microsoft chose that tells that they want to make a billion available resources so
every two months i think they have a deadline for project submissions of uni computing power
it's relatively easy to to to get and there's a lot of discussion
in like the the literature of the various stakeholders like politics so
on the thing and us it has not changed a little bit but uh by doing
definitely a push quite to was like making things available and we asked like how
we want to have the kinds the moon shopkeepers that like make data available share data
are those those were the things that uh that read researchers boston for
something for intel so they have like was call for a cancer clout
that they uh put up and the model is really that
hospitals will have computing infrastructures in the future so we can actually move because their work directly on the data so does
not need the institution in the risk of like particularly in the us were many data sets can be bored
also sometimes data that on like all protected data and so you can match actually different data sources to really identify patients
from voter records to uh i different registered in by keeping the data
in the indian situations which you more these kind of problems
institutional supports we at the national cancer institute participating and they're really pushing for running challenges making
are making things available and this is one of the projects were i'm involved in
uh of the scope quantitative imaging network of the national cancer institute
where they have a are applied for what every every
maybe evaluation of data in this context is to be run by doctor containers the containers are kept
available it's one in the clouds are to really make sure that everything is absolute reproduce well
from data creations so they're working on image data creation feature extraction 'cause even
standard features like of features based on co currents mattresses for for
text extraction had extremely different results so they had like
seven hospitals participating in one of those are a test and they realise that even like
very very basic features depending on the installation that people were using where they
were normalising the way they were normalising they they didn't correlate at all
and uh i i think it's really important to have these things to to look at like what what we
start with because even the image creation was very different so even the same machine installed into different hospitals
didn't deliver the same image and they work physically shipping like a phantom but they took images so
so if the images are not the same with the same parameters we need to look at like if we extract quantitative
features everything's based on on what is done before in terms of also like reproduce ability that's actually really really important
as i mentioned there's when we look at this so it's political part of my talk i think um when we look
at uh evaluation is the service is a concept really need to look at like with this with the stakeholders
so we have funding institutions like this is national science foundation we have
companies who have their part in the we have scientists everybody has different constraints people we have a problem it's like on
clinicians in our case to have a problem that they would like to solve what they would like to to use
data sciences tools to to help them and how can we uh like put these different interest together because i think
everybody would have an interest in the system that would be more more efficient more effective to india and
so there's more on this in the white paper and um on on on off
yeah okay okay you know five minutes i'm i'm always it but so for me or like one of the messages
putting from today's really um if you like many things are centred around data and we've mentioned like was mentioned this morning that
data sets on a side table itching nature has now journal called
scientific data is nature scientific data which only partially status
it's an open access journals if you have a nice data set out submitted here you will expose it to the community
uh they have a very good review process and they really give very very good feedback uh on things
um and i think it's important to make them available in meaningful ways we have one of our data sets
published here and they were actually downloading the data they were reading are constructions even the instructions on how
to use the data they corrected the invasion there so i was quite impressed but really they they they take
it seriously uh to that the data made available in a meaningful way so so people can actually
rely on it and we use it and i think we also need to look into how can we share
infrastructures sitting one of the difficult part is really hard to make it sustainable how to make this
but from like beat how can we make it sustainable independent from
like our project funding 'cause i think like dependent project funding
you depend on project funding it's it's not easy to to have something that uh uh that is actually totally independent from
and um our our code will definitely become more portable talking really helps with that and i think we also need
to look into like public private partnership how can we get like infrastructure providers in here now with like the
e. p. f. l. d. th that data sciences centre maybe there will be like a scientific lot for switzerland that
i would be easy to use that's something to look into it but if you really want like a reproduce ability like
long term reproduce ability insuring of tools we need to look at like how can we create an infrastructure
that is flexible enough are maybe different infrastructures for different domains but
um because it it it would be much more efficient to to do that because
our it just doesn't make sense that you know like how much people develop in in two different directions
well i think data signs really requires an infrastructure want he like
everybody's working in one way or another on artificial intelligence so
everything evolves around data in in medical field i think we need to work on routing data currently like whenever we
produce data for clinical trial we have like a perfect image quality everything's very controlled reaching data is not
like you don't want to have the best image that you want to have the lows radiation dose for your patients so you're
trying to limit things to load those to the quality is not the same so if we want it to be usable
in in in real setting which you need to look at how how how we can easily do that we need to look at ways
how to actually limit the amount of manual annotation so we can use
like active learning we can use weekly annotated data to learn
um are in might get like much much larger data sets from hospitals that can actually allow us to do that
um and then well you need to shave infrastructures i think and uh i'm really like in most domains
want to build decision support from needs in the medical field but in many other fields it really can somebody to
and a lot more available on like the different web pages and uh
also don't don't hesitate to contact me if you have any questions

Share this talk: 

Conference program

Sébastien Marcel, Senior Researcher, IDIAP, Director of the Swiss Center for Biometrics Research and Testing
24 March 2017 · 9:17 a.m.
Keynote - Reproducibility and Open Science @ EPFL
Pierre Vandergheynst, EPFL VP for Education
24 March 2017 · 9:20 a.m.
Q&A: Keynote - Reproducibility and Open Science @ EPFL
Pierre Vandergheynst, EPFL VP for Education
24 March 2017 · 9:54 a.m.
VISCERAL and Evaluation-as-a-Service
Henning Müller, prof. HES-SO Valais-Wallis (unité e-health)
24 March 2017 · 11:35 a.m.
Q&A - VISCERAL and Evaluation-as-a-Service
Henning Müller, prof. HES-SO Valais-Wallis (unité e-health)
24 March 2017 · 12:07 p.m.

Recommended talks

Le projet européen Khresmoi : les résultats
Henning Müller, prof. HES-SO Valais-Wallis (unité e-health)
6 June 2014 · 10:50 a.m.
How decentralised energy systems change the game
Karl Werlen - René Schumann, Misurio AG & HES-SO Valais
29 Jan. 2016 · 2:05 p.m.