Embed code
Note: this content has been automatically generated.
00:00:00
thank you very much for pretty introductions up i would present like a little bit about the basal project and uh
00:00:05
evaluation as a service it's more of a concept that came out of the
00:00:09
project like some of the reflections support the work that we have done
00:00:14
and if we work in medical imaging we we always have like i have plenty of nice pictures to show us
00:00:21
mm if so when um i started preparing for this which the toppings reproduce abilities i was looking around with what
00:00:28
actually exists to reproduce ability and it's i think in line with a remote this morning from here about uh
00:00:34
uh open science et cetera they're actually quite a few like
00:00:38
a initiatives person institute for incentive for open signs
00:00:41
in the us non for profit organisation that tries to really look at like open transparent reducible signs
00:00:48
because everybody should have an interest in this and uh there are many many other initiatives there's
00:00:53
a group but uh uh stanford university around like john you and you guys who has
00:00:57
written a couple of articles very critical about reproduce ability of medical research
00:01:02
one of them called why most published research findings of falls
00:01:05
and um so he also has like a matter centre like metal research innovation centre
00:01:11
and i saw that even now have a stanford centre for reproduce when your slides
00:01:15
so meaning that i think many universities around the world actually really invest into this domain
00:01:21
and it shows that i think in the long run it it is something that uh when we make research better and stronger
00:01:27
but there's also like with respect to what we saw this morning there also a critical
00:01:31
voices there's wasn't article in in nature like the risks of up the replication drive
00:01:37
and i think something we need to take into account is what particular experimental sciences so
00:01:41
i think computational sciences we can very easily reproduce results of experiments but if we're running like
00:01:48
medication test these kind of things actually experimental conditions like small changes and experimental conditions
00:01:54
can lead to incorrect results and so if we want to reproduce medical results
00:01:59
and actually we want as an outcome that they're not reproduce but it's very easy to reach so
00:02:03
just make like small changes modify things a little bit so there's also
00:02:07
um uh and risk in that but sometimes it's just impossible
00:02:10
to reproduce exactly the same conditions and i think we need to take that into account also when we talk about reproduce ability
00:02:18
this is something that um um i looked into because
00:02:22
i like the scientific environment is very competitive so
00:02:27
we stay in here like everybody wants to publish we uh we
00:02:30
want to get funding and uh quite often like to
00:02:34
fighting sort of against each other and sometimes it's good like if we just sit back and look at each other
00:02:40
and uh how can we do that better and how can we maybe uh i
00:02:45
how to use this also to take that assigns i think this is something that i would like to see also out of this meeting because i'm like
00:02:51
which to institutions very close to each other that we like to see more
00:02:55
of these things will actually looking together to uh to get results
00:02:59
in in terms of like competitions will often scott scientific
00:03:03
competition i would rather call it cool petition
00:03:05
like a mix between everybody wants to get good results was only to look into actually corporations
00:03:10
really working together to uh to get results and this is a it's from the citation of nineteen
00:03:18
sixty eight so i uh uh that was richard hamming when he received the turing award
00:03:23
so he said like uh well uh newton said uh i sat stood
00:03:27
on the shoulders of giants today we stand on each other's feet
00:03:32
um perhaps the central problem we face in all of computer science is how we get to the situation we built on top
00:03:38
of the work of others rather than re doing so much of it which really different way that with sixty eight
00:03:46
fifty years ago and very often i i feel a little bit like
00:03:50
that like will making things slightly differently but we all have
00:03:53
like pressure we need to publish to work modifying things a little bit and but much of it is like should really different
00:04:01
and when i started my p. h. d. in image retrieval content based image retrieval
00:04:07
like i started reading papers and relies actually have no idea whether these algorithms good or not
00:04:13
i reprogrammed several algorithms from other people that are reading papers they didn't work on my data
00:04:19
and it was like impossible it was impossible to compare to other programs so i started working on evaluation making
00:04:25
data set available pushing people to compare everybody said like it's great that's what we need to do
00:04:30
but i had almost no one when the end came down and compare the results t. so was a little bit frustrated that's when we
00:04:37
traded like uh i've had a couple of names here but of
00:04:40
of when i think about fifty scientific challenges over the years
00:04:44
and arm it's quite interesting to see because then everybody has exactly the same
00:04:49
environment the same data the same experimental setup uh to compare results
00:04:54
and uh over the years we've had several results in also we've seen this morning like people
00:04:58
pushed was making code available making data available which do down like a bit pneumatic analysis
00:05:04
so if the code is available deposit to get more citations in particular the data papers like it
00:05:11
was making data available also usually get a many more citations than papers that don't too
00:05:16
it doesn't happen within a year or two but it's rather over a longer period of time saying like five to ten years
00:05:24
um they're also like commercial platforms like cackle of that
00:05:28
have made available data sets or a scientific
00:05:31
environments and the plenty of room with the prize money so people can participate in this war
00:05:37
in our scientific challenges for we don't have prize money but to people but it's more of
00:05:41
a scientific will in one of the things that struck me at some point is like
00:05:46
the plumbing needed to move data is an unavoidable part of purchasing data silence so moving data arm
00:05:53
was seen as like go something important but um one of the things that we released over the years is actually i'm
00:06:00
like yes we want to share data um we want to have sort of the point of large data sets
00:06:06
what have them well annotated and out here this is the paper off the that the
00:06:09
group didn't mention the beginning and they said like um if like very often
00:06:14
results on very small data sets the more competitive the domain of the more wrong all the findings because people
00:06:20
don't take the time to actually properly analysis but get results are very quickly because there's pressure people
00:06:26
um but um there's also n. i. h. that really pushes people
00:06:30
to make data sets available out of projects every n. i.
00:06:33
h. funded presentation institutes of all in h. funded project has to make the data available the end of the project
00:06:39
the problem is you can make it available in in many ways that make it unusable so that's what quite often have often happens
00:06:46
so it's not well annotated sometimes a important parts and missing and how the data were created so that there are many problems
00:06:53
and so over the years we realise that actually with the challenges we've organised on there are many many
00:06:58
challenges with organising challenge so one is extremely large
00:07:02
data sets so what kyle said like um
00:07:06
shipping the data plumbing around shooting the data is unavoidable um like many data sets like i've
00:07:12
just downloaded five tera bytes from the national library of medicine with like department data set
00:07:17
we like three weeks five million zip files and then another two weeks
00:07:21
to unzip them so it's not not messy very convenient to to
00:07:24
do that and there's always respect depending on what your we have pretty fast lines in switzerland if you happen to be in
00:07:31
i don't know in a um a jerry at university and you want to do that
00:07:36
are it'll take you two years to download the data so it's just not possible
00:07:39
i've also had hot tests and there's like the national lines being from the us it's not three tera bytes i think
00:07:45
so i sent the hottest there they send you what is that the house was broken when it right
00:07:49
so how do we manage to respond to reconstruct most of it but but there are problems with
00:07:54
another problem i work on medical data so medical data is confidential in general so it's hard to distribute it
00:08:01
small data sets can often synonymous we can check it manually particular or
00:08:05
text documents and images you can check it but i'm even
00:08:10
there's always a small wrist that something was not the scenic prostheses and realise they have unique identification number so you can actually
00:08:16
there's a prostate is somewhere in some graphic image you can actually we and number and you can we identify the pros
00:08:21
so they're always like small rates that we might not see actually that we only relies afterwards and if you
00:08:27
have a small problem and you multiply with ability in actually have a pretty big problem so
00:08:32
um we need to look at like how can we how can we deal with it
00:08:35
than any other domains like enterprise search can make available email archives of companies
00:08:41
investigative domain can police officers make available data on terrorist
00:08:46
and all like screening and things like that
00:08:48
g. p. s. status a telephone companies or so who would have a lot of g. p. s. data from people
00:08:54
and actually you can see where the people lets you cannot really distribute the data to researchers even though they
00:08:59
would really like to use apart from my cackle uh or anything similar to actually run things on
00:09:05
no other products quickly changing data so some domains we have new data writing constantly and
00:09:10
we would like to evaluate the or rhythms always on the like to status it's
00:09:14
oh if we create test data set we generated we make it available
00:09:18
on the platform with people work on it we get results evaluated
00:09:22
that's nine months to a year in some domains that's the that's quite quite a lot
00:09:26
and then as soon as we have that's what what i mentioned in cairo us with one half price money is cheating actually problem
00:09:34
so can people just annotate the test data and then train with
00:09:38
that to get better results maybe in a purely scientific setting
00:09:42
like we we never worried about it in make most of the the main like most of the contains we had
00:09:47
is the fact i got lucky provided paper if it's not reproduce abode at some point it
00:09:52
people might relies it because they cannot reproduce exactly the results if there's like there's no competition on
00:09:58
line cancer with into a million dollars of prize money so i think there's incentives
00:10:03
in this case to get the optimal results to get the best possible results
00:10:08
another problem is like groups with a lot of computing power have advantages
00:10:13
so if you you can one more complex models you can
00:10:15
run more training so potentially you can get better results can we
00:10:18
actually normalised it can can we make things really comparable
00:10:23
as a for like this was one to get rid of that so we're looking into this so we're looking into clout computing
00:10:31
it was like for this whole project started i think in two thousand twelve and
00:10:36
at that time it was like came up so we said like okay
00:10:40
we want to make a competition with the participants actually get a virtual machine
00:10:45
they can work on a small data set so they can that that their own rhythms
00:10:49
uh that that training systems and there are like segmentation systems or retrieval systems
00:10:55
and then for testing we take over the virtual machines we kept
00:10:59
the axis and we run it on a large data set
00:11:03
so i think some of like what you've seen this morning with the b. platform many
00:11:06
the objectives are actually quite similar to to what was in to be like from
00:11:10
this is not is integrated but i think that's why also asked the question about like what two main is it for
00:11:15
medical imaging people use totally different tools and uh it's very hard with a
00:11:20
virtual machine they can use the favourite environment huh they can use matlab
00:11:24
if they want to they can use windows or linux machines and then install so this was a little bit the idea so this is what
00:11:31
we had in in a project proposal i think in two thousand ten or eleven movie but when we go to propose
00:11:37
and then we started annotating data so we owe the
00:11:41
first competition we ran wasn't medical image segmentation
00:11:44
so that was the cover picture that you've seen so we had twenty organs that be annotated
00:11:49
in eighty images for testing and then we had another forty in l. a.
00:11:52
for training and another forty images that uh that we then when used
00:11:56
for testing some of the organs are not not visible in in all
00:12:00
modalities so we can only undertake what what's actually visible into data
00:12:04
and are you want thing like just as an example so these are people with these organs annotated you might think that i'll
00:12:11
one segmentation to pretty simple problem but it's the basic thing humans up pretty different when we look at that like uh
00:12:18
there are some have like pick levels smaller those small lines big longs so there's a lot of variability
00:12:24
and when we want to treat medical data when we actually want to make sure that we're comparing the same things
00:12:29
that we are we need to actually look into these uh these kind of things into it automatically
00:12:35
and these are some of the results like in like grade we have marked what radiologists annotated
00:12:42
and then we had different participant algorithm so this is then what we can compare weak compared to
00:12:46
in in in in in three dimensions can see also sometimes the automatic algorithms totally fail sometimes
00:12:53
the the organs are really difficult also to to to undertake in the date another thing we realise also that
00:12:59
even humans have a lot of variations so we had everything was annotated by several humans
00:13:05
um we then run comparisons so we also have a
00:13:07
baseline to once we consider like a maximum performance
00:13:11
but then i remembered you save the organs are at human performance actually itself
00:13:16
uh uh if the agreement with the same then that was so that that was considered
00:13:20
for us for like uh like being in an an optimal a segmentation or group
00:13:27
in other task we we ran with in the visceral oh
00:13:31
a project was reason detections so detecting small lesions in our
00:13:36
bones the liver brain long and in lymph nodes
00:13:40
and we are you can like make data sets available that people could then one on
00:13:45
unfortunately this is very hard task and uh for this task we
00:13:48
have like a basically nobody who participated in the competition
00:13:52
another uh part was are a similar case retrieval so the idea was we give a case
00:13:58
uh with a a region of interest marks we mark which organ is actually interesting
00:14:04
uh we do with automatic segmentation we have the radiology
00:14:07
reports but text documents are hard to actually distribute
00:14:11
be because an an imitation is not as easy as with uh with images of structured data
00:14:17
so we extract it semantic terms from these documents and then are that the participants had to
00:14:23
find similar cases in the database of i think like three thousand or four thousand to
00:14:27
volumes with a touch with so that was so the third task that we run within the spongy
00:14:33
another advantage off having the code of people is also that we can
00:14:38
create something that we call the silver corpus so we actually
00:14:42
like this is an example we took for our becomes we'd run like the algorithms on new data because we do have
00:14:49
um hundred twenty volumes that i manually annotated have a few thousand volumes well we have no annotations
00:14:56
so by by using several algorithms to simple label fusion we can actually
00:15:01
create something that is fairly similar to what the human might annotate
00:15:05
and i'm like that we can create additional training data uh that can then be released we can
00:15:11
then use for the algorithms to train again into to run on the test data so hopefully
00:15:15
uh with an iterative but automatic approach we can we can improve
00:15:20
our rooms and it can also serve to like really
00:15:23
get annotations even if they are not perfect for a a
00:15:27
very large so uh for very large number of cases
00:15:31
this also can allows for example to choose those cases for manual annotations with all of the
00:15:36
participant algorithms have like different results is even if all of the participant algorithms are wrong
00:15:42
we would only get more information in terms of ranking the algorithms with the cases where they actually different
00:15:48
so um maybe then brown chooses not the most important part but really finding these cases that
00:15:53
allow us to uh to to separate the pots so in the end like arm
00:15:59
this is pretty much what we had at the end of a project so this isn't got a lot more complex
00:16:04
so we still have virtual machines for the participants with registration system
00:16:08
so people can register automatically when they sign the license agreement
00:16:12
they get a virtual machine assign we haven't analysis system so when they train it um and they say
00:16:17
like okay now i submit my sit system the link is cut the analysis system runs on
00:16:23
uh the test data uh and then the analysis system
00:16:27
submits result to us organise us into the participants
00:16:30
and also all of the annotation actually happens in the system so we don't need to move data anymore
00:16:36
all of the data remains at the same place uh all the annotations happened see and with time we
00:16:43
actually realise also that which machines are actually not as portable as with all like moving them
00:16:48
from an is er clout to numbers and clout is nontrivial
00:16:53
so that's why we are actually move toward stock and i'm really happy like was it would be too
00:16:58
but to go to was because doc that's really like white approach to our part to move
00:17:03
quote unquote actually gets increasingly are mobile in in this respect we had several projects so
00:17:10
in the us and we're also discussing that in switzerland we can like this we cannot only work on data that is in the clout
00:17:16
which is usually like most most often anonymous data we can actually move the code further
00:17:21
on we can move it to hospitals so we can work directly on life data
00:17:26
because we don't have the problem of our our having to anonymous the
00:17:30
data having to move the data out because nobody only the
00:17:33
code sees the data no researcher actually sees the data the research
00:17:36
is only get evaluated results back and ups and um
00:17:42
and with the national cancer institute we also uh uh have worked too
00:17:46
uh look into how we can generalise this so they can actually
00:17:50
run their data challenges also in distributed way so potentially
00:17:53
sending the code to several institutions where the results evaluated been abrogated for a
00:17:58
further in august and this is just as an overly so really
00:18:02
it's a lot lighter than than virtual machines it's much more mobile it's easy to move around and it also voice over had
00:18:08
for groups that need to reinstall the whole software step because they can just
00:18:12
use like a local doctor contain and move that a move that over
00:18:17
um and there are couple of other lessons we learned from from from the this uh project so
00:18:22
cloud space is actually a pretty expensive so when initially like fortunately thank you microsoft so they
00:18:29
okay 'cause i think a hundred fifty thousand dollars in job computing resources over the years
00:18:34
um so we had to dispense running things are evaluating them so but you
00:18:40
think that bush machines open without doing anything it's quite so now we
00:18:43
developed a methods actually shut them down to stop them because that cost in terms of like sustainability
00:18:50
also i like with a beep from if you have people running things on the platform
00:18:54
you're responsible for for making making it like like sustainable and so we're looking into what what actually possibilities to
00:19:00
do that to control the cost and to limit maybe also the amount of computation but somebody can take
00:19:06
and um for installation is additional work we also have requests that the time for g. p. use in two thousand twelve
00:19:13
in the a.'s or cloud the window g. p. use now they have the traditions that you used so we could more easily do that but at
00:19:19
the time couldn't so some groups wanted to use did learning at the time uh they were pretty much excluded from uh from from this
00:19:26
it's also when we run competitions or cool petitions
00:19:30
um it actually takes quite a while initially like in when we rent the
00:19:34
project for the first time where five participants the twelve and seventeen
00:19:38
in i think now are like three years after the project finished i think we have over two hundred groups that are registered on the web page
00:19:45
uh on on the registration system we've run the code on the data that
00:19:49
also shows that we really need to think i think long term if
00:19:52
we want to have an impact on this and i think that's really something that's important when we think about open signs it's not like
00:19:58
feedback like uh you will not get our uh like a gratification
00:20:04
for your data or your code within here it's really
00:20:07
with a span of like five years that you you did you see whether people pick it up with the people would use it
00:20:12
and um we also realise that like trouble shooting like the prototypes admission it's it's
00:20:18
actually probably because people cannot see what happens on the test data so sometimes
00:20:22
the systems faded like out of memory errors and then with a small number it's fine
00:20:26
because we can go there we can check it out in manually supply feedback
00:20:30
but for larger number you need something more more like automatic which just like that that beep from for example
00:20:36
um but uh i i think i think we should be already like or a soft quite a few of the problem so
00:20:42
um we can make large data sets available so we don't have the problem shipping so
00:20:47
in the work like tens of a terrible it's not a problem we bring
00:20:51
the order against was the data not moving it the other way around
00:20:54
um we can run it on confidential data are we can always use
00:20:58
the later status because sometimes we actually in our data set
00:21:02
sometimes the ground to change because somebody said like always saw that there's an incorrect or
00:21:06
segmentation so we really are we modified the data and run run everything again
00:21:12
if we have the code it's it's triple to do we have more cases we can re run it again like that we can also see like how stable the ranking
00:21:19
because there is like variability depends on the topic it depends on the exact images
00:21:23
and very often the differences of the best systems are actually
00:21:26
within arrow so there's no statistically significant difference quite often
00:21:30
we can reproduce things because it's it's it's it's virtual machines so it's relatively easy even if the if
00:21:37
we're working on different hardwood because everything's virtual actually um it's it's relatively easy to cops to produce
00:21:45
and well we can also avoid cheating because nobody sees the test data nobody
00:21:49
can annotated nobody can optimise manually on the test data looking about
00:21:54
um and we also removed by so we can we have computation time so
00:21:57
we know how long it took forever everybody got the same virtual machine
00:22:01
so we can see it some of the other rooms for two minutes of us run for like twenty four hours for exactly the same thing
00:22:06
so there's like huge differences in the order of like a a factor of a hundred in that respect but
00:22:13
still reuse of components is not trivial making people collaborate more on components
00:22:20
was something that we found quite hard in really pushing people to to like work
00:22:24
together i think that those were the two parts that uh the different then
00:22:29
based on the outcomes of this so within think like how can we
00:22:33
okay if you push this this further uh what what are the problems actually with making disposable more
00:22:38
more general line simple in we came up with organised one watch appearance here in two thousand
00:22:44
fifteen in march and then another one in boston in november two thousand fifteen with
00:22:49
rip some very different stakeholders affirm industry with people from intel for microsoft
00:22:53
from funding organisations from academic part most user groups et cetera
00:22:58
to look at um how can we how how what are the existing approaches and how can we do with it
00:23:04
so there's there's a white paper are that we put on archive stowed openly accessible it's not he reviewed
00:23:11
um uh but people people can have a look at that and um
00:23:15
like the different ways of doing this using a. p. i. is of using executable code um
00:23:21
i'm looking at like what are the different stakeholders that uh that we can
00:23:24
look at we can see that for example the national cancer institute
00:23:28
push towards uh are um the organisation of challenge which got the digital mammography dream challenge
00:23:34
it's organised by sage fire networks which sure is an american company
00:23:38
there in seattle and they run scientific challenges and uh
00:23:42
they picked up exactly the small so they used talker containers people wanted people haven't limited amount
00:23:47
of computation available it's one on i. b. m. infrastructures and i'm isn't infrastructures in
00:23:53
um they have a million dollar prize so if your interest and that it's uh it's quite
00:23:58
interesting and they have what is called a community face initially everybody's separately submits results
00:24:04
but in the end they really want people to collaborate to work together to get the best results so the best
00:24:09
possibility very likely to win one of the prices a have good results early
00:24:14
but then team up without which which i quite liked in that respect
00:24:19
their business model so there's a company called other medical vision but they also make data available docket containers and
00:24:26
read the idea is that of algorithms they can commercial ice results and then ship benefits on a lot
00:24:33
of it's it's it's i think an interesting model and it shows that there is also possibility
00:24:37
of business model on microsoft chose that tells that they want to make a billion available resources so
00:24:43
every two months i think they have a deadline for project submissions of uni computing power
00:24:47
it's relatively easy to to to get and there's a lot of discussion
00:24:51
in like the the literature of the various stakeholders like politics so
00:24:56
on the thing and us it has not changed a little bit but uh by doing
00:25:00
definitely a push quite to was like making things available and we asked like how
00:25:05
we want to have the kinds the moon shopkeepers that like make data available share data
00:25:09
are those those were the things that uh that read researchers boston for
00:25:14
something for intel so they have like was call for a cancer clout
00:25:18
that they uh put up and the model is really that
00:25:21
hospitals will have computing infrastructures in the future so we can actually move because their work directly on the data so does
00:25:27
not need the institution in the risk of like particularly in the us were many data sets can be bored
00:25:33
also sometimes data that on like all protected data and so you can match actually different data sources to really identify patients
00:25:40
from voter records to uh i different registered in by keeping the data
00:25:44
in the indian situations which you more these kind of problems
00:25:48
institutional supports we at the national cancer institute participating and they're really pushing for running challenges making
00:25:54
are making things available and this is one of the projects were i'm involved in
00:25:59
uh of the scope quantitative imaging network of the national cancer institute
00:26:03
where they have a are applied for what every every
00:26:07
maybe evaluation of data in this context is to be run by doctor containers the containers are kept
00:26:13
available it's one in the clouds are to really make sure that everything is absolute reproduce well
00:26:19
from data creations so they're working on image data creation feature extraction 'cause even
00:26:25
standard features like of features based on co currents mattresses for for
00:26:29
text extraction had extremely different results so they had like
00:26:33
seven hospitals participating in one of those are a test and they realise that even like
00:26:39
very very basic features depending on the installation that people were using where they
00:26:44
were normalising the way they were normalising they they didn't correlate at all
00:26:48
and uh i i think it's really important to have these things to to look at like what what we
00:26:53
start with because even the image creation was very different so even the same machine installed into different hospitals
00:26:59
didn't deliver the same image and they work physically shipping like a phantom but they took images so
00:27:04
so if the images are not the same with the same parameters we need to look at like if we extract quantitative
00:27:09
features everything's based on on what is done before in terms of also like reproduce ability that's actually really really important
00:27:16
as i mentioned there's when we look at this so it's political part of my talk i think um when we look
00:27:22
at uh evaluation is the service is a concept really need to look at like with this with the stakeholders
00:27:28
so we have funding institutions like this is national science foundation we have
00:27:33
companies who have their part in the we have scientists everybody has different constraints people we have a problem it's like on
00:27:40
clinicians in our case to have a problem that they would like to solve what they would like to to use
00:27:45
data sciences tools to to help them and how can we uh like put these different interest together because i think
00:27:51
everybody would have an interest in the system that would be more more efficient more effective to india and
00:27:57
so there's more on this in the white paper and um on on on off
00:28:05
yeah okay okay you know five minutes i'm i'm always it but so for me or like one of the messages
00:28:14
putting from today's really um if you like many things are centred around data and we've mentioned like was mentioned this morning that
00:28:22
data sets on a side table itching nature has now journal called
00:28:26
scientific data is nature scientific data which only partially status
00:28:30
it's an open access journals if you have a nice data set out submitted here you will expose it to the community
00:28:37
uh they have a very good review process and they really give very very good feedback uh on things
00:28:43
um and i think it's important to make them available in meaningful ways we have one of our data sets
00:28:47
published here and they were actually downloading the data they were reading are constructions even the instructions on how
00:28:53
to use the data they corrected the invasion there so i was quite impressed but really they they they take
00:28:59
it seriously uh to that the data made available in a meaningful way so so people can actually
00:29:05
rely on it and we use it and i think we also need to look into how can we share
00:29:09
infrastructures sitting one of the difficult part is really hard to make it sustainable how to make this
00:29:14
but from like beat how can we make it sustainable independent from
00:29:18
like our project funding 'cause i think like dependent project funding
00:29:21
you depend on project funding it's it's not easy to to have something that uh uh that is actually totally independent from
00:29:29
and um our our code will definitely become more portable talking really helps with that and i think we also need
00:29:37
to look into like public private partnership how can we get like infrastructure providers in here now with like the
00:29:43
e. p. f. l. d. th that data sciences centre maybe there will be like a scientific lot for switzerland that
00:29:48
i would be easy to use that's something to look into it but if you really want like a reproduce ability like
00:29:55
long term reproduce ability insuring of tools we need to look at like how can we create an infrastructure
00:30:00
that is flexible enough are maybe different infrastructures for different domains but
00:30:04
um because it it it would be much more efficient to to do that because
00:30:09
our it just doesn't make sense that you know like how much people develop in in two different directions
00:30:15
well i think data signs really requires an infrastructure want he like
00:30:20
everybody's working in one way or another on artificial intelligence so
00:30:23
everything evolves around data in in medical field i think we need to work on routing data currently like whenever we
00:30:29
produce data for clinical trial we have like a perfect image quality everything's very controlled reaching data is not
00:30:36
like you don't want to have the best image that you want to have the lows radiation dose for your patients so you're
00:30:41
trying to limit things to load those to the quality is not the same so if we want it to be usable
00:30:47
in in in real setting which you need to look at how how how we can easily do that we need to look at ways
00:30:53
how to actually limit the amount of manual annotation so we can use
00:30:57
like active learning we can use weekly annotated data to learn
00:31:01
um are in might get like much much larger data sets from hospitals that can actually allow us to do that
00:31:08
um and then well you need to shave infrastructures i think and uh i'm really like in most domains
00:31:14
want to build decision support from needs in the medical field but in many other fields it really can somebody to
00:31:20
and a lot more available on like the different web pages and uh
00:31:25
also don't don't hesitate to contact me if you have any questions

Share this talk: 


Conference program

Welcome
Sébastien Marcel, Senior Researcher, IDIAP, Director of the Swiss Center for Biometrics Research and Testing
24 March 2017 · 9:17 a.m.
Keynote - Reproducibility and Open Science @ EPFL
Pierre Vandergheynst, EPFL VP for Education
24 March 2017 · 9:20 a.m.
Q&A: Keynote - Reproducibility and Open Science @ EPFL
Pierre Vandergheynst, EPFL VP for Education
24 March 2017 · 9:54 a.m.
VISCERAL and Evaluation-as-a-Service
Henning Müller, prof. HES-SO Valais-Wallis (unité e-health)
24 March 2017 · 11:35 a.m.
Q&A - VISCERAL and Evaluation-as-a-Service
Henning Müller, prof. HES-SO Valais-Wallis (unité e-health)
24 March 2017 · 12:07 p.m.

Recommended talks

Le projet européen Khresmoi : les résultats
Henning Müller, prof. HES-SO Valais-Wallis (unité e-health)
6 June 2014 · 10:50 a.m.
Multiple ways of building a recommender system with ElasticSearch
Andrii Vozniuk, React-EPFL
11 May 2017 · 1:04 p.m.