Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:01
so thank you for the invitation thank you i'm very happy to speak in
00:00:05
front of you and uh what julie asked me to do what to um
00:00:10
speak about a big data and more precisely big that that with as in health domain
00:00:17
and i'm going also to to give you some elements on the um
00:00:25
are huge topic that has science just to
00:00:29
see if some think a ring bells uh for
00:00:33
you in this uh in this topic so first when we are talking about a big data
00:00:40
we always between me if and we often okay when we are hearing about the big data offers
00:00:47
we can think that big data is going to solve all our problems like a
00:00:53
magic flying unicorn you know so being that i would soar everything in the room
00:00:58
but in fact this so here i am just they can uh some um
00:01:04
the titles off uh the chapters of these topic or and
00:01:08
you can see that we are always between me fan reality
00:01:13
with a dilemma now big data ah about revolutionary applications but with a
00:01:18
critical bottleneck wisdom of the crowed finding all the needles in a a stack
00:01:26
all noisy incompatible data so we say there are so many problems inside the big data
00:01:32
and what you sure either that it's a new way of doing science and maybe
00:01:37
it's what became step by step the data science i will uh talk about that either
00:01:44
so
00:01:46
about big data and if we have now look big dateline else
00:01:54
okay let's have a look on some examples where our health care and
00:02:00
mm in use a big guy kept problems so here
00:02:04
are some extracts all the article at twelve examples of big
00:02:08
that our analytic seen at carol that can save people uh
00:02:12
published uh last year in business intelligence so big that out
00:02:18
can i reserve from the use of electronic health
00:02:22
records such as a smart watching from and older
00:02:26
electronic device waited to internet off objects internet of things
00:02:32
uh also use real time noting for instant care
00:02:36
and thence patient engagement in their own yeah with by
00:02:40
monitoring brought pressure on pizza with them and so on
00:02:45
uh what what what we can see also in this examples
00:02:50
about a research so we search market and extensively to cure cancer
00:02:55
so gathering good economics they ah gathering
00:02:58
main information about patient about the population
00:03:02
can of course in or a big data issues and can also help to
00:03:08
to to get better information on cancer and other diseases
00:03:13
ah another aspect in this example seem related to the economy of health
00:03:18
we the reducing fraud and and insulated security also but is telling me
00:03:23
this intimidation and precision maybe seem to be able to sure people with the
00:03:29
up tomorrow though these of um drugs
00:03:34
integrate medical imaging fun broader than
00:03:37
that missus so integrates integrated data
00:03:40
integrates a room it originates data i would talk about the selector also
00:03:46
and last examples and even he or prevent i miss assume we
00:03:51
are busy it's all emergency services it's a hot topic in france
00:03:55
now we know that emergency services are only uh over when the by many
00:04:00
people also being that that could also be consider in this way to improve these
00:04:08
so the main trains its cage my lease examples all three four
00:04:13
so first improve experience for one patients to improve these all else
00:04:20
uh the second one is a more global
00:04:23
over a population so overpopulation that could improve
00:04:27
but i i'm defining optimal strategy for vaccine for instance
00:04:34
and uh yeah
00:04:37
the third and who are the third transit for general cost uh once again for economy of else
00:04:46
and the conclusion he said in this article that
00:04:49
a big date a consideration in the elves can
00:04:53
has the potential to save money and most importantly to save people's lives
00:05:01
no no uh already to the toes for early uh ended the night identification of it it's nice soft
00:05:07
interval passions idea as i said and social economic groups
00:05:10
in taking preventive action because prevention is better than you
00:05:15
okay so here are just some examples some a sketch of
00:05:19
uh how big that that can be used in health context
00:05:24
so now as of the previews
00:05:28
one mention here a new way of doing science on i'm going to talk about
00:05:32
this new way of doing science and talking about a little of a data science
00:05:38
so better science something rather knew that which he or about a very frequently
00:05:45
yeah and in these uh a conference by apply such a murky you last year in december
00:05:51
um there is a kind of a tentative definition of data science
00:05:56
so the reason not yet a consensus on what essentially constituents and that s. science but
00:06:05
can be seen as a study of genuine is able extraction off of knowledge from date
00:06:12
okay some minds we sometimes we can find v. c. d. two k.
00:06:16
data to knowledge so that i science can be view either science that
00:06:23
uh gather every ready to extract knowledge from date
00:06:30
that s. science clearly as a an interdisciplinary nature and
00:06:35
we cries substantial collaborative it falls so regarding interdisciplinary t.
00:06:43
okay it can looks like that
00:06:46
and i can consider uh i mean the start crawl
00:06:49
the speaking like that we uh formulas and stuff like that
00:06:53
uh if i'm breaking with biologist maybe they are on the laying a gentle mix
00:06:59
information with the d. n. a. r. n. a. from
00:07:02
general mixed transcript or maybe some nuclear ties and so on
00:07:06
a a wrecking with maybe ceased uh these words can come in the
00:07:10
discussion and if your involve in collaboration with the voiced their web based
00:07:16
maybe uh this one will also be using the discussion so on the first
00:07:21
side it can be difficult to see how we are going to collaborate altogether
00:07:28
but it's a collaboration between happy pour from uh every domain ease
00:07:34
our uh who is necessary to advice we all want questions
00:07:41
so here is an extract from a a book by your
00:07:44
job on that and bearing is a process or in um
00:07:47
the us some props or statistics in mathematics so all not
00:07:51
to be around with the i don't uh of everyday life
00:07:56
in here uh the main idea is that mathematics
00:08:01
is not on the computation okay mathematics is not only
00:08:05
i have to re apply the method i after
00:08:09
runs some algorithm no mathematics is also how to understand
00:08:14
are we or wall question how to address it we i make that and that that much much committed all something that's
00:08:24
and a bit provocative of the sentence here uh uh so it's only
00:08:29
after you start to to formulate this question that you take out the calculator
00:08:34
but at that point the real mentor work is already finished
00:08:39
okay so it's just on the computation i pick a calculate or and it
00:08:43
would be okay i take the software and as well the method to be applied
00:08:49
so
00:08:52
when we are facing the data and we want to extract knowledge from data
00:08:59
here is a kind of a tentative what map off a strategy
00:09:04
though to to go through to try to extract knowledge from data
00:09:09
so first in the upper right corner of the re these uh them cop to
00:09:15
i couldn't find the source the first also that we gentle sources of this cartoon but what is retained is
00:09:21
get all the information you can we think of use for with later
00:09:27
and this is not a good way to uh face a um that out and that is is a problem okay
00:09:35
if we gather many information many data okay we have
00:09:38
to think that we also gathering mini noise okay i'm not
00:09:44
and maybe we are not improving the signal and noise which
00:09:49
still the east entity what map uh can be
00:09:52
aches the your um expressed in a seven points
00:09:56
well the first one could be clearly state the specific question are we or world question
00:10:03
and uh then we think about the the methods mathematical methods all statistical
00:10:08
methods are we use to an ally these and it is a future data
00:10:13
then state the design of experiments on many patients i have
00:10:17
two on war in my trial you know and so on so
00:10:22
that gives you certain of the data is not the beginning of the story is the force pointing this what now
00:10:28
so they tell our cried after we out or with the think about the war problem
00:10:34
then after an alive today to interpret the without an
00:10:37
ounce or the question that has been asking the first point
00:10:42
and we are also to think about these sentence that come from an article
00:10:47
um asked is music statistic is the church or be
00:10:51
reversed question our best conclusion is often a refined question
00:10:57
and the we are in a kind of loop this cycle that we go through
00:11:02
another clearly state the specific question tried west with this for them and so on
00:11:07
this is rather close to a existing models
00:11:11
for a or research and back then it is
00:11:14
this is called l. p. p. d. a. c. for problem plan data analysis conclusion
00:11:20
and another in the quality control uh some tights are represented a as the wheel dimming we'll or
00:11:26
something like that with that plan do check and act can be seen at the kind of uh
00:11:33
evaluation about these
00:11:36
so
00:11:39
now i'm going to have a look to focus
00:11:42
on just two points of what we have to
00:11:46
do do you leave it in fact is so these
00:11:49
two subjects our first meeting values and second one tidy data
00:11:57
so what about missing values so first i do
00:12:02
like these uh called from the jet should cocks an
00:12:05
american statistician of the twentieth century so the best thing to do reese missing values is not to have any
00:12:13
okay it's a very good advice the impact is no that's uh as
00:12:18
it does mean rephrase that it or um are we simply by uh um
00:12:23
in this article all to handle meeting data and we use the um you were all i'm just after
00:12:29
the firstly understand the really that there is no good way to
00:12:33
deal with missing data okay so we have a hole in our data
00:12:38
it's a problem and we are trying to try some things but
00:12:42
there is no good way not optimal way to deal with missing data
00:12:47
but in fact is what can we do
00:12:52
so the main idea you there on the top
00:12:55
left before jumping to the methods of that implication
00:12:59
so i want to feeling the whole we have to understand the reason why that that goes missing
00:13:06
so walking you they are a look at other we the as it can be that
00:13:11
can appeal it or strange people we are because we have a date that me seeing
00:13:18
completely at random me sing at run them missing not that run the
00:13:25
okay so briefly a missing completely at random okay i have a
00:13:31
war in my data i don't know why and the fact that the
00:13:34
data is missing is not related to information regarding the value these
00:13:40
made you all another volume okay so completely at random missing at random
00:13:49
so i'm going to talk about the next one missing not that one them the classical example to understand
00:13:55
these is uh if i ask every people or in this room how much money do you earn per year
00:14:04
okay maybe some people wouldn't like to answer and
00:14:08
roughly the kind of cliche uh maybe people for an
00:14:13
much money wouldn't like to answer this question so if i have a missing value is because the value is high okay
00:14:23
and missing at random missing at random is because i have a missing value
00:14:30
it's not related to the viable or how much money do you for an but maybe the
00:14:36
fact that the people or the not answer
00:14:40
another cliche maybe because the uh uh young people
00:14:46
doesn't want to talk about the morning all all those people or doesn't want to talk about money
00:14:53
so when you have a missing value in the question how much money do you hunt it because people or is held
00:14:59
are all younger okay so it's what i did to another viable in the data set so well you know it's not always
00:15:09
easy to identify in which configuration we all saw
00:15:14
if it's me seeing completely at one done okay i
00:15:18
asked refugee to include an rather with them ably some
00:15:22
um um when you take that in the data set
00:15:26
either we have to try to think of a better and better room methods to to include
00:15:33
so here are under me handling missing data and
00:15:37
depending on the configuration we can decide to the late
00:15:42
either the information related to a an observation or volume low so or or all column
00:15:49
or uh after going through imputation um
00:15:54
okay i can uh replaced a missing value by you mean
00:15:57
although cullen overlap of uh we depends on the on the problem
00:16:03
so the second one i would like to to focus on is about tidy data
00:16:10
because in this article by idly week um uh
00:16:13
it by the data into don't all of statistical software
00:16:17
is within the sentence it is often said that the eighty percent of the
00:16:21
internet is easy spend on the process of cleaning and preparing when the date
00:16:26
okay use or frequently when i receive data sets it's uh an excel x. five for instance
00:16:33
and i have to admit you were rich tomb be able to run a statistical method in it
00:16:39
okay just to have a look on the date for
00:16:42
instance uh and just having a look if i am
00:16:46
using the french version of excel from stand because they have come my instead of points for this much separation
00:16:53
i have to have a look on uh i
00:16:55
we apply a merge cells that can be uh uh
00:16:59
understand the room specifically by the statistical software and so on
00:17:05
so here
00:17:07
mm
00:17:10
okay we can as this
00:17:12
kind of moved to new idea uh our
00:17:16
data sets so first each volume performs a colour
00:17:21
so one viable a one eight hour an age mm sex uh
00:17:29
date of birth and so on
00:17:31
each observation forms a rule observation or patient all
00:17:37
these pop up so much and all you need
00:17:39
so a patient or indeed you or so i'm going to ask many questions
00:17:45
and each type of observe also not humid forms a
00:17:47
table saw here it's highly buys it for people involved
00:17:52
in the that that these database management it's uh the way we up to two hundred uh um data sets
00:18:00
so in the salad go by the that that by and they we can you say that
00:18:05
missy data is any other arrangement of that it okay so just think about i at jane
00:18:13
together might data in a one table
00:18:18
so well paul that
00:18:21
the uh here uh on the upper limb rights corner is the
00:18:26
logo for the all software art is a statistical software free software
00:18:33
and in these uh_huh software um as being developed
00:18:39
what is called the tie diverse sort the universe off by the data
00:18:44
and maybe package are being developed for um
00:18:52
improve the way of doing that essay and we all the we have package
00:18:58
to we shape data sets to make them tidy in a more efficient way
00:19:07
so i'm going to do to finish my speech we'll
00:19:12
just mentioning a kind of methods that can be used them
00:19:19
facing big data
00:19:23
so uh wants to give you more
00:19:26
details than that but let's mention spouse methods
00:19:31
so sparse methods is the fire me the off statistical and mathematical methods
00:19:37
that enable or to fill act most well haven't valuables
00:19:42
we move noisy or you wouldn't want my elbows using um
00:19:48
ryan's valuations of methods or using place to be interrupted
00:19:53
so if you have many viable also cried on some patients some individuals
00:20:01
if you run a method you will focus on the most whatever
00:20:05
on viavoice depending on uh um the acquittal young you want to optimise
00:20:11
do you want to optima is a discrimination between civil war groups groups of patients
00:20:18
do you want to optimise the the more viability you would like to see on your graphical display and so on
00:20:25
i i know that kind of methods so uh in the first one of
00:20:30
the first slide actually you know there was a all menacing incompatible or beta
00:20:37
and she okay on our methods can be way as can be view as a way to do something like that
00:20:44
because you being a killer methods we would be able
00:20:47
we will be able to integrate different types of data
00:20:51
presented as numerical featured presented as networks or for essentially
00:20:56
on the final genetics threes and so on using cam no
00:21:01
okay you can see that everything can be sent in the same mathematical
00:21:06
space and once again when we all and in the context of else
00:21:12
uh uh from one patient i can have a texture or um information as a text
00:21:19
i can add in maine jeez i can have an email we call a valuable supplied
00:21:24
so maybe to integrate everything that the use of colour methods can be useful
00:21:30
so in the end of my polk just going through the title slide and
00:21:37
if you have any question i will be happy to to answer as a
00:21:42
as my was would be able to to do it so
00:21:45
thank you very much higher

Share this talk: 


Conference Program

ESR03 : Interpretable speech pathology detection
Julian Fritsch
Sept. 4, 2019 · 2:30 p.m.
160 views
ESR09 : Clinical relevance of intelligibility mesures
Pommée Timothy
Sept. 4, 2019 · 4:49 p.m.
Big Data with Health Data
Sébastien Déjean
Sept. 5, 2019 · 9:20 a.m.
ESR11 : First year review
Bence Halpern
Sept. 5, 2019 · 11:20 a.m.

Recommended talks

Health in a world of Data : IoT, a Source of (Big) Data for the Digital Health
Gilles Mazars, Director of Engineering, Samsung Strategy and Innovation Center - SAMI Platform
June 11, 2015 · 11:17 a.m.
126 views
Cloud Computing: The industrialisation of IT and what it means for business
Babak Falsafi, Director, EcoCloud
Dec. 5, 2014 · 8:49 a.m.