Player is loading...

Embed

Embed code

Transcriptions

Note: this content has been automatically generated.
00:00:00
thanks very much a bunch of your turtles were patient so yes that that would be a lot of
00:00:05
conflicts probably what i'm gonna tell you know uh and it's not my to confuse you but
00:00:09
yeah uh what it just like this sort of talk about how big data existing genetics and uh
00:00:14
how this can help wasn't how this can maybe help in the future to probably cure diseases
00:00:20
um at the structure be very simple first i will tell you how
00:00:23
these genetic data used to discover low sign your genome your
00:00:27
d. n. a. sequence that are associated diseases mostly able talk about common diseases but that will show a couple of examples
00:00:34
uh and i will tell you then how these associations what we learn from them what learnt about
00:00:39
individual genes in the action and in the mechanism individual genes
00:00:43
and the rolling diseases but what we can also learn
00:00:46
over the entire architecture of a genetic architecture because it
00:00:50
as high as in the title was complex diseases
00:00:53
how complex but the reason is because there are have probably dozens of different reasons that lead to these diseases and these are these
00:01:00
different courses that we want to identify you know to to protect them numbered on the stand them able to treat them eventually
00:01:06
so the architecture would be very important for this and valuable also mentioned
00:01:10
some pitfalls because it's very important not to misinterpret these results
00:01:14
and it's a very hot topic in the current ah literature whether these genetic tests that we can now provide
00:01:22
should they be using clinics to identify individuals that high risk uh especially very early on
00:01:27
and if i have time i also want to show how these genetic data can help us to
00:01:33
yeah distinguish between courses consequences and confounding can't really find true causes and
00:01:39
establish the extent of the calls a on a certain diseases
00:01:44
so it might be very cryptic to you uh i hope you get more later so this is that you know all three people
00:01:51
every position in the genome we'll have three billion base pairs three point two billion
00:01:55
roughly one in every thousand base pairs is changing across people in the population
00:01:59
so the rest of the genome these part which is the same for everyone is
00:02:02
pretty boring what you're interested is a part of the genome which is different
00:02:06
every location inherited an l. deal from a mother and a father so we carry two of
00:02:11
these ideas soul most typically we can have difficulty geno types at each position so somebody
00:02:18
can be a a disposition a. g. or g. g. three different uh possible job types
00:02:24
and we want to relate these geno types whether you carry more areas of re does it lead to certain disease
00:02:30
or certain traits so what we dealing with various or simply is we have a large amount of genetic data so these are typically
00:02:37
bands or at least millions of markers in the genome data
00:02:40
variable across population uh across people very frequently and
00:02:44
actually nowadays you measured i mean hundreds of thousands of people because what we learn from if you just
00:02:49
have have a few thousand this would be not enough to understand really what's going on behind
00:02:54
and i will tell you the reason for it and but luckily these things
00:02:57
exist unfortunately switzerland because we're looking at least ten years behind all
00:03:01
the european countries mostly bored northern european countries hope will catch up one
00:03:05
day but if you look around at u. k. denmark sweden finland
00:03:10
uh estonia all these countries are a gathering now by banks of of this kind of sizes
00:03:17
uh and the key here is not just a genetic data if you're enough money you can do this is not a problem
00:03:22
the problem is here can you get their high quality clinical environment of
00:03:26
data for these people to be able to really clearly understand diseases
00:03:31
so what we do one seventy genetic data in some outcome i would
00:03:34
just take for starters body mass index it has been already mentioned
00:03:38
as a as important respect or for um for arthritis a body mass index a country does not commonly what the model it
00:03:45
or the genetic markers in the genome that can predict whether you will become obese or not in your life
00:03:50
and at what age so we take the body mass index of individuals and we hope that we do more number
00:03:57
values carry that certain position your genome may also increase your body mass index this is the real example
00:04:02
this is this is his data so small they don't but that's what we have here about six thousand people
00:04:07
where you have full dimensional typing and this is one particular variant in the gene called f. t. o.
00:04:13
and you can classify the individuals in this call working these populations was population how
00:04:18
many a year you somebody carries at this very location in the genome
00:04:21
and you can measure their body mass index and squint enough you see a slight slope there and that's what it means is that
00:04:27
every additional a. l. you will increase your weight by a kilogram
00:04:31
or average over your life so this is not very
00:04:36
maybe the for large for you but this is what's happening at this is the idea that the
00:04:39
largest effect on body mass index and this is typically why is a complex disease because
00:04:45
uh such abuse will not have large fact there would be many many many hundreds of values
00:04:50
each of them it really tiny fact we can do this association so this
00:04:54
is just linear regression you want estimate the slope your how many
00:04:57
uh how much that trait changes with each additional copies of of uh it
00:05:02
partly the any sequence you can do it not just one marker
00:05:06
but you can do it for tens of millions of markers in your genome and that's what you plot here this is called manhattan plot
00:05:12
what age dot is one of these ten million markers in the genome
00:05:16
on the x. axis official one which chromosome it lies and where it lies within from was on
00:05:20
the y. axis shows essential the strength of the association like ha strongly easy changing to trade
00:05:26
and f. deal this variant here is read the top one so they can you can really see this is the
00:05:31
the commentary to the largest effect on but i didn't see next okay so that's the started that's
00:05:36
what uh has been done ten years ago how much can we move forward from these
00:05:40
we can run decisions for many traits in increasing sample size so this is a study which
00:05:45
we published a now was two years ago on human height it's a very simple trait
00:05:51
uh here each dot again is a genetic marker and on the x. axis we showed
00:05:56
the frequency in a population and y. axes we show their effect on height
00:06:01
and you see that essentially very frequent markers they cannot have large effect on height because they would've been selected against
00:06:08
uh and they're free press would be pushed down and these are the ones here these are the reverence
00:06:13
the present in one thousand people so probably not one person in this room
00:06:18
yeah but they change a high by two centimetres some of them pushing you big dollar someone make it shorter
00:06:24
so now we have the technology to go down a to even wanting thousand people and to make a
00:06:30
collection of all genetic markers so that's what very standing that if you take all together these markers
00:06:36
they still probably explain about thirty percent of height very guilty it's tiny fraction
00:06:41
height is eighty percent palpable be explained thirty percent incident eighty percent
00:06:46
and something where we're much more successful is is straight that are closer to the uh to the molecular
00:06:54
to to the d. n. a. itself for example if you look at your i guess it levels
00:06:58
uh which is very important market to high levels of the dog out uh if we run
00:07:03
associations can that's how the genome looks like in terms of contribution to your guesses levels
00:07:08
and the strongest association is in this club nineteen as a c. to a nine
00:07:13
this is very common variant but still large effect the single variant explains especially the
00:07:17
mean spends about six percent in variations of these you because it levels
00:07:21
so uh we have many many of these association peaks bought these are
00:07:25
common variance what we can do is we can go deep inside
00:07:28
in sequence these jeans and try to find ray variance maybe the
00:07:32
reverence would be more important than indeed what turns out that
00:07:36
the in two of the most important genes that have been that that it showed up in his association scans
00:07:41
and they have reverence inside them it much much larger effect on you requested levels than the common variance
00:07:47
so we see that rare and common variance together contribute to traits we still don't have enough samples actually too
00:07:53
to explain everything probably would need at least hundred million people collected
00:07:57
and full genome sequence to be able to understand everything
00:08:01
but what do we do with these for example in this study uh also arthritis it's very impressive sample
00:08:06
size thirty thousand patients have been collected build you know that so here that is not the problem
00:08:11
but it is not the probably have enough data but the problem is
00:08:13
the small effects that each individual a genetic markers a tiny contribution
00:08:18
here the studies are already good enough that we can very accurately estimate
00:08:22
that this is sixteen percent heart trouble so it means that
00:08:26
we will only ever able to explain the genetic sixteen percent operate will develop the disease and will not
00:08:33
uh that such a large study will just still it has discovered about a few dozen maybe
00:08:38
twenty last sign nine which we're doing this and it was published last year so
00:08:44
so far just by looking at your associations we haven't learned necessary that much
00:08:47
but what is interesting is that these can be using level explain later
00:08:51
how we can estimate effects for example body mass index how how much
00:08:54
the causal effect of body mass index on also right right this
00:08:58
anybody choices want lower per square meter weight gain as roughly about three
00:09:03
kilograms for an average person it increases by ten percent your odds
00:09:07
to develop the disease and this is really causal if in theory you could lose weight sort that's how much uh you would
00:09:15
a decrease your chances develop the disease what is very interesting the studies that if we look at these be zoom in
00:09:21
on those men templates and we look at the g. only context of the genes line you buy these associations signals
00:09:27
there are always many many genes but the signals are not always falling into
00:09:30
jeans so we don't necessarily understand straightaway genes are implicated in these diseases
00:09:36
so here we uh i just pulled up a few uh regions in the genome where we have several jean
00:09:42
candidates and and then what you can ask for example what in the study they they look at
00:09:46
which genes among these have different expression levels in patients versus controls so that they can tell us more
00:09:52
that these are potentially very interesting but markers which might be actually cool though preceding the disease
00:09:58
something which is just happening court currently that this is not very informative because it will not necessary tell you
00:10:04
uh you will not allow you to predict longterm had and also it will
00:10:07
be probably useless for treatment because it's just a side effect of
00:10:11
of the of the disease or consequence of the disease itself it's very important to distinguish between
00:10:15
the courses of these in the consequences or just something which is a correlated the disease
00:10:21
so here many of the genes that have been uh uh that are in these last i discover
00:10:25
they seem to vary significantly different expression levels iterations and controls
00:10:30
what can be done put forward is that we can then look at the jeans some of these genes that are emerging for this also are trying to study
00:10:36
and then see which jobs are targeting these jeans and hope that
00:10:41
maybe this can guy does which treatment could be useful
00:10:43
and also if i know that my patient has mutations are probably more prisons in these genes as opera disposing
00:10:49
maybe at such a drug would be more efficient use for that patient i mean this
00:10:53
is not working currently large scale we need to probably understand a lot more
00:10:57
but there are several very positive examples where it's known and where these discoveries of
00:11:01
the genes the lead to a discovery of the all the proper medication
00:11:06
so as you can see many of them are in phase to phase three trials it's always the the big risks to uh
00:11:13
to develop such drugs because eventually will not know it will work or not but we have
00:11:17
some good indications at least for for the g. is who could be worthwhile being targeted
00:11:24
how can we learn a bit more not just about individual genes maybe from
00:11:27
individual genes we can move to drugs if you're good enough understanding of
00:11:31
the mechanism of action of the drugs which you often lack the often don't know the target of the drug or we just know few targets
00:11:37
that can be tons of of started effects which really to side effect these
00:11:40
we also have to be taken into account but these genetic studies
00:11:44
if i know now that that these drugs actually targeting another gene which is not the most relevant i can
00:11:48
look up what could be the impact of suppressing the expression of level of these gene on certain diseases
00:11:55
so the joystick architecture does a bit complicated slide so maybe it's just that if you don't even look at to just listen to me
00:12:01
so what we can learn with the study is is you can estimate overheard ability of the
00:12:04
straight so know how much a genome itself the d. n. a. sequence contributes to it
00:12:09
so that's basically looking at this is on heightened b. m.
00:12:12
i. model traits the allies about uh seems that from
00:12:17
the data we have if you just had more samples we could almost estimate thirty per
00:12:21
se explain thirty percent of the hurt ability for height is approaching fifty five percent
00:12:25
so it's all getting more promising the just the more data so we could build more better predictors
00:12:30
you can already did pretty good predictors for height and b. m. i. and many other diseases
00:12:34
uh but still we need probably more data protection is one thing and understanding by which is another
00:12:40
what we can also estimate is uh probably skip this part but you can also estimate the production is it is
00:12:46
what it says what what it means here is that what fraction of
00:12:49
the genome is probably a involved in the modulation of the disease
00:12:54
as you can see for uh for height it's roughly five percent of the genome
00:12:59
so it means that every twenty is gene in the genome is somehow
00:13:03
taking apart an active part in determining couple you would become
00:13:07
why phobia my it's much much like almost ten percent of your genome is probably playing a
00:13:11
role in determining your height so sixteen apologetic there will not be the gene for anything
00:13:17
yeah if you have something very rare disease one concrete a a patient may have just a
00:13:21
single mutation in a single gene but as soon as we move the more common diseases
00:13:26
yeah it can be much more complicated and the same thing for even for red is is it can because by many many variants
00:13:33
so we can understand how many genes contribute how much can
00:13:37
they commit to explain we can also estimate the
00:13:40
correlation between how frequent these markers are and how large their effect is
00:13:45
essentially there's a strong negative correlation between them it means they asked strongly on the negative selection
00:13:51
and uh and often these actually are in the clinics that most preferred application would be just let's
00:13:56
predict the disease so we have all the gen typifies which is very easy to measure
00:14:01
and this is what happens for a couple of common traits for example this is types baby does it
00:14:06
against the study from last year taking together several hundreds of markers in the genome trying to predict
00:14:12
basically you can group individuals here degrading to forty different things
00:14:16
based on the genetic risk so these groupies here
00:14:19
the highest jenna degrees group is the lowest your degrees group and then you can go to external study
00:14:25
and ask what's the tied to the recent type two diabetes prevalence indigo could highs genetic risk
00:14:30
and you can see it for more than the login decrease craven at least triple compared to the population average so that's
00:14:37
that can be very informative because we're bored with our genome so adverse we could tell this information to the people we
00:14:42
don't need to wait much longer and we don't need any clinical flop so at least for the extreme groups
00:14:48
the same for coronary artery disease if you look at the top percentile
00:14:51
in terms of genetic risk so basin justine d. n. a. sequence
00:14:56
we can tell uh and it again it is fivefold higher risk compared population average so these effects
00:15:02
identifying only a subset only one hundred one hundred people but we can give them valuable information probably
00:15:10
uh at that they're more sensitive transfer like education level or in these cases the she g. c. s. e.s course
00:15:16
so all here again this is pollution agree score for educational attainment which is also sounds
00:15:22
crazy hard genetics tell you how long you will will you stay in education
00:15:26
actually they do and and the genetics already now explain about ten percent of these
00:15:31
but you have to be very careful how to interpret these results you can still group the people like it top ten percentile based on your genome
00:15:38
and indeed do we achieve better in this test but does it mean that these people should have different education these
00:15:43
people should have a higher level education and you should person as a our education system and probably absolutely not
00:15:49
these are told that to be put forward in this review which i completely disagree with and the reason for
00:15:55
this is that many of these genetic risk what you're predicting the disease or not necessary meaningful biologically
00:16:02
this is a map of the u. k. and the colour is according
00:16:06
to the risk the knowledge and apologetic score for educational attainment
00:16:10
so the higher red is darker red is the higher your education will be about that the
00:16:15
genetically predicted education level and if you see look at the map these are all cluster
00:16:21
and they're clustered around cities and typically so that these ones are here which are uh
00:16:27
uh would be to a black boundary these are the ones that are coal mines
00:16:31
so obviously if you build a predictor uh that there is a constipation certification going also the people who are from
00:16:37
an awful themselves they have slight ideal frequency differences they carry
00:16:41
more or less often by chance just by migration
00:16:45
a certain areas and if we can predict if your from here
00:16:50
and you can productively genetics very accurately whether you're from here or from their from there you can be this predictors
00:16:56
and you can tell you from eugene where you are give bob is already ten years ago
00:17:00
so do we predict anything more than actually just where you are from and whether
00:17:04
you have moved away from a coal mine already you design the city
00:17:08
actually probably half of these genetic predictor is just predicting where you're from
00:17:13
so this is just predicting your social comic state is this not
00:17:16
because of biology because these genes would change a brain structure these are sort of peripheral jean very for probably more prisons which have
00:17:22
nothing to do actually to the mechanism the disease so it's a huge danger to use this for anything to it just helps prejudice
00:17:30
uh but the steelers these are good predictors in the u. k. once you move to another country or you want to compare the
00:17:36
same you you you want to do the same in another continent this will not work it will work much much worse
00:17:41
so this is a very very danger currently to to try to apply these college
00:17:46
uh next course uh and predict a into on other continents another nation's
00:17:51
the same for height heightened situational sounds great and you know that you northern europeans up all the southern europeans there are
00:17:57
tens of that probably at least ten thousand values that have also if a frequency
00:18:02
difference much more frequent in northern in the south so these ideas will
00:18:06
be good to pretty tight but they just predicting where you're from again in
00:18:09
europe and that's been shown if you compare latitude with apologetic score
00:18:14
actually part of disposing score that has been dry for height is predicting again where you're from north south
00:18:21
uh not just neural bottle wasn't work so these are good
00:18:24
predictors we can build stronger stronger predictors the opportunities always
00:18:28
the hard to be given to trade which is on average for most of traits would be around twenty percent
00:18:32
so this is just one part of the picture but it's very easy to measure and you can get
00:18:37
it if eventually we'll have many samples but the interpretation of these will be still a different issue
00:18:43
i would like to use the final few minutes on we give it away from genetics could genetics is
00:18:47
just one small piece of the puzzle of course are interesting transcript or makes your interest matador makes
00:18:52
what is always gonna help us production and explain uh we respond to therapy or not
00:18:57
and also there are the cause of environmental factors but it's more to do physical activity and so on
00:19:02
uh actually genetics can even help with these ones and this is something called mandy random
00:19:08
is asian studies so what we do is for me making randomised control trial
00:19:13
we want to predict if you change something if you changed expression with jean what impact it will have on the disease
00:19:20
the way we do it is that you know the gene expression is her table on average it's
00:19:23
again about twenty five percent each gene has about twenty five percent heartedly so there are
00:19:29
nanny known genetic markers that are changing expression little digits so not what i can do i want to for example
00:19:36
estimate by there you will uh uh develop a or start right is
00:19:42
what i can do is i have a genetic marker that i know that it's changed expression of
00:19:46
the of the g. i. can now split everybody in this auditorium according to what you
00:19:50
carry this marker or you don't carry them also carry the market will have increased expression about
00:19:55
from birth we know this from very large studies that have been done gene expression levels
00:20:00
uh what i will look at these people i will not not together expression that was because i
00:20:04
know it for much larger studies that of course this group will have higher expression level
00:20:08
uh what i want to know is whether what fraction of those people developed particular
00:20:13
disease like a start right is or or or obesity or or diabetes
00:20:17
and if the prevalence difference will tell me the causal effect
00:20:21
of the gene expression level on that uh outcome
00:20:25
and it's very important because then i would like to design a drug which is targeting these gene and it will have
00:20:30
the opposite effect and it will surprise uh the impression that the expression of the gene so to prevent the disease
00:20:36
of course this is an idealistic scenario because uh each drug with not just target one gene
00:20:41
and it would be not only one gene expression would because of for the disease
00:20:44
but we we build a different statistical models to to try to model it and
00:20:49
and to find the optimal trucks at least to prioritise uh under position
00:20:54
so just to a show an example for remote that arthritis we have
00:20:58
these associations signals but we can map now the actual the kosovo
00:21:02
a link between the genes in the region and we can identify genes that are actually even common
00:21:07
because this involvement that arthritis which makes a lot of sense because these shared with allergy
00:21:12
but we can prior those jeans and we can now hopefully design treatments for those
00:21:17
and the very last slide is about we can do it for less than it was fun study to do we can try to predict how long you can leave
00:21:23
again genetics probably spent ten percent of it what isn't more interesting is what other factors are
00:21:28
in there and you from living longer not surprising if you're developing diseases you look shorter
00:21:33
but uh things of the genetics we can use the genetic markers estimate effect of smoking of course smoking is the
00:21:38
biggest killer that will shorten your lifespan the most what extent july spend the most is going to university
00:21:43
and again it's probably not because you you you spend time at university you believe you wasted your life uh at least five years
00:21:49
actually you gained is back because each year you spend at university extend your lifespan by your
00:21:54
so the years again back but it can also use it for body mass
00:21:58
index that each kilogram you gain it shortens july by two months
00:22:02
so the genetic studies can now really put a number on these calls
00:22:06
estimates and we'll not just get here correlates of something but
00:22:10
uh but we'll get really the courses and uh obviously the correlations always much larger but the cause
00:22:15
effect somewhat smaller uh so the studies can be really useful to these out these these causality
00:22:21
so that's all i wanted to say and i would like to thank my group uh
00:22:23
and a very nice collaborators the studies always requiring several countries in several research groups
00:22:29
to put together all what we have is data and and work together toward the same goal so it's it's a very
00:22:35
uh it's great pleasure to participate in these big
00:22:37
corporations thanks so much
00:22:38
for attention ha
00:22:45
put questions if any
00:22:50
the person who's running the we have moved to ask a question no i'm kidding 'cause it's okay
00:22:58
if any question so actually should not be big that i should
00:23:02
be huge data because there is so much diversity that's yeah
00:23:06
so we need the last thing to be really a hacker rate what
00:23:10
where you can display a double on just changing things on the
00:23:15
two small sample size and then you change the world wheeze new
00:23:18
interface didn't to really make captain in the proper way right
00:23:22
yeah so what are the big is always relative it's going be time so currently now we have this data bytes
00:23:27
of data which we need to delve into what's important is since we have millions of variables you need
00:23:31
hundreds of thousands of samples to do anything accurate and there's just like good example previously shown was if
00:23:37
you have a small sample size and you did the risk of over fitting is very large
00:23:40
and you will have too much confidence in in fitting data which is similar to yours
00:23:44
but if comes from another m. r. i. machine you with this option pigeons methods
00:23:48
of statistical methods work as well it's the question in i guess the curve on the
00:23:53
responsible for example we were you were talking but be my c. well heavy
00:23:57
loss of one you need to be in mind would you say you like that a lot but i guess that
00:24:01
one point you losing too much you will lose life again so slow queries directly you curve say yes
00:24:07
and then it's not like a linear or interpretation so interpretation of these done also
00:24:12
have to be very careful the way that just couldn't reactions about the problem
00:24:16
that would just rebounds back and so yeah so we we have the another the power to
00:24:20
i uh actually do the local estimation so you know that from only to be
00:24:24
on my twenty to twenty five a no no the the causal effect and moving down
00:24:27
was indeed it for the the curve like this and it has been just reasonable
00:24:31
yeah you made the point starts genetic associations can be misleading and that maybe some
00:24:36
of the genes that are associated with an outcome or maybe not involved
00:24:40
at all but happy excludes the nature nurture not your facts so that the
00:24:44
gene those play role but only to the interaction with the environmental factors
00:24:50
yeah the that's a good point uh i wish i had that slide euro well it's a supplementary so
00:24:55
these genetic effects for example for education that is gonna also shown is one third of these effects are not directly going to you
00:25:01
and changing something for your brain structure or or or or functionality of your brain
00:25:05
but it's actually you cheryl is that your parents and it's your parental care
00:25:10
which which also is is on genetics so it's how your parents are raised you to
00:25:15
to enforce basically to encourage you to go to school so one third of the fact is pointing directly
00:25:20
this is very difficult to find without having parent of a parental genotype data and all sequence data
00:25:26
uh and gene environment interactions also very important points which we are my group is actively researching into
00:25:31
uh the the difficulty with that is is the scale issue if you look at the uh mine the if you have a lot genetic
00:25:38
exposure so many many genetic risk lives doing we exposed to be hobbies
00:25:43
and you have also many environmental you have an object environment
00:25:46
the combination of the two will not have just a linear effect adding one and two it would have simple interface with an extra burden for these people
00:25:53
but if you change debian mind we look at low b. m. i. just transform the trade
00:25:58
most of these stations do these interactions disappear it's always scale
00:26:01
dependence what's very important is that is being my important
00:26:05
goes early or or is the low b. m. i. is something which is is better predicting disease
00:26:10
so without that context it's difficult to tell whether is that you know moment interaction important on
00:26:18
thank you very much so all ha
00:26:24
hi no before moving to the lunchtime uh we have
00:26:29
uh and not be tied talk where will the
00:26:32
inlet to uh introduced to so that's a senator from virginia you talk about a ah health
00:26:38
and uh i hope that would use the opener you apply for the

Share this talk: 


Conference program

Demystification the digital health world
Thomas Hügle
1 Feb. 2019 · 9:41 a.m.
249 views
Digital lessons learned from musculoskeletal radiology
Patrick Omoumi
1 Feb. 2019 · 11:37 a.m.