Player is loading...

Learning to Segment 3D Linear Structures Using Only 2D Annotations
Dr. Mateusz Kozinski, EPFL

Thursday, April 19, 2018 · 11:33 a.m. · 20m 42s

We propose a loss function for training a Deep Neural Net- work (DNN) to segment volumetric data, that accommodates ground truth annotations of 2D projections of the training volumes, instead of annotations of the 3D volumes themselves. In consequence, we significantly decrease the amount of annotations needed for a given training set. We apply the proposed loss to train DNNs for segmentation of vas- cular and neural networks in microscopy images and demonstrate only a marginal accuracy loss associated to the significant reduction of the annotation effort. The lower labor cost of deploying DNNs, brought in by our method, can contribute to a wide adoption of these techniques for analysis of 3D images of linear structures.

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.

00:00:00

and the o. t. v. show in your rooms

00:00:04

and the context is that's ah having acquired the and specimen a

00:00:10

a a new reality show uh from a mouse brain

00:00:17

yeah in your just would like to have a very very very if i have a

00:00:21

hypothesis abode to the propagation of uh i

00:00:25

imposes in uh in the neural tissue

00:00:30

so uh basically a physical experiment can be performed way

00:00:33

where we attached electrodes to the to the specimen

00:00:37

uh we excise some new ronson observe the propagation of the of the sign out of the impulses

00:00:43

but we would also like to have a eh visual model of this a neural network

00:00:50

um uh if like that in a um relation setup

00:00:58

uh or imagine uh uh we want to see

00:01:02

how a certain you other generated diseases

00:01:05

influence is um the the the topology

00:01:09

of a of a neural tissue

00:01:12

or maybe a simply want to verify the hypothesis is that

00:01:18

a landing in a consistent creating new connections new connections between between you and

00:01:24

so we know each of the scenarios uh we need to uh a

00:01:29

construct a model of a of a of a neural tissue

00:01:32

from three d. observations normally v. observation will be a microscopy image

00:01:37

it's a t. dislike of images of a neural tissue

00:01:42

and uh uh normally well the what the image looks

00:01:47

like is uh to depicted on the left

00:01:52

and the the the the mole though that uh would be

00:01:56

off use for simulations 'cause the form of a graph

00:01:59

so it's overlaid in green at the right side of the slide

00:02:04

and uh and how we perform such reconstructions is usually in two steps

00:02:11

first we have a method for a segmenting the

00:02:15

uh the the the the volume into

00:02:17

uh x. owns land rights and and not axles not murals

00:02:23

uh and then on top of that oh uh we construct it uh a graph

00:02:28

so the details of these uh of this method them not essential but in general

00:02:33

it consists in that creating an over complete graph we sub sample

00:02:37

this page putting more or less evenly gaff note everywhere

00:02:42

we uh drawing uh uh uh the nodes uh

00:02:45

within a certain distance uh with edges

00:02:49

and that there's an optimisation scheme scheme that uh enables us to

00:02:54

and this guy the edges that don't really present your connections

00:03:00

and this talk is focused on the on the segmentation part of this pipeline

00:03:04

because uh there are some interesting problems to be solved a better

00:03:09

and they really most often the a segmentation is performed

00:03:14

by you know a a convolutional neural network

00:03:18

and the the the disadvantage of this approach is

00:03:22

not to uh get a highly uh

00:03:25

uh perform and a and a baby so we need a lot of training paid

00:03:30

and the the uh the the data needs to come as we come we've connotations

00:03:35

and these these uh annotations in a a off to mention of volumes

00:03:40

so as you can expect they are costly to produce

00:03:43

first because on updating in three dimensions is um

00:03:49

time consuming and second because nobody uh

00:03:53

only trained experts uh can actually

00:03:57

uh correctly classify correctly uh perform this annotation

00:04:03

and the uh it we were thinking how to address this problem and within this context

00:04:10

we were actually looking at an interface and the software interface design

00:04:13

for an operating these uh uh then dates and axles

00:04:18

and the of course it's a computer interface so what the user sees

00:04:21

on the screen is is is basically a to do you much

00:04:25

and the way these images uh create that it's called a maximal intensity projection

00:04:31

uh so basically eats uh you you take your volume

00:04:36

the user has the opportunity to rotate its uh over the the three coordinate axes

00:04:43

and uh what appears in the screen is a an

00:04:47

image that is a constructed by taking a maximum

00:04:51

along and they that crosses the volume and the that is basically par uh to the computer screen

00:04:57

and the uh did did did this uh the depicts so through from which this

00:05:02

way has been showed contains the the maximum value that this uh like crosses

00:05:08

so because these images have this night's property that the

00:05:11

structures of interest are brighter than the background

00:05:14

you will get to see a of or the important connections in an image like that

00:05:20

now too i'm not a it's such a volume include a of course

00:05:24

you need to click in a due to the coordinates of them

00:05:29

computer screen and one of the two things will happen

00:05:32

either you will adjust the depth uh manually

00:05:37

uh because there's the felt coordinate that that your remote the

00:05:40

position of your mouse is not a corresponding to

00:05:43

all all the bad will be selected automatically so in the first uh case the process is uh

00:05:51

time consuming and it's really not easy to navigate in a three d.

00:05:55

volume using it to the interface and in the second case

00:05:58

uh often this this that selection is based on heuristics other problem tool to failure

00:06:06

mm so the question so the question we're asking ourselves is uh uh instead

00:06:11

of actually because acting this today elevations can't we use uh annotations

00:06:18

uh uh which is costly us strong dislike going to use a a annotations of

00:06:24

the image that the user sees on the screen to train the neural net

00:06:28

so basically uh let's say we take three maximum density projections the peep that uh the

00:06:35

a top left corner of the slide uh will and that they

00:06:38

just them and uh we'll use them to train our network

00:06:43

the problem now was used in the formulating it out which is a lot less

00:06:48

cost more practical hopefully and the the problem now 'cause it's in the

00:06:55

in formulating a loss function that can accommodate on one side the uh uh

00:07:01

uh volume attic prediction and on the other side annotations of its projections

00:07:07

and the uh uh that can be addressed in a very simple way basically we can project

00:07:14

the automatic production in the same way as as the input

00:07:19

images project that were annotated and compare the projections

00:07:26

00:07:28

and voila so the the the projection of the uh production is performed as i said according to the same

00:07:34

principle as the a projection of the volume when it's

00:07:38

on the tape it and in consequence the um

00:07:43

loss function has the following property uh uh there's a

00:07:49

one uh a cost computed for each pixel or

00:07:54

each of this product projections any depicts a is labelled as background

00:08:00

uh the last actually depends on the largest elements

00:08:05

of the call on a are all or tube of the uh production that

00:08:12

projects uh to this to the speaks and the nice property here

00:08:18

is that um uh for uh but don't speak so we're actually want to

00:08:25

minimise the largest value in the in the in the mentioned colon

00:08:30

and that uh uh corresponds to minimising an upper bound uh in the colon

00:08:36

so you can say that the background big broad background excel is actually constraining

00:08:41

a a hole are all off or or colon of walks us in the production and there's a

00:08:48

uh and there's a connection between uh this and

00:08:52

the classical metal fluffy deconstruction called space carving

00:08:56

where we uh i have for a and number of camera us uh

00:09:00

we segment the images of the scene into foreground and background

00:09:04

and the then uh the construction process consists in abstract terms of shooting at

00:09:10

i. from each of the cameras they emanates if they pass is true

00:09:15

uh and background speaks uh we remove all the box of the the way it's past

00:09:23

uh so this is this is the basic concept of the of

00:09:27

this uh of this last function however it has one problem

00:09:33

uh maybe it typically the volumes that we would like to ah taped uh uh should be large

00:09:39

because uh uh this is more time efficient and also

00:09:42

the the topology of the neural network is

00:09:46

but the scene in large volumes however normally we uh uh

00:09:52

when changing a neural network we only comparable tool

00:09:55

um a forward uh uh to a network uh a

00:10:01

cube over limit that size you tool a constraint

00:10:06

a memory size now imagine that we have annotated the volume

00:10:11

a present in the slide but we can only uh

00:10:15

we we copied and we only forward through the neural network

00:10:19

the uh a cube or a marked in red

00:10:23

uh the problem is uh about or even though we can uh crop uh

00:10:30

the maximum intensity a projection and the corresponding annotations to the corresponding size

00:10:37

uh it it will still contain a a in each

00:10:43

of the structures that uh that are are from

00:10:49

outside of the of the volume group so see here you

00:10:53

see this is a uh these bright then the rights

00:10:56

here and even though they would be outside of this volume they will still appear in the projection he

00:11:02

and the same applies to the uh annotations of course they will be on the fact that so this is an

00:11:08

annotation that does not correspond to our training thing and we found out that you can actually address this problem

00:11:15

at least badly uh again by a a drawing from the uh a field the off a

00:11:21

t. v. construction and space carving and using the

00:11:25

uh um construct caught the visual ha

00:11:29

so uh

00:11:34

yeah i'm afraid uh uh this which is not really well visible in the screen but uh

00:11:40

at the top left uh you have a you have a volume with three um

00:11:46

with the foreground talk so it's uh then in the needle there's

00:11:50

an annotation uh uh there are three uh yeah so

00:11:54

that the foreground boxes in the corners of like your but it's completely invisible in the screen i'm sorry for that

00:12:00

and then there uh uh the uh projections of the fuel with annotations of the wood projection on

00:12:06

the patients so basically each of these two cubes is visible in each of this the projections

00:12:14

thank you very much and the uh the one

00:12:19

and the visual howell is basically a an intersection of

00:12:24

the uh of these uh on the patients included

00:12:28

so as you see it's contains all the original

00:12:32

a foregone folks i was but it can contain also edition of boxes it's a superset of all the

00:12:39

uh and volumes that explain this is on the t. shirts but it has a nice property that

00:12:46

uh if uh uh uh work so is market is foreground in the visual how's it

00:12:51

means it has been my mark this foreground in all the uh on the patients

00:12:58

so we can use this property by observing that if we just cut out half

00:13:02

of this cube the left one with just one for going broke so

00:13:06

uh and we uh um crop the annotations accordingly will have

00:13:11

two annotations with just a single foreground uh um a boxer mark

00:13:17

and the third one we've all three of them mark however we can

00:13:21

the costs act a visual a howl from the cropped um

00:13:27

annotations up painting only one a foregone boxer because uh uh the

00:13:33

other two are not consistent with the in in three d.

00:13:37

we've we've the annotation clocks and that enables us to

00:13:41

prove on some false positive connotations however in

00:13:45

case where a and so this is a separate

00:13:48

uh uh image uh in case where um

00:13:53

the the oak lotions and the and walk so that uh there's not

00:13:58

existing eh in the realities because jack that individual how we

00:14:03

we cannot uh we cannot really uh really cover

00:14:07

the the correct i'm uh i'm not patients

00:14:15

so uh we have we have we have addressed the the presented problems

00:14:20

but no the basic question to ask is if the uh

00:14:25

if it's really a better tool undertaking the two d. then include me

00:14:32

so to tool shed light on the answer to this problem we have around a

00:14:38

small users that the would fifteen users who asked to are not paid

00:14:42

and uh volume uh microscopy volume both into the on t. v.

00:14:49

and you see the results in this plot there's the time of the annotation on the

00:14:53

x. axis and the time of the to the annotation on the y. axes

00:14:57

and that each point corresponds to a volume which has been annotated one d. and one thing t. v.

00:15:04

uh in in on them all the and by by different people and you see that on average

00:15:09

uh other patients in three d. and to be a faster than other patients

00:15:15

so it's top with like you for that and the patient to the going to be faster than other patients really

00:15:22

and in total if you some the annotation time it took eight hours to

00:15:26

annotate or the uh or a working hours to underpaid or the

00:15:31

uh the the volumes in three d. and five hours to on the pay them in in two d.

00:15:39

a a and a zoom what are the users said they

00:15:43

liked on the beginning to d. however uh uh

00:15:48

the three d. uh seem to to some of the use of the t.

00:15:52

v. seems to be easy to be useful to these m. b. u.

00:15:55

where they pick so they are looking at actually belong to annual it just annoys

00:16:00

uh and the side the other conclusion of the study was that uh

00:16:06

a very important factor or you live in the uh interface which should be in

00:16:11

into each simple and fast and basically that is uh but the game change

00:16:18

um uh and we everybody waited the the method

00:16:21

experiment that we just to verify that um

00:16:25

the uh annotation that the the supervision which is

00:16:31

less strict just in terms of the

00:16:33

a consistency of the projections leads to a results

00:16:38

that are much much worse than the

00:16:41

uh results of of uh then the performance of networks trained with with

00:16:46

you the elevations any indeed uh where data sets of um

00:16:53

uh confer concurs quickly no actually this is to puerto microscopy images of of

00:16:59

a mouse brain we got even better results when training on the uh

00:17:05

maximum intensity projection on the patients then

00:17:09

by training on the that would be the annotations and why it is

00:17:13

so does it's not completely clear to me but i guess it's

00:17:17

we should not throw a far reaching conclusions out of the single experiment but we

00:17:21

can safely say that that the method performs well what is also interesting is

00:17:26

that the performance doesn't the the great catastrophic we where the number of i'm not

00:17:31

take that uh uh projections uh is a decrease and yeah i i

00:17:45

yes yes so um i i mean uh it this is a a um

00:17:52

this is an important thing so indeed uh the to the

00:17:56

annotations i hear a obtained by projecting defeating elevation

00:18:01

which is the uh in some sense cheating because while the one

00:18:06

on updating into the you would get a bit different annotations

00:18:10

however seems evaluation in is in a a three d. uh according

00:18:15

to the other patients that come with the data set

00:18:18

it somehow makes sense to consistently use this thing on the patients into the l. p.

00:18:23

however we also uh around uh uh for this data set we

00:18:26

also run an experiment where we undertake that again the projections

00:18:31

uh uh without looking at the original t. v. annotations so of course but as we could

00:18:36

and the performance drop to buy wine point five

00:18:41

io you percent points so not much

00:18:44

uh and the baselines use here where the uh so actually

00:18:48

the basic baseline is undertaking the slices of the

00:18:52

of the volume because you can argue that you can put as much effort

00:18:55

into undertaking slices then as undertaking projections and maybe you will be

00:19:00

uh as good or even better and the answer is where you're slightly but the in if you are not that slices than

00:19:06

if you are not the projections than if you are not the sliced and these two are some existing methods that

00:19:12

uh i will not uh describe in detail and we did the the same uh a series

00:19:17

of experiments where they pass that off uh can focus on microscopy images of um

00:19:24

and it you double the cells and again the the the uh the network trained

00:19:28

on a a projection notations performs not much worse than the one thinking clearly

00:19:34

and we did the same for uh they can set off a

00:19:38

magnetic resonance and you're gonna be images of binaural bring musculature

00:19:43

and with the same conclusion that the performance uh over network trained on on the projection

00:19:49

on that patients use accept them and to conclude the the whole presentation uh

00:19:56

oh i would say that uh uh we manage to considerably lower

00:20:02

the d. v. annotation apple to how to the uh

00:20:07

uh the remote really you a compromising performance

00:20:13

um and as a commentary without but uh the the the this method

00:20:19

is actually quite unique that we can only apply to beta which

00:20:23

uh shows well in a marketing basically pretty projections

00:20:28

and which is sparse enough so that

00:20:30

you can use this property that that background speak sally's costing a whole

Share this talk:

Recommended talks

52:50

Pose estimation and gesture recognition using structured deep learning
Christian Wolf, LIRIS team, INSA Lyon, France
Oct. 17, 2014 · 11:06 a.m.

388 views

30:48

Understanding Transformers
James Henderson, Idiap Research Institute
March 10, 2023 · 8:46 a.m.

369 views

Learning to Segment 3D Linear Structures Using Only 2D Annotations
Dr. Mateusz Kozinski, EPFL

Embed

Transcriptions

Recommended talks

Pose estimation and gesture recognition using structured deep learning
Christian Wolf, LIRIS team, INSA Lyon, France
Oct. 17, 2014 · 11:06 a.m.

Understanding Transformers
James Henderson, Idiap Research Institute
March 10, 2023 · 8:46 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Learning to Segment 3D Linear Structures Using Only 2D Annotations Dr. Mateusz Kozinski, EPFL

Embed

Transcriptions

Recommended talks

Pose estimation and gesture recognition using structured deep learning Christian Wolf, LIRIS team, INSA Lyon, France Oct. 17, 2014 · 11:06 a.m.

Understanding Transformers James Henderson, Idiap Research Institute March 10, 2023 · 8:46 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Learning to Segment 3D Linear Structures Using Only 2D Annotations
Dr. Mateusz Kozinski, EPFL

Pose estimation and gesture recognition using structured deep learning
Christian Wolf, LIRIS team, INSA Lyon, France
Oct. 17, 2014 · 11:06 a.m.

Understanding Transformers
James Henderson, Idiap Research Institute
March 10, 2023 · 8:46 a.m.