Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:00
okay welcome everybody i'm gonna say some words about e. therapy
00:00:07
okay tonight on um i'll i'll present two examples of his orders then say something about
00:00:15
speech technology uh go on to say effect when we design and you've ever be
00:00:21
a third them and summarise and of course i can't cover all and very thankful for the talk
00:00:27
right before we said a lot about therapy and the question is how can speech technology how
00:00:34
so just for clear clarifying you know when we have the uh uh um
00:00:41
planning respiration formation articulation we talked about this function
00:00:45
of the neurological process we talk about language
00:00:47
disorders we talk about this function of the expectation that this is the voices order
00:00:53
and uh finally at this function articulation week about speech
00:00:58
no image just pick i picked two different things so to speak the
00:01:03
on the different uh uh ends of of these orders so
00:01:07
the first one is the signatures some of it's a
00:01:12
phonetic disorder miss pronunciation of the and the
00:01:15
uh intelligibility is not very strongly compromise and it's
00:01:20
still normal until the age of five
00:01:24
well or doing the second then temptation i'm wrong positioning
00:01:28
of the stuff of pong and keith and
00:01:34
so that would be the right word to pronounce the german word c. brought so you have it in the dental
00:01:43
and lateral
00:01:44
so
00:01:46
but but the thing is what can he therapy even do when the situation of course what what we need we
00:01:52
need to say that the person we would tell the person to say what a a certain word so that
00:02:00
oh
00:02:03
i'm sorry i took that out before yeah mm hum well let's let's give it a
00:02:15
c. o. g. known she known beep unknown so yeah the slight differences so basically
00:02:25
uh oh what what would that you therapy do well first of all we
00:02:28
need to know the person say the word we tell them you say that
00:02:31
work so papers and say the word so you need a speech recogniser
00:02:35
and then it needs to the speech recogniser already knows what it's what was supposed to be set so you need the
00:02:41
time alignment and then you need to go into the place where you you say this is where i want
00:02:47
where i'm gonna look at and is syrup is there and prove meant is that correct or so on
00:02:53
now uh uh let me look at another varies different one
00:02:59
so examples for disorders advantages of parts yeah we're needed
00:03:29
we really got madder than a smoker otherwise when
00:03:32
of a recogniser and repeated separate um they
00:03:38
what what can speech recognition to it's it's a very it's it's it's a very
00:03:42
you want much more difficult ah because you don't really hear what super announced
00:03:49
correctly what the articulation in order but the problem is did he say a word
00:03:53
the new lodges and given you as a speech recognition system the problem
00:03:58
did he say a word out of my lexicon is it something out of vocabulary
00:04:03
i needed a very different approach to the technology if i wanted to
00:04:07
use that for a a a pack the thing with something
00:04:12
and uh you know depending on the the the the the disorders
00:04:16
that we have we have voice articulation semantics morphology pragmatic
00:04:21
and it might be affected some of them we know we do your effect and then that haven't
00:04:26
and in fact on the speech technology to the you're gonna use for the therapy in that case
00:04:31
because then you know in some cases it suffices if you to to say okay i'm gonna
00:04:37
tell you this is the x. is that you have to do repeat these words happen
00:04:43
and in other situations it doesn't matter so much of you know you want to
00:04:47
to to to do with therapy things on like memory skills or so
00:04:51
in this case you know memory scale works recognition as much uh has a very different uh a requirement
00:05:00
um and if we look at at at what can speech technology do in the medical domain
00:05:06
well the diagnosis is how intelligible is the patient would be an into like a a holistic impression
00:05:13
okay how how intelligibility i just want to find out that what i need an evaluation sure if
00:05:18
i have a house from the that's the pay next allies that's a very distinct asked
00:05:23
in this case i needed different uh it that very often but
00:05:27
fourteen intelligibility we have shown and many groups have shown that
00:05:31
a a speech recogniser can replace the human listen and naive listener by just you'll you do give
00:05:39
the person the text he reads it and you just see how many words were correct
00:05:42
and it's a pretty good it's pretty good at uh at now if
00:05:46
if you may still lies in that aspect you might need
00:05:50
not only can i recognise somehow intelligible but also i need a phonological features i need
00:05:57
the features that give me phonological realisations that are trained on plus minus nasal
00:06:03
okay so in a uh so that's that's the diagnosis and therapy
00:06:09
control has the situation of the patient improve doing therapy
00:06:13
well in some way that that's the same thing as diagnosis except now i need in a user model right and you
00:06:21
say okay you started here where he's going but otherwise the
00:06:25
on the like diagnosis is is the same thing
00:06:29
if if if you start you know you diagnose the person at a
00:06:32
certain moment and then three weeks later another three weeks later
00:06:37
with the same thing and then you have the user model e. to say okay now with you i can detect a change
00:06:44
of course when you have therapy different or methods you can say which of the therapy
00:06:48
method leads to press the best result for a group of patients which is
00:06:52
again in some way therapy control last now i have two groups
00:06:58
monitoring is there a change in the patients situation in a way the same
00:07:04
underlying technology just with you need to tweak on the parameters because
00:07:08
you know you're you're looking for mile four minute changes were in the therapy you looking for okay i can see a strong
00:07:16
screening does the person show early signs of a depression for instant yes
00:07:22
screening is a little bit out out here because i mean
00:07:25
here you try too much more look for control words patient and
00:07:30
finally that's what we're talking about here computer assisted therapy
00:07:33
it the patient perform the exercise correctly of course that had a lot to do
00:07:39
with diagonal the but there are people call the underlying technologies them but
00:07:44
what is new here what is really not so much is the the patient much more
00:07:50
what you were saying about that but but but you know the it it i cannot i i
00:07:57
have to to encourage the person to keep on going the gaming or so effect you know
00:08:02
so the motivation is something which is much more important and out of
00:08:06
that and we see that the graphical user interface is is is
00:08:10
it's much more important then in a situation up the therapy controlling
00:08:15
the therapy control i have to to to to graphically show
00:08:20
the the therapist up eight you know the person was here now is
00:08:25
here uh with the patient that's on a new different ball
00:08:32
so let's say livid about speech technology
00:08:35
um and i will address within phoneme uh recognition
00:08:40
acoustic speaker model laying prosodic analysis and visualisation
00:08:44
and you know i will say a little bit here and there and you can they are but i can use a
00:08:49
uh uh i can stop that with the net instead of this classifier doesn't really
00:08:54
matter that much i mean the the idea is on the line every
00:09:00
and i see i i like that's like um that that's from
00:09:06
mike to genie which any of 'em i. b. m. initials
00:09:10
how from nineteen ninety to wear a switchboard was
00:09:13
introduced over thirty years the recognition error
00:09:18
uh changed now this is a recognition problem that is
00:09:23
i'd say well nolan it's difficult and the recognition rate so far from things with
00:09:30
medical domains but still and you know granted that this is a logarithmic scale
00:09:36
what we see is we started thirty years ago what eighty percent error
00:09:41
and then you know you had improved fast added patient of the of the hidden
00:09:46
markov models and improve it markov model training and it got down to about
00:09:52
twelve percent and now in the last three four years the impact of the learning really got
00:09:58
us down to five percent and granted that the recognition rate in in in in other
00:10:04
the costs that when not as familiar with didn't work and all that hot for thirty years and a
00:10:10
lot of people have work and oh so great that that it might be a little bit worse
00:10:16
um on other task it did bring us closer to human performance so
00:10:21
what we have right now is technology that really is close to
00:10:26
human performance and that we can store really thing about using
00:10:32
and you know speech technology when you have a speech of phoneme recogniser
00:10:36
it it's still there is end to end systems but still the majority is hidden markov
00:10:41
uh or model pasted off the shelf technology you have hidden markov models
00:10:47
i'm in the medical domain you will you might you you might it might be helpful to be able to adapt
00:10:54
a a two eighty uh that with a small amount of acoustic and language model uh training data
00:11:01
and typically you take them out cepstrum coefficients and energy and the rivet now
00:11:07
but think the good thing is there's really um oh what recognition out there that you can use
00:11:16
on copy and one of our partners that u. m. l. the
00:11:20
european media lab so with you really wanna use anchorage
00:11:24
uh some of you yes ours to to think about should i spend some time there they
00:11:29
will allow you to use a a speech recognition system and to adapt the language model
00:11:37
well
00:11:39
the the the um the possibilities you you either you train
00:11:44
your role or you use an off the shelf
00:11:46
recognise and you might want to have to do at that uh that so i'll acoustic model ling
00:11:54
um if you if you if in the therapy you need to learn to do some of the acoustic modelling it might be
00:12:01
beneficial to happen at a access to the acoustic model which in in in this case might be good to to
00:12:09
train a standard recogniser using call the and then take some
00:12:15
of your uh uh um but the locket speech samples and that that which
00:12:21
means that your speech recogniser now goal was from a naive listener
00:12:29
who is trained on a human who on a human speech model for initial for terminal for dutch
00:12:36
to a more expert listener who will list who will be trained
00:12:40
people already have heard some of the pathology that you're looking
00:12:46
and you know typically you use them state of your adaptation for if if that is that
00:12:52
now language modelling depends on the kind of spoken text so if you haven't known text
00:12:58
you've you might wanna restrict to the vocabulary restricted vocabulary because
00:13:04
you know always reading the north wind on some
00:13:07
you know he's saying isolated word you know always reading sentences
00:13:10
so might as well restrict your your um vocabulary um
00:13:19
to to to the word of that to the path that and still see how good this recognition
00:13:24
if you have an on a a text on onto a radio raider you can
00:13:31
replace one of the raiders by a a medium size vocabulary on you know on it
00:13:36
that that uh uh recognition system in that language medium size that'll do the job
00:13:42
you might in that case just use of every stupid language
00:13:46
model because language model tells you what sequence of words
00:13:51
comes in what order and in that case you know if you're looking for
00:13:55
the acoustics you wanna put emphasis on how does he realised that sentence
00:14:00
if i give him the sentence they read that so you might which the language model
00:14:06
it's on the other hand you you want to see if uh a person with dementia how we see saying
00:14:11
these words of course the language model will be very important and you can't do that so spontaneous speech
00:14:17
means it basically you want to see how the person produces the speech
00:14:23
what other words what is the on the lines and that
00:14:26
what is the underlying semantics in that case you you want to to be as
00:14:31
error free as possible and then do an analysis based on the transcription
00:14:37
so based on what you do when your therapy uh it it it have an impact on
00:14:43
what we were going at that the line that the phoneme or speech recognition stuff
00:14:50
on
00:14:52
another thing is that you don't want to go and and use for word recognition but care to write the
00:14:59
acoustics direct now in this case you would you would you say acoustic space of speakers you can model
00:15:07
can model an acoustic space i have all the people who speak english then i
00:15:12
have some known nonnative set some women that's yeah i can model the variation
00:15:17
so the space represents the multidimensional characteristics uh off voiceover speak
00:15:24
and the degree of pathology varies in this acoustic space and it's one dimension in the acoustics
00:15:31
and if i keep certain other ones constant so i only look at
00:15:36
young people and i only look at young uh a controlled another's
00:15:43
then i can say okay one of the bed and they
00:15:45
all say the same thing then one of the variations will be via uh in the acoustic space according to the pathology
00:15:52
and then i want to try to find characteristics of the degree of the speech or speeches or
00:15:58
and so as i said you know one when i hear it is taken from speaker recognition systems
00:16:05
and and you can replace now the u. b. m. with the
00:16:09
neural network but basically they all do the same idea
00:16:13
they say i'm gonna model the space and i'm gonna model the deviation
00:16:17
whether that is an eye vector or a change in a universal background model it doesn't matter you know
00:16:24
um so you you one of the acoustics for instance back out
00:16:28
mixture model you train a universal background model with normal speakers
00:16:33
your training gaussian mixture model with a have a lot six pieces and then you transform
00:16:40
this model into a vector and then you do a classification or regression
00:16:46
so to to to to say let's say this is my feature dimensions
00:16:53
okay and i start training a system and basically the variation here is
00:17:00
he acoustic variation of the underlying phones 'cause that's the strongest deviation
00:17:07
so now i have a model i have not mean backed and might deviation might might covariance matrix
00:17:14
then i'd just take some pathology could be one person or or pathology pool
00:17:22
and i train the haitian how do these underlying phonemes no t. v.
00:17:31
then i have those model and i can transform just taking all the mean
00:17:36
vectors or all the covariance matrices i get them into big back
00:17:40
so now i asked the person to speak something and out of a certain
00:17:45
i have a fixed i mentioned vector and now i can use that
00:17:51
to say the differences between my groups now either i have my super vectors and a half two
00:17:58
groups like without you wanted to or that probably you wanna control and i want to classify
00:18:04
in this case i pick some classifier which will training me to fuck spaces i have a new
00:18:11
person i just look what type you don and i cough cool based on that stupid
00:18:19
the other it's uh uh and and noticed that beneath them engines are now the mention of my
00:18:27
in this case my sober vector spaces only the x. x. it
00:18:31
and i think we apology is the wire
00:18:36
and now what i want to estimate is not classified healthy or not i want to classify how strong is the degree
00:18:45
again i train my so my so go back 'cause i train some kind of repression i'd
00:18:53
create any uh support vector for a pet speaker i estimated degree pop up without
00:19:01
okay now this is the underlying technology that you used to say
00:19:06
did you perform it correctly has improved from the last session to the news
00:19:11
basically you you measure the degree well what do you need then for
00:19:16
the in therapy well you need your model to say it has
00:19:22
um the last time you perform these exercises like that now we've improved and
00:19:29
the underlying technology used a a correlation that estimate for the week
00:19:37
um
00:19:39
things that that we do our uh in this direct
00:19:43
marketing is that we we calculate acoustic or prosodic
00:19:48
or form a phonological features and these are the underlined dimensions for i was uh uh so perfect
00:19:55
so prosody is written into nascent stress related attributes we
00:20:02
compute these onward levels across several works across
00:20:06
syllable nuclei in which case we need
00:20:09
uh automatic uh speech recognition what we also can do is just to say well the hell with it will just
00:20:16
may show them all ten milliseconds and then compute function over
00:20:21
so in that case we have something like the mean f.
00:20:25
zero role and the standard deviation and and and
00:20:31
we can have local features like part of before and after segments
00:20:35
we can then calculate dysfunctional slight mean standard deviation maximum minimum
00:20:42
um we can have global features so independent of what the person
00:20:48
says i'm gonna have something like the children the shimmer
00:20:52
um or the voiced unvoiced characteristics i get about two hundred features per
00:20:58
utterance and then i'd take the functional it's and i get about
00:21:03
four five thousand six thousand features that get enough functional is like position in the third what time of
00:21:10
fourth what have first what pile standard deviations tunas and so on i can get a huge feature
00:21:17
and if you look at one of our partners the uh university of awkward cannot
00:21:23
hearing you can get and feature vector that will do the job for you
00:21:28
it's it's a six thousand dimensional feature vector it's it's a good start and then you can look at what's
00:21:35
up vector of this is helping okay but it's a good start you can download open smile um
00:21:42
uh uh it it's been used for ten years in the in the speech um
00:21:48
a a para linguistic challenges and always produced competitive results and then you
00:21:53
know this the work begins because then you have to say
00:21:56
which feature really shows me in my situation something back and then use
00:22:01
as an indicator you did that correctly or there's an improvement
00:22:07
and another one if you go to the it yep form bar here you unit can
00:22:13
get phonological features rented that they were trained on english but they work amazingly
00:22:18
well even in other languages and if you do have good um if you do
00:22:23
have good data you can retrain the whole thing on your own thing
00:22:30
again what you basically get is you can transfer to a certain part of the speech signal into
00:22:37
the the feature vector that says howl nasal lust was that's out how ms like what that sound
00:22:43
and then you you create functional suntan and say how strong is an excellent station over the whole complete other
00:22:51
um so basically what we use for evaluation well we're
00:22:56
word accuracy in word correctness for intelligibility we have
00:23:01
calculated features based on the acoustic models on the prosodic model
00:23:07
um we can use correlations like spearman pearson uh um
00:23:13
based on these calculated features or the word accuracy
00:23:16
word correctness uh and compared with human listeners
00:23:20
um we can classify based on these calculated features or interpret interpret which of
00:23:26
these relevant features which of these uh uh features with the most relevant
00:23:31
now another thing that um
00:23:35
or the or the last thing i want to within this uh objection i wanna go go to is
00:23:40
how could visualise you know a a therapist as wide use
00:23:44
like that's your problem they improved enough i tell them
00:23:47
well that's because i can look at the variation in the seventh not cepstrum coefficient is not gonna believe that
00:23:54
okay so so basically station is important but keep in mind busily chaise in
00:23:59
can be either for the uh uh for the patient or whatever
00:24:07
so i'll basically when when we transform the speaker into a vector this beach
00:24:14
can be seen is the point in the very high dimensional space
00:24:18
okay i have us huge factor in it has not cepstrum features prosodic features whatever
00:24:24
i i find out okay this is an important one then every to sit
00:24:27
in dimension to you i still have a hundred dimensional feature space
00:24:32
that's pretty are quite so basically what we do what how helps a lot
00:24:36
of what i can recommend this you transform that into us lower space
00:24:41
and then you you you explain you say okay i need a lot of time mentions
00:24:47
to get too into that but now if i try to read to was that
00:24:51
with certain techniques then i can show you a group of people and
00:24:57
where is my patient okay and i did that as an example
00:25:03
um we started with young reference book okay now we talk some
00:25:09
that cepstrum coefficients because with that both the character right no
00:25:13
it's interesting to see okay here is the males here's the thing so there's no mention nicely separates the two people
00:25:21
so now we add it
00:25:24
i'm old reference book and in that case only male
00:25:29
so now you still have that stop
00:25:33
space
00:25:35
and then you space for the for the new speakers how do they come up well the acoustic difference
00:25:42
give someone new subspace that is close to the mail subspace because they're all elderly met okay
00:25:50
now i play uh people with the phone pathology in this case
00:25:55
um after the speak with the uh at the learning correct me so
00:26:01
of course they will be most of them are elderly
00:26:06
in this case they were all male that's why we added bills control speakers what what what what i am well i cannot
00:26:13
be somewhere closer to them than that i'm right which is very nice yeah that's where my larry get the means come
00:26:22
now i get some chronically boards
00:26:26
people well you know i don't expect them to be here expect them somewhere here right and that's exactly what happens
00:26:34
see here if your a lower and lower lower rachel voice here is young elderly speakers use
00:26:41
you young speakers here it's your chronically four speakers nicely separating the male from the female
00:26:49
and if i have a transformation like that and i project a new patient in it not project a
00:26:55
new patient ordered protectorate i can show that to somebody who doesn't know what about cepstrum coefficients
00:27:01
i can still say or tell them okay this is when you started and this is where the therapy problem too
00:27:08
okay so basically you need to find a way to transform
00:27:17
as speakers state or speak in space which is very high dimensional into low dimensional
00:27:23
and then you say okay this is speaker group one this is big group to this is bigger group three
00:27:29
this is way to store that this is where you was not after three months after six months
00:27:34
so that this visualisation is very very important because you can not argue
00:27:39
with certain uh features unless you have a very good feature for exactly one exercise
00:27:45
when you say okay say that uh isolated by our nose say it as long as you
00:27:50
can and then you say okay your pronunciation time or five seconds six seconds seven thing
00:27:56
but very often you you have um high dimensional space in which case it's important division
00:28:05
so when when we had exercise with isolated by all of all we need the phoneme recognition and it caught
00:28:12
up acoustic and prosodic features like you know shader or some like the one we have unknown text
00:28:20
technology can give us a naive listener how intelligible is that
00:28:27
unknown text to listen of course not to the speaker when we have
00:28:31
known text we kind of can simulate a a expert listener
00:28:35
we need worked recognition we need acoustic speaker modelling we need prosodic analysis
00:28:42
and when we have spontaneous speech we need to we might need prosodic analysis
00:28:47
as we saw in the the main showcase you know how long are the parts is very very important
00:28:53
we also need a syntactic semantic analysis based on the word recognition
00:28:58
so word recognition is only the tool and then you look at the
00:29:01
words and you simulate okay i assume that uh i have
00:29:06
sixty seventy eighty ninety percent correct it's it's and it's
00:29:12
in my opinion it's still on soft how how good for instance for
00:29:17
the men shot detection we don't need a hundred percent worker
00:29:21
i think you know i i
00:29:25
i i don't have the data but that would protect protect
00:29:28
somewhere at at eighty five uh percent you can stop
00:29:32
because those fifteen percent will not don't don't try to go to a hundred percent you know
00:29:38
the stork with word recognition that doesn't make errors and then still assume i have a hundred percent
00:29:46
uh on the other hand you know that if you have a
00:29:51
and that leads me to the to the to the next subject
00:29:55
um if you if you have a two class problem like the the patient
00:30:01
perform something correct it's a two class problem and you have eighty five
00:30:06
percent that's pretty good or eighty percent but eighty percent means in one
00:30:10
out of five pratt exercises you tell the person the wrong thing
00:30:15
and that is wrong
00:30:17
because that thing with it so it's much more important doing how
00:30:22
urchin okay so that you know like like if you look
00:30:28
at what the teacher to what is it the bright purpose to see immediately adapts to the level of the person
00:30:34
and then he makes corrections in a certain amount like we had a a a
00:30:41
language teaching tool we asked people the teachers to say when would you intervene
00:30:47
and
00:30:49
they listen to the stuff and they said okay i would not accept this pronunciation that that's that's the agreement was horrible
00:30:56
what was interesting though that every single teacher on average
00:31:01
over all that gets more like five percent
00:31:07
which means you know on average i'm gonna interrupt my teacher and say no
00:31:11
that was not correct on average in one out of twenty cases
00:31:16
okay so five percent mark for the same thing you look at all the teachers which means
00:31:23
you know you can always sometimes uh that we pronouncing it wrong
00:31:28
still encouraging encouraging and that's that's another thing then you
00:31:32
don't wanna do the same thing to the therapist you wanna tell the therapist so the visualisation has to be different
00:31:40
okay so let me as a uh come to some aspect when we design and if there
00:31:46
are people on the segment has some we designed a couple years ago or tool
00:31:51
um it has to be child according to the h. soul you
00:31:56
know they loft animal so we always all the all
00:32:00
the words that we got to have had something to do with animals it was that kind of uh uh um
00:32:07
easy interface um they were doing it together with the therapist
00:32:14
um and and of course but you know you you need a child appropriate response and
00:32:19
a lot we we played around and a lot of smiley it's you know
00:32:24
on the th speech technology we used was phoneme in word recognition basically we talk of
00:32:30
uh uh the word did you say the word then we went in the phoneme
00:32:34
how good was the phoneme found some phonological uh uh knowledge and
00:32:40
what we're currently working on it or that this ought to get tool um
00:32:46
and i'm just gonna say some of the work that we're planning we're currently implementing it so
00:32:53
what we say it it's got be big enough so we don't train on the on the on the smart phone we have a habit
00:33:00
and what we're planning to do is make it simple for the people
00:33:04
so we have the the training that's the first takes the hell and the upload into onto the server
00:33:13
and uh the training always tells us where are you in the
00:33:17
training what is the next exercise and then let's start
00:33:23
and then you know you have a uh the best turned into a display where the therapist
00:33:30
gives you that there are uh gives you the exercise and then you read it
00:33:35
and we have a whole battery of exercise and what we believe is that you know it's gotta be good
00:33:43
for the therapist and for the patient so the therapist should be able to say okay i can personalise
00:33:50
so we know in order to personalise it you say okay have a whole big battery right
00:33:54
of of of therapy so when you say something like type token aces articulation
00:34:00
lip closure or sustained vowel how long perturbation lip shaping and so on
00:34:05
so in this case we have a camera and the microphone
00:34:10
and the therapist it's okay to this exercise and
00:34:14
then the the uh you see what what am i supposed to do and you do it
00:34:21
um we the patient can view with certain aspects and like i said in
00:34:26
that case it's gotta be very simple like to ration of of well
00:34:30
so it's gotta be something where you say o. k. a. cola you did you know you get that now for
00:34:36
seven weeks in every week you get to a a a better you start it very good now you so
00:34:43
so the patient can few with the results and compare peppers performance
00:34:47
over time and it helps them to pete motivate okay
00:34:53
um that the results are also uh uh this uh both for the to the physician of speech therapist
00:35:00
and what we plan to do is for privacy reasons only do the evaluation on the device
00:35:09
um so
00:35:12
but it it p. it's important to to to keep the motivation high it's much more important keep
00:35:18
the motivation high than to say you get this wrong listen to me let me repeat
00:35:22
so you'd rather when you set the threshold you should rather say
00:35:27
okay good let's continue will uh instead of saying repeating it three four five
00:35:32
times because then the person says i can't do it and if so
00:35:37
it's also important to pay the fee that you know you do when good you're getting better
00:35:43
now for the therapist it's also important to monitor the patient
00:35:47
over time but also project them into this petition space
00:35:50
as i indicated before where is he where was the easy making progress
00:35:54
it is this improvement good after three month or is he
00:36:01
compared to the other people much slower within proof or see more constant
00:36:08
and so so so you have the results for two
00:36:12
different evaluations for the patient results to the previous
00:36:17
results of the same patient and for the therapist you take into account also the patient group
00:36:25
um
00:36:27
i think it important thing is also the the the the training and have to have some
00:36:33
some uh uh update information so the the so it it it should be able you should be able to contact
00:36:40
the therapist the therapist should be able to download after he
00:36:45
sees a result download the exercises and and it should
00:36:49
also have an alarm from some function hey you didn't do your exercise if you wanna do it now
00:36:55
on so new patients have to be registered easily uh so the interface to the the therapist
00:37:02
very often therapists are not very keen on technology so it should be easy to register
00:37:09
and it should be easy to town download the trainings fine so you we
00:37:14
have a hierarchy of of of exercises and what we're planning to do
00:37:19
is is to say okay now i wanna practised for this i have just with this exercise
00:37:25
let me individualised the plan for the next five of four weeks for the words
00:37:32
so let me summarise um the disorders affect different linguistic levels
00:37:39
speech technology can guide through therapy we need for the e. therapy and incremental
00:37:45
user model which we don't need for diagnose it we don't need
00:37:49
uh that much for a a therapy control um so the incremental user model has to update
00:37:55
itself after each week or a check to size the pain has to be encouraged
00:38:02
uh too many correct corrections that discouraging so rather have
00:38:07
this you know remember the phase where you know
00:38:09
it's mild it didn't smile and the the one in between uh_huh but okay let's keep on
00:38:16
third it uh has to have the possibility to individualised the therapy so
00:38:21
i don't believe that that that's a stand alone therapy you buy and that's it and you do that
00:38:26
that's not gonna work you need somebody to look over it and individualised it and the the performance analysis for
00:38:34
the patient has to be different from that for that that with that i thank you for you
00:38:40
listening
00:39:11
oh absolutely
00:39:18
i i agree with you on the thing is i can do with something with eighty percent even if i have only eighty percent
00:39:26
on the top five problem because the only thing i need to do is have them do when x. is that ten times
00:39:32
and then my eighty percent will say he's doing it right or wrong
00:39:37
okay so
00:39:39
where would be eighty percent i. k. i should not give
00:39:43
the individual feedback too often i mean i agree with
00:39:47
you that you know if it's very clear that's that's what i said that the teacher will correct the
00:39:54
the try out the one language learning eye on average in five percent some with three percent
00:39:59
some with a eight percent but on average you know if you if
00:40:02
you look at the docking curve it was all wonderful around
00:40:06
uh uh for twenty teachers we had twenty teachers and they all had the the carton around five percent over all the students
00:40:13
so you know if you're sure then intervene but the tendency even
00:40:19
with the eighty percent you can give a good tendency because
00:40:22
you know you're getting pretty good when i mean if you do it ten times that means in eight out of
00:40:28
uh uh ten yes you're doing the right thing and then you much more sure about
00:40:33
but what you do one and then you get the therapy uh the feedback he
00:40:36
didn't do with that good enough that it's the one where he still has
00:40:40
where he got the worst course he can that the therapist can control it and then say okay let's keep on doing that
00:40:47
that that's that's what i'm but i think that's an important aspect because otherwise you tend to discourage and then they stop
00:41:12
that that what's the you face a in in effect yeah i i i think
00:41:18
what we need to do is is is have not uh to class problem
00:41:23
but the bike that into i'm definitely gonna say that was wrong let me
00:41:28
play it again and then something like all in could you please repeat
00:41:34
and the other one was good let's go one and we we need to it that that but i i think you
00:41:40
you know user um i play around with users or or say i'm gonna i'm gonna
00:41:47
be at c. b. or teacher or a soft teacher you know that that
00:41:53
where you shift these thresholds when may do you make that decision on when you would do you
00:41:58
make that decision that is something that that we uh uh we should we should do
00:42:14
the therapist have to do that i don't think i don't think the in therapy can do that if i a human
00:42:22
so much better it at any time at their place and i know this is the person who who will
00:42:28
uh uh to give up easily they're not better take that that the soft teacher
00:42:33
and i know this guy really really really wants to get better you know i can give my heart teacher
00:42:39
i think i think it you know i would i would never touch that right now with a machine i think the therapist has he
00:42:46
uses i mean basically it's that the what's the difference between therapist and
00:42:51
machine learning the machine learning is trained on the scene data
00:42:55
it doesn't know all that this is a person who will give up easily
00:43:00
a therapist you know he uses all his common sense and evaluate the person
00:43:05
in many different dimensions these dimensions ah not known in the statistic model
00:43:10
so don't leave that to the machine give that to the teacher and say you know
00:43:28
you
00:43:35
that
00:43:36
matthew i agree but that's big data analysis that like data analysis that's
00:43:41
not something because that information is not there for the machine learn
00:43:59
yep in our in our terribly uh uh uh uh uh uh system what we do is
00:44:07
um the therapist makes does the exercise and it's still
00:44:12
and then you know in the case of correction it says let me show you again and then it shows you again

Share this talk: 


Conference Program

Introduction to Phonetics and Speech
Rob van Son, Amsterdam
Sept. 24, 2018 · 9:02 a.m.
297 views
Dysarthria
Marc de Bodt, Antwerp
Sept. 24, 2018 · 9:45 a.m.
303 views
Children’s speech: development, pathologies and processing
Alberto Abad, Lisbon
Sept. 24, 2018 · 2 p.m.
102 views
Speech after Treatment for Head and Neck Cancers
Michiel van den Brekel, Amsterdam
Sept. 24, 2018 · 2:45 p.m.
144 views
Speech therapy
Marc de Bodt, Antwerp
Sept. 25, 2018 · 11 a.m.
114 views
eTherapy
Elmar Nöth, Erlangen-Nürnberg
Sept. 25, 2018 · 11:45 a.m.
128 views
Assessment in speech disorders
Virginie Woisard, Toulouse
Sept. 25, 2018 · 2:45 p.m.
140 views

Recommended talks

Bridging the Gap between Signal Processing and Learning - Research Overview
Rahil Mahdian and Youssef Oualil, Idiap Research Institute
June 21, 2012 · 11:44 a.m.
154 views