Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:00
so everyone is tomorrow night and you're surfacing and i work in
00:00:04
your stride in portugal is been under the supervision of called after that
00:00:09
and i will give you a bit of recap of why i'm what i'm doing here
00:00:16
when drinking stoppers thing is when you have a speech to text application
00:00:20
uh usually two pretty well or if we're isn't it and so we did learning uh on adults
00:00:26
but the problem is when you use it's the same kind of model
00:00:30
and the same kind of system on shock when it's not working well
00:00:35
mainly because children are i'm not like adults you have much
00:00:39
more viable between speech and damn frequency in term of right
00:00:44
uh and also serve their vocabularies and uh they're currently learning book to all
00:00:49
to tell so is that one of the same colour bearer token varies an adult
00:00:55
and if you are too that's the fact that the jury novel political speech it's much worse
00:01:01
so it's really important to have a kind of therapy for this should run
00:01:04
in order to be integrating decide to uh during after that that i would
00:01:10
and to so's do we they're happy with therapist that's
00:01:14
my goal in this project is to bring two therapist
00:01:18
and to the should when uh some tools that can do therapy for example at home
00:01:23
and so might title to my project is
00:01:26
developing speech therapy games for children with speech disorder
00:01:31
so to do so we divide the problem in three uh important points the first one is obvious you
00:01:37
cannot work with children speech with speech disorder if you don't understand it if you don't have a good system
00:01:43
no most uh should when we've uh no disorder
00:01:47
so we i first in this investigate and speech processing
00:01:51
and automatic speech recognition for normal and healthy children
00:01:55
um my second point is uh obviously to switch to but we could better logical speech
00:02:01
and the third one uh is to uh traits tools uh in using
00:02:06
vindication uh for this should win in order to tear up yet on
00:02:11
so i mean you during this use a menu and estate on the first point about
00:02:16
also try some few things on the last point but i will explain it
00:02:21
and from the way we will first focus on the first uh first point
00:02:25
so our method is to use state of the art as you know uh told us uh
00:02:31
computer vision state of the art using a transfer learning so adaptation
00:02:36
and we want to do this adaptation uh in the acoustic model
00:02:40
to to do so we have we know that we have good system for that so we use as models of
00:02:45
reference model uh an adult acoustic model and we want
00:02:49
to adapt it to start with you that after two children
00:02:54
um so i will explain later we to the experts we
00:02:59
also wants to use new features more than ever since you
00:03:02
uh because children is our children is difference an obvious on variability so
00:03:07
we want to investigate on speaker information speaker embedding in order to maybe
00:03:12
compensates uh during the acoustic model uh in the customer that has
00:03:16
just variability so uh just show very simple uh does that well
00:03:24
uh other uh very simple um acoustic model uh using d. n. them so are you off the uh
00:03:31
oh yeah you have the input layers you as you can see there is two bucks a first one at the left
00:03:36
uh is the uh one with adults and the one on the right is the chip one
00:03:41
so the idea is to uh just give up this ad acoustic model different a database can say that that
00:03:48
one all about adults uh first we transit and suddenly
00:03:52
one with the children and the great doubt i'll just actors
00:03:58
and this is done with and to do to do that's uh we first lan we've all the other data
00:04:04
and after that we train again with our children that are but we reduce the number of a
00:04:09
book and obvious years also the landing right this is a common uh be of y'all uh in
00:04:15
uh in computer vision so we we still doing the same thing and bring better results
00:04:21
so in some of that uh i use which his data um so i use children and adults obviously
00:04:28
um so the digits uh when you can read the there is plenty of
00:04:33
formation we all bets i i want to talk a little bit more about my
00:04:38
children that that because it will impact my results out that i will show you after
00:04:42
so if you look on my letter that that it's really in a
00:04:47
everything is in that let's read what's written speech so sorry twenty
00:04:52
uh since our children's there are plenty of a mistake we can say
00:04:56
and some of them for example twelve percent is just repetition of ruled
00:05:01
you also far starts sermons pronunciations substitution
00:05:04
of word insertion deletion and mismatched pronunciations
00:05:09
so it's maybe it's a bit confusing so i take some example from my data sets
00:05:14
so uh you don't need to land to to know how to speak portuguese to understand how it works but it's
00:05:19
for example you up to and how what was uh which is instead of saying
00:05:25
sorry for my portuguese intense few i'm right it just safety and after that to silence and who
00:05:35
and you have also ms pronunciation so here you have the
00:05:38
normal way to say the word and all we substitute it's
00:05:44
so there is also kind of are also the most common one is a
00:05:48
false start on the mistakes that are corrected a major so yeah we have
00:05:54
the correct word and this is what is pronounces
00:05:58
uh uh and obviously so you can have no it's more example for example your as you can see there is plenty of thing
00:06:07
got some get which is a good some word you have
00:06:10
some a repetition you have some a a an active donation also
00:06:17
and some insertion and if you are the everything uh you cannot woke up one of the uh even if
00:06:23
i woke up what little working i will love it's complex because you windsor should when dunes penn station uh
00:06:30
so what did you say is not for your word so you'd have to do transcription i get is just from them and
00:06:37
i want i want to work out with that so or is the results i get so far with these data sets so it's
00:06:45
yeah so would that's yeah we we can see the kind of tendencies which is this is what i want to show you
00:06:51
so i use m. f. c. c. and i use i victor's this is the part of my problem
00:06:56
and i used for experiment the first one i train only my
00:06:59
data and i only on others data this is the first row
00:07:04
uh after that i do only using my chevron that ties in
00:07:07
my second row and might also learning approaches my last remote control
00:07:12
and as you can see the best result gates for the children uh
00:07:17
at it's only we've i'm transferring approach and the best wannabes with i victor
00:07:24
so it's not so good uh i'm aware of that so
00:07:27
i'd decides with about two to switch to another language to another
00:07:31
that the sets we decide to switch to english because we find this my same struggled at the base
00:07:37
uh which is not anymore a reading a a speech but it's just really simple speech or or
00:07:43
speaking page and also uh we are interesting does that that is because it seemed
00:07:48
to be very used to the state with its can see almost five hundred hours
00:07:53
a lot of your turns and there are approximately two hundred
00:07:57
uh our that's i've been manually transcribed so no possible error
00:08:03
so it would be in men more interesting to work with his status it and what
00:08:07
we want to do is to incorporate uh in addition of uh and if this is
00:08:12
new features for example uh we want to try we've uh
00:08:16
expect uh which is uh on june and alternative of i victor
00:08:21
um we not out no new new features a speaker information and also why not prose
00:08:28
or do we we want to try different thing and using these data base which is
00:08:34
quite large we can try a much more experimentation for example we can try
00:08:38
to transfer ending with a certain amount of uh of time for example than a war
00:08:43
and we can try also um to improve this time to see all the transatlantic work
00:08:50
and so this is for my for to walk so what is this year
00:08:54
uh it's mainly uh i had some course with my inverse does that are mandatory so i follow
00:09:00
a four of them uh so i already talk uh it's been about street it should turn any machine running
00:09:06
and rushes to bitch and during since uh it's been
00:09:10
i finished interdisciplinary research and advance to begin entertainment systems
00:09:16
and i would just come back to to that says the last one because yeah it's a lonely to discover
00:09:23
all to do game because in my project uh names
00:09:26
every spain for children so i create is little game uh
00:09:31
which is voice activated uh the pitch control the action of the play of you know if you do a low pitch it just
00:09:37
run if you do i pitch it will chant and the energy
00:09:40
options pitch of your voice uh it gets a bit altogether action
00:09:45
well it was a first try i'd recorded from scratch it's probably not good people but it work
00:09:50
and it's cool menus as a peach an
00:09:54
image training uh for any disorder for example
00:09:59
and i don't think i would do anything what it came up
00:10:03
and in some of conferences uh i want to uh at an into
00:10:08
speech uh i will in turn uh and the speech writers is most
00:10:12
um i also this uh some attends a sparse uh which is
00:10:18
a a conference that i walked i i wrote a blog posts about so you can read read about
00:10:24
it's you q. interest i was presenting uh oh
00:10:27
some work uh basically to master is a work but
00:10:31
uh the interesting thing about this conference was there is plenty
00:10:34
of poster and lecture about the planning and speech and also uh
00:10:40
yeah and also know much more might mention a less a problem uh one uh our proxy musician or thing
00:10:49
uh_huh the last thing i want to talk is uh as worship that i will do news
00:10:54
but that that want to uh i don't i wanted to buttons uh it's been that mentioning school
00:11:01
unfortunately i wasn't able to attend a little machine and then school because i have to
00:11:07
can accident imported mike my foot and broken glass
00:11:11
it happen and probably i will try to go next year uh i
00:11:16
i will try and as second once i want
00:11:19
to um to do a supplement in antwerp university
00:11:24
uh to learn more about therapy and oh therapist do with children what kind of what kind of exercise
00:11:30
do in order to know what they want from a word because i suppose to give tools to them
00:11:35
um this is normally in november december this year and next year
00:11:39
i want to go to therapy boxing uniting kingdom in order to
00:11:43
bring my knowledge on should win automatic speech recognition but in in jewish trio context
00:11:50
thank you for your attention

Share this talk: 


Conference Program

ESR03 : Interpretable speech pathology detection
Julian Fritsch
Sept. 4, 2019 · 2:30 p.m.
160 views
ESR09 : Clinical relevance of intelligibility mesures
Pommée Timothy
Sept. 4, 2019 · 4:49 p.m.
Big Data with Health Data
Sébastien Déjean
Sept. 5, 2019 · 9:20 a.m.
ESR11 : First year review
Bence Halpern
Sept. 5, 2019 · 11:20 a.m.

Recommended talks

eTherapy
Elmar Nöth, Erlangen-Nürnberg
Sept. 25, 2018 · 11:45 a.m.
128 views