Player is loading...

Embed

Embed code

Transcriptions

Note: this content has been automatically generated.
00:00:05
alright complete yep i am at you without a yeah
00:00:11
permanent is that uh at it yet
00:00:14
working in speed and or deposit single
00:00:17
i'm going to present and he activities in or will it take it
00:00:24
let's have a pointed out though group is pretty large group that's what about this book but it yep
00:00:30
we have many people profits the p. c. that is that is for post box what you'd be a and they need
00:00:37
yes the picture it pretty but it will keep evolving so we get all the time you know much more
00:00:45
than coming in going out and it's one of our lunch meeting it happens and every month once
00:00:54
so i do to basically say about what the the set and relevant activities that the idea
00:01:01
it lights that they're the intersection up to be a main uh fills
00:01:06
up is that it is signal processing machine learning and linguistics
00:01:13
and and there's so many is that problems here which we cannot just doesn't inundate innovative but
00:01:19
what the main thing is that this this that's not funded by a letter sent foundation
00:01:25
had the foundation european union higher well this you know sweets
00:01:31
and it also that we also get something that's different like that but that's that's it
00:01:38
the plan
00:01:42
the primary purpose of speech if you ask why we humans have it's
00:01:47
first thing is that we want to communicate with each other
00:01:51
and when you want to communicate with uh we know that the the two
00:01:55
good more of communication it is big and speak or and typed
00:02:01
and in along this line that i took a it is up to survive you want
00:02:05
to go from one more up on occasion one of one of the nation
00:02:09
but it's one it's automatic speech recognition for fun for them or for the written word
00:02:15
and factors bits and this is where you want to go from the data more plausible mode of action
00:02:22
speak on that contains information but and the person identity and sometimes
00:02:28
very interested in finding this identity from the speech signal
00:02:32
for instance suppose you want to use your wife a safe authentication mechanism for your computer
00:02:40
then there are problems with button to speech communication which we
00:02:43
want to automatically asses without having human in the loop
00:02:47
for example if you have a speed mathematical a channel or
00:02:52
well you have a sense of a speech you want classes and tell automatically with that
00:02:57
speed is intelligible you the other with the uh the loans and i'll show
00:03:03
i'm going to the men of the top ten would be if it wasn't the
00:03:09
because if you activities without a a a happening in each um yeah the f. e. e. s. p.
00:03:18
so speech recognition as i said is the task of of converting speech into text
00:03:24
now this problem is very old is not a new one there's
00:03:29
quite some success friendly animals but there are still some challenges
00:03:34
as usual most of the systems have enough love for my dopey languages such as english
00:03:40
french german but there are a lot of them not many languages
00:03:45
four cents for with this is that system doesn't exist
00:03:49
put them let's let's systems unit what some this was this in a lot of data if we go to the doctor who will
00:03:55
a week they us in the city of with the with the the the two bottles of the double double the system
00:04:03
we need a fanatic like
00:04:05
and this kind of two sources that not always available for all the languages
00:04:11
so the languages for wheat there under the source of this one talented yep it's looking
00:04:19
another talent is the very old problem but still um for being pursued is my over speech recognition
00:04:27
and then this is one of the problem with this uh
00:04:30
this protestant speech recognition to the the how you haven't
00:04:35
uh having lot of home devices where you want to visit and the and you want to interact
00:04:42
and uh in this kind of system so we have quite some challenges for instance
00:04:55
i think
00:05:00
for instance yeah as big dogs can be dangerous
00:05:07
so we're used to sell came up clean condition speak the moment
00:05:11
you're speaking in a cool addition the speech will be
00:05:21
oh and in addition to that in the home you may have even my
00:05:30
oh now we are we uh we we we know how to do a lot of things with clean speech
00:05:37
but the moment you go for rubber and a noisy speech you have defeat me you have you have that it has to do that
00:05:45
another light enough it's it is the the um
00:05:51
another line of this that is the um i'm going to words what is for optical
00:05:57
speech most of of systems we have number for native speakers of the language
00:06:03
how fun yeah the system so that will for adults speakers and
00:06:07
we know that the characteristic didn't idols and children like different
00:06:15
so i say oh and then there are cases where news you get your speech production can get in bed into it
00:06:24
so well in this case is the speech recognition systems may not as profound as
00:06:29
good as we want for example and an impaired speaking of sound like this
00:06:37
and if you want to recognise unemployed speed it maybe didn't delete aren't uh_huh uh_huh uh_huh it's an
00:06:43
happen as a speech and sit set in the problems i still an open this uh problems
00:06:50
well some of the use cases of the speech recognition this uh that is being carried out
00:06:55
i plead yeah is one of this is um the several project
00:07:00
it it it in this project the basic idea is that
00:07:04
um that's not of for interaction fallen speech coming
00:07:09
and we want to make it accessible through a this information in this spot
00:07:15
intact and for the speed through a quite a bit company english
00:07:20
this system is pretty good pretty might use by the intelligence committee community yeah so what happens is
00:07:27
you performance is a speech recognition aligned with id them
00:07:31
coming from the thought impact and for the speech
00:07:34
analysis machine translation and you press on some kind of multilingual
00:07:38
semantic analysis and index the data in our product
00:07:43
and then when intelligence community want to where the this database they can give a
00:07:48
korean english it would be underlies information will be the tape and summarised
00:07:54
now the question is that the intelligence community that don't work on just one language
00:07:59
they may want to move to a new language quickly one for
00:08:03
now we have developed a speech recognition system for bangalore
00:08:08
and slimy league tomorrow there'll be new language for the meet the systems have to be rapidly them not to
00:08:16
another use case is the media if we get evolving and it's one
00:08:21
of the biggest bit uh we can see we can call
00:08:27
in this solo project is to what really is that it's a
00:08:30
full chain of accretion recognition translation to summarise additional solar cracking
00:08:37
and uh you can uh and this is done the bemused by a news
00:08:42
organisation like b. b. c. dot chevrolet and a lot when news and
00:08:47
you'll be able to um learn wanna work a building the poster session today and often
00:08:55
another use case we have these um for the case where
00:08:59
we want speech recognition to assist the controllers apathy controllers
00:09:05
there we want the system to be automatically improving
00:09:09
it should uh in you automatically into new lexicons it
00:09:13
can uh automatically modify and the the uh the
00:09:17
search easily so these kind of things we want to do it automatically and here is the
00:09:24
the situation for i don't know if it
00:09:33
because the production of the next couple tomatoes
00:09:37
which is just just for the speech recogniser throw single logo doesn't fall minus
00:09:49
the microphone she she was
00:09:58
so
00:10:09
usually i also borough manager should it just of the control commands which are possible
00:10:14
in the current situation dramatically which is just search place the speech recogniser
00:10:21
this enables fantastic command recognition error rates of listen to do so
00:10:27
but just a little girl with speech recognition is models that many uh a lot
00:10:37
or i wasn't able to this machine learning as a solution
00:10:44
each delegations generates like about going so daily basis go
00:10:54
with the school within the machine the algorithms
00:10:59
it should mortals can be improved day by day
00:11:04
but then you we've got to set with a with you and we should move people being
00:11:09
alone to sector but with higher show with a new or so will uh that
00:11:16
sell this solution is currently being tested in that where puts a plot can be and
00:11:21
uh the basic idea if you can in that you the controller load you can
00:11:26
actually you improve the efficiency of probably output functions whether they
00:11:33
have to thirteen it's um text to speech synthesis
00:11:38
well the assets that the the idea is that everyone who and what the text
00:11:43
to speech and the speech has to be intelligible and natural as much possible
00:11:50
often the systems we can easily them up for one speaker voice maybe well very good quality you
00:11:55
can develop so one of the talented see that how we could weekly about new speakers
00:12:02
second of systems again like speech recognition subsystem supplement leave
00:12:07
all through my dirty languages this certain my
00:12:10
dirty languages such as english or french now you want them to going to new languages
00:12:19
other thing is that um we have in in in in a has human
00:12:25
we have the from m. s. h. and this method is communicated
00:12:29
a plan in the brain and then it come from or to system and
00:12:33
this model system cents an hour impulse to basically control our vocal folds
00:12:39
and the changes the v. a. v. speak or with the v. the b. b. x. plus our souls in effect
00:12:46
know how to include this kind of models into events you'd like
00:12:51
to speak systems this is one of the search yeah
00:12:55
and based on that how we can have lot more system feature more expressive
00:13:00
some of the use cases are announcements dispense dialogue systems and
00:13:05
assisted system for visually impaired that speaking that was
00:13:13
so the other lady of research it's automatic speaker recognition and speaker recognition
00:13:20
we have two kinds of uh problems where we have the speech
00:13:26
and from a database of people i didn't people you wanted them to speak it is
00:13:34
that's another case use case where you have a speech and then you have a claim
00:13:38
on identity and you want to say you don't accept project this identity clean
00:13:44
now this this earth actually uh overlaps also with my mom thinks speculative
00:13:49
by with a group to be working remote collaboration with them
00:13:53
the challenges uh up in this system the the allotment is how
00:13:58
the system can be robust to environment and channel variation
00:14:02
how they can uh we can reduce the amount of data needed to create speaker models
00:14:09
this is tons often when double up on one language um uh using one language data
00:14:15
they may not perform suddenly if you want to put this model to a a speaker to speak the
00:14:21
new language this model may not work so how good the it system that the language independent
00:14:28
and how could make second assistance to gave it a a to be
00:14:32
used as a um a a as a forensic evidence in legally
00:14:38
one application along that line is he a a forensic speaker recognition that we have a often don't
00:14:46
and we have a us um yeah but a speech on the printer and we have a speed uh from the suspect
00:14:53
now the goal is to tell you the same are not on the same person or not
00:14:59
the difficulty lies in the thing is that we don't want an innocent to
00:15:04
be pop punished so we need a reference for so we match
00:15:10
and we can play the score but i don't know that and probably it and that it goes to the often does
00:15:15
speech and we can even the score to make the decision in order to
00:15:20
make sure that the base uh in a sense it's not getting pens
00:15:25
not this kind of system is currently being used by that whole interpol
00:15:30
um probably nearly portugal um law enforcement agency
00:15:36
and the company provided that and
00:15:43
we had the last question is about automatic speech assessment in in this nice
00:15:50
um this is a lesser broad category of problem whitening to speech communication with human in though
00:15:58
for example in speech coding and definition systems one of the main requirement is that when you pass
00:16:04
me to speak important speed that other side the receiver say the speech has to be intelligible
00:16:10
it has to be of good quality and sometime nowadays we it does have its as ah
00:16:16
on the on line services like white with i. p. we can ask questions like quality of service
00:16:22
for example yes a speech but were hundred right break your e. r. artwork
00:16:29
so this is kind of the input that goes into it a telephony system and after the kodak and
00:16:35
but where raw
00:16:39
so we want to know if it's this uh the product that you're
00:16:44
developing it it giving me intelligible speech of a good quality
00:16:49
similarly the problem problem can be coming across language learning where that alone or
00:16:55
and with the exact uh a a a evaluated ben language expert and they can say
00:17:01
the degree of may deem as fluency the pronunciation as this kind
00:17:05
of stuff so you do what you want to evaluate
00:17:09
similarly in clinical mine that's a speech and the technical expert
00:17:14
who want to sometime determine whether it's again it's a normal l. d.
00:17:19
r. pathological a speech and the type of pathology well example
00:17:25
yeah it's a control speech also work undercover huh
00:17:32
for the case men you double up the the uh this that you have
00:17:37
a a value you are not able to control the muscles that
00:17:41
moves the vocal vocal tract and the vocal folds you're not able to control in that condition what happened
00:17:47
this is a a it then that's uh my condition also were hundred
00:17:54
and when it's it's in good condition or blue blue blue blue
00:18:03
so in this case you the the clinician want to tell whether it's
00:18:08
like cross the controller to my is always the c. v. s.
00:18:12
and this case can be coming due to different places but that but
00:18:16
this that second um did to a damn of went up
00:18:19
like this and this it's it can also come to school so sometime you want to know with what was the problem
00:18:26
now having human in the loop has some drawbacks because
00:18:30
it's costly in terms of time and efficiency mining
00:18:35
it need different expertise as they say that fine line really need a
00:18:38
language expert for the clinical said you need a clean collect but
00:18:43
so uh how we can and it's also difficult to reproduce something
00:18:48
so the best solution is to work was automatic production using
00:18:53
um machine learning methods signal processing methods that kind of stuff
00:18:58
i'm in these we face think he talent is
00:19:01
one is how cool them lot model that are sensitive to
00:19:05
specific me depleted in the limited with unlimited data
00:19:09
we'll always have did a problem because they're not uh uh you you for example if
00:19:15
you want to go and get a clinical data on the set that today
00:19:18
there are not many resources available exploiting existing
00:19:23
speech technology components to solve these problems
00:19:27
and human assessment has not availability is and how to handle this variability so these

Share this talk: 


Conference program

Introduction by Hervé Bourlard
BOURLARD, Hervé, Idiap Director, EPFL Full Professor
29 Aug. 2018 · 9:03 a.m.
Presentation of the «Speech & Audio Processing» research group
MAGIMAI DOSS, Mathew, Idiap Senior Researcher
29 Aug. 2018 · 9:22 a.m.
Presentation of the «Robot Learning & Interaction» research group
CALINON, Sylvain, Idiap Senior Researcher
29 Aug. 2018 · 9:43 a.m.
Presentation of the «Machine Learning» research group
FLEURET, François, Idiap Senior Researcher, EPFL Maître d'enseignement et de recherche
29 Aug. 2018 · 10:04 a.m.
Presentation of the «Uncertainty Quantification and Optimal Design» research group
GINSBOURGER, David, Idiap Senior Researcher, Bern Titular Professor
29 Aug. 2018 · 11:05 a.m.
Presentation of the «Perception and Activity Understanding» research group
ODOBEZ, Jean-Marc, Idiap Senior Researcher, EPFL Maître d'enseignement et de recherche
29 Aug. 2018 · 11:24 a.m.
Presentation of the «Computational Bioimaging» research group
LIEBLING, Michael, Idiap Senior Researcher, UC Santa Barbara Adjunct Professor
29 Aug. 2018 · 11:45 a.m.
Presentation of the «Natural Language Understanding» research group
HENDERSON, James, Idiap Senior Researcher
29 Aug. 2018 · 2:03 p.m.
Presentation of the «Biometrics Security and Privacy» research group
MARCEL, Sébastien, Idiap Senior Researcher
29 Aug. 2018 · 2:19 p.m.
Presentation of the «Biosignal Processing» research group
RABELLO DOS ANJOS, André, Idiap Researcher
29 Aug. 2018 · 2:43 p.m.
Presentation of the «Social Computing» research group
GATICA-PEREZ, Daniel, Idiap Senior Researcher, EPFL Adjunct Professor
29 Aug. 2018 · 2:59 p.m.

Recommended talks

Cross lingual speaker adaptation
Phil Garner, Idiap
1 Sept. 2011 · 3:13 p.m.
Adaptation of Neural Network Acoustic Models
Steve Renals, University of Edinburgh, UK
12 May 2016 · 10:35 a.m.