Player is loading...

Embed

Embed code

Transcriptions

Note: this content has been automatically generated.
00:00:01
for for summer
00:00:06
i'm not
00:00:11
one uh_huh
00:00:14
yeah
00:00:22
oh
00:00:26
uh_huh
00:00:43
right
00:00:47
yeah
00:00:56
oh
00:01:01
oh okay
00:01:13
oh
00:01:15
oh
00:01:21
yeah
00:01:27
oh
00:01:29
ah
00:01:30
yeah
00:01:32
oh
00:01:33
oh
00:01:37
uh_huh
00:01:39
yeah
00:01:41
yeah
00:01:45
her
00:01:48
or
00:01:50
thanks thanks for joining me for this um talk this is very
00:01:56
informal um please stop me if you have any questions
00:02:01
very much an overview of what we've been doing um so they all
00:02:06
lots lots and lots of references but i'm really going to
00:02:10
try to give you a picture of what everything doing in south africa i'm going to talk a little bit about a group
00:02:17
oh when we on what we do about the work that we've done in south africa and south african
00:02:22
languages um some of the challenges we faced and what we've been doing to get all the resources
00:02:28
and then just touch on the web that we've been doing on the babel project we and a resource means something different to us
00:02:35
um number of o. is given here i think in the first
00:02:40
yeah it was something like a hundred thousand that was considered
00:02:43
under resources well we sort of think of um ten hours a sort of time we've got something we can work for
00:02:49
so um that's really a and ah and i mourn is what i'd like to talk about
00:02:58
surface of will have full given that we here i should share you way
00:03:02
we or so um i think we know how probably way around they
00:03:10
and ah university is subtracted here we cease read
00:03:16
thoughts because they'll three campuses um and
00:03:20
it really is the multilingual university as well also given in three different languages
00:03:25
um and that's dots cost about two hundred kilometres apart so it's not
00:03:30
the very um no clash campers and we even for the
00:03:35
oh research lab sits here which is a thousand four hundred kilometres from and nearest
00:03:40
campus uh oh you kilometres models when no none of multiple of um
00:03:47
so a to a very small town call them on it which i'm somewhat yeah if it if it um
00:03:55
which happens to be a really quiet place to do really focus research and we found that
00:04:02
i'll 'cause i'm right this appreciate that and um we've
00:04:08
ya we really welcome visitors to our shorts
00:04:11
i'm waiting it's cold in winter here in the northern hemisphere we all signing so um
00:04:19
oh we do you um we did you really invite collaboration
00:04:23
it's only a small group how group consists of um five
00:04:27
senior researchers and in student to come and go
00:04:31
um what about so focused um all the senior research is all at the bases lebanon mon
00:04:36
is students all of it by sometimes come and visit us for awhile off um
00:04:42
and
00:04:44
this really is ah focus to work on speech technology for the least missiles
00:04:50
and as um uh if some of you have mentioned earlier south
00:04:54
africa release multilingual they are eleven official languages officially recognise
00:04:59
um i one students who could speak full eleven at varying levels
00:05:04
of proficiency but most people's speaks to format which is um
00:05:10
so the
00:05:13
at amounting to chant happening and the amount and language
00:05:16
mixing is just um phenomenal um um group works
00:05:21
from basic research through technology development topic actions i'll show you how that will fit together but like it but in
00:05:27
the small uh um field offer 'cause really when we
00:05:30
get applications and is on speech recognition and um
00:05:35
a basic research is all very much on pattern recognition and speech processing we try to
00:05:40
provide an environment way these young people pause through really welcome projects with ours
00:05:45
learn from i'm working on a project and linking up with
00:05:50
international research is we really see that as part of our role as you see there's a long way to go
00:05:55
from when we or two when most of the speech research yeah and that really is happening in the mall
00:06:02
um
00:06:05
and then for senior researches and we try to provide environment
00:06:10
with distractions i mean it's minimise with on committee meetings
00:06:13
people can really focus on the research and um that is why
00:06:17
it's a very nice environment for someone taking a sabbatical
00:06:23
okay so how does it all fit together on epic action side um
00:06:28
most of all we're currently is actually focused around this project which is a um speech transcription
00:06:34
project for parliament which is similar to something that you've been doing here as well um
00:06:39
some time ago already um we sponsored by government to create this speech transcription
00:06:44
platform utterance final stages we doing uh um we didn't uh uh first
00:06:50
fool and usability paste with this speed with the
00:06:53
um parliamentary transcription unit this july um
00:06:58
and in a way that
00:07:01
it takes in many of the other bits and pieces that b. g. m. guys into that platform
00:07:09
a big part of our work as really being using techniques that excess technologies
00:07:14
that exist in creating the resources to make it working on parliament
00:07:18
so the loads of um uh databases incorporate that we've collected all of these
00:07:24
oh my it available as open content free of charge if you
00:07:28
able when he is any of this it's available it they
00:07:31
um these all specialist pronunciation dictionaries playing with what
00:07:36
happens if people say pronounce it proper names
00:07:39
across languages i'm i'm the lube adversaries to sit in on what happens to those plots rations um
00:07:46
this one was a real direct requires corpus laws was really we we started when we did that in two
00:07:52
thousand and seven i can nine they really wouldn't know the resources to get started with in these
00:07:59
they with few proprietor create um resources but if you want to to download something and
00:08:03
get going takes dictionary speech just wasn't right so we both initial pain our
00:08:10
um i hope right all the languages and also bootstrap the
00:08:14
pronunciation lexicons got to fix corporate corporate to get going
00:08:18
and then like to be extended that to proper co prof that really
00:08:22
can now uh support application set those all sixty two hundred hours
00:08:26
um this was a telephony channel this is nice for and um you can do a lot with that
00:08:35
um so a lot of all speech technology development
00:08:40
is also supportive off the um resource collection
00:08:44
we cracked at the number of corpus election ballot action to law show you some of those
00:08:49
uh but like in the presentation um practical tools to assist you in creating corpora
00:08:57
on this speech prices inside if we using stuff that's available all of the the one thing that's difficult
00:09:03
full fundamentals languages is pronunciation modelling so much of our workers focus on that as well
00:09:09
how to do go to shows all caffeine weak systems what you do non result environments multilingual proper nouns
00:09:15
oh that and some of those just to get a feel for the top of with that we can do
00:09:22
and they now mine projects at the moment um this speech transcription platform
00:09:27
on mention that be full babble this one that's finished but
00:09:31
um as a setup touch on that and then we've just i'm
00:09:35
finished completing the beatles collection when you've multilingual voice which is
00:09:39
which is a new thing we have the interest the spectrum back to base here and that will also be released free
00:09:44
of charge all the was open content all the data for back so what we have they all different voices
00:09:52
producing phonetically balanced sentences different languages that all all somewhat compact
00:09:58
so it's a very interesting um resource to work with especially feuding permitted t. v.
00:10:04
eight and you want to create different reactions of voice speaking different languages
00:10:13
okay so back to be off um
00:10:19
on what we do you first of all i'm making to do you shoot their languages of south africa
00:10:26
that was all the um official ones um all spoken with
00:10:31
these officially recognised that's the number of speakers the
00:10:36
um wallace home line which is very difficult to identify what um language
00:10:40
is in many instances and fees show the families that come from
00:10:45
so um basically right at the top is about twelve million speakers and maybe lay here at the bottom and one
00:10:51
um one may not want 'em and you'll see that english lot it's widely spoken really is
00:10:58
not them the most um prominent language when it comes to i'm i'm i'm language because
00:11:05
most of the land which is all from the southern bounty family um basically
00:11:10
splitting up into these two groups ingredients it it's one with some other
00:11:14
different ones i'm in between and in english afrikaans that stands out as the t. germanic languages
00:11:21
there's also means there's a lot of work that oh interesting analysis with regard to language families and how we
00:11:27
can then across that that is possible to increase the guy with um with that specific language make
00:11:37
so you know intentional t. corpus the big one that we can make that we basically use more
00:11:43
fun by solve this thing so people would output smart phones that these uh small that
00:11:49
that day you up front is provided the piston records it it gets cold
00:11:53
it's stored and i'm from that week we hiked up corpora and
00:12:00
it works it's cheap it's easy way to click tied to it works really well um
00:12:07
so what can go wrong it in if you've done actual reasons good resource collection out
00:12:13
in the field no is that these are some of the technical problems that happen
00:12:18
people um mispronounced the word very repeat what i have to say i'm tired
00:12:24
i start to um by an like i'm doing at the moment
00:12:28
and wayne one that we that was a bit more unexpected to us was that
00:12:35
often people really struggled reading these problems so in me off the languages
00:12:40
people really are not he's doing reading aloud in their own language
00:12:46
i'm reading is you don't english speaking is done in a um in a your
00:12:52
lambs that you really comfortable with that reading aloud is not um room
00:12:58
as common so we you know uh you know like two with work we actually um
00:13:07
when we created a problem with we use 'em children's books to get the problems from to
00:13:12
make it more easily readable it what it sees um improve the quality of the director
00:13:19
so the other problems the normal ones background speech freeway lots
00:13:22
of cows chickens and got this one was um
00:13:27
i read a lot of problem
00:13:30
um and bane
00:13:32
especially when the things got a bit see long people rushing to finish so ah there's one example i wanted
00:13:40
to play which is an error that is not one of the ones that i've just mentioned um
00:13:46
the
00:13:48
student this um i have to say that phrase
00:13:53
and i'll play you
00:13:56
what got recorded
00:14:11
this can now and stuff
00:14:19
i think it keeps going on for a long time
00:14:27
so some air is expected some unexpected but in the end
00:14:33
it's important to have quick and simple tools that you can use the preferred way out in the field wall he in your small village
00:14:39
collecting your data set with that um you know that the debt isn't good enough quality that you don't have to go back
00:14:47
now um
00:14:51
mainly confidence scoring algorithms politicking aldrin sixes
00:14:57
many of the posterior brice would just your best algorithms really have some reliance on a language
00:15:03
model that it's very difficult to get to where if you language models really small
00:15:08
so um that's why we win for fun by scoring
00:15:13
um the basic idea is very standard lots of people do that you compare your phone alignment
00:15:18
into find up offices you see whether they match and at some other threshold you okay
00:15:23
um at the p. p. algorithm that we've been using um add if you
00:15:29
know wants us to that that might be really really um effective
00:15:35
we simplified often sick so that you don't have all this issue with the stones in
00:15:39
africa it's that sometimes match and some comes done um we have a um
00:15:44
the normal free funny code that we use when we cracked up the string
00:15:49
but for the reference string we created a very specialised garbage model show that on the next slide
00:15:54
we try not scoring matrix to estimate there is we make sure
00:15:59
that we're just for that i'm seeing target that you cannot
00:16:03
so easily i'm gay directly from the mike tricks we normalised by the number of fines unequal that b. p.
00:16:10
i'm evaluating that uh against a number of other whipped based um
00:16:16
algorithms cables very useful results but evaluating is not a simple uh this the
00:16:21
garbage model for such a chair that if it's so it's the
00:16:26
one of these is very similar to the eyes to guys it's be model
00:16:29
except that it lost in this case you absorb people speech yeah
00:16:34
set in our action game yeah view movable that's a beginnings that is that
00:16:40
your normal i'm making states but then you really allowed to jam that
00:16:45
we can go through it you don't have to go through it you can jump across this whole thing so that
00:16:52
you really when you aligning you trying to find those pieces of speech at a
00:16:56
useful and you try to jump of anything else that is so um
00:17:04
getting back to the evaluation protocol if you always think i to what you want
00:17:10
typically you really want to
00:17:14
except everything that's great and reject everything that's wrong with it
00:17:19
it's so almost matched or wrong little bad audio deleted with that goes
00:17:24
that comes in that goes out and not your typical evaluation protocol
00:17:29
you could out it would be much more lenient and so i well exact match you want to do these things
00:17:35
you really do not want some of these might be useful some of these might not see really done okay
00:17:41
or you could have the session evaluation protocol which is actually the one that we've been aiming
00:17:47
fall we visa really old acceptable because everything is useful and that really all gets rejected
00:17:55
and using the p. d. p. schools um this is an example of the date date curve that we could get um
00:18:03
i just wanted to show you the difference is that's part of the whole fun bystander programs calls the p. d. p. schools
00:18:09
is that what difference it makes to try new all time scoring metric
00:18:14
on your training day so all the training happens on your training back to be
00:18:17
trying to clean your training director so so does see lines all the um
00:18:24
using just a normal flak mike tricks you just scoring like even stand distance yeah you know using
00:18:28
a train my tricks any see how much nicer the m. b. t. your um perform
00:18:36
to cancel scheme um some of the other languages really had a lot of background noise all problems in
00:18:42
pronunciation um he is the delay and you see that
00:18:47
you do not get the same quality um
00:18:52
oh when you applied to to take me
00:18:56
but that's allowed us to really take all our director score everything
00:19:00
and select the actually two hundred hours that are really
00:19:05
considered to be um consistent according to the schools that so this is
00:19:10
what our final corpus looked like what we actually did is we
00:19:14
package the really keen factor as one sick one opens but think i
00:19:19
did these additional pots it depends what you wanna work with
00:19:23
if you want something really clean patch wanna work with that if you want more of these here is actually interested in that
00:19:29
um the other visions of a cool place right um almost with were uh schools so that you know what you think
00:19:42
okay any questions here which i keep going
00:19:50
star some thoughts on pronunciation modelling
00:19:56
uh you've all seen pictures like fifth way to try and see
00:20:00
the difference between when you building a system this one
00:20:06
i think this was forty how now so this is where we go from zero to ten twenty thirty forty hours
00:20:13
what happens if you know um systems are that purely graph we make the was performing one the rate line
00:20:20
um uh grafting t. phoneme convert to some women make
00:20:25
and the neural phone based results right at the bottom the base that you
00:20:29
could do with this corpus um so in one of us that is
00:20:35
i took this picture and really try to understand what's
00:20:38
happening inside the based on um with categories
00:20:43
so you see if you look at the tops of the
00:20:48
categories of words that all observed in the training vector
00:20:53
you really generic would actually did pretty well that's who we're right on that's a very small would you like
00:20:59
but this is not for the caffeine existence as just a graph in existence a you understand why
00:21:05
spell out characters really it's not more that there's not much that that everything
00:21:10
system can do with that i'm acronyms perform poorly um foreign with it
00:21:16
i really do not do well proper names the system struggle spelling errors
00:21:21
um these become more we usable in a graph unique system but still
00:21:26
it's a normal standard with the system that's really really well so
00:21:32
actually if you are this more than one category applies this means it might have been uh
00:21:38
and then with the team foreign or proper name that was spelled out something
00:21:41
like that so it was just it was difficult to classify um
00:21:46
now this was just the caffeine existed not if you
00:21:50
can pay the three different systems you see that
00:21:55
great green means it does the based on that specific can't agree yellow in between read
00:22:01
where is not i can see what's really happening so y'all caffeine by system um
00:22:10
actually does pretty okay on a um many of the
00:22:14
categories but fell spectacularly on some of the others
00:22:19
way y'all call dictionary um does pay to even
00:22:24
on some of the strange ones the and you could be with we can i'm
00:22:28
gonna come back to that because that seems like an unexpected result that um
00:22:34
i just do it so i gave a graph even iced tea to p.
00:22:39
e. and the goal dick that's really every single we're getting to
00:22:44
the g. d. p. one was trying on generic with the plot but not so i'm not many anything
00:22:51
so what is happening yeah some things on lace
00:22:55
something's really just of very difficult to recognise
00:23:01
spelled out characters are really um short is like in h. d. um
00:23:07
and some things are really easy to recognise
00:23:12
so what we did in this would work with the state is to really um identify these
00:23:18
categories pride to system development and regular regular ice the spelling and pretty back into the
00:23:24
graphic system so it'll still have a direct linux system but all the strange wits between spells
00:23:29
say just fix the spinning um because you really only trying to fix these thing
00:23:35
i'm not these ones that are really doing pretty well
00:23:42
and they are some examples um
00:23:46
hall which could be really spelled back into the graph unique system up a language that we've been
00:23:52
working for in this case um uh that actually looks to me like a a english one
00:24:00
so this system we try no we tried and tested different
00:24:05
techniques but in in the simple joy in sequence model
00:24:08
and work very well we just use the second or the one because we did not want
00:24:13
to catch all the detail would just want to to capture the broad patterns and um
00:24:20
chimes these strange wit into something that a system could work with that um
00:24:27
here is what happens to our with your right if we
00:24:31
actually translate right one category at the time that was the graph unique system the raid one to start with
00:24:37
when we just don't let right foreign whether it's proper names spelled with an e. everything that we can
00:24:44
think of to johnson tried that was the um that within how close we got to the phonemic system which is
00:24:51
the last place if it been building anti phonemic system um
00:24:55
you know which which identity problems you know what
00:24:59
spelled character looks like you know the problem and looks like in the big identified in foreign with
00:25:05
foreign which you know like to but we also use chase him buys language identification
00:25:10
full would identification we found works really really well if you've got would list obviously that's that
00:25:16
that game for these languages we often do not even have proper which lists um
00:25:23
some other languages like fillies if you'd native so these with all these all words that just keep on
00:25:30
um you just keep on adding to the cadbury if you hold the stakes the cabbages keep some graphic
00:25:36
so this is something that we've been doing assistance for all the
00:25:40
great female find the strange with translate right payment you ask
00:25:44
almost as good as i'm getting the whole five by system yes
00:25:54
um that one was the way this stick it it's the written this i'm well the spelled out
00:26:02
where it lays there written the signs followed with something like b. b. c. um
00:26:07
that acronym is something like oh i'm thinking also that within one but at us as a be
00:26:12
inside that you don't side is at stuff as ideas i you save as single way
00:26:17
so it's also capitalised it's also it looks the same as a spell out
00:26:22
with but it's pronounced as a wood instead of i spelled out late
00:26:27
yeah well enough yet
00:26:35
so
00:26:40
it's
00:26:43
it's
00:26:47
there was
00:26:53
um
00:26:56
yes
00:26:57
it says
00:27:01
itself out enough to last often that if
00:27:05
i'll just say he's okay with me
00:27:10
work
00:27:11
so but the the the categories that you brought to be handled outside a good thing
00:27:17
i would actually
00:27:19
um thinks they
00:27:22
i would expect you to he
00:27:24
to be good on acronyms and that and spell that were
00:27:30
unless it's a lexus bottled was really all um is it would
00:27:34
it would you put it it hadn't been identified as well
00:27:40
so that's like b. b. c. but thirty to be with try to the book
00:27:44
from
00:27:46
um
00:27:48
okay
00:27:51
so another thing that is also with i'm noticing here is that on that i'm not in it
00:28:00
that is really not much difference a man's these and on foreign words as well um
00:28:09
with different systems do they really
00:28:13
there is also not as different as you would have expected you really expect and buckled it to be really excellent
00:28:21
so one of the things we look at at some proper names is really
00:28:25
what happens when people or pronouncing far nines pronouncing words that i'm mike
00:28:34
you the the speaker might not be exactly shot pronounce there's all each language group might have
00:28:40
it on whether or not it's uh not yeah the goal dick that's is the
00:28:43
english pronunciation of this weird if there is but actually with in language groups are completely
00:28:49
different pronunciations that at the can standardise that it's not included in our culture
00:28:56
so what we looked at this was it um an
00:29:00
evaluation of uh a lot of proper nines specifically
00:29:05
pretty used um by speakers off topic on english data
00:29:09
and zillions of nineteen is that all afrikaans
00:29:13
in these citizens it so the speakers or in the um the
00:29:18
the languages all the rows and the columns honesty is
00:29:23
i'm a business are it's because all the rose their names all the comments yeah
00:29:30
so what we doing his thing seeing an english this and would
00:29:36
pronouncing english name cindy i percent of the time
00:29:40
i got a great with great isn't now
00:29:44
also something that we've established from the data back so what we did is
00:29:48
we look at with inland which groups of utterances so that it had
00:29:54
there's an speaking aches pronouncing in nine speaking x. we're looking at what those um
00:30:02
what although sometimes sounds like we think it's um phonetically transcribed
00:30:07
when we say what is the most common pronunciation that comes from this satan we say okay that's the onset
00:30:13
this deficits sees but isn't exciting this would pronounces it like that
00:30:18
so this what separate means something very specific in this
00:30:25
um in this type of it means that i think that that would
00:30:29
then from the data how close the speakers normally i'm gay too
00:30:36
um that single pronunciation ethnic select and with in a language is really high um
00:30:44
some of the um this city differences is because um to do as
00:30:49
it as it involves system with its high and low um
00:30:52
visions of and the end of which people sometimes produce and
00:30:58
sometimes not sometimes like it transcribed correctly and sometimes not
00:31:03
so that is um mostly the reason for this discrepancy
00:31:07
now oh what happens is people who um
00:31:14
um it's speak of a construct decide english with the id much will
00:31:19
put uh especially the traps i see to get even with an
00:31:24
english people cannot pronounce it with what's in the in this database
00:31:27
so you get a feel for how much lately um people
00:31:32
approximately two pronunciation the one that image looking here
00:31:37
now gets measured by speakers of other languages
00:31:41
what was interesting when we say okay let's forget about what is great according to the
00:31:47
um first language speakers this to see how consistent all pronouncing sashes within the second
00:31:53
language group we saw that in all cases people that a lot beta
00:31:58
so here they was much more consistency in how afrikaans speakers
00:32:04
pronounces it says see two nouns then in the ability to
00:32:07
approximate the true which just sees you can really
00:32:13
base this information in here fact is also valuable even though it's
00:32:19
not great in those scenes of what our once people
00:32:24
so um basically just the i'm not gonna go through the detail
00:32:29
of this but the whole debate is that even though um
00:32:34
the getting
00:32:37
the speaker specific pronunciation um all but
00:32:43
it's very it's very close to this one if it is the the sign speaker out a the um
00:32:51
if you're trying for should g. t. p. system not try to
00:32:55
to pieces that um to try and approximate that ideal pronunciation
00:33:00
doesn't do that well but it actually does a lot paper with a proxy when we may shit i'll speak is against this
00:33:10
um you could be pronunciation so the g. d. p. is really
00:33:16
predicting some of the here is that it's this because i'm making crossing
00:33:22
which brought us to the concept of meeting that language abysmal were coming on that
00:33:26
that really when people speak is the language rice he is a language that would be spoken
00:33:32
the origin of that with that they speaking and it's the language that they thinking of when i speak like
00:33:39
trying to produce an english we're trying to use to see to it that often
00:33:43
find decent and i'm in the g. t. p. system that the mascot
00:33:49
so we now marching those three as parameters when we do a g. t. p.
00:33:53
um would like to speak a language untainted language and trying to pull apart these different factors
00:34:01
so that's been a way to get you
00:34:05
all these ideal pronunciations that we've been can translate right
00:34:10
into opera phoenix system in order to deal with
00:34:14
pretty complex phenomena
00:34:18
okay so this is very short i just wanted to as an aside on code switching um
00:34:27
wanted to give you a feel for just a big the problem set we had a um
00:34:32
i did student of mine really look into uh uh the type of data that usually
00:34:38
available over the radio and um we started a regular broadcast coopers
00:34:44
we we really just try to feel free could lead that's 'cause
00:34:48
switching happened now the first thing to notice is that
00:34:53
it's often difficult to say what is in english um days uh this obviously this continuum
00:35:00
between really just using english lit inside you'll see the d. and c. intends
00:35:06
to um which is not usually used to just pop it in to something that is things
00:35:11
become support all the languages rely on it to something that as being shy range um
00:35:19
so that it can attest change is actually committed the so
00:35:23
here are some examples these may not english with that
00:35:27
that that i have been um that become part of the
00:35:31
language using some of these um um modifies now
00:35:37
that was modifies can be used with was anyway so you have all the
00:35:41
seine modified with which also once identifies age and she bought of it
00:35:47
in uh english fashion and part of it as a this is actually separated
00:35:53
but that was actually a smaller part i do have uh and uh uh i
00:35:58
went show it now but it is the analysis of often does happen um
00:36:03
but most of comes when we fit in with these guys waiting is real
00:36:08
english but this happens in between enough to just be in a sense
00:36:12
so this was really the frequency of code switching we so um
00:36:18
these with different roles on the different um news brought cost
00:36:23
and wayne act is performed in actual sit they replied on this and the buckets four percent
00:36:29
um uh oh that was the you know that was the time right sure the um
00:36:34
case when i find in forty percent of the time that was english and
00:36:38
um and we say it was english if fries will be added definition of a
00:36:43
fright contained in english with it and the number of those foreign with our
00:36:51
same thins is this like here so um this is up to twenty with direct a
00:36:57
very small number in pain and pinky between five and ten it becomes very significant
00:37:03
yeah i'm easy that hard to think of the fact that up to two hundred and fifty sentences with that
00:37:10
just with right
00:37:14
so that's a lot of code switching that need to be model um
00:37:19
one of the things we soul though when we tried to actually and allies there's um
00:37:26
we owe the the the study that they didn't how their english dolls really op
00:37:31
analysis in this environment specifically look at all the funds but the the balls
00:37:36
really all the interesting ones um and then we had a transcribers mock is this
00:37:42
that we'll finish fun is this one of this that the fans and
00:37:46
i'll into transcriber agreement as to what people didn't agree and actually when we
00:37:52
looked at this i'm not sure if this is as possible um
00:37:57
what i've plotted here is for some of the um samples i of
00:38:02
the uh if one any t. the two formants and what the
00:38:05
tag unfold so the water tag uh was it yet p. based
00:38:08
one um so it's like to direct show off the um
00:38:13
of the target fine um if we really think it's an all all
00:38:19
the um what this this fund this is for the tree um
00:38:26
and you'll see that on the late side this the manual pack on the right
00:38:30
side is the um what'd packets version it all here is here with
00:38:35
a really agree they eyes very clearly around here that's very simple but in
00:38:41
the applies is actually with the fish well what a around here that
00:38:46
um it much more difficult to identify and here but even that was become um
00:38:54
overlapping and yes some of them as well so they owe us some patterns
00:39:02
and it's something you can he's that it's not clear that thing and um
00:39:07
with that we trust the guy go with really trust individuals
00:39:11
difficult decide it's not um it's not clearly more problem
00:39:17
but using this with it um i play around with years but i just included this lot because of um
00:39:23
of days and he says i'm not going to go into the detail but we did something that is
00:39:29
somewhat similar to trying to um incorporate acoustic info into the into the prices but
00:39:35
in a in a much laptop fashion then your very nice framework so um
00:39:41
basically we modified the lexicons to you classes infer from that that k. u. e.
00:39:46
that was it i to you again do the forced alignment audio a hypothesis
00:39:51
seconds what's supposed to be you see what there's nothing candidates off
00:39:54
if you have those might be candidates you just push them all in as possible the we and we see what the tag things which one is space
00:40:01
and from that you can really generate additional variant and we did
00:40:06
get some improvements that right and this is with its accuracy
00:40:11
sees a it's it's very very like yeah and because it's for subset of these would sweep is really the problem um
00:40:19
and using these additional variance we could get some improvement but is
00:40:23
meetings with them so it might be interesting to see what
00:40:27
we can combine from i'll make it with the stuff that you've been doing
00:40:34
okay
00:40:37
and um
00:40:42
so last three um
00:40:47
back to babble so this i am for those of you
00:40:53
who is not familiar with the um was a um
00:40:58
fallen off your project i'm sponsored by your part we the that was run it's a
00:41:03
challenge we really try to solve the arms but it didn't detection paul's fundamentals land
00:41:09
we were part of the babylon team and all group
00:41:12
that we screwed really focused on um the pronunciation
00:41:18
trying to see what we can do with limited resources in this environment
00:41:22
or just to put it it's i'm good but more context
00:41:26
these with if you have four languages each year we got this bunch
00:41:30
of languages and started fall in began five and going six
00:41:34
every time at the end we had one surprise language we we had trouble the system with and in the eighteen
00:41:39
that whole system from when you get the died it'll you submit your results i'd be done within a week
00:41:45
sarah they thought of the focus wasn't eating it needs that are language independent that
00:41:50
you pay before and you need to be sent it back to you um
00:41:54
see what happens now this is these two osteoporosis light so i've not added this tight side
00:42:00
action because he's supposed to see meeting so at that like the on still um
00:42:06
back at least on mine so this is just for babylon for the whole team um the
00:42:12
program pocket initially in the projects started about programme talk it with a
00:42:16
good at a point three that's the actual thumb with a value
00:42:20
for that is that's not unfamiliar with spoken ten detection is really convoluted way to get in
00:42:26
how well off button to detection works and but just think of it as a value between zero one one is
00:42:32
um so when the program started it was set it up with three and then off to
00:42:37
you won the results were so good that grey um up it's up and sixes okay
00:42:42
and in the final here we could get back on all the systems and makes it it go direct with um really tricky
00:42:50
so um that's the context what i really wanted to show he because this awful was quite interesting
00:42:57
from the beginning of the program um many different um speech
00:43:02
recognition techniques were developed and fees with the ones
00:43:05
we um the babylon team got its gainesville so from the real by sly on this one um
00:43:14
adding each of these added
00:43:17
um and that those points within the city just on the other one um
00:43:23
and the youth and different techniques with what company which
00:43:30
means that the will of one wasn't was out
00:43:33
one of that haven't not that they a lotta other techniques that are not qantas flight because
00:43:37
i didn't i wouldn't um the ones we not to tie these things really worked um
00:43:44
those are the um bottleneck features this is me i dean ins got into the picture
00:43:49
um these will lots of different um keyword spotting techniques from
00:43:53
b. b. in very spatial arts and that open passion
00:43:57
so um you would've seen this before as well we take your whole database you you've changed in some way
00:44:04
um in this case um the speed was changed see you had it typed copy
00:44:09
of the whole thing of its level but force that we add noise
00:44:12
the double back the um is in a lot of different um
00:44:19
there are lots of different coding format in its its meaning me see lots of different um background sound
00:44:24
it's a silent portions from the corpus was taken from different call prof and
00:44:28
it to others and in all of this run into the pot
00:44:32
um the multilingual bottleneck features really muddy huge difference 'cause this was now getting to the
00:44:37
end of the prices with lace wins um that we could find in places
00:44:42
so that is we just try not only data to detect the only different languages in many languages
00:44:48
as you have i think oft about eleven twelve when started dropping um
00:44:54
but twelve twelve either of those languages to try to bottleneck features out your plight
00:44:59
so um that but gives the when and makes it very quick once you start to try on
00:45:06
both your system you have your features already because that's inspect whatever to try another one
00:45:12
this was weight factor um
00:45:15
and that was a whip automatically where is that went out got lots of data to try and improve the language model
00:45:22
um joint alignment decoding when you actually work on the hypothesis label when you combine will stop the
00:45:27
two common action at um by combining mysterious rather than systems um i mean stop word models
00:45:36
that caught up to ten point four out of a cat
00:45:43
so that's with the different gangs come from i'm
00:45:47
not going to talk about any of this
00:45:51
say one interesting one from their soap with models that
00:45:54
we worked on which is um automatics identification
00:45:59
which is a very simple technique and actually make quite a difference to the system
00:46:06
so um basically um we were responsible to generate some of the
00:46:12
as good at l. t. p. maps with lots of complicated
00:46:16
a pronunciation modelling and make very little difference you could get a hold point here
00:46:21
point they but um in the in the the wins with very very slim um
00:46:30
oh well does it so that discussion was quite a simple prices it was based on assigning consonants and
00:46:36
vowels to all the words and images that thing into syllables based on uh automatic detection of
00:46:42
the class the consonants by starting from what's available beginning what about it looking
00:46:46
been really much of consonants or you could split it in different ways
00:46:50
also while we so um this looks good but we want to
00:46:54
pull 'em syllables what it more units to work with
00:46:58
so we thought it just we classify not twelve this constant is just to get the good chance so this is a very
00:47:05
brian dade out with that and that's why i wanted to show the results because that quite interesting
00:47:12
so um the top one is two different reactions of the algorithm
00:47:18
doesn't make much difference but we here we start um
00:47:22
increasing the syllable they announced stanley ah performance um and this is
00:47:27
all but in and out of the cadre keeps increasing
00:47:31
and if you look at the uh i'm changing the number of syllables you see that
00:47:36
every time we increase these are the sign petitions we know increasing the syllable linking ace
00:47:41
all syllables of getting one mole your syllables is the right she's given between the syllables and the number of words
00:47:47
in the vocabulary now um you see me getting to the whole with system you know why we finding a
00:47:55
somewhat in a good way to get a whole with system and if you look at the result
00:48:00
any see the um hold it system that's the pink one
00:48:05
these are a lot of the different languages that's the based um how would system for in the cadbury
00:48:10
expect that could be the base you know um there's
00:48:14
there's five approaches that um in many cases
00:48:22
i'm out of the cadbury
00:48:26
it does even beat
00:48:28
these ones on now the one that ms that's suitable but it's
00:48:35
so um that's still not the most interesting result because these all
00:48:39
somewhat comparable only to combine everything it does like that
00:48:42
the um what mall themes these can fry mighty that
00:48:47
were uh i'm sure we'll for change right
00:48:50
um how's with the syllables and the combination with canned beans system
00:48:57
so yes actually the interesting part um while i see this all really um
00:49:04
it approximate elder performance but it still gives you something to work with that's not
00:49:07
always something a bit smaller but more chunky that um i realise useful um
00:49:15
but the interesting thing is that syllables all a lot easier to extract on the last
00:49:20
day to see its way um the last artistic to say many and uh
00:49:24
or maybe that's just a match people very very short time off the youth on the deck that
00:49:28
all the preparation and family whips it you not to syllable the file these results as well
00:49:34
and trying the morphological um prices on a text are which means that um
00:49:41
these all the i mean not cadbury this is out of the cab
00:49:45
really um results from all things and syllable any see that these
00:49:50
i'm out of the cadbury could to to pay pay to but very similar um in
00:49:56
vocabulary it a lot data because you guys the bigger chunks um but the name
00:50:02
when we got to a million weird remote phoenix systems were not it wasn't possible to developing
00:50:09
which means that i'm an nice going with was um in the final system georgian with uh
00:50:16
mm evaluation so that was a surprise but but dropped on it and then in the final
00:50:20
system of syllables really could help because they were just up quick and simple to well
00:50:27
and
00:50:30
i think in conclusion i'm just in the whole field of and the results languages they all
00:50:38
many challenges mini issues still to be resolved many things
00:50:42
that still on working we'll still not understate
00:50:46
um there's some graphical environment really the interesting one we have a very um
00:50:53
interesting language makes integrating seat of resources that's been just thought with you don't
00:50:58
have to go out and we i they're all pro before that
00:51:02
and use these more info if anyone's interest
00:51:08
i think that's a rough ah ha
00:51:24
so yep yeah oh
00:51:31
huh
00:51:41
oh
00:51:43
okay
00:51:45
oh
00:51:49
stuff it was itself with you guys system having thing
00:51:54
the many different support units that the product
00:52:01
yeah
00:52:04
and you would see that if the uh oh i think i mention it they who once the um
00:52:12
once you get him to a fairly lot with corpus it was seven out of the capri words in
00:52:17
you will spur content depiction pacing in in g. you end up with a book every system
00:52:25
uh_huh
00:52:29
oh
00:52:32
'cause that's if that s. f. plus
00:52:40
i think the actual final system um which was pull it apart plummet
00:52:45
being really pretty good it was to null it's probably going to
00:52:54
sure
00:52:56
whoa
00:53:01
still
00:53:04
so it's off
00:53:12
yeah
00:53:19
it's
00:53:21
um so the
00:53:25
there was some lamb specific with that was that that fact provided very very little guy
00:53:31
so the multilingual aspect of the battle project was first of all there
00:53:37
the um the the techniques really should be applicable across
00:53:43
all these languages that i i to be very not choosing to specifically
00:53:47
but the winds from the multilingual using the multilingual
00:53:52
data really kind from the multilingual feature
00:53:55
they will lots of different things that would be if they're especially um
00:53:59
my beauty um but the the big when he yeah uh
00:54:06
they it was really there um multilingual speech
00:54:13
and that's bottleneck is
00:54:19
oh
00:54:24
losses sold with it and it has this um
00:54:30
ah got the says
00:54:35
yeah oh yes
00:54:39
yeah i'll have this be honest and
00:54:44
i'm not sure what we came to me and i remained it has a strange all
00:54:50
all that will serve as transvaal i'm elated to sound relationship that we found difficult
00:54:59
the t. not find the sign
00:55:04
the the big thing with this um a program was that we were not allowed to use the big
00:55:12
so uh they will lexicons i'm right at the start and then they got right now um
00:55:18
and that means that any time they is some um
00:55:24
a description see it is very difficult to fix that
00:55:29
um but on it this specifically i think i'll have to skip that on so
00:55:34
i'm actually not sure and i left to go look mine redid mine out
00:55:44
so if
00:55:52
uh_huh
00:55:54
that the um but it's still um but the standard
00:55:59
is that we stand so once a um
00:56:05
so part of the prices of writing takes resources would be
00:56:10
to their five is um the the spinning of them
00:56:14
says often the official standard spelling might not necessarily be the one
00:56:18
that is used in that the data that we work with
00:56:22
they all for example um some of the language would also have a cross border the we said the
00:56:29
this is city from south african the city from the city would have a slightly different and spellings
00:56:36
and if you just get says it it takes you don't know which one is with these

Share this talk: 


Conference program

Multilingual speech recognition in under-resourced environments
Marelie Davel, North-West University, South Africa
2 June 2017 · 11:05 a.m.

Recommended talks

Adaptation of Neural Network Acoustic Models
Steve Renals, University of Edinburgh, UK
12 May 2016 · 10:35 a.m.
Epoch-based analysis of speech and its applications
Bayya Yegnanarayana, International Institute of IT, Hyderabad, India
1 Sept. 2011 · 3:42 p.m.