Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:00
you get bragging rights ever run else in the room so
00:00:05
uh_huh
00:00:08
uh_huh
00:00:16
uh_huh
00:00:20
uh_huh
00:00:31
yeah
00:00:33
uh_huh
00:00:39
so it because there's not any backing right answers also back in the right answers quickly as you
00:00:43
can there's this other time aspect it takes into account when you're actually doing the um
00:00:48
putting your answers in it all multiple choice and just enter once we've got around sort of been rolled
00:01:06
uh_huh
00:01:13
uh_huh
00:01:17
i hope i go well
00:01:21
huh
00:01:23
yeah
00:01:25
yeah
00:01:34
uh_huh
00:01:38
uh_huh
00:01:41
okay there are all he wants to be route
00:01:45
i uh i just lost a player oh wait a minute
00:02:07
right
00:02:20
yeah
00:02:22
uh_huh
00:02:25
uh_huh
00:02:52
go mostly around that that right there yeah it always is also to model we're always
00:02:56
the most general model we use the anyway is the source filter model when we talk about
00:03:00
speech production oh modelling says to speech is a series of when the separable filters
00:03:05
we're modelling but it's voiced unvoiced excitation separately much unvoiced excitation
00:03:10
as as opposed to what a low pass filter those
00:03:12
we have a pay thought as other band pass filter then we have a high pass filter for the lips and
00:03:16
that's what essentially zapping the speech production the source filter model the source is always yeah going from the longs
00:03:22
through the vocal folds and the filters everything from the vocal folds through into your lips
00:03:33
uh_huh
00:03:37
mm
00:04:09
yeah so the that's the big problem with rectangular windows that i really start or stop the zero crossing points
00:04:16
they always introduce some sort of discontinuity sums of the high frequency noise with into the windowing operation
00:04:22
that's generally i'll use this sort of idea of tape it windows avarice of the gaussian state
00:04:26
that's really so dampen the so the high frequencies it can come apart through windowing operation itself
00:04:31
i this is the assumption that we do with windowing were always presuming that the actual signal
00:04:36
changes very quickly in turn this page but the feature itself some like page so mike angie changes
00:04:42
a little bit more slowly actually compare it will generate looking at the road either itself
00:04:56
uh_huh
00:05:25
the show to many g. function really has nothing to do with pages showtime energy
00:05:29
pretty much the rest of them some harmonic summation is probably the best way i
00:05:33
talk about autocorrelation function average difference you'd mentioned work on the same my uh
00:05:43
huh
00:05:46
the yeah it's it's no use really in the
00:05:50
you're not gonna get teach directly from a short term energy function is no uh_huh
00:05:58
yeah okay i could whether the question slightly better the the general assumption that was fine i knew this would happen by
00:06:04
the way that you like you some of my ounces so that's fine i'm sort of what also why did it
00:06:16
uh_huh
00:06:45
uh_huh
00:06:48
i try to make that one sound like there were plenty of correct dances essentially that one that also
00:06:52
always with get our eyes measuring page very patients you know are
00:06:55
always meaning century i mean in the g. permutations itself
00:06:59
with based images and they're just reflections as i said uh sort of vocal folds not really opening
00:07:04
and shutting fully we'll i some some sort of formal there was some sort of noise
00:07:07
to do with some so the in coordination or not shutting probably with them is
00:07:11
what we're trying to pick up with so that your engine images themselves
00:07:26
this one people constantly get wrong in my classes in my exams so no pressure
00:07:52
yep cool around out that um i don't know what the dominant peaks in the
00:07:56
bottle spectrum i don't think there is a dominant taken the glottal spectrum
00:08:00
yeah dominant peaks of the vocal tract spectrum are always will formant frequencies so we've got that sort of you know
00:08:05
he points and they really are faked a lot of the linguistics the sound what is
00:08:09
being said the particular sound is reflected in the change of sort of formant frequencies
00:08:30
choices one and the most correct answer
00:08:42
in terms of what i talked about this morning anyway
00:08:45
new techniques i talked about
00:08:50
yep linear predictive coding it's always the way we try to find a vocal track parameters
00:08:55
i mean this vocal track information embedded in mel frequency cepstral coefficients themselves we wouldn't
00:09:00
exactly identified a frequency or the sort of filled up coefficient from it
00:09:05
shorter magnitude spectrum again information is embedded in a it's just probably not the best person
00:09:12
want 'cause we have a lot of the harmonic informational also present when looking at the um shot the magnitude spectrum here
00:09:18
so linear predictive coding and setting the right sort of dimensionality the linear predictive coding as we service or any examples
00:09:24
about twelve or fourteen really so they gave us a nice shape where we started say the
00:09:28
formant frequencies but not really go get too much interaction from the actual harmonics themselves
00:09:39
uh_huh
00:09:42
uh_huh
00:10:10
yep that always uh some of past samples here that we have a past and future samples
00:10:16
in makes no really centre was looking sort of back it's not really that um
00:10:20
no it's a real off i could yeah gets a linear function that it's not really how expressive really
00:10:25
call concept is very approximate the current sample some totally near some of past samples
00:10:30
we put this together we sort of can the that we found when we sort of so try to
00:10:34
sell the system occasions we could've i had the sort of autocorrelation function gives is topics matrix
00:10:40
and then this definitive methods we use to solve the different filter parameters themselves to find
00:10:53
yeah
00:10:55
so the b. main already briefly mention that
00:11:24
so yeah it's a sort of idea of trading off when we're doing those
00:11:28
other shot um spectral analysis and we got this sort of idea
00:11:31
of windowing size and when doing that signal the window parameter controls but
00:11:35
the time frequency at the time resolution and the frequency resolution
00:11:40
so essentially if we choose a very very long window and do the properties of the for a chance on we get very very good
00:11:46
frequency resolution but obviously in choosing a long window within sort of
00:11:50
start to blow temporal resolution of which is very short windows
00:11:53
and how did john i mean we're getting very very good temporal resolution but then we're starting to lose
00:11:58
a frequency information was applying less information into the actual fourier transform algorithm we're getting less resolution there
00:12:04
so the yeah we've always got this trade off it's impossible to get both good time an good frequency resolution
00:12:10
most the time we don't really care about this twenty five millisecond frames overlap by ten does the job
00:12:16
maybe slightly longer frames is said maybe forty second forty millisecond frame of looking at something like pitch extraction
00:12:22
just to sort of separate get a little bit more frequency information that we might do it
00:13:01
yep that's cool runner that right so is exactly the log spectrum within the the filtering with the mel filter bank
00:13:08
we then the car like the data so the uncompress the data using the d. c.
00:13:13
t. car the car later this gives us normally something around so the twelve
00:13:17
the main channels which arcane something like um log energy zeroes coefficient we take the delta uh which are
00:13:23
the delta delta coefficients we get is very nice cool free for all all purpose i guess
00:13:36
he
00:13:41
oh
00:13:43
if if you
00:13:47
that's the semantics and say yes
00:13:52
yeah
00:13:58
yeah i knew that they some others that argument to yeah it's
00:14:06
a rough idea of the general steps though the whole thing was there a member ah
00:14:11
yeah we could argue i have to take it easy day it's the same yes
00:14:17
but the standard to fall
00:14:24
again in terms of the notes i talked about
00:14:56
yeah i don't know something called spectral activation point does exist and i'm sorry if
00:15:00
it does like they did make that i'm not gonna make disk results uh
00:15:04
if it does exist i apologise but that's just the chance that
00:15:07
generally spectral gradients pectoral injuries vector or point of the main
00:15:11
with eleven smile anyway spectral informations that we close sort of collect
00:15:15
and sorta using is sort of collapsing information down in
00:15:25
uh_huh
00:15:54
yep so we just summarising the set of l. l. days so extract was a lot of uh well days from a frame of
00:16:00
h. is really just one one sort of free one feature representation sort of the utterance of a chunk of other ends
00:16:06
so we do this sort of summation using the functional by the time we getting a sort of specified length of time
00:16:12
we're getting one representation to a greater extract the a.'s in the
00:16:15
survey g. maps and the compare formats this afternoon the tutorial
00:16:54
yep so no going back over the average is always of the codebook generation
00:16:59
we always take a look a sub sample about training data we do code book with
00:17:03
that and with essentially doing vector quantisation so we're looking at ah frame level
00:17:07
feature for finding the euclidean distance to the nearest word or the nearest group of
00:17:11
words is sensitive form histogram of sort of occurrences out of these ones themselves
00:17:22
okay
00:17:39
yeah
00:17:47
it has what i talked about it's convolutional neural networks were essentially
00:17:51
lining these different filtering operations in the convolutional neural networks themselves
00:17:55
i imagine you can't very broad either enter a car neural network but i don't imagine it's gonna learn anything particularly useful
00:18:01
when you're doing this in the end when learning what we generally did with the convolutional neural
00:18:05
network to learn the feature representation and then we sort of put a wreck our
00:18:08
neural network at the back of this to sort of catch or any sort of temporal
00:18:12
model any sort of temporal expectations any temporal dependencies within the actual data itself
00:18:53
yep convolutional imac falling liars cope with
00:18:59
what is next pulling do
00:19:32
yeah so i the down sampling several ways down sampling the output of the convolutional lies itself
00:19:37
remapping the output is something we'll probably use a soft max i wouldn't
00:19:41
use a match polling operations actually not something into a probability distribution
00:19:48
okay so switching from pages now into machine i mean
00:19:55
yeah
00:20:06
you could but there's one more correct answer than anything else
00:20:17
their eyes looking up performing abbas prediction men with doing machine learning essentially we'd only just
00:20:22
learn what's in the training set we wanna learn something while we wanna learn
00:20:26
a more generalised will model so that when we build something during speech pathology detection
00:20:30
in the lab we considered in the doctor's surgery somewhere essentially no what
00:20:35
so this idea of sort of right past predictions them is correct yeah i. q. we passenger interest of course
00:20:40
would like to build something emission lenses than that did that the yeah the summer's almost correct answer themselves
00:21:12
but
00:21:22
yep so is always important remember we always training to reduce training areas but testing to reduce generalisation error
00:21:29
is make sure that we've got the most robust most gentle liable model that we possibly can
00:22:04
yep so civilised learning is always learning a model to actually from
00:22:07
labelled data so we can make some sort of production itself
00:22:14
uh_huh
00:22:47
yeah they always again generalisation error is always associated with the case it itself
00:22:52
subset of the training data it's still the training data is still something the models actually
00:22:56
sane really interested in generalisation error is uh some in the models not seen itself
00:23:21
that should be how much
00:23:41
so yeah we're always looking at using it by sears really they to
00:23:46
reflect the errors caused quite using unsuitable model but really that that
00:23:50
technically defined as the predictions and how they differ furniture actual values
00:24:31
yeah okay in what i sort of talk about again this one is open and the questions
00:24:35
but essentially the most correct answer from the lecture today was really increasing
00:24:39
model complexity by sir as a cause quite generally when we have
00:24:43
a model essentially too simple for the actual task we want we've made some assumptions are not very good ones
00:24:50
we just making very very simple generalisation roles but if we start to increase the complexity adds more parameters was
00:24:56
more information in we can really sort of pull so the air is down and decrease the actuaries
00:25:00
because by the actual um by sarah's by increasing the model complexity itself
00:25:05
um decreasing training data networks or anything you know do that
00:25:13
uh_huh
00:25:20
uh_huh
00:25:31
uh_huh
00:25:43
so there and says hi there is there is there is enjoy staying um when we sort of make little changes
00:25:49
which really change the overall sort of decision that's owned by the model itself so
00:25:54
to call a sort of a of a fitting it's always is really convoluted so the decision function that
00:25:59
we've got so it's always sort either fit into that i don't we sort of have librarians
00:26:05
um
00:26:35
yep cool i think that's a persona got a hundred percent so well done class
00:26:41
uh_huh uh_huh uh_huh
00:27:15
yep the discriminate models already lining the parameters in the model essentially directly from the training data itself
00:27:22
estimating parameters using bayes role is sort of made that up as a bunch of times it sounded like they could be correct
00:27:27
quite genetic optimisation is one form that we actually use when the doing discriminate models but for the purposes here
00:27:34
estimating parameters directly from the training data is the most correct answer that we actually have up there
00:28:16
yep so rise to model the gem probably between the labels and the features that
00:28:20
why that's order probably distribution of the gators so doing only doing generative models
00:28:46
uh_huh
00:28:59
yep so we generally combining multiple classifies normally sort of we classifies we never really care
00:29:04
on the sort of performing actually have a chance level we try to combine
00:29:08
them in such a way that we actually trying to reduce their answer is the could be associated with we classifies that might be sort of a fitting
00:29:40
yep weather is defined as the samples it's it closes the the decision boundaries
00:29:44
themselves every move disavow back uh we highly or to the decision boundary
00:29:48
itself and out lies a normally once it's infer this from the decision boundary
00:30:30
there is always on the feeling without complexity values not just for s. v. m.
00:30:33
that for everything on the feeling low complexity of the fitting high complexity
00:31:03
yep i am algorithm goal
00:31:06
and last question
00:31:42
i made this one are quite just pull out the terms together that i thought really
00:31:45
sounded fancy again is is the simple one here was mapping a sequence of observations
00:31:50
to sequence the labels that's essentially what we so use 'em hidden markov models for
00:31:57
so is julie and
00:32:05
enjoy your chocolate
00:32:08
yeah
00:32:09
call that ends the lectures so this morning it's uh i hope you guys learn

Share this talk: 


Conference Program

ML for speech classification, detection and regression (part 1)
Nick Cummins, Universität Augsburg
Feb. 13, 2019 · 9:04 a.m.
471 views
ML for speech classification, detection and regression (part 2)
Nick Cummins, Universität Augsburg
Feb. 13, 2019 · 10:59 a.m.
104 views
Quiz
Nick Cummins, Universität Augsburg
Feb. 13, 2019 · 11:59 a.m.

Recommended talks

Language Identification using Spectro-Temporal Patch features
Kamal Sahn, CMU
Sept. 8, 2012 · 11:06 a.m.
143 views
Sequence modelling for speech processing - Part 2
Mathew Magimai Doss, Idiap Research Institute
Feb. 12, 2019 · 9:59 a.m.