Quiz | Nick Cummins, Universitat Augsburg | 13.02.2019 at 11:59 | Part of TAPAS Training Event 2: Speech Processing and Machine Learning Workshop - Day 3

Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.

00:00:00

you get bragging rights ever run else in the room so

00:00:05

uh_huh

00:00:08

uh_huh

00:00:16

uh_huh

00:00:20

uh_huh

00:00:31

yeah

00:00:33

uh_huh

00:00:39

so it because there's not any backing right answers also back in the right answers quickly as you

00:00:43

can there's this other time aspect it takes into account when you're actually doing the um

00:00:48

putting your answers in it all multiple choice and just enter once we've got around sort of been rolled

00:01:06

uh_huh

00:01:13

uh_huh

00:01:17

i hope i go well

00:01:21

huh

00:01:23

yeah

00:01:25

yeah

00:01:34

uh_huh

00:01:38

uh_huh

00:01:41

okay there are all he wants to be route

00:01:45

i uh i just lost a player oh wait a minute

00:02:07

right

00:02:20

yeah

00:02:22

uh_huh

00:02:25

uh_huh

00:02:52

go mostly around that that right there yeah it always is also to model we're always

00:02:56

the most general model we use the anyway is the source filter model when we talk about

00:03:00

speech production oh modelling says to speech is a series of when the separable filters

00:03:05

we're modelling but it's voiced unvoiced excitation separately much unvoiced excitation

00:03:10

as as opposed to what a low pass filter those

00:03:12

we have a pay thought as other band pass filter then we have a high pass filter for the lips and

00:03:16

that's what essentially zapping the speech production the source filter model the source is always yeah going from the longs

00:03:22

through the vocal folds and the filters everything from the vocal folds through into your lips

00:03:33

uh_huh

00:03:37

00:04:09

yeah so the that's the big problem with rectangular windows that i really start or stop the zero crossing points

00:04:16

they always introduce some sort of discontinuity sums of the high frequency noise with into the windowing operation

00:04:22

that's generally i'll use this sort of idea of tape it windows avarice of the gaussian state

00:04:26

that's really so dampen the so the high frequencies it can come apart through windowing operation itself

00:04:31

i this is the assumption that we do with windowing were always presuming that the actual signal

00:04:36

changes very quickly in turn this page but the feature itself some like page so mike angie changes

00:04:42

a little bit more slowly actually compare it will generate looking at the road either itself

00:04:56

uh_huh

00:05:25

the show to many g. function really has nothing to do with pages showtime energy

00:05:29

pretty much the rest of them some harmonic summation is probably the best way i

00:05:33

talk about autocorrelation function average difference you'd mentioned work on the same my uh

00:05:43

huh

00:05:46

the yeah it's it's no use really in the

00:05:50

you're not gonna get teach directly from a short term energy function is no uh_huh

00:05:58

yeah okay i could whether the question slightly better the the general assumption that was fine i knew this would happen by

00:06:04

the way that you like you some of my ounces so that's fine i'm sort of what also why did it

00:06:16

uh_huh

00:06:45

uh_huh

00:06:48

i try to make that one sound like there were plenty of correct dances essentially that one that also

00:06:52

always with get our eyes measuring page very patients you know are

00:06:55

always meaning century i mean in the g. permutations itself

00:06:59

with based images and they're just reflections as i said uh sort of vocal folds not really opening

00:07:04

and shutting fully we'll i some some sort of formal there was some sort of noise

00:07:07

to do with some so the in coordination or not shutting probably with them is

00:07:11

what we're trying to pick up with so that your engine images themselves

00:07:26

this one people constantly get wrong in my classes in my exams so no pressure

00:07:52

yep cool around out that um i don't know what the dominant peaks in the

00:07:56

bottle spectrum i don't think there is a dominant taken the glottal spectrum

00:08:00

yeah dominant peaks of the vocal tract spectrum are always will formant frequencies so we've got that sort of you know

00:08:05

he points and they really are faked a lot of the linguistics the sound what is

00:08:09

being said the particular sound is reflected in the change of sort of formant frequencies

00:08:30

choices one and the most correct answer

00:08:42

in terms of what i talked about this morning anyway

00:08:45

new techniques i talked about

00:08:50

yep linear predictive coding it's always the way we try to find a vocal track parameters

00:08:55

i mean this vocal track information embedded in mel frequency cepstral coefficients themselves we wouldn't

00:09:00

exactly identified a frequency or the sort of filled up coefficient from it

00:09:05

shorter magnitude spectrum again information is embedded in a it's just probably not the best person

00:09:12

want 'cause we have a lot of the harmonic informational also present when looking at the um shot the magnitude spectrum here

00:09:18

so linear predictive coding and setting the right sort of dimensionality the linear predictive coding as we service or any examples

00:09:24

about twelve or fourteen really so they gave us a nice shape where we started say the

00:09:28

formant frequencies but not really go get too much interaction from the actual harmonics themselves

00:09:39

uh_huh

00:09:42

uh_huh

00:10:10

yep that always uh some of past samples here that we have a past and future samples

00:10:16

in makes no really centre was looking sort of back it's not really that um

00:10:20

no it's a real off i could yeah gets a linear function that it's not really how expressive really

00:10:25

call concept is very approximate the current sample some totally near some of past samples

00:10:30

we put this together we sort of can the that we found when we sort of so try to

00:10:34

sell the system occasions we could've i had the sort of autocorrelation function gives is topics matrix

00:10:40

and then this definitive methods we use to solve the different filter parameters themselves to find

00:10:53

yeah

00:10:55

so the b. main already briefly mention that

00:11:24

so yeah it's a sort of idea of trading off when we're doing those

00:11:28

other shot um spectral analysis and we got this sort of idea

00:11:31

of windowing size and when doing that signal the window parameter controls but

00:11:35

the time frequency at the time resolution and the frequency resolution

00:11:40

so essentially if we choose a very very long window and do the properties of the for a chance on we get very very good

00:11:46

frequency resolution but obviously in choosing a long window within sort of

00:11:50

start to blow temporal resolution of which is very short windows

00:11:53

and how did john i mean we're getting very very good temporal resolution but then we're starting to lose

00:11:58

a frequency information was applying less information into the actual fourier transform algorithm we're getting less resolution there

00:12:04

so the yeah we've always got this trade off it's impossible to get both good time an good frequency resolution

00:12:10

most the time we don't really care about this twenty five millisecond frames overlap by ten does the job

00:12:16

maybe slightly longer frames is said maybe forty second forty millisecond frame of looking at something like pitch extraction

00:12:22

just to sort of separate get a little bit more frequency information that we might do it

00:13:01

yep that's cool runner that right so is exactly the log spectrum within the the filtering with the mel filter bank

00:13:08

we then the car like the data so the uncompress the data using the d. c.

00:13:13

t. car the car later this gives us normally something around so the twelve

00:13:17

the main channels which arcane something like um log energy zeroes coefficient we take the delta uh which are

00:13:23

the delta delta coefficients we get is very nice cool free for all all purpose i guess

00:13:36

00:13:41

00:13:43

if if you

00:13:47

that's the semantics and say yes

00:13:52

yeah

00:13:58

yeah i knew that they some others that argument to yeah it's

00:14:06

a rough idea of the general steps though the whole thing was there a member ah

00:14:11

yeah we could argue i have to take it easy day it's the same yes

00:14:17

but the standard to fall

00:14:24

again in terms of the notes i talked about

00:14:56

yeah i don't know something called spectral activation point does exist and i'm sorry if

00:15:00

it does like they did make that i'm not gonna make disk results uh

00:15:04

if it does exist i apologise but that's just the chance that

00:15:07

generally spectral gradients pectoral injuries vector or point of the main

00:15:11

with eleven smile anyway spectral informations that we close sort of collect

00:15:15

and sorta using is sort of collapsing information down in

00:15:25

uh_huh

00:15:54

yep so we just summarising the set of l. l. days so extract was a lot of uh well days from a frame of

00:16:00

h. is really just one one sort of free one feature representation sort of the utterance of a chunk of other ends

00:16:06

so we do this sort of summation using the functional by the time we getting a sort of specified length of time

00:16:12

we're getting one representation to a greater extract the a.'s in the

00:16:15

survey g. maps and the compare formats this afternoon the tutorial

00:16:54

yep so no going back over the average is always of the codebook generation

00:16:59

we always take a look a sub sample about training data we do code book with

00:17:03

that and with essentially doing vector quantisation so we're looking at ah frame level

00:17:07

feature for finding the euclidean distance to the nearest word or the nearest group of

00:17:11

words is sensitive form histogram of sort of occurrences out of these ones themselves

00:17:22

okay

00:17:39

yeah

00:17:47

it has what i talked about it's convolutional neural networks were essentially

00:17:51

lining these different filtering operations in the convolutional neural networks themselves

00:17:55

i imagine you can't very broad either enter a car neural network but i don't imagine it's gonna learn anything particularly useful

00:18:01

when you're doing this in the end when learning what we generally did with the convolutional neural

00:18:05

network to learn the feature representation and then we sort of put a wreck our

00:18:08

neural network at the back of this to sort of catch or any sort of temporal

00:18:12

model any sort of temporal expectations any temporal dependencies within the actual data itself

00:18:53

yep convolutional imac falling liars cope with

00:18:59

what is next pulling do

00:19:32

yeah so i the down sampling several ways down sampling the output of the convolutional lies itself

00:19:37

remapping the output is something we'll probably use a soft max i wouldn't

00:19:41

use a match polling operations actually not something into a probability distribution

00:19:48

okay so switching from pages now into machine i mean

00:19:55

yeah

00:20:06

you could but there's one more correct answer than anything else

00:20:17

their eyes looking up performing abbas prediction men with doing machine learning essentially we'd only just

00:20:22

learn what's in the training set we wanna learn something while we wanna learn

00:20:26

a more generalised will model so that when we build something during speech pathology detection

00:20:30

in the lab we considered in the doctor's surgery somewhere essentially no what

00:20:35

so this idea of sort of right past predictions them is correct yeah i. q. we passenger interest of course

00:20:40

would like to build something emission lenses than that did that the yeah the summer's almost correct answer themselves

00:21:12

but

00:21:22

yep so is always important remember we always training to reduce training areas but testing to reduce generalisation error

00:21:29

is make sure that we've got the most robust most gentle liable model that we possibly can

00:22:04

yep so civilised learning is always learning a model to actually from

00:22:07

labelled data so we can make some sort of production itself

00:22:14

uh_huh

00:22:47

yeah they always again generalisation error is always associated with the case it itself

00:22:52

subset of the training data it's still the training data is still something the models actually

00:22:56

sane really interested in generalisation error is uh some in the models not seen itself

00:23:21

that should be how much

00:23:41

so yeah we're always looking at using it by sears really they to

00:23:46

reflect the errors caused quite using unsuitable model but really that that

00:23:50

technically defined as the predictions and how they differ furniture actual values

00:24:31

yeah okay in what i sort of talk about again this one is open and the questions

00:24:35

but essentially the most correct answer from the lecture today was really increasing

00:24:39

model complexity by sir as a cause quite generally when we have

00:24:43

a model essentially too simple for the actual task we want we've made some assumptions are not very good ones

00:24:50

we just making very very simple generalisation roles but if we start to increase the complexity adds more parameters was

00:24:56

more information in we can really sort of pull so the air is down and decrease the actuaries

00:25:00

because by the actual um by sarah's by increasing the model complexity itself

00:25:05

um decreasing training data networks or anything you know do that

00:25:13

uh_huh

00:25:20

uh_huh

00:25:31

uh_huh

00:25:43

so there and says hi there is there is there is enjoy staying um when we sort of make little changes

00:25:49

which really change the overall sort of decision that's owned by the model itself so

00:25:54

to call a sort of a of a fitting it's always is really convoluted so the decision function that

00:25:59

we've got so it's always sort either fit into that i don't we sort of have librarians

00:26:05

00:26:35

yep cool i think that's a persona got a hundred percent so well done class

00:26:41

uh_huh uh_huh uh_huh

00:27:15

yep the discriminate models already lining the parameters in the model essentially directly from the training data itself

00:27:22

estimating parameters using bayes role is sort of made that up as a bunch of times it sounded like they could be correct

00:27:27

quite genetic optimisation is one form that we actually use when the doing discriminate models but for the purposes here

00:27:34

estimating parameters directly from the training data is the most correct answer that we actually have up there

00:28:16

yep so rise to model the gem probably between the labels and the features that

00:28:20

why that's order probably distribution of the gators so doing only doing generative models

00:28:46

uh_huh

00:28:59

yep so we generally combining multiple classifies normally sort of we classifies we never really care

00:29:04

on the sort of performing actually have a chance level we try to combine

00:29:08

them in such a way that we actually trying to reduce their answer is the could be associated with we classifies that might be sort of a fitting

00:29:40

yep weather is defined as the samples it's it closes the the decision boundaries

00:29:44

themselves every move disavow back uh we highly or to the decision boundary

00:29:48

itself and out lies a normally once it's infer this from the decision boundary

00:30:30

there is always on the feeling without complexity values not just for s. v. m.

00:30:33

that for everything on the feeling low complexity of the fitting high complexity

00:31:03

yep i am algorithm goal

00:31:06

and last question

00:31:42

i made this one are quite just pull out the terms together that i thought really

00:31:45

sounded fancy again is is the simple one here was mapping a sequence of observations

00:31:50

to sequence the labels that's essentially what we so use 'em hidden markov models for

00:31:57

so is julie and

00:32:05

enjoy your chocolate

00:32:08

yeah

00:32:09

call that ends the lectures so this morning it's uh i hope you guys learn

Share this talk:

Conference Program

01:20:37

ML for speech classification, detection and regression (part 1)
Nick Cummins, Universität Augsburg
Feb. 13, 2019 · 9:04 a.m.

479 views

58:41

ML for speech classification, detection and regression (part 2)
Nick Cummins, Universität Augsburg
Feb. 13, 2019 · 10:59 a.m.

104 views

32:20

Quiz
Nick Cummins, Universität Augsburg
Feb. 13, 2019 · 11:59 a.m.

Recommended talks

22:18

Language Identification using Spectro-Temporal Patch features
Kamal Sahn, CMU
Sept. 8, 2012 · 11:06 a.m.

143 views

32:34

Quiz
Nick Cummins, Universität Augsburg

Embed

Transcriptions

Conference Program

ML for speech classification, detection and regression (part 1)
Nick Cummins, Universität Augsburg
Feb. 13, 2019 · 9:04 a.m.

ML for speech classification, detection and regression (part 2)
Nick Cummins, Universität Augsburg
Feb. 13, 2019 · 10:59 a.m.

Quiz
Nick Cummins, Universität Augsburg
Feb. 13, 2019 · 11:59 a.m.

Recommended talks

Language Identification using Spectro-Temporal Patch features
Kamal Sahn, CMU
Sept. 8, 2012 · 11:06 a.m.

Sequence modelling for speech processing - Part 2
Mathew Magimai Doss, Idiap Research Institute
Feb. 12, 2019 · 9:59 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Quiz Nick Cummins, Universität Augsburg

Embed

Transcriptions

Conference Program

ML for speech classification, detection and regression (part 1) Nick Cummins, Universität Augsburg Feb. 13, 2019 · 9:04 a.m.

ML for speech classification, detection and regression (part 2) Nick Cummins, Universität Augsburg Feb. 13, 2019 · 10:59 a.m.

Quiz Nick Cummins, Universität Augsburg Feb. 13, 2019 · 11:59 a.m.

Recommended talks

Language Identification using Spectro-Temporal Patch features Kamal Sahn, CMU Sept. 8, 2012 · 11:06 a.m.

Sequence modelling for speech processing - Part 2 Mathew Magimai Doss, Idiap Research Institute Feb. 12, 2019 · 9:59 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Quiz
Nick Cummins, Universität Augsburg

ML for speech classification, detection and regression (part 1)
Nick Cummins, Universität Augsburg
Feb. 13, 2019 · 9:04 a.m.

ML for speech classification, detection and regression (part 2)
Nick Cummins, Universität Augsburg
Feb. 13, 2019 · 10:59 a.m.

Quiz
Nick Cummins, Universität Augsburg
Feb. 13, 2019 · 11:59 a.m.

Language Identification using Spectro-Temporal Patch features
Kamal Sahn, CMU
Sept. 8, 2012 · 11:06 a.m.

Sequence modelling for speech processing - Part 2
Mathew Magimai Doss, Idiap Research Institute
Feb. 12, 2019 · 9:59 a.m.