ESR11 : First year review

Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.

00:00:02

okay i don't i'm everything so i'm i'm taking too long to stop may end up with for the presentation that

00:00:09

so good morning everyone and going to review my first

00:00:13

you will be a he uh i'm gonna focus on uh

00:00:18

basically learning outcomes and what i have done in the first

00:00:21

year like the the uh so for it i will summarise mine

00:00:26

meaning research activities all fine including the second and i have done at oxford

00:00:31

uh i will also mention uh what additional

00:00:35

activities i have participated in doing this one here

00:00:38

uh reflect on what i have learned so far but

00:00:42

for the learning outcomes of these things and that was all

00:00:47

try to show some direction what's going to happen in the future so all

00:00:53

remind everyone what my project is about because i'm sure no one remembers the

00:00:58

it's to synthesise those to do all kinds of speech samples

00:01:04

and that's that's basically so from black box approach what you can see is that

00:01:09

we will have some pretty operatives examples

00:01:12

patients before an operation and uh we have

00:01:18

and we want to simulate we want to sympathise how they will sell often operation

00:01:24

yeah that's sounds sounds good so far about uh the the problem is that there

00:01:29

are not so many all kinds of speech data available and they they they that

00:01:35

all work and so speech that i mean the speech data

00:01:38

has high variance because of the the current treatment model eighties

00:01:43

there are also the front to more sizes animals cancer different

00:01:47

places where the two more can occur which all calls like

00:01:52

hi differences in how though kansas beaches bonus on like up the operation

00:01:58

um and also there is the additional thing that a black box model is

00:02:03

not that it not in out there should be some systematic articulate to lee's any light

00:02:10

like what's the least on someone is speaking like

00:02:13

like uh that with a particular problem so not just

00:02:19

some sorta for transforming from one acoustical scenario

00:02:22

to another ad comes to go scenario uh

00:02:27

and so that made the first question that i have investigated is uh

00:02:33

can we use articulatory considerations to do sympathise please

00:02:36

so so what what what that really means uh

00:02:42

so basically the idea was used to train a neural

00:02:45

network using articulation data and and speech data fayette bands

00:02:50

and uh and first thing that ties have the speech from articulation data

00:02:55

and uh normally what happens if if we want to the the speech from articulation

00:03:01

is that uh the provider articulation information to a statistical model

00:03:06

which is an enemy network abuse you know okay it's and uh

00:03:11

and it and uh in our case what i use this is a simple cooler to partition this

00:03:17

speech into various parameters like the fundamental frequency that m.

00:03:22

f. c. c.s and the band that area of the cities

00:03:25

and the the i remain in our network was

00:03:29

to be the the m. f. c. c.s of the

00:03:35

of the of the speech in order to sin to sympathise uh the speech itself

00:03:42

and we retain the peach and a better than that relate

00:03:45

to cities we don't be big that because what we're interested in

00:03:49

is how the articulation changes have now the articulation

00:03:52

can be predicted indecent scenarios and uh so that's uh

00:03:58

that's okay for for for help the speed but how do we actually make that'll people

00:04:03

speech and that's i think a very valid question and something that the times they still investigating

00:04:08

so what what i what i used as a night approach to synthesise but would people speech

00:04:15

i still think about how we can slow down or or

00:04:20

you need the velocity of the speech in articulatory domain and so

00:04:25

for that i just take simply a time stevie spectre and simply

00:04:29

fresh for the day but it's all that factor to limit sort of the

00:04:33

battle city of the op the articulators but uh from the articulation they that

00:04:41

and the

00:04:43

and what was was very interesting to the is that actually it's

00:04:48

the berries over by the way the end around any problems that that

00:04:53

that matter because it is so if there is sort of a domino effect in sort

00:04:58

of realising sort of such such as speech because there is in itself the production and overall

00:05:05

such a such a such a model which tries pretty the speech

00:05:10

from from the articulation that is also the limitation of the

00:05:14

vocoder which causes and the overall that speech probably and that

00:05:20

uh the additional psychoacoustic thing that i think we haven't considered is that pathological speech

00:05:28

has to be actually even more natural than natural

00:05:31

speech to be considered not just simply bad computer speech

00:05:37

and because you the computer synthesised speech is not natural enough then it just

00:05:43

really sounds like a computer making that abuse arrows in

00:05:47

the production some kind of a a smoothing effect which

00:05:51

is which is sometimes every damn thing for example hidden

00:05:54

markov models if um and the one night method uh

00:06:01

so such naive model is obviously not agree although for for more doing

00:06:06

that's a possibility but what is surprising that it's really cannot show like a consistent

00:06:11

ontology but it still shoals the according to a speech language but would you some aspect

00:06:18

both pathological speech summer we saw that some some books all the speech and some the thought to speech

00:06:25

um and and that another problem is that i have

00:06:29

continuous the matt doing doing the miami so it is

00:06:33

the there is not really messed published object evaluation mess people pathological speech

00:06:39

for even for actual speech we we have something like the mean opinion score three have mel cepstral big

00:06:45

portions two point two five how loud the deeply

00:06:48

held the speech uh uh but hope but logical speech

00:06:53

that the uh the only thing is we can do is to compare it with one too but logical speech

00:07:00

and we don't have a little bit and and be pathological speech synthesis the idea is

00:07:04

to sort of have a generative model rather than a uh comparing the the existing examples

00:07:11

uh so i'm the project that i this for two months doing doing my p. h. d.

00:07:17

uh it it was a second and third hoax would wave research uh

00:07:23

huh by my by my project was about that abiding by the uh speech sample

00:07:30

fake uh it's also quotes pool or natural which is also called when i feed

00:07:35

the in in in a sort of jog on time or or g. and the um

00:07:43

and so with this system is called a sort of scooping come to measure and in order to them about that

00:07:49

uh they have used uh yes these two two thousand nineteen benchmark we should uh

00:07:54

uh is a benchmark released for this year's in those into speech conference and

00:08:00

it's in order to understand what's going on it's it's good to know that

00:08:05

that uh what's really the purpose of the system and

00:08:09

and so that's the purpose of such systems to protect against

00:08:12

attacks uh you know to magic speech verification and there are really two cases of that so one type

00:08:18

of attack is when somebody comes next to you and uh tries record us was speaking and feed back into

00:08:25

a a speaker verification system which might be sums

00:08:29

that self checkout machine at the supermarket asking for for

00:08:33

your age uh well skip asking for you and identity that would be like a very natural scenario but

00:08:40

just an example uh another is when you when you are looking into a bank and you have

00:08:46

to repeat i'm i'm exec sentence for that you would use something like voice conversion and speech synthesis

00:08:52

as a solo artist cloth logical accents and the and the former

00:08:56

it's called the physical access to now your in a in speech detection

00:09:01

so what what what what the i i. b. b. is

00:09:04

is basically a of that a lot the method of uh

00:09:10

uh for for detecting spoof change uh using good

00:09:15

combinations of existing things and the and the whole

00:09:19

and the going to some of my own intuition so oh well what

00:09:25

we use formal those apple features is it is either log spectrograms and

00:09:30

or constant q. transports which are really interesting kind of features if you

00:09:35

have never heard about and there's the i've mainly used for musical signal processing

00:09:40

and the the the other one at is is

00:09:43

just a simple log spectrum not mass spectrograms interestingly

00:09:48

and the the model we i have used it's a ballad that native red red's net

00:09:53

combined with the gaussian mixture model simmer

00:09:56

to expect a architecture uh and it's uh

00:10:01

and it's really interesting that this but the um so the idea

00:10:06

a part of that expect the ark architecture is basically learning uh features

00:10:11

sort of like an outgoing call the if you have time yeah without doing what is

00:10:15

that you've done some sort of and then things in in it come in a compressed space

00:10:21

uh which can be useful for speaker verification expect asides to speak of

00:10:26

a vacation these kind of them but been set used for school that action

00:10:30

and a and b. c. speed to a gaussian mixture model

00:10:37

and so there are some i think less of that i think are very interesting and the the

00:10:42

so i there's some lessons than that i think i may be interesting to to share so

00:10:47

the idea is is that there is like um they divide that phenomenon um but the here

00:10:52

spool speech which like speech that is that these mutated from like so the big these uh

00:10:59

cloned from there the examples that are already over evil online and it's we

00:11:05

know that these are convincing but these are actually not too difficult to detect

00:11:10

so we could because of because most of the cases what happens is that this

00:11:15

was going out with them so all the mice for perceived speech quality and actually

00:11:21

if you use some other feature which focus on low frequency bands which

00:11:25

are not leave out recognise that human heaving you can still detect that something

00:11:29

is going blocks all i think the one of the message of this product

00:11:32

is to just don't panic over the apparitions yuppies board was going bills because

00:11:37

print practical applications we can still be correct but that would be that they are real speech

00:11:43

uh one of the surprising thing is that that works for some features uh

00:11:49

yeah finally can get the space spectral that we are transferable

00:11:52

beach it's something that people can forget about that big big these

00:11:56

then we do just abductees and pick now spectrograms but

00:11:59

they exist and they can be used for falls whooping depiction

00:12:03

uh at another thing is that in the yes puts book two

00:12:06

thousand nineteen a simulated we play database use and actually many people found

00:12:11

doing the channels that it is not really be present at the top the actual we'll

00:12:15

put it up so it in some sense it's not a not a good bunch my

00:12:21

and also one really interesting the ink in in the middle of the people earning a a rat is

00:12:26

that a single no matter what does by mail

00:12:29

but for that one caption make sure models english challenges

00:12:33

uh so uh g. m. and still seem to have a place in in in the d. one yeah

00:12:39

uh so i would like to talk about my all the

00:12:41

other activities also quickly i started looking out with i've wrote already

00:12:45

two articles one of them about the summer school experience i

00:12:49

have been doing in norway another one i had is a post

00:12:53

uh a lot just in the speech articles i find

00:12:56

interesting i i don't know why anybody else find interesting but

00:13:01

it's for you to see other people uh i plan to do

00:13:06

an article about by it's a commendable but they've research and also an article about

00:13:11

this event and beep or do i have the time also about in in in about into speech and

00:13:18

also some article which is taught targeted a general audience about

00:13:22

what really is going down with this widespread voice conversion too because i think that there

00:13:27

is a general view about the media about these things and i think we should talk to

00:13:32

the ordinary people about what are the real dangers obese voice conversion

00:13:36

tools and uh i and how they can uh protect against them

00:13:41

uh other than that i'd done done quite a lot of course it's uh so i should yeah

00:13:48

uh i answered the e. c. t. s. for my entire page yeah and i seem to have

00:13:54

twenty three of them so i think i'm progressing quite well i studied during the p. h. d.

00:14:00

and uh the i. v. i. you also teach the september

00:14:04

in the university of amsterdam a speech synthesis and recognition point

00:14:09

at the the b. a. artificial intelligence and the we are

00:14:14

also going to do some actual magnetic or colour graphic data

00:14:18

uh and speech what action we the billions they checked uh uh uh and the university upon him for

00:14:25

for or against the speech to better understand or all kinds of speech inundated with them yeah

00:14:31

uh um yeah and for my segments but the future future plans that i used to just second

00:14:38

second second one sort of apartment but also plays the church focusing on

00:14:42

translating the models or product lifeline and an additional segment that

00:14:47

maybe up to study or kansas but that'll jeez

00:14:50

you using explainable nationally and just continually search engine

00:14:55

yes oh thank you for this thing i asked any questions at all

Share this talk:

Conference Program

51:42

Storage and management of sensitive speech data
Henk van den Heuvel
Sept. 4, 2019 · 11:51 a.m.

08:04

ESR01 : Speech based automatic dementia detection in a home environment
Yilin Pan
Sept. 4, 2019 · 2:14 p.m.

10:31

ESR03 : Interpretable speech pathology detection
Julian Fritsch
Sept. 4, 2019 · 2:30 p.m.

08:02

ESR04 : Acoustic speech based monitoring for telehealth and predictive analytics
V. Srikanth Nallanthighal
Sept. 4, 2019 · 2:54 p.m.

161 views

17:25

ESR07 : Modeling progression of patients with neurodegenerative disorders
Camilio Vásquez
Sept. 4, 2019 · 4:05 p.m.

15:20

ESR08 : Developing measurment procedure of pathological speech intelligibility
Wei Xue
Sept. 4, 2019 · 4:31 p.m.

12:13

ESR09 : Clinical relevance of intelligibility mesures
Pommée Timothy
Sept. 4, 2019 · 4:49 p.m.

21:49

Big Data with Health Data
Sébastien Déjean
Sept. 5, 2019 · 9:20 a.m.

41:35

Small Data, Number of features vs. number of observations
Olivier Crouzet
Sept. 5, 2019 · 9:47 a.m.

15:00

ESR11 : First year review
Bence Halpern
Sept. 5, 2019 · 11:20 a.m.

08:58

ESR10 : Acoustic features to support the perceptual evaluation of accent production in dysarthric speech
Viviana Mendoza Ramos
Sept. 5, 2019 · 11:39 a.m.

110 views

14:13

ESR12 : Continuous speech recognition for people with dysarthria
Zhengjun Yue
Sept. 5, 2019 · 11:50 a.m.

153 views

20:33

Management of health data in an industrial company
V. Srikanth Nallanthighal
Sept. 5, 2019 · 1:57 p.m.

146 views

19:27

ESR02 : Automatic detection and classification of pathological speech conditiond based on emotion expression
Zhao Ren
Sept. 5, 2019 · 4:34 p.m.

14:35

ESR13 : Bridging the gap between typical and pathological speech recognition
Enno Hermann
Sept. 5, 2019 · 4:56 p.m.

11:55

ESR14 : Developing speech therapy games for children with speech disorder
Thomas Rolland
Sept. 5, 2019 · 5:16 p.m.

14:15

ESR15 : Automatic assessment of speech in cochlear implant (CI) users
Tomás Arias-Vergara
Sept. 5, 2019 · 5:32 p.m.

Recommended talks

20:02

ESR11 : First year review
Bence Halpern

Embed

Transcriptions

Conference Program

Storage and management of sensitive speech data
Henk van den Heuvel
Sept. 4, 2019 · 11:51 a.m.

ESR01 : Speech based automatic dementia detection in a home environment
Yilin Pan
Sept. 4, 2019 · 2:14 p.m.

ESR03 : Interpretable speech pathology detection
Julian Fritsch
Sept. 4, 2019 · 2:30 p.m.

ESR04 : Acoustic speech based monitoring for telehealth and predictive analytics
V. Srikanth Nallanthighal
Sept. 4, 2019 · 2:54 p.m.

ESR07 : Modeling progression of patients with neurodegenerative disorders
Camilio Vásquez
Sept. 4, 2019 · 4:05 p.m.

ESR08 : Developing measurment procedure of pathological speech intelligibility
Wei Xue
Sept. 4, 2019 · 4:31 p.m.

ESR09 : Clinical relevance of intelligibility mesures
Pommée Timothy
Sept. 4, 2019 · 4:49 p.m.

Big Data with Health Data
Sébastien Déjean
Sept. 5, 2019 · 9:20 a.m.

Small Data, Number of features vs. number of observations
Olivier Crouzet
Sept. 5, 2019 · 9:47 a.m.

ESR11 : First year review
Bence Halpern
Sept. 5, 2019 · 11:20 a.m.

ESR10 : Acoustic features to support the perceptual evaluation of accent production in dysarthric speech
Viviana Mendoza Ramos
Sept. 5, 2019 · 11:39 a.m.

ESR12 : Continuous speech recognition for people with dysarthria
Zhengjun Yue
Sept. 5, 2019 · 11:50 a.m.

Management of health data in an industrial company
V. Srikanth Nallanthighal
Sept. 5, 2019 · 1:57 p.m.

ESR02 : Automatic detection and classification of pathological speech conditiond based on emotion expression
Zhao Ren
Sept. 5, 2019 · 4:34 p.m.

ESR13 : Bridging the gap between typical and pathological speech recognition
Enno Hermann
Sept. 5, 2019 · 4:56 p.m.

ESR14 : Developing speech therapy games for children with speech disorder
Thomas Rolland
Sept. 5, 2019 · 5:16 p.m.

ESR15 : Automatic assessment of speech in cochlear implant (CI) users
Tomás Arias-Vergara
Sept. 5, 2019 · 5:32 p.m.

Recommended talks

Pitch Estimation Using Mutual Information
Yanbo Xu, University of Maryland College Park
Sept. 7, 2012 · 11:07 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

ESR11 : First year review Bence Halpern

Embed

Transcriptions

Conference Program

Storage and management of sensitive speech data Henk van den Heuvel Sept. 4, 2019 · 11:51 a.m.

ESR01 : Speech based automatic dementia detection in a home environment Yilin Pan Sept. 4, 2019 · 2:14 p.m.

ESR03 : Interpretable speech pathology detection Julian Fritsch Sept. 4, 2019 · 2:30 p.m.

ESR04 : Acoustic speech based monitoring for telehealth and predictive analytics V. Srikanth Nallanthighal Sept. 4, 2019 · 2:54 p.m.

ESR07 : Modeling progression of patients with neurodegenerative disorders Camilio Vásquez Sept. 4, 2019 · 4:05 p.m.

ESR08 : Developing measurment procedure of pathological speech intelligibility Wei Xue Sept. 4, 2019 · 4:31 p.m.

ESR09 : Clinical relevance of intelligibility mesures Pommée Timothy Sept. 4, 2019 · 4:49 p.m.

Big Data with Health Data Sébastien Déjean Sept. 5, 2019 · 9:20 a.m.

Small Data, Number of features vs. number of observations Olivier Crouzet Sept. 5, 2019 · 9:47 a.m.

ESR11 : First year review Bence Halpern Sept. 5, 2019 · 11:20 a.m.

ESR10 : Acoustic features to support the perceptual evaluation of accent production in dysarthric speech Viviana Mendoza Ramos Sept. 5, 2019 · 11:39 a.m.

ESR12 : Continuous speech recognition for people with dysarthria Zhengjun Yue Sept. 5, 2019 · 11:50 a.m.

Management of health data in an industrial company V. Srikanth Nallanthighal Sept. 5, 2019 · 1:57 p.m.

ESR02 : Automatic detection and classification of pathological speech conditiond based on emotion expression Zhao Ren Sept. 5, 2019 · 4:34 p.m.

ESR13 : Bridging the gap between typical and pathological speech recognition Enno Hermann Sept. 5, 2019 · 4:56 p.m.

ESR14 : Developing speech therapy games for children with speech disorder Thomas Rolland Sept. 5, 2019 · 5:16 p.m.

ESR15 : Automatic assessment of speech in cochlear implant (CI) users Tomás Arias-Vergara Sept. 5, 2019 · 5:32 p.m.

Recommended talks

Pitch Estimation Using Mutual Information Yanbo Xu, University of Maryland College Park Sept. 7, 2012 · 11:07 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

ESR11 : First year review
Bence Halpern

Storage and management of sensitive speech data
Henk van den Heuvel
Sept. 4, 2019 · 11:51 a.m.

ESR01 : Speech based automatic dementia detection in a home environment
Yilin Pan
Sept. 4, 2019 · 2:14 p.m.

ESR03 : Interpretable speech pathology detection
Julian Fritsch
Sept. 4, 2019 · 2:30 p.m.

ESR04 : Acoustic speech based monitoring for telehealth and predictive analytics
V. Srikanth Nallanthighal
Sept. 4, 2019 · 2:54 p.m.

ESR07 : Modeling progression of patients with neurodegenerative disorders
Camilio Vásquez
Sept. 4, 2019 · 4:05 p.m.

ESR08 : Developing measurment procedure of pathological speech intelligibility
Wei Xue
Sept. 4, 2019 · 4:31 p.m.

ESR09 : Clinical relevance of intelligibility mesures
Pommée Timothy
Sept. 4, 2019 · 4:49 p.m.

Big Data with Health Data
Sébastien Déjean
Sept. 5, 2019 · 9:20 a.m.

Small Data, Number of features vs. number of observations
Olivier Crouzet
Sept. 5, 2019 · 9:47 a.m.

ESR11 : First year review
Bence Halpern
Sept. 5, 2019 · 11:20 a.m.

ESR10 : Acoustic features to support the perceptual evaluation of accent production in dysarthric speech
Viviana Mendoza Ramos
Sept. 5, 2019 · 11:39 a.m.

ESR12 : Continuous speech recognition for people with dysarthria
Zhengjun Yue
Sept. 5, 2019 · 11:50 a.m.

Management of health data in an industrial company
V. Srikanth Nallanthighal
Sept. 5, 2019 · 1:57 p.m.

ESR02 : Automatic detection and classification of pathological speech conditiond based on emotion expression
Zhao Ren
Sept. 5, 2019 · 4:34 p.m.

ESR13 : Bridging the gap between typical and pathological speech recognition
Enno Hermann
Sept. 5, 2019 · 4:56 p.m.

ESR14 : Developing speech therapy games for children with speech disorder
Thomas Rolland
Sept. 5, 2019 · 5:16 p.m.

ESR15 : Automatic assessment of speech in cochlear implant (CI) users
Tomás Arias-Vergara
Sept. 5, 2019 · 5:32 p.m.

Pitch Estimation Using Mutual Information
Yanbo Xu, University of Maryland College Park
Sept. 7, 2012 · 11:07 a.m.