Player is loading...

Presentation of the "Natural Language Understanding" research group
James Henderson

Tuesday, Jan. 9, 2018 · 4:28 p.m. · 18m 40s

The Idiap NLU group was created in September 2017 under the direction of James Henderson, in part as a continuation of the previous Natural Language Processing group which was lead by Andrei Popescu-Belis. The NLU group continues work on how semantic and discourse processing of text and dialog can improve statistical machine translation and information indexing, with a recent focus on neural machine translation and attention-based deep learning models. This fits well with the NLU group's new research direction of neural network structured prediction and representation learning for modeling the syntax and semantics of text and speech, including modeling abstraction (textual entailment) and summarization.

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.

00:00:00

the language understanding group here it idea um

00:00:06

this is studying a natural language understanding which in my opinion

00:00:11

is is the greatest challenge in a high because

00:00:14

essentially anything you can think are almost anything you can think

00:00:19

can be said in natural language and you can understand

00:00:24

uh that natural language so it essentially me is a form of of mind reading

00:00:30

we can read each other's minds we can communicate thoughts directly into

00:00:35

another person's head just by making sounds that's incredible um

00:00:42

but there are the kinds of things that we don't know we don't understand about this process

00:00:48

in particular things like what is the structure of human

00:00:51

thought if if we can can really communicate any

00:00:55

any idea in language then that tells a lot about what ideas were capable

00:01:01

of having 'cause those are the things that can be expressed in language

00:01:07

how the how do we understand language how can we calculate

00:01:10

this thought from just hearing a a sequence of sounds

00:01:14

how can we decide if i make this sequence of sounds and

00:01:18

that person is going to have this thought in their head

00:01:22

um and my arguments you only animals i can do this in their other animal certainly do communicate

00:01:29

but not with anything like this structure uh and complexity

00:01:34

of structure that humans are able to do

00:01:37

so is that because uh animals i don't have the there's some you

00:01:43

know innate human ability to to speak in there from by

00:01:48

communicate our hearts more effectively or is it just that that thoughts themselves are much simpler

00:01:54

we have more complicated thoughts and that's why we make more complicated uh communications

00:02:01

we don't know that

00:02:04

this so this is the fastening problem for a i um

00:02:10

in our group we're looking at uh as a understanding natural language through a

00:02:16

natural language processing n. l. p. and the focusing on representation learning so

00:02:23

uh we work on deep neural network architectures for representation learning applied

00:02:28

to multiple different tasks um in particular focusing on machine translation

00:02:35

information access in classification and a semantic entanglement

00:02:44

uh this uh maybe the head a new because

00:02:48

others to post on joan leslie r. p. h.

00:02:52

d. students we also work closely with andrei

00:02:56

the best to police who was the head of the n. l. p. group before i arrived

00:03:02

uh we work closely with people in the speech and audio processing group

00:03:07

always shared project projects and i've collaborations homer low in geneva

00:03:15

and jenna pop up used to student uh and my from for their former employer

00:03:22

um so our current research topics are

00:03:27

as i said representation learning that

00:03:30

is pervasive machine translation information access in classification and actual entanglement

00:03:38

made a number of 'em contributions in this area uh we sites

00:03:45

and a lot of places where we've published this work

00:03:51

00:03:54

i won't go into the whole list but i will uh

00:03:57

go into more detail for some of these other contributions

00:04:04

so just as an overview of work on representation learning

00:04:10

00:04:14

we've done a lot of work that's related to attention based representations

00:04:19

so attention over the different uh meanings that are

00:04:24

word could have attention over the previous um

00:04:30

words in the decoder when you're generating a sequence of words

00:04:36

a hierarchical attention that will uh describe in this moment

00:04:43

and uh a bag of factor i'm betting was uh the non parametric embedded graphs

00:04:50

um output representation learning uh these are

00:04:56

ways to general lies to um

00:05:00

new outputs by taking advantage of the descriptions of those outputs

00:05:06

uh which will talk in particular about the label law where text classification and um

00:05:13

i work on internment factors representing intel meant in a vector space

00:05:19

uh for uh and semantics of words and looking

00:05:23

at a compositional semantics so we'll talk

00:05:27

a bit about how to play those representations a tech show and tell me

00:05:35

so for some machine translation

00:05:40

i just two pieces of work the sense of where

00:05:45

um neural machine translation and it's uh the

00:05:50

self attentive residual decoder for no machine translation

00:05:58

so neural machine translation works by or uh

00:06:04

running a neural network over the input sentence computing a a representation

00:06:10

and then conditioning on our representation to generate the

00:06:14

sentence uh for the output the target language

00:06:20

00:06:24

one of the problems is that uh for the input sentence

00:06:28

you often have words that are ambiguous him or

00:06:32

at least in the target language there are more than one ways in which you could translate them

00:06:39

so if you can decision on and big us a

00:06:44

workers of representations of words that are ambiguous

00:06:48

you quite quickly get into problems because of the the company towards of all these ambiguities

00:06:56

also you end up with a models that are very biased words

00:06:59

most frequent senses which might not apply for given case uh_huh

00:07:06

um so the solution is to explicitly add a word since the

00:07:12

us in big you asian component to the in coder

00:07:16

uh you first look at your input text decide

00:07:20

on what sense each we're word uh

00:07:23

should have uh at least the nouns and verbs which are the ambiguous ones

00:07:29

and then condition on those senses of when you're

00:07:33

compute your representation of the input a sentence

00:07:38

00:07:43

so that's one piece of work the second one is once you've conditioned on

00:07:48

with the the representation of the input sentence you need to generate

00:07:54

the translation of that sentence as your output sentence

00:07:59

and this uh if you're doing it with the recurrent neural network as a problem that

00:08:04

it's very biased uh the the recurrent neural network one is

00:08:08

trying to predict the next word it should generate

00:08:12

very biased towards uh looking only at the previous few words that it's already generated

00:08:18

when in fact lots of previous words or previous words that are far away maybe very relevant

00:08:24

so the idea is to add an attention mechanism there so that you can decide

00:08:30

uh which of the previous words you need to take pay

00:08:33

attention to to generate if the next word um

00:08:38

and in particular this is an attention mechanism that uh

00:08:43

looks at the specific words that's what it's called the residual connection

00:08:50

rather than looking at uh the the hidden representations and the for previous steps

00:08:56

um and that improves um improves the quality of the translation

00:09:03

output and if you look at those attention patterns

00:09:07

you see something that looks very much like a syntax syntactic structure that

00:09:13

it's learning model of how to generate the output and that model is capturing

00:09:19

some information about the syntax of the the sentence that is generating

00:09:27

in areas in indexing and and

00:09:30

classifications so information access problems

00:09:36

00:09:40

uh when the problem is that um we need to build the

00:09:45

do with multiple languages for for solving these tasks um

00:09:52

and often we do not have a lot enough data for all

00:09:56

our languages so we need to be other somehow share

00:09:59

parameters share information learned about one language to another language

00:10:05

and here the solution is to have a hierarchical attention mechanism

00:10:11

the first layer here uh uh for each individual sentence decides which word to look

00:10:17

at which words to focus on when computing the representation of that sentence

00:10:23

the second level says uh which sentences should really focus

00:10:28

on when computing the representation of the entire document

00:10:31

for doing our uh retrieval or categorisation of documents

00:10:39

and the uh multilingual part has to do with which parts of this model

00:10:46

you share do you share these these representations that ever there

00:10:51

will be in the same encoded in the same why mapping from words to encoding

00:10:57

or you share these attention mechanisms that uh when

00:11:02

once you've got the uh to uh encoding

00:11:04

at one level how do you apply attention to compute the coding at the next level

00:11:09

so you might wanna change a shared a the encoding you might wanna

00:11:14

share the uh attention or you might want to share both

00:11:19

and so uh we've done experiments on on what's the appropriate

00:11:24

level of of sharing for um multilingual text classification

00:11:33

um another piece of work in this area is label where i'm a

00:11:39

decoder for classification for for deciding which class something is is in

00:11:45

here you're in a setting where the classes have labels so there is a piece of tax

00:11:52

telling you what you're out what should be a um you know

00:11:57

how the a piece of text describing want you to each of your possible output you're allowed to choose

00:12:04

so these tracts getting crowded in the same way using the same wording beheadings of

00:12:10

factors that represent words as the input tax that you're trying to classify

00:12:16

then you computed joint space between these two representations and

00:12:22

then train a classifier and that joint space

00:12:27

um and that leverage is the semantics of your labels themselves which is a

00:12:33

very useful in many cases especially if you have a very large number

00:12:38

of classes or you have two classes that you have no training data

00:12:42

for but you have a label you can still uses method

00:12:48

so this uh particular uh a solution approach

00:12:53

to solving this problem actually not only

00:12:57

uh handles and seen labels are rare labels it even improves

00:13:02

on the places where you do have have enough data uh

00:13:12

i'm not there other thing we're working on is textual entail men

00:13:17

so actually tell meant is information inclusion so you want to

00:13:22

be able to say this statement about health care

00:13:26

is a includes the information uh this simpler more abstract

00:13:31

piece of information so it's also kind of abstraction

00:13:35

this is an abstraction of both these two different state means even though they're different

00:13:42

it's a fundamental concept actual entertainment for for this man takes of language

00:13:47

and as you can imagine it's useful for doing opinion summarisation

00:13:52

if you have a very diverse everybody's opinions is different

00:13:56

but maybe you can abstract out some consensus opinions

00:13:59

that large numbers of people agree on and summarise based on that

00:14:07

so what we've done um is develop

00:14:12

models of of actual entitlement

00:14:15

that are based on two ideas uh first of all to deal with this um

00:14:24

to deal with this problem that the

00:14:29

the the sentences are can be very long and so you want to say that

00:14:36

uh if you want to say the information in the be entailed part

00:14:41

is a subset of the information in the in telling part

00:14:44

it may just be that it's a subset of the words or a subset of the the semantic entities

00:14:51

uh in the two representations so we have um

00:14:57

a bag of factor which representations for each part and then uh

00:15:02

we assume that there's for every vector in the entailed

00:15:07

bag uh there exists at least one vector need tailing bag

00:15:12

uh oh such that one the vectors entailed each other

00:15:17

and that but that leaves open the question of how do you represent entanglement between vectors

00:15:22

so we have a whole line of work on how to model

00:15:25

entailed in a in a vector space cancelled answer that question

00:15:33

uh right so this is what i just said every vector

00:15:37

in the entailed sense has to be entailed by some factor

00:15:41

in in telling sentence and uh we have of

00:15:47

a theoretical framework for measuring entanglement in a in a vector space

00:15:55

so that framework uh lets us um represent

00:16:02

propositions that can be known about uh in

00:16:07

about something i'm i'm

00:16:14

as bits in the vector and then uh there's a

00:16:18

prior that captures the constraints over the whole factor

00:16:24

the framework tells us a way to measure the intel meant to

00:16:28

be given to vectors and how to for a vector

00:16:32

a given other in vectors that it entails or is until right

00:16:39

we use this framework to define a a distribution semantic model

00:16:44

so distributional semantics says that all meaning of a word is um you can

00:16:52

tell the meaning of a word by looking at the distribution of words

00:16:56

that that word coworkers with in texas just by looking at very large

00:17:01

amounts of text uh you can infer the meaning of a word

00:17:06

we propose this uh to internet based model of

00:17:11

of why the meaning of a word tells

00:17:14

you something about the the meaning of the other words that are in the same context

00:17:20

um by assuming there's some pseudo phrase that that a

00:17:25

unifying these two vectors together gives you some coherent

00:17:30

uh as a semantics uh that so it reflects the sense in which these

00:17:36

two words are played both playing a role in some coherent larger semantics

00:17:45

so by uh representing bad in entertainment framework

00:17:49

uh we can train models that gives us state of

00:17:53

the art uh results on predicting hype on me

00:17:57

so does uh is the word catch a

00:18:01

a hypo name of the word animal

00:18:10

so that's just an overview of us and the things we're working on working

00:18:14

on understanding natural language to an l. p. and representation learning um

00:18:22

looking at people earning architectures for doing that representation learning

00:18:27

machine translation and um information access

00:18:32

and semantic internment problems

Share this talk:

Recommended talks

15:19

Presentation of the «Natural Language Understanding» research group
HENDERSON, James, Idiap Senior Researcher
Aug. 29, 2018 · 2:03 p.m.

8978 views

20:11

Presentation of the «Natural Language Processing» research group
Sept. 1, 2016 · 2:25 p.m.

1989 views

Presentation of the "Natural Language Understanding" research group
James Henderson

Embed

Transcriptions

Recommended talks

Presentation of the «Natural Language Understanding» research group
HENDERSON, James, Idiap Senior Researcher
Aug. 29, 2018 · 2:03 p.m.

Presentation of the «Natural Language Processing» research group
Sept. 1, 2016 · 2:25 p.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Presentation of the "Natural Language Understanding" research group James Henderson

Embed

Transcriptions

Recommended talks

Presentation of the «Natural Language Understanding» research group HENDERSON, James, Idiap Senior Researcher Aug. 29, 2018 · 2:03 p.m.

Presentation of the «Natural Language Processing» research group Sept. 1, 2016 · 2:25 p.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Presentation of the "Natural Language Understanding" research group
James Henderson

Presentation of the «Natural Language Understanding» research group
HENDERSON, James, Idiap Senior Researcher
Aug. 29, 2018 · 2:03 p.m.

Presentation of the «Natural Language Processing» research group
Sept. 1, 2016 · 2:25 p.m.