Embed code
Note: this content has been automatically generated.
00:00:00
the language understanding group here it idea um
00:00:06
this is studying a natural language understanding which in my opinion
00:00:11
is is the greatest challenge in a high because
00:00:14
essentially anything you can think are almost anything you can think
00:00:19
can be said in natural language and you can understand
00:00:24
uh that natural language so it essentially me is a form of of mind reading
00:00:30
we can read each other's minds we can communicate thoughts directly into
00:00:35
another person's head just by making sounds that's incredible um
00:00:42
but there are the kinds of things that we don't know we don't understand about this process
00:00:48
in particular things like what is the structure of human
00:00:51
thought if if we can can really communicate any
00:00:55
any idea in language then that tells a lot about what ideas were capable
00:01:01
of having 'cause those are the things that can be expressed in language
00:01:07
how the how do we understand language how can we calculate
00:01:10
this thought from just hearing a a sequence of sounds
00:01:14
how can we decide if i make this sequence of sounds and
00:01:18
that person is going to have this thought in their head
00:01:22
um and my arguments you only animals i can do this in their other animal certainly do communicate
00:01:29
but not with anything like this structure uh and complexity
00:01:34
of structure that humans are able to do
00:01:37
so is that because uh animals i don't have the there's some you
00:01:43
know innate human ability to to speak in there from by
00:01:48
communicate our hearts more effectively or is it just that that thoughts themselves are much simpler
00:01:54
we have more complicated thoughts and that's why we make more complicated uh communications
00:02:01
we don't know that
00:02:04
this so this is the fastening problem for a i um
00:02:10
in our group we're looking at uh as a understanding natural language through a
00:02:16
natural language processing n. l. p. and the focusing on representation learning so
00:02:23
uh we work on deep neural network architectures for representation learning applied
00:02:28
to multiple different tasks um in particular focusing on machine translation
00:02:35
information access in classification and a semantic entanglement
00:02:44
uh this uh maybe the head a new because
00:02:48
others to post on joan leslie r. p. h.
00:02:52
d. students we also work closely with andrei
00:02:56
the best to police who was the head of the n. l. p. group before i arrived
00:03:02
uh we work closely with people in the speech and audio processing group
00:03:07
always shared project projects and i've collaborations homer low in geneva
00:03:15
and jenna pop up used to student uh and my from for their former employer
00:03:22
um so our current research topics are
00:03:27
as i said representation learning that
00:03:30
is pervasive machine translation information access in classification and actual entanglement
00:03:38
made a number of 'em contributions in this area uh we sites
00:03:45
and a lot of places where we've published this work
00:03:51
um
00:03:54
i won't go into the whole list but i will uh
00:03:57
go into more detail for some of these other contributions
00:04:04
so just as an overview of work on representation learning
00:04:10
um
00:04:14
we've done a lot of work that's related to attention based representations
00:04:19
so attention over the different uh meanings that are
00:04:24
word could have attention over the previous um
00:04:30
words in the decoder when you're generating a sequence of words
00:04:36
a hierarchical attention that will uh describe in this moment
00:04:43
and uh a bag of factor i'm betting was uh the non parametric embedded graphs
00:04:50
um output representation learning uh these are
00:04:56
ways to general lies to um
00:05:00
new outputs by taking advantage of the descriptions of those outputs
00:05:06
uh which will talk in particular about the label law where text classification and um
00:05:13
i work on internment factors representing intel meant in a vector space
00:05:19
uh for uh and semantics of words and looking
00:05:23
at a compositional semantics so we'll talk
00:05:27
a bit about how to play those representations a tech show and tell me
00:05:35
so for some machine translation
00:05:40
i just two pieces of work the sense of where
00:05:45
um neural machine translation and it's uh the
00:05:50
self attentive residual decoder for no machine translation
00:05:58
so neural machine translation works by or uh
00:06:04
running a neural network over the input sentence computing a a representation
00:06:10
and then conditioning on our representation to generate the
00:06:14
sentence uh for the output the target language
00:06:20
um
00:06:24
one of the problems is that uh for the input sentence
00:06:28
you often have words that are ambiguous him or
00:06:32
at least in the target language there are more than one ways in which you could translate them
00:06:39
so if you can decision on and big us a
00:06:44
workers of representations of words that are ambiguous
00:06:48
you quite quickly get into problems because of the the company towards of all these ambiguities
00:06:56
also you end up with a models that are very biased words
00:06:59
most frequent senses which might not apply for given case uh_huh
00:07:06
um so the solution is to explicitly add a word since the
00:07:12
us in big you asian component to the in coder
00:07:16
uh you first look at your input text decide
00:07:20
on what sense each we're word uh
00:07:23
should have uh at least the nouns and verbs which are the ambiguous ones
00:07:29
and then condition on those senses of when you're
00:07:33
compute your representation of the input a sentence
00:07:38
um
00:07:43
so that's one piece of work the second one is once you've conditioned on
00:07:48
with the the representation of the input sentence you need to generate
00:07:54
the translation of that sentence as your output sentence
00:07:59
and this uh if you're doing it with the recurrent neural network as a problem that
00:08:04
it's very biased uh the the recurrent neural network one is
00:08:08
trying to predict the next word it should generate
00:08:12
very biased towards uh looking only at the previous few words that it's already generated
00:08:18
when in fact lots of previous words or previous words that are far away maybe very relevant
00:08:24
so the idea is to add an attention mechanism there so that you can decide
00:08:30
uh which of the previous words you need to take pay
00:08:33
attention to to generate if the next word um
00:08:38
and in particular this is an attention mechanism that uh
00:08:43
looks at the specific words that's what it's called the residual connection
00:08:50
rather than looking at uh the the hidden representations and the for previous steps
00:08:56
um and that improves um improves the quality of the translation
00:09:03
output and if you look at those attention patterns
00:09:07
you see something that looks very much like a syntax syntactic structure that
00:09:13
it's learning model of how to generate the output and that model is capturing
00:09:19
some information about the syntax of the the sentence that is generating
00:09:27
in areas in indexing and and
00:09:30
classifications so information access problems
00:09:36
um
00:09:40
uh when the problem is that um we need to build the
00:09:45
do with multiple languages for for solving these tasks um
00:09:52
and often we do not have a lot enough data for all
00:09:56
our languages so we need to be other somehow share
00:09:59
parameters share information learned about one language to another language
00:10:05
and here the solution is to have a hierarchical attention mechanism
00:10:11
the first layer here uh uh for each individual sentence decides which word to look
00:10:17
at which words to focus on when computing the representation of that sentence
00:10:23
the second level says uh which sentences should really focus
00:10:28
on when computing the representation of the entire document
00:10:31
for doing our uh retrieval or categorisation of documents
00:10:39
and the uh multilingual part has to do with which parts of this model
00:10:46
you share do you share these these representations that ever there
00:10:51
will be in the same encoded in the same why mapping from words to encoding
00:10:57
or you share these attention mechanisms that uh when
00:11:02
once you've got the uh to uh encoding
00:11:04
at one level how do you apply attention to compute the coding at the next level
00:11:09
so you might wanna change a shared a the encoding you might wanna
00:11:14
share the uh attention or you might want to share both
00:11:19
and so uh we've done experiments on on what's the appropriate
00:11:24
level of of sharing for um multilingual text classification
00:11:33
um another piece of work in this area is label where i'm a
00:11:39
decoder for classification for for deciding which class something is is in
00:11:45
here you're in a setting where the classes have labels so there is a piece of tax
00:11:52
telling you what you're out what should be a um you know
00:11:57
how the a piece of text describing want you to each of your possible output you're allowed to choose
00:12:04
so these tracts getting crowded in the same way using the same wording beheadings of
00:12:10
factors that represent words as the input tax that you're trying to classify
00:12:16
then you computed joint space between these two representations and
00:12:22
then train a classifier and that joint space
00:12:27
um and that leverage is the semantics of your labels themselves which is a
00:12:33
very useful in many cases especially if you have a very large number
00:12:38
of classes or you have two classes that you have no training data
00:12:42
for but you have a label you can still uses method
00:12:48
so this uh particular uh a solution approach
00:12:53
to solving this problem actually not only
00:12:57
uh handles and seen labels are rare labels it even improves
00:13:02
on the places where you do have have enough data uh
00:13:12
i'm not there other thing we're working on is textual entail men
00:13:17
so actually tell meant is information inclusion so you want to
00:13:22
be able to say this statement about health care
00:13:26
is a includes the information uh this simpler more abstract
00:13:31
piece of information so it's also kind of abstraction
00:13:35
this is an abstraction of both these two different state means even though they're different
00:13:42
it's a fundamental concept actual entertainment for for this man takes of language
00:13:47
and as you can imagine it's useful for doing opinion summarisation
00:13:52
if you have a very diverse everybody's opinions is different
00:13:56
but maybe you can abstract out some consensus opinions
00:13:59
that large numbers of people agree on and summarise based on that
00:14:07
so what we've done um is develop
00:14:12
models of of actual entitlement
00:14:15
that are based on two ideas uh first of all to deal with this um
00:14:24
to deal with this problem that the
00:14:29
the the sentences are can be very long and so you want to say that
00:14:36
uh if you want to say the information in the be entailed part
00:14:41
is a subset of the information in the in telling part
00:14:44
it may just be that it's a subset of the words or a subset of the the semantic entities
00:14:51
uh in the two representations so we have um
00:14:57
a bag of factor which representations for each part and then uh
00:15:02
we assume that there's for every vector in the entailed
00:15:07
bag uh there exists at least one vector need tailing bag
00:15:12
uh oh such that one the vectors entailed each other
00:15:17
and that but that leaves open the question of how do you represent entanglement between vectors
00:15:22
so we have a whole line of work on how to model
00:15:25
entailed in a in a vector space cancelled answer that question
00:15:33
uh right so this is what i just said every vector
00:15:37
in the entailed sense has to be entailed by some factor
00:15:41
in in telling sentence and uh we have of
00:15:47
a theoretical framework for measuring entanglement in a in a vector space
00:15:55
so that framework uh lets us um represent
00:16:02
propositions that can be known about uh in
00:16:07
about something i'm i'm
00:16:14
as bits in the vector and then uh there's a
00:16:18
prior that captures the constraints over the whole factor
00:16:24
the framework tells us a way to measure the intel meant to
00:16:28
be given to vectors and how to for a vector
00:16:32
a given other in vectors that it entails or is until right
00:16:39
we use this framework to define a a distribution semantic model
00:16:44
so distributional semantics says that all meaning of a word is um you can
00:16:52
tell the meaning of a word by looking at the distribution of words
00:16:56
that that word coworkers with in texas just by looking at very large
00:17:01
amounts of text uh you can infer the meaning of a word
00:17:06
we propose this uh to internet based model of
00:17:11
of why the meaning of a word tells
00:17:14
you something about the the meaning of the other words that are in the same context
00:17:20
um by assuming there's some pseudo phrase that that a
00:17:25
unifying these two vectors together gives you some coherent
00:17:30
uh as a semantics uh that so it reflects the sense in which these
00:17:36
two words are played both playing a role in some coherent larger semantics
00:17:45
so by uh representing bad in entertainment framework
00:17:49
uh we can train models that gives us state of
00:17:53
the art uh results on predicting hype on me
00:17:57
so does uh is the word catch a
00:18:01
a hypo name of the word animal
00:18:10
so that's just an overview of us and the things we're working on working
00:18:14
on understanding natural language to an l. p. and representation learning um
00:18:22
looking at people earning architectures for doing that representation learning
00:18:27
machine translation and um information access
00:18:32
and semantic internment problems

Share this talk: 


Conference program

Recommended talks

Presentation of the «Natural Language Understanding» research group
HENDERSON, James, Idiap Senior Researcher
29 Aug. 2018 · 2:03 p.m.
IP1: Integrated Multimodal Processing
A. Billard, EPFL
3 Sept. 2012 · 9:27 a.m.