Embed code
Note: this content has been automatically generated.
the language understanding group here it idea um
this is studying a natural language understanding which in my opinion
is is the greatest challenge in a high because
essentially anything you can think are almost anything you can think
can be said in natural language and you can understand
uh that natural language so it essentially me is a form of of mind reading
we can read each other's minds we can communicate thoughts directly into
another person's head just by making sounds that's incredible um
but there are the kinds of things that we don't know we don't understand about this process
in particular things like what is the structure of human
thought if if we can can really communicate any
any idea in language then that tells a lot about what ideas were capable
of having 'cause those are the things that can be expressed in language
how the how do we understand language how can we calculate
this thought from just hearing a a sequence of sounds
how can we decide if i make this sequence of sounds and
that person is going to have this thought in their head
um and my arguments you only animals i can do this in their other animal certainly do communicate
but not with anything like this structure uh and complexity
of structure that humans are able to do
so is that because uh animals i don't have the there's some you
know innate human ability to to speak in there from by
communicate our hearts more effectively or is it just that that thoughts themselves are much simpler
we have more complicated thoughts and that's why we make more complicated uh communications
we don't know that
this so this is the fastening problem for a i um
in our group we're looking at uh as a understanding natural language through a
natural language processing n. l. p. and the focusing on representation learning so
uh we work on deep neural network architectures for representation learning applied
to multiple different tasks um in particular focusing on machine translation
information access in classification and a semantic entanglement
uh this uh maybe the head a new because
others to post on joan leslie r. p. h.
d. students we also work closely with andrei
the best to police who was the head of the n. l. p. group before i arrived
uh we work closely with people in the speech and audio processing group
always shared project projects and i've collaborations homer low in geneva
and jenna pop up used to student uh and my from for their former employer
um so our current research topics are
as i said representation learning that
is pervasive machine translation information access in classification and actual entanglement
made a number of 'em contributions in this area uh we sites
and a lot of places where we've published this work
i won't go into the whole list but i will uh
go into more detail for some of these other contributions
so just as an overview of work on representation learning
we've done a lot of work that's related to attention based representations
so attention over the different uh meanings that are
word could have attention over the previous um
words in the decoder when you're generating a sequence of words
a hierarchical attention that will uh describe in this moment
and uh a bag of factor i'm betting was uh the non parametric embedded graphs
um output representation learning uh these are
ways to general lies to um
new outputs by taking advantage of the descriptions of those outputs
uh which will talk in particular about the label law where text classification and um
i work on internment factors representing intel meant in a vector space
uh for uh and semantics of words and looking
at a compositional semantics so we'll talk
a bit about how to play those representations a tech show and tell me
so for some machine translation
i just two pieces of work the sense of where
um neural machine translation and it's uh the
self attentive residual decoder for no machine translation
so neural machine translation works by or uh
running a neural network over the input sentence computing a a representation
and then conditioning on our representation to generate the
sentence uh for the output the target language
one of the problems is that uh for the input sentence
you often have words that are ambiguous him or
at least in the target language there are more than one ways in which you could translate them
so if you can decision on and big us a
workers of representations of words that are ambiguous
you quite quickly get into problems because of the the company towards of all these ambiguities
also you end up with a models that are very biased words
most frequent senses which might not apply for given case uh_huh
um so the solution is to explicitly add a word since the
us in big you asian component to the in coder
uh you first look at your input text decide
on what sense each we're word uh
should have uh at least the nouns and verbs which are the ambiguous ones
and then condition on those senses of when you're
compute your representation of the input a sentence
so that's one piece of work the second one is once you've conditioned on
with the the representation of the input sentence you need to generate
the translation of that sentence as your output sentence
and this uh if you're doing it with the recurrent neural network as a problem that
it's very biased uh the the recurrent neural network one is
trying to predict the next word it should generate
very biased towards uh looking only at the previous few words that it's already generated
when in fact lots of previous words or previous words that are far away maybe very relevant
so the idea is to add an attention mechanism there so that you can decide
uh which of the previous words you need to take pay
attention to to generate if the next word um
and in particular this is an attention mechanism that uh
looks at the specific words that's what it's called the residual connection
rather than looking at uh the the hidden representations and the for previous steps
um and that improves um improves the quality of the translation
output and if you look at those attention patterns
you see something that looks very much like a syntax syntactic structure that
it's learning model of how to generate the output and that model is capturing
some information about the syntax of the the sentence that is generating
in areas in indexing and and
classifications so information access problems
uh when the problem is that um we need to build the
do with multiple languages for for solving these tasks um
and often we do not have a lot enough data for all
our languages so we need to be other somehow share
parameters share information learned about one language to another language
and here the solution is to have a hierarchical attention mechanism
the first layer here uh uh for each individual sentence decides which word to look
at which words to focus on when computing the representation of that sentence
the second level says uh which sentences should really focus
on when computing the representation of the entire document
for doing our uh retrieval or categorisation of documents
and the uh multilingual part has to do with which parts of this model
you share do you share these these representations that ever there
will be in the same encoded in the same why mapping from words to encoding
or you share these attention mechanisms that uh when
once you've got the uh to uh encoding
at one level how do you apply attention to compute the coding at the next level
so you might wanna change a shared a the encoding you might wanna
share the uh attention or you might want to share both
and so uh we've done experiments on on what's the appropriate
level of of sharing for um multilingual text classification
um another piece of work in this area is label where i'm a
decoder for classification for for deciding which class something is is in
here you're in a setting where the classes have labels so there is a piece of tax
telling you what you're out what should be a um you know
how the a piece of text describing want you to each of your possible output you're allowed to choose
so these tracts getting crowded in the same way using the same wording beheadings of
factors that represent words as the input tax that you're trying to classify
then you computed joint space between these two representations and
then train a classifier and that joint space
um and that leverage is the semantics of your labels themselves which is a
very useful in many cases especially if you have a very large number
of classes or you have two classes that you have no training data
for but you have a label you can still uses method
so this uh particular uh a solution approach
to solving this problem actually not only
uh handles and seen labels are rare labels it even improves
on the places where you do have have enough data uh
i'm not there other thing we're working on is textual entail men
so actually tell meant is information inclusion so you want to
be able to say this statement about health care
is a includes the information uh this simpler more abstract
piece of information so it's also kind of abstraction
this is an abstraction of both these two different state means even though they're different
it's a fundamental concept actual entertainment for for this man takes of language
and as you can imagine it's useful for doing opinion summarisation
if you have a very diverse everybody's opinions is different
but maybe you can abstract out some consensus opinions
that large numbers of people agree on and summarise based on that
so what we've done um is develop
models of of actual entitlement
that are based on two ideas uh first of all to deal with this um
to deal with this problem that the
the the sentences are can be very long and so you want to say that
uh if you want to say the information in the be entailed part
is a subset of the information in the in telling part
it may just be that it's a subset of the words or a subset of the the semantic entities
uh in the two representations so we have um
a bag of factor which representations for each part and then uh
we assume that there's for every vector in the entailed
bag uh there exists at least one vector need tailing bag
uh oh such that one the vectors entailed each other
and that but that leaves open the question of how do you represent entanglement between vectors
so we have a whole line of work on how to model
entailed in a in a vector space cancelled answer that question
uh right so this is what i just said every vector
in the entailed sense has to be entailed by some factor
in in telling sentence and uh we have of
a theoretical framework for measuring entanglement in a in a vector space
so that framework uh lets us um represent
propositions that can be known about uh in
about something i'm i'm
as bits in the vector and then uh there's a
prior that captures the constraints over the whole factor
the framework tells us a way to measure the intel meant to
be given to vectors and how to for a vector
a given other in vectors that it entails or is until right
we use this framework to define a a distribution semantic model
so distributional semantics says that all meaning of a word is um you can
tell the meaning of a word by looking at the distribution of words
that that word coworkers with in texas just by looking at very large
amounts of text uh you can infer the meaning of a word
we propose this uh to internet based model of
of why the meaning of a word tells
you something about the the meaning of the other words that are in the same context
um by assuming there's some pseudo phrase that that a
unifying these two vectors together gives you some coherent
uh as a semantics uh that so it reflects the sense in which these
two words are played both playing a role in some coherent larger semantics
so by uh representing bad in entertainment framework
uh we can train models that gives us state of
the art uh results on predicting hype on me
so does uh is the word catch a
a hypo name of the word animal
so that's just an overview of us and the things we're working on working
on understanding natural language to an l. p. and representation learning um
looking at people earning architectures for doing that representation learning
machine translation and um information access
and semantic internment problems

Share this talk: 

Conference program

Recommended talks

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
4 July 2016 · 2:01 p.m.