Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:00
okay chip okay no one so exam they were saying yeah uh in the stock i
00:00:06
will uh uh um the discussion from no
00:00:09
concrete application all the way up to abstraction
00:00:14
uh really in the speed it up trying to test uh
00:00:17
what are the limitation of this model we talk a lot about
00:00:21
well this one that can do but what are the limits
00:00:24
how how much we can push the these models uh the word
00:00:30
so uh and one question did we can ask he's uh
00:00:34
we have seen that uh that much more that can kind
00:00:37
of platform surprising a reasoning uh for large amount propagation but
00:00:44
um what happened when we start testing this more that it's um
00:00:49
more abstract form or all the inference that are more
00:00:52
um that that they represent the core what'd you money penchant
00:00:57
uh intelligence is able to do so what we can ask them these
00:01:02
um are a language more that's able to acquire the sort of abstract concept
00:01:08
manipulate general rules and systematically outcry
00:01:12
those rules what race specific problems
00:01:16
so i said as i said before the slots or reasoning these are really
00:01:21
core feature of human intelligence edited support a generalisation
00:01:25
um to observe reasoning that it would walk to general i solution from
00:01:30
and apply the same solution to a potentially infinite set of the stances
00:01:37
it supports fast and efficiently happening um support adaptation to normal and then expect
00:01:42
that situation is fundamentally important for a
00:01:45
shitty taking friends q. v. formation and explanation
00:01:50
and it also supports to one on logical reasoning creativity and imagination
00:01:56
so um why is it important to evaluate the uh
00:02:00
abstract reasoning in the context a whole lot larger which market
00:02:04
but despite the fact that uh the war absurd reasoning contain our select them
00:02:12
it uh the the the ability to perform absurd
00:02:14
reasoning can have a direct impact on concrete applications
00:02:19
uh it impacts can affect the their reliability and robustness
00:02:24
of of the more that's and uh therefore
00:02:27
becomes particularly important process those culpability as we
00:02:32
apply a more and more the smallest too critical domains um we can use it
00:02:38
to uh evaluate uh the more that's
00:02:41
uh until twenty five biases in the reason
00:02:45
and uh as such it can add parts to
00:02:48
form a at least try to form a global understanding
00:02:52
all those more that's uh actually operates so scientifically
00:02:57
speaking uh this becomes really interesting questions need and
00:03:02
um resulted events that a large dealing with more that's can
00:03:07
uh some acquirer of some sort of a major a cover abilities
00:03:11
as we scaled the sides of the more that's so as you can see uh in
00:03:15
this crowd um um and that was a was a study the the and the reason work
00:03:22
the shore how the um language more that's can um progressively increase
00:03:28
on on to get tasks also involving the some formal reasoning of
00:03:33
including question on sweating but also mathematical operations so um the
00:03:38
number question then these uh have largely much more that's not why
00:03:42
those uh general it's systematic easily in the capabilities uh of course
00:03:51
is not trivial or to inspect the reasoning like to language model
00:03:56
uh mainly because we don't have direct access to the
00:04:00
dental macon is that the model use but however yeah but
00:04:04
some of a nice to have that become a special
00:04:07
important with a more recent uh like the the generative approaches
00:04:12
uh we can leverage prompting a and trying to do some of systematic uh analysis
00:04:18
uh to verify the generator top in that nation so in
00:04:22
this context is particularly important not to focus on a single example
00:04:28
uh because we have seen that uh also what what of colour mentioning that
00:04:33
the that that uh what was it can need the uh the court i can't answer for
00:04:37
a a large set of examples so it's really
00:04:40
important to um doing dimension and manipulation of the problem
00:04:45
to try to check whether the model can actually than the rules of the task the c. uh want to solve
00:04:53
so i'm falling these um speed uh how we took a critical
00:04:59
perspective be or so which means i really try all myself to
00:05:04
break them all that's in in and show what are the limitations
00:05:08
uh in this context so for the remaining of the stole kabuki but
00:05:12
some concrete examples i don't dare quantitative results uh here but i will uh try
00:05:18
to be more global understanding of what all the model be it to work sample
00:05:23
and i would do these by when from more abstract tasks such as mathematical reasoning
00:05:29
two more concrete task that deal directly with the another
00:05:33
language such as common sense and special reason so let's
00:05:38
start with the uh an example of a mathematical reasoning
00:05:42
so isn't it on there was mentioning uh in this presentation
00:05:45
there is a lot of interesting trying to apply those larger
00:05:49
in which more that's on should be vic inference tasks and therefore
00:05:54
it is particularly important to test what are the mathematical abilities support more that the
00:06:01
way we can do this is to ask them all this to do mathematical delegations
00:06:07
so yeah to an example of a a differentiation so we're the model to differentiate
00:06:13
the next with respect weeks and the using the methodology that that i was
00:06:19
described it before we can start from a really simple a relatively simple prompt
00:06:25
and as you can see the uh the language model can actually generate
00:06:30
the correct mathematical delegation and also compute the corded pop good so uh each shoals so
00:06:36
in onto that ability to apply rules of the relation in this case the chain rule
00:06:42
and also can derive the correct calculation um sort of course it
00:06:48
first impression these can uh the last while a language more that's
00:06:52
kind of actually perform this quite abstract formal reasoning there for you
00:06:58
it is important to start the manipulating the uh the the wrong
00:07:04
so the way we can do these these two rather introduce complexity
00:07:09
so in this example i simply added three uh it cost on to x.
00:07:15
but of course these reflecting the more complex the deviation because now
00:07:20
the model doesn't that top lie anymore i seen go by the way
00:07:23
the nation root but it has to come mime to be fun rules
00:07:27
and of course the computation the population itself becomes more complex
00:07:32
so surprisingly the the model can uh be robust
00:07:36
to enlarge separate examples i really tried to input
00:07:40
the model would several uh a number several concerts and you can see that the model is uh
00:07:48
can correctly combines all trimmed to pour a
00:07:51
obstruct rules together and uh also perform direct calculation
00:07:59
um this is body don't so for um uh in a monthly that set of input example
00:08:06
here is that the same kind of differentiation now like to a larger constant one problem five
00:08:13
so you can see that better condition is is correct so well i'm
00:08:18
one of the things that so that uh we want to mention you that
00:08:22
mathematical reasoning is also a painting of for evaluation because
00:08:28
uh we know that mathematical a deletion at to follow really v. g. two rooms so we kick in
00:08:35
the um somehow by some automatic make me double
00:08:38
verification so these allow potentially for him more systematic evaluation
00:08:44
but uh that's i mean that's transformers can a generalised to any arbitrary numbers so
00:08:50
but if we try hard enough to uh we can see that the more that um
00:08:56
can not a ah like please the calculation
00:08:59
to any uh arbitrary numbers so um but again
00:09:05
uh the surprising part is that the model can still reliably uh generate the colour display nation
00:09:11
so if you look at the at the output of the model here we can see that
00:09:15
the model is able to apply the cork to woods but it somehow fits to perform direct calculation
00:09:22
so that means that we need to be careful and the model um is not a calculator uh and uh
00:09:29
it can not be generalised yet to uh any phoenix
00:09:34
senate of in if you clicked september uh input examples
00:09:40
oh if we try the so we we cannot set a similarly there also in other types of
00:09:45
the deviation that take what signal mathematics for example this is an a the deviation in in physics
00:09:51
and again we can see that the more that can reliably generate
00:09:55
the corporate litigation steps but the three it's uh in the final calculation
00:10:03
uh however uh why the model is rob us of for um what
00:10:08
we've said is that what the model can be roberts to really specific
00:10:12
uh problems as actually do a would say next
00:10:16
uh the mother can somehow subjectivity can okay generally feel
00:10:20
also on recognition for example so here this
00:10:24
is an example for the the differentiation of tennis
00:10:28
uh what you can see that the model is
00:10:31
able to generate the corrected editions that operates upon point
00:10:36
uh where the more that starts a allusion eighteen in producing the wrong um inference that
00:10:44
okay so let's move me towards a more concrete and uh not rolling which problems
00:10:50
um so in these the separate accent bowl i tested the model for a spatial reasoning
00:10:57
we each of testing the ability of the model to keep track
00:11:02
oh of a a special relation such as on top of you can do
00:11:06
these very um setting up some uh
00:11:10
an inference starts going wrong uh that
00:11:14
are related to those structures such as tax uh and so you can you
00:11:21
can ask them more than two or keep track of the state of the stacks
00:11:25
uh perform some variation and then asked some question on on on on that
00:11:32
so um even here we observe some uh no three
00:11:37
yeah the uh the the or of the more though
00:11:41
uh in fact the more than the um even for these kind of problems is able
00:11:45
to reliably keep track of uh of the
00:11:48
state of discrete uh structured in also oh
00:11:54
generally the correct inference that our uh
00:12:00
as we start that they're making it be it the input
00:12:03
for example in this case reversing the order of the step
00:12:07
is that of a asking the more that we don't respect that this or that from top to bottom we do the various
00:12:14
the modest starts uh exhibiting some limitations so in this case you
00:12:19
can see that the model is able to o. d. that for cancer
00:12:24
but there are steps of in the expedition where the
00:12:28
model uh is not able to keep track of the operations
00:12:32
but for me that that that are required to direct the course solution so
00:12:36
in some cases the output can be recorded but explanation uh can be wrong
00:12:43
and again we can do the same type of generalisation pests
00:12:47
the scenery for what we did before the show next example
00:12:52
increasing the sides of the stack uh changing the type
00:12:55
of operation to respect whether the model is able to
00:13:00
somehow abstract of generalised the what's the time required to solve the tasks
00:13:05
and again as we if we put the enough effort we are somehow able to
00:13:11
to break the the model and and see that that
00:13:16
the models not able to generalised on any arbitrary input
00:13:22
well finally um we can uh towards more not to roll with
00:13:27
language only that part um commonsense reasoning schneider sold as an example
00:13:35
oh boy editor reasoning um uh with the touch typically it's um
00:13:41
what we can do is uh um even a statement asking touchy picky
00:13:47
whether this statement is morally acceptable or not in ask the model to explain
00:13:53
listening step by step so the the the statements in input
00:13:56
sierra are taken from a eh that aspect which is call techniques
00:14:01
which was a present that that i. c. l. r. into twenty one which was
00:14:06
actually designed it to test the alignment of the bowling which more that's human bodies
00:14:12
so why is it important in a to test the models
00:14:16
and the practical reason of course there are a practical implication
00:14:20
but in the context so lots or reasoning these can tell us whether the model z. ball
00:14:26
to connect really concrete statements about of the
00:14:31
word completed work too abstract moral principles so some
00:14:36
form about what action is also required these that stand a couple of reason so uh um
00:14:44
i have to meet the model is really really roberts to this kind of uh of
00:14:48
inference are really try all myself to break the model and i i it was not easy
00:14:55
uh so try to be taken consistently generate compensate explanation this is
00:14:59
more or less i wanted to work on the bottom said before
00:15:04
but not only can also all connect uh the
00:15:08
concrete statements uh in the in the interest in the
00:15:13
in the example to abstract moral principles still didn't some some sort of uh
00:15:18
house or reasoning going on uh in the context of commonsense reasoning
00:15:23
however is really room who pushed me to four
00:15:28
but also try uh the model on relating the simpler task
00:15:34
we can uh uh some of the most it some of the limitations so yeah he's so we i this was quite surprising for me
00:15:41
but uh in this example i try to test the um with that uh a
00:15:49
touchy picky can uh consistently um you didn't define the meaning of the sentence is
00:15:56
so uh in this example specifically i ask
00:16:00
uh adapted weekly remember that but dogs are animals
00:16:03
can replace dogs with animals in the following sentence what
00:16:07
presenting its meaning and the sentences all dogs have four legs
00:16:12
and the answer the mother was uh the first answer the model is uh yes you can replace the ultimate
00:16:17
banning waltzing documents implement a white by sending its meaning
00:16:21
the new sentence will be all animals have for lex
00:16:25
the sentence sixty two as the original statement indicates
00:16:28
that dogs which are type one anymore at four legs
00:16:32
the forty four dogs alignments it might also it must also be too that all animals
00:16:37
and we're liked the meaning but the sentences purcell but by the the by uh the substitution
00:16:44
so this of course is that isn't wrong the um between them mistake from side jet g. b. t.
00:16:52
i have to admit that this doesn't happen all the cases so there are some sort of about abilities
00:16:59
also when asking the same question to the mall that so what i would observe
00:17:03
is that i'm charged beatty uh as and
00:17:07
non deterministic way of generating a on sars
00:17:11
that uh we can make it some are sometimes contradictory
00:17:15
but indicate model is wrong you can keep back at eighteen
00:17:20
uh asking do you think so i mean that the model was wrong such as are you sure in the
00:17:25
model is immediately able to to correct it said give
00:17:29
me the the correct explanations of for me this was
00:17:33
quite surprising but confusing as well because it shows that
00:17:38
uh the model the last some sort of are all those
00:17:43
word model um but at the same time is able to learn
00:17:48
uh some sort working knowledge even to perform them together obstructing the tasks so in some money
00:17:56
yes in the charger beatty he's able to pull fatter they combine our dental solution for
00:18:03
known to the problems uh and foreign onto that set of buttons but
00:18:07
that's and boards as in the case of signings does mathematical reasoning it's on
00:18:12
i'll wear the um the more this is not a ball the two system at
00:18:18
that systematically generalised the chorus solutions or
00:18:21
any arbitrary input any particular the modesty struggle
00:18:25
to deal with numbers to obstruct concept imputed sets and relations
00:18:31
moreover the model can occasionally phase uh on relatively simpler problems
00:18:37
as we have seen with an ex but also in the case of commonsense reasoning
00:18:41
uh with the relation between the homes in anyone's and therefore chat you but you can get the passion
00:18:49
oh oh well quieting in manipulating routes uh in the abstract sense
00:18:53
we doubt the castle inspection so it's really important to test
00:18:57
the more that systematically to really see this kind of limitations
00:19:02
a tiny um it is important to note is that when when the
00:19:07
models fail um it's fading in a really nice way if it's a
00:19:12
because the model does it uh still able to generate the core it
00:19:17
or possible explanation um this is of course um
00:19:23
i'll gonna sit in a nice stage of the model because in its that
00:19:26
you can somehow they recombine solution even if it's not able to perform right calculations
00:19:34
this means that we could somehow combine the the model with
00:19:37
the other type of c. stands for example in the context mathematics
00:19:42
that can be combination between uh chuck e. p. t. and a symbolic in gene that is able to perform calculations
00:19:48
how where where in cases in which we don't all dissolution and we don't want the rooms
00:19:54
underline the tasks this can make things quite
00:19:57
complicated uh especially the you know the two
00:20:02
uh understanding the spec to uh the explanation provided by the model

Share this talk: 


Conference Program

The Evolution of Large Language Models that led to ChatGPT (Andre Freitas, Idiap)
Andre Freitas, Idiap Research Institute
March 10, 2023 · 8:34 a.m.
664 views
Understanding Transformers
James Henderson, Idiap Research Institute
March 10, 2023 · 8:46 a.m.
369 views
Inference using Large Language Models (Andre Freitas, Idiap)
Andre Freitas, Idiap Research Institute
March 10, 2023 · 9:19 a.m.
Q&A
Andre Freitas, Idiap Research Institute
March 10, 2023 · 9:45 a.m.
ChatGPT for Digital Marketing
Floris Keijser, N98 Digital Marketing
March 10, 2023 · 9:58 a.m.
Biomedical Inference & Large Language Models
Oskar Wysocki, University of Manchester
March 10, 2023 · 10:19 a.m.
Abstract Reasoning
Marco Valentino, Idiap Research Institute
March 10, 2023 · 10:38 a.m.
120 views
Q&A
Andre Freitas, Idiap Research Institute
March 10, 2023 · 10:58 a.m.
Round Table: Risks & Broader Societal Impact (Legal, Educational and Labor)
Lonneke van der Plas, Idiap Research Institute
March 10, 2023 · 2:07 p.m.

Recommended talks

Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
July 5, 2016 · 1:59 p.m.
2157 views