Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:00
um thank you for the presentations is like actually question for marco and if you combined
00:00:07
i should reasoning without the sense whether the model perform better for example
00:00:12
it doesn't really make sense to have a t. v. on top of a cat
00:00:15
or something like that so if you asked a question that make more sense like
00:00:20
if a box and put something in the box and then you take it out or something like that do you think of the phone better so oh i
00:00:27
didn't try these directly and examples but um as far as i know there were
00:00:32
ten touche to to show something like that put in the context of special reasoning
00:00:37
and it seems that the model is actually more robust when uh you present um
00:00:44
you presented with more possible examples that you were mentioning so if you give the model
00:00:49
up to provide a model with the a and b. reply it's possible or their uh
00:00:55
in for example incapable on top a respecting what is more plausible in the real world
00:01:02
then the model uh can uh uh perform better but of them bought dan of
00:01:08
in the speed of the tool co what we want want to detect was of the
00:01:12
ability to obstruct those concepts right so uh as humans we are able to apply
00:01:18
these um to obstruct comp concept to any
00:01:23
possible should i write and it seems that
00:01:26
a touch of beating that sends the struggle when we could hardly move way
00:01:31
from the from things that are absurd but in in the real worked up
00:01:39
um for example floors for your very enthusiastic program grisham in
00:01:43
your excitement how was can make all your work much easier um
00:01:49
every good telecommunication provider very good reason constant exchanger for general public about
00:01:55
i'm from trips phone call us a unwanted emails
00:01:59
um can you describe produce you your audience to the
00:02:04
phrases you say carver excite m. m. m. m. m.
00:02:08
well right now we flair about your presentation you may be from emotions where can describe from from
00:02:17
uh and and with marketing is a difficult because marketing as a as a bad image i guess
00:02:23
yeah but i was saying i wish one have clients initial like i don't wanna do like re marketing or something
00:02:30
i always say like there's a small line between spam and being relevant
00:02:36
and shows i've just accept willing to give somebody much correct when
00:02:40
you when you know way many talks about some to you talked before
00:02:44
you're exactly invitation if somebody's asking about investment again which i get a lot i just read nearly black
00:02:50
'cause i think when machines everything gets smarter hiding people weren't
00:02:54
sure accepted because as you as you see which i deeply
00:02:58
it gives smart answers relevant answers and it's the same thing for us as
00:03:01
long as long as as meshing in our talks are relevant to the audience
00:03:05
i hope to get some and some happy yeah people that's it
00:03:10
i i put up a thank you very much for your talks
00:03:17
i i just have a feeling that we are trying to
00:03:22
over extrapolate what ah l. m. is
00:03:26
i in my opinion and the eleven in is it's a harsh man with model so we are modelling language
00:03:32
and it's in other words is like having a auto completion on asteroids
00:03:38
so it's a really uh i only more than language and we're not trying to model
00:03:45
a logic behind the ovaries owning or other
00:03:49
things these are in my opinion by products of
00:03:55
having is souls also large language model but at the end we cannot ask to the mall though
00:04:02
to go farther and is intended to i mean the the language
00:04:09
is uh the mike that is fine it and the reasoning is not
00:04:14
completely incorporate the language itself so perhaps is
00:04:20
it's over optimistic to hope that everything could be
00:04:25
obtain wins there from these models the these kind of a reasoning on this kind
00:04:29
of uh effects i i hope so and we i also surprised of how much
00:04:35
are we able to dropping from that i have been converse a thing with the chart the p. t. and equivalents
00:04:43
and it's surprising all the time about at the end
00:04:48
we have to be aware of the limits of uh what we really more built and we are really up thing
00:04:55
thank you for acting it thanks for the observation
00:05:00
i think you're absolutely right but uh we also need
00:05:05
to um you know um they care about the um
00:05:12
and the thing in what thought the evidence right so
00:05:15
and uh we cannot ignore the fact that um
00:05:19
by joss processing language in the way language models too
00:05:24
there are some sort of a reasoning couple be decent matching multiple small that's right so
00:05:29
at least as a sign these we need to take it into account
00:05:33
and respect uh those sort of really kind of capabilities uh um carefully so
00:05:41
of course the these not only a dingy getting question
00:05:46
all way that the resulting in the obstruct about these kinda match from logic what was that
00:05:52
but is also a profound philosophical epistemology got patient so i
00:05:57
think i'm from application side of course we need to be
00:06:01
careful and uh um if you're able to show that the
00:06:06
model kit systematically fee on disk and what's the reasoning that maybe
00:06:12
also from the application side the the the the and the noose apartments and all will be more careful
00:06:19
yeah in in application of the small that so i think there is a double sided these the application
00:06:25
oh side in which of course we need to be careful we need to go
00:06:30
but but in order to be get full we need to first on the stand
00:06:33
what the limitations are right so and the smoke also expecting reasoning capabilities
00:06:40
and something even super we've but the the model is not expected to do
00:06:45
of by just modelling language up and if you want what ended up to be
00:06:55
but i i i guess i'm finished finished up the chain um uh i. e. net quite
00:07:06
you know i was thinking about it and i was surprised h. i. g. b. he is able to do
00:07:13
q. um a a entering prove questions so let us get approved blah blah blah blah blah
00:07:21
then after to prove are really prove are really well but then after when you ask it was one plus one
00:07:25
um sometimes it doesn't know that answer and and so i wonder if this
00:07:28
is just the nature of the day that strained ons like psycho or float
00:07:32
typically you won't have one plus one equals to read in there but you
00:07:36
will have this proves that palm are written there so that was kind of
00:07:41
the comment and if you guys have any thoughts on that but the question that i have um i think it's true oscar
00:07:46
your presentation ah was like how do you actually evaluate keys
00:07:52
um ours language models and there's a couple concerns that out here one is the models are changing
00:07:57
pretty rapidly and you need like a new evaluation scheme for each time that a new model emerges lay
00:08:03
and yet this clearly isn't scalable or sustainable and so like i was wondering if you considered
00:08:09
using these large language models as a discriminated so both like this constitutional yeah approach from an tropic
00:08:15
we both lasted to generate the data and then you also ask it to
00:08:20
correct to see how good this data is at answering this question which
00:08:24
could be another like angle or frame to make this more scalable thank you
00:08:30
yeah thank you that's very relevant questions so uh and it's an open question i
00:08:34
don't know why we we don't have yet uh like our evaluation framework for that
00:08:39
as i mentioned the evaluation of the generated taxes really
00:08:42
uh really difficult especially in the this by the main
00:08:47
um so here we just limited i was have to uh
00:08:51
due to what data set which contains a a well known
00:08:54
relations and we are evaluating what is well known in the
00:08:58
knowledge base so uh that was kind of a money what
00:09:02
evaluation uh but you know it's an open question how to do
00:09:06
it automatically and as you said there is a ongoing progress
00:09:11
uh_huh buyer typically large was announced i mean it was available
00:09:15
uh for your form weeks ago so another model to include
00:09:18
there is another alarm um although which we want to bother wait
00:09:22
but i it's it's it it defeats progress really fast um but it
00:09:29
it there is a need for for for this evaluation and we need
00:09:32
to know the limits of the model um so i i i don't
00:09:37
have an answer to that we could use an ongoing uh work i guess
00:09:42
yeah it's been it's yes yeah i think everything that oscar said it's work yet
00:09:48
fully valid right so if we go in very specific domains you have this problem
00:09:53
uh at the same time you have large evaluation benchmarks
00:09:57
so big bunch is the yeah no the canonical one
00:10:01
i think you have to i don't mean institutions uh maybe hundreds of the stations involved on that a data set
00:10:08
so that's a good way to evaluate some of the properties that you have and maybe
00:10:13
uh there be similar initiatives in terms of more specific type
00:10:18
of domain knowledge uh that to emerge sewn but but there is
00:10:23
clearly it does need is not addressed by the bench uh
00:10:27
as far as we know uh i don't think there is something
00:10:30
much or on that of course there are many by with a data sets uh evaluation uh benchmarks
00:10:38
uh but uh i i don't think they are evaluating the features uh
00:10:42
that one is walking for example in in all sorts work maybe adding up
00:10:47
to to your point on the on the prove i i think it's when
00:10:52
you have the evidence right at that um when you were joining a facts
00:10:58
uh in making some substitution is uh in simpler abstractions so
00:11:04
i think language models function quite a nice into setting when
00:11:09
you go to something that's more operational definition to find something
00:11:12
that's that's more operational i i think you have that limitation
00:11:17
uh that you described right so i think i think it's unexpected property of lodging the model that they can
00:11:22
do a complicated proved uh as long as you have
00:11:25
some some evidence and uh yeah in the original corpus
00:11:30
uh in even with purpose some part of this proves that yeah in the
00:11:34
liver complex reasoning but the simplest operational uh is not designed disconnect with the
00:11:39
comment below as models are indeed they are not intended to to be i
00:11:44
i into and reasoning framework rabbit trying to understand how far they can go
00:11:49
yes they are so popular uh yeah and what you need to inject symbolically
00:11:54
and how do you doubt make this breach between the new and the symbolic i i think that's you would change
00:11:59
that and the paradigm that one is expecting a family
00:12:05
uh yeah but maybe just a lot on the numerical variation
00:12:10
i didn't get a medical condition these do even if it's if it's simple example
00:12:15
well lots of reasoning it's a fundamental want to shoe weight them
00:12:19
the dish on the one that might be and i don't think the
00:12:23
easy to perform a penal code uh you i don't think i'm
00:12:27
is just a matter of bought at the in bringing the data model
00:12:31
uh because before mean uh these can you know simple simple 'cause he put small corporation
00:12:37
like would that be to obstruct the rules and apply those rules to put in the
00:12:42
potentially in feet set on numbers so you can add a number still doing said but
00:12:49
then you can always find a numbers that the larger a number of the jeep for example
00:12:56
and then the question is whether the model can actually extrapolate those
00:13:00
rules uh and of course this is an open question but uh
00:13:04
um the some discussion uh between scientists someone out to you that
00:13:09
you can solve this problem by just getting up the more that
00:13:12
other people more on the symbolic spectrum argues that disease
00:13:16
it what quantity problem so we need to add something more
00:13:21
on top of the we even which models of gritty work
00:13:27
i i uh i have a question what's question for star
00:13:33
uh i found your presentation very interesting in the sense that you turn uh we
00:13:38
that's a fine shortcuts and recharge a review by using the g. p. g. models now my
00:13:43
question is i use most of these you pretty model for adding some let's say uh oh
00:13:50
question for example and by our marketing direction and
00:13:54
on so we're very fascinating but then came the origination
00:13:58
like of the model become originating he gives you a
00:14:01
an article that don't exist out or that don't exist
00:14:06
and at the end my question is would it be do able would that would that
00:14:11
even exist already models like charge peachy but
00:14:14
only for scientific literature review with some actual shaking
00:14:21
um yeah so what you observed ah i have the same
00:14:24
experience that the result summation and it's really hard to detect
00:14:29
and i think that the next step i don't think there is a um so there are more those dedicated
00:14:36
i guess for doubt about the performance to does not
00:14:39
um hum satisfactory about um as under mention to about this
00:14:44
instruction beatty i think that's the next step uh to
00:14:48
really use the the the state of the art charged beauty
00:14:52
and converted into a this instruct the huge p. p. and this will be used as a model which will be
00:15:00
a tailored for this task uh but um to quantity that there would be no
00:15:06
hallucination um difficult gets still there will have to have some errors and the question is
00:15:15
uh how we have a waitress and after the evaluation what is the performance right so um again again i
00:15:21
don't have the answer because i have the same challenges as as you have i'm still open question me about it
00:15:30
uh what ways policy where people are part of top or ah

Share this talk: 


Conference Program

The Evolution of Large Language Models that led to ChatGPT (Andre Freitas, Idiap)
Andre Freitas, Idiap Research Institute
March 10, 2023 · 8:34 a.m.
664 views
Understanding Transformers
James Henderson, Idiap Research Institute
March 10, 2023 · 8:46 a.m.
369 views
Inference using Large Language Models (Andre Freitas, Idiap)
Andre Freitas, Idiap Research Institute
March 10, 2023 · 9:19 a.m.
Q&A
Andre Freitas, Idiap Research Institute
March 10, 2023 · 9:45 a.m.
ChatGPT for Digital Marketing
Floris Keijser, N98 Digital Marketing
March 10, 2023 · 9:58 a.m.
Biomedical Inference & Large Language Models
Oskar Wysocki, University of Manchester
March 10, 2023 · 10:19 a.m.
Abstract Reasoning
Marco Valentino, Idiap Research Institute
March 10, 2023 · 10:38 a.m.
120 views
Q&A
Andre Freitas, Idiap Research Institute
March 10, 2023 · 10:58 a.m.
Round Table: Risks & Broader Societal Impact (Legal, Educational and Labor)
Lonneke van der Plas, Idiap Research Institute
March 10, 2023 · 2:07 p.m.

Recommended talks

Q&A - The upbringing of the Swedish digital citizen
Kristin Heinonen, Digital Trend Analyst, identifying communication trends & emerging user behavior
Feb. 11, 2016 · 2:54 p.m.