Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:01
okay so let's talk a little bit about getting closer to chat e. p. t. so
00:00:08
just to for to go where uh the way that i prepared as lights they're going
00:00:11
to be available line so i have more slides then i have time so i'm going
00:00:15
to skip but this is just to give you some pointers that you can access later
00:00:20
yes well it'll be a a bit of a unorthodox
00:00:23
powerpoint so i think i did today is following up immediately
00:00:28
from james understanding first to get the i kind of
00:00:32
behaviour understanding of things like chen typically what the the lever
00:00:37
uh and then going into concepts which are specific to chat to the t. types
00:00:42
of architectures right so uh so we're going to talk about prompting instructive itchy which is
00:00:48
uh the model that uh yeah if over to to be checked typically uh in
00:00:53
that we're talking about some sort of inference including scientific inference can we apply these models
00:00:59
to deliver inference okay so as a james uh explain
00:01:07
the essence of the learning and the training that you have interest from these more those uh these next
00:01:12
talk talk in production other types of pretty training strategies that you can have so morgan wants than that but
00:01:19
uh it's a good approximation to think about that but then what to get after you apply all
00:01:25
these methods in order methods is uh quite a
00:01:28
sophisticated linguistic behaviour so just use one uh example
00:01:34
you're so what i ask it a chatty pity
00:01:38
to write a poem exhorting the achievements of if able uh so our previous uh research is the director
00:01:45
right and then uh this is what a pity came with came
00:01:50
about so i'll just reading case you cannot read a little bit
00:01:53
so in the word of a i and speech a visionary didn't
00:01:57
reach new heights innovations grand with q. and talent in his hand
00:02:02
if people are earning renowned with accolades in praise and down a master speech recognition
00:02:08
pushing the limits with each ambition from the edge of martini and so on it
00:02:16
why not that not that so then uh i was trying to find some problems which were
00:02:25
really not something that you can find a and direct references in
00:02:29
which b. d. right so things which are a bit more exotic
00:02:32
so uh yeah right because the vote e. d. out in the style of the book of
00:02:36
genesis and this is what chatting to him about probably most of you did similar experiments right
00:02:44
so in the beginning there was a group of visionary scroll saw
00:02:47
the potential of artificial intelligence in its ability to revolutionise the world
00:02:51
that's all they gathered together in a small town in the swiss alps unfounded easy out researches it that and the
00:02:57
leaders of idiots that lead every research innovation that you think that that's and so reports and they saw it was good
00:03:05
right so uh and you can see the whole text
00:03:09
uh yeah it's super coherent really no no receivable inconsistencies
00:03:18
okay a lot happened behind the scenes right so let's try to dissing tango does a little bit more
00:03:25
so first thing is uh with this prompting these models and we're going
00:03:28
to talk about so there's a bit that you can ask something instruct
00:03:32
the model to do something right uh that is really doing a very
00:03:36
good job in identify aligning elements of into like style book of genesis
00:03:40
it's interesting that cannot the really the find that what does it mean
00:03:44
to write like the book of genesis why not not easy to find right
00:03:48
find a person like which the target of that and then uh put that's the basic style on the top of a story sorry
00:03:56
it's identifying essential attributes and events of the target character right
00:04:00
the biography of the of the uh the events around the incident
00:04:05
so it's organising intro narrative structure uh in the narratives factors are
00:04:09
super consistent as well so it's a story that has a formal
00:04:14
right or that i think to a text quarter style and then it has
00:04:18
in the end perfect fluency meaning semantic coherence uh this is usually what you get
00:04:24
and this is from this quite dry mechanisms that james uh yeah just
00:04:29
explain before right uh how's that possible so let's think of other examples
00:04:36
uh so write a short story about alan turing talking to each activity in the style of local ski
00:04:42
and i'm not going to read this but uh probably change it to be censored a little bit because because is called me
00:04:49
i have but it gets really the the sense of the double costly
00:04:53
style and and shot right it's not a literary peas why does not
00:04:57
uh it's more superficial than that absolutely but it gets some sense of style
00:05:03
it puts that interest story that matches choose a structure that we have for
00:05:07
example a narrative theory so it's a story with all its components right as well
00:05:15
you would have access to the examples in give liked related
00:05:19
and then uh also something also in the in the on line of exotic
00:05:24
right uh yeah trying to put that to be on the spot
00:05:29
to have an argument on abstract ethical values
00:05:34
over a concrete biography in probably nobody asked this question before did
00:05:39
the life of burning mad off so which was a yeah a scammer
00:05:44
in us in body needs and values right and then you uh the argument that it's not
00:05:51
the case it's quite clear uh it takes a position is not uh s. circulating around the problem
00:05:58
it's taking a stance the answer is no uh in the this explaining why right and
00:06:03
each philosophy is is brought it forward on the points which are relevant
00:06:08
chew uh the crimes uh all of burning that of impressive right
00:06:15
so uh just uh enlisting there's other point so it identifies authorisation attributes
00:06:21
target an ethical framework relate this attribute from the character
00:06:25
to the framework ride in this case contrasting right commuter conclusion
00:06:31
right in speech was that relevant through all the argument but and and then again instructor the
00:06:36
argument in a very fluid a way okay so the beach or is that behind the scenes
00:06:44
uh there's like in space is that uh james just just told about uh the mechanism behind
00:06:49
construction is believing something that i i don't believe people thought that we will there so fast ride
00:06:56
uh so it's somehow uh going from the instruction which is the description of the implant right
00:07:03
aligning itself with latent variables in this space which describes author style narrative
00:07:09
explanation limitation styles dialogue and so on so forth quite
00:07:12
semantic level so discourse level semantic level a conceptual level
00:07:19
right in in the end it's putting more or less all together into
00:07:23
a kind of inference time right there is some sort of inference going on
00:07:29
right so uh but we cannot really explain uh exactly what is happening there
00:07:34
so people working on this error uh i do not have a theory
00:07:38
like the proper formal theory to describe the behaviour of the things so in some sense is a mystery run
00:07:46
so uh_huh let's right shoe uh i understand some of this map and it's it's ah okay does the p. d. f. version this should have been
00:07:55
remove so ah let's try to understand some of this advances right so the first work that started shoe
00:08:01
think in terms of applying transformers not in the sense of fine chewing right so having
00:08:06
upper train the model that should try to learn over specific examples so try
00:08:11
choose prompts work few short examples uh in the context of uh the inference
00:08:17
time of the model was was proposed by brown uh in twenty twenty
00:08:22
so and that's that's essentially the idea right that you instead of having a large data set that you want the tape
00:08:27
you describe the task what you wanted to give a couple of examples and the
00:08:31
model does that line so that's the notion of prompting with a keyword that's very important
00:08:38
in chat typically types of discussion right so uh you have enough from at best description uh and then you have a couple
00:08:45
of examples right you may not even example you may give
00:08:49
one example anyway give multiple examples and this is essentially the job
00:08:52
than to describe the three cases in what was observed a
00:09:00
and that a time was that models with larger parameter space o.
00:09:06
g. p. t. three uh i have this ability off responding to
00:09:10
prompting right so have this ability to do this few short learning
00:09:14
uh from from think so not the sally requiring that finding
00:09:18
functions that before i don't this is was a quite unexpected behaviour
00:09:23
right because it's understanding intent is doing these alignments behind the scene in
00:09:27
a white efficient manner right and this is this is the foundation behind
00:09:32
a lot of what evolved to be chatty so some people have attempted a chew or are attempting
00:09:40
because this is quite recent work to understand what is when one so there is some recent work
00:09:45
uh that looks inch you a large thing the models as a kind of mixture of
00:09:49
experts uh it tries to post sept allies like this so that for example heavy units specialist
00:09:56
in some specific concepts the concepts that emission you write it maybe something more specific
00:10:01
so this about graphical text right and then i try to to make that model
00:10:06
uh behaviours apocryphal back some aligned with the intent and so
00:10:09
on and they are all all the words which are for example
00:10:13
uh focusing on this in context learned so another a way to describe
00:10:18
uh what is the prompting uh s. beige an inference right so there there
00:10:23
are some attempts to form allies the behaviour of these models in terms of
00:10:28
all other inference paradigms so on that would so i'm not going to have time to go uh to this but i just like to
00:10:35
leave appointed because i think it's quite interesting work that's trying to organise a little with the space because now there is a disconnect right
00:10:42
this modest behave very sophisticated we we cannot explain really why might be in this
00:10:47
this people are attempting to do exactly that would just keep the details on this
00:10:54
okay so uh another evolution in the concept of prompting a is something called chain of thought
00:11:00
uh especially for tasks which are a step wise procedural in some
00:11:07
form right you want to calculate something or you want to develop
00:11:11
a a set of that reasoning steps by that leads to uh to the final result you can that compose your problem
00:11:18
into what all the i ching of thought brought right the example that you have
00:11:22
here you have a problem with the question answer and then you have the answer
00:11:27
and then as as you have the other question might in the channel thought prompting you
00:11:31
articulating via the prompting mechanism the underlying reason
00:11:36
and the composition that to have behind the pro
00:11:40
and for many problems that's a way that you had to guide the the the model
00:11:44
uh in the direction of the inference that one light so if you're thinking about
00:11:48
a models of inference so for example the scent of domain that you need more control
00:11:53
this can be useful mechanism to control that uh that type of behaviour
00:11:58
so quick reference should chain for prompting but again not going should super
00:12:05
level of that okay so let's go to instruct to b. t. so
00:12:11
that's essentially the the foundation will do that behind j. d. b. t.
00:12:16
and it started bitchy essentially focus on really understanding this user intent this
00:12:22
this prompting vita that to have from the user uh and trying to
00:12:27
improve the the behaviour of the problem with human feedback light so it's
00:12:33
a sure what they do they collect the data set with human demonstrations
00:12:36
so examples of people asking does weird things and uh i don't think that
00:12:40
the fine tune g. p. t. three right ah using supervised learning so they try to learn that pass
00:12:47
and then this the fine tune the supervised learning model using a reinforcement
00:12:51
learning paradigm right uh so and then in the end they found that
00:12:57
compared to the result you teach more do larger parameter space instructive it it can be quite a
00:13:03
kinda lever explanations and and uh answers which are
00:13:08
fluent in perceived by in user should be higher quality
00:13:12
right so it's essentially a a kind of speech for that so going a little bit
00:13:16
more in to the details so as i mentioned we start with g. p. t. three
00:13:21
uh we beauty fine should more do well with function
00:13:24
by using a prompt so weird questions answered uh and
00:13:30
that's the annotation so yeah if they surely albany i
00:13:34
use using that using some input from users in creating all
00:13:37
the prompts so in this work i think they use around ten kate uh peers or prompts for this phase
00:13:46
rights of not massive scale but but considerable right so they what they do is they use this function more too
00:13:54
shoe viewed a reward more to uh for for that's right if you want to apply reinforcement learning
00:14:02
you need to have a model of reward something that says this is good not good at the end was you can use the
00:14:07
final data set but they want to leverage a yep at an
00:14:12
intermediate phase uh in this context so you have the function module
00:14:17
uh it's a she you're taking the set of answers that you have for each
00:14:21
prompt so assuming that you have the answers in in the cases skating i think for
00:14:26
and thirteen in their case uh you have a mechanism to rank order
00:14:30
right and that's a kind of rehabilitation that the um the order of of the product of the answers you
00:14:37
can say this answer is better than this in my opinion they have a special which control that type of annotation
00:14:44
and then you have pairs of comparisons right uh this represents essentially
00:14:49
i prefer dancer in contrast to a a a baseline so
00:14:55
right and then they have this set of pairwise comparisons background
00:14:58
they have a data set and they use that to be would be rewarded up
00:15:02
the way that they do that uh they input the prompt and the answer the completion
00:15:09
uh and then they train the model to calculate the reward indian right
00:15:12
so it's actually a number like they have a a loss function that
00:15:20
essentially does that and i think we looked into the components of that
00:15:24
white with essentially a yeah using this pairwise comparisons to be you that uh
00:15:31
that we work function then we'll learning that uh in the context of
00:15:36
lonely away to give a reward based on a new this light and
00:15:40
that we have water above the the data sets by that are learning
00:15:44
the reward function like again in the interest of time i'll just keep that
00:15:50
uh input that as a reference but the the the idea here is that essentially a trainee something
00:15:55
that says this is good uh this is not a this is a better answer yeah and then you're
00:16:03
ready to apply reinforcement learning so you have your final data set and they use a a method
00:16:10
call the proximal policy optimisation in that process uh
00:16:14
so it's actually the the apply the reinforcement learning yeah
00:16:18
uh each step it's i mean at the at the final answer they do do the reward at the final answer and then they
00:16:25
it's actually use the the new gradients
00:16:28
to update the model uh via reinforcement
00:16:33
learning uh at that the technical details of this use a bit uh maybe
00:16:39
over a four for our event here but i say sure what we're doing
00:16:44
we're taking the parameters of the regional model right that's that's our function more do
00:16:49
uh which based on g. b. t. ride and
00:16:51
then we're updating using essentially a reinforcement learning method
00:16:56
that use this intermediate reward more do as a feedback mechanism might so
00:17:02
that's that's the general architectural framework right behind so uh they evaluated that
00:17:14
uh in the context of tasks yes usually one does in natural image
00:17:18
processing uh and they found that compared to the baseline switches g. p. t.
00:17:24
the the model the proposed model instructed to keep delivers uh is some knowledge
00:17:29
of lowers the nation's uh yet it falls better this intent was explicit comparing constraints
00:17:36
uh yeah and uh it has better properties compared to that you could so this is not
00:17:42
just to be but essentially the framework to work for that you had to hijack the b. t.
00:17:46
he's not chew this work so uh yeah this is just we don't know
00:17:55
exactly uh what is behind chatty pity from the point of the data collection
00:18:00
uh i and infrastructure but i think expectations that they're using g. p. t. three point five
00:18:06
uh and of course they have a more aggressive uh data collection
00:18:09
set up about architecture speaking i don't think this is too much you
00:18:13
want of what we describe your okay so uh i just like
00:18:19
to leave a a very tweak pointer uh as an issue does going
00:18:24
uh to be online so many people uh started to attempt to use this larger than the models shoe
00:18:30
uh do scientific inference in in all this is going to be really a set of pointers right
00:18:36
uh the the work that's most outstanding on that side uh
00:18:41
is galactic at so that's a word that was published last november
00:18:45
and they spend a lot of time uh essentially trying to talking lies in parse elements
00:18:51
of scented discourse that are not really natural
00:18:54
language rights so molecular structures uh yeah mathematical formally
00:19:00
yeah insulin and that's essentially what they did a behind the scenes why they collect the work
00:19:07
was which is scientific they really butte robust
00:19:10
recognisers to address the scientific specificity of this corporate
00:19:15
right and then they experimented with that uh in in in many
00:19:21
different types of inference tasks so again i'm not going to talk
00:19:25
about the numbers but they have planes of improvement uh really in
00:19:28
terms of scientific inference test and they it's a really super thorough
00:19:32
work uh i i really recommend looking into the collect come on
00:19:36
that but there are two messages here so first be achieved significant
00:19:40
improvements i think over the existing baselines by doing this type of
00:19:44
strategy right at the same time if you look at the absolute numbers
00:19:49
they are quite low in many of this inference tasks why does not
00:19:54
really super safe for use in the real why does a kind of emerging
00:19:59
a two one should not apply this for a medical decision making
00:20:06
uh yes at this point uh but but it's definitely one should invest time
00:20:12
on that and i i i did some specific tasks that the the focus
00:20:16
i think very interesting so trying to site uh should predict
00:20:20
citations letterhead back so even does a bit should do a
00:20:24
citations they are trying to prove this models for but but because most of these models to deliver that is is to
00:20:30
restrict it so yeah they deliver that uh i think to
00:20:36
the general public there was quite a push back on that
00:20:41
because they didn't card maybe too much of the narrative one
00:20:44
that should not be used uh as an end user right
00:20:49
uh but it's a super work uh yeah but they received like some huge uh in the
00:20:56
context of public criticism or so uh pointers as well there are specific models which were developed
00:21:06
you know similar style right collecting corpora specific corporate beauty specific parses
00:21:10
for code completion so called x. is the things model in that direction
00:21:15
so it doesn't change too much on the uh let's see the core
00:21:18
learning framework side but it has this investment in infrastructure to collect the data
00:21:23
uh yeah and of course there is a lot of empirical air for going behind is once again just leaving as a pointer here
00:21:31
original does exist and by doing that uh
00:21:36
they can really outperform at your naturally would be the model so just to be okay so and then they are even
00:21:46
extrapolation right so people were uh experimenting with this model should
00:21:49
do what people although formalisation so to goal uh from text
00:21:54
to a kind of formal mathematical a form which can be accepted for example by f. year improve or
00:22:01
right uh and this is the best that's called formalisation so
00:22:06
this is a bit more farfetched but quite exciting a research area
00:22:11
as well okay so this is it more from the side taking for side so i just wanted to give a general portfolio
00:22:18
on what people are doing i'll just skip in the interest of time should the final conclusions
00:22:25
so uh i think we were thinking about these models uh we looked at some of the capacities especially
00:22:32
with the prompting and what they're cheating now it's
00:22:35
it's really a outstanding right and unexpected in many senses
00:22:39
uh but of course we need to be critical about when there's more than they should be applied
00:22:44
right and uh and there are different perspectives of what is what is it that you are delivering right so
00:22:50
they're this recent article i think should be that will from now on trotsky the false
00:22:54
promise objectivity in there some fair points there right what does models are are deliveries not
00:23:00
doesn't have the elegance of humour reuse any uh it's not
00:23:04
really church or right despite looking a bit sophisticated is not proper
00:23:09
hard to not properly church or in i think when you see
00:23:12
integrated that critical perspective in shoe is major capabilities that we have
00:23:16
here so uh one seemed is that i found very balance these
00:23:21
from human can uh i like again to live as a reference
00:23:26
and i think it's a good way to articulate what does models can and cannot do
00:23:30
not going through that but if you can have access to this later i would recommend
00:23:34
it's a very balanced perspective on on what these models can deliver and that was more on the safe side
00:23:41
you could even push a bit more on the on the side oh okay this is a formatting problem okay just to conclude
00:23:50
so uh i i i think most of the words that we covered here
00:23:54
is our two years old or less a lot of the references that i cited
00:23:59
are from twenty twenty to write a largely them all those uh with
00:24:05
essentially we human feedback consisted it
00:24:07
identifies in control stylistic narrative conceptual semantic
00:24:12
so and so forth right um elements
00:24:15
for generating text that's outstanding discourses fluent incoherent
00:24:20
uh yeah it produces narrative argumentative next month or reason it's clearly is happening at a certain point there
00:24:28
so there is reason uh but to major gaps in them is basically discourse in particular in the mail science
00:24:35
so the result problem with fact reality and we're going to see this in the
00:24:38
next section some examples of that into their we use uh yeah uh we should have
00:24:45
uh what awareness and there would be probably fast progress in
00:24:49
that direction as well so the community has this aware that's
00:24:53
right but one always need to be critical about its court
00:24:56
use what's import is that billions are already committed to the space
00:25:02
right so uh and many of these problems that we're seeing now we're going to see a festival motion direction or something
00:25:07
that addresses some of this problem so i think would be a very uh

Share this talk: 


Conference Program

The Evolution of Large Language Models that led to ChatGPT (Andre Freitas, Idiap)
Andre Freitas, Idiap Research Institute
March 10, 2023 · 8:34 a.m.
664 views
Understanding Transformers
James Henderson, Idiap Research Institute
March 10, 2023 · 8:46 a.m.
369 views
Inference using Large Language Models (Andre Freitas, Idiap)
Andre Freitas, Idiap Research Institute
March 10, 2023 · 9:19 a.m.
Q&A
Andre Freitas, Idiap Research Institute
March 10, 2023 · 9:45 a.m.
ChatGPT for Digital Marketing
Floris Keijser, N98 Digital Marketing
March 10, 2023 · 9:58 a.m.
Biomedical Inference & Large Language Models
Oskar Wysocki, University of Manchester
March 10, 2023 · 10:19 a.m.
Abstract Reasoning
Marco Valentino, Idiap Research Institute
March 10, 2023 · 10:38 a.m.
120 views
Q&A
Andre Freitas, Idiap Research Institute
March 10, 2023 · 10:58 a.m.
Round Table: Risks & Broader Societal Impact (Legal, Educational and Labor)
Lonneke van der Plas, Idiap Research Institute
March 10, 2023 · 2:07 p.m.

Recommended talks

Discoverying Life patterns
Fausto Giunchiglia, from University of Trento, Italy
June 8, 2015 · 4:07 p.m.
278 views