Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:00
thanks like a uh yeah so my talk is connected
00:00:04
to a trillions and andrea says stalks of this morning right
00:00:08
you see the computer scientists they they like this topic and they
00:00:11
think uh uh they're addressing or trying to address that right so
00:00:16
um i talk is on i do see nations uh so so in all was talking to you uh i
00:00:23
have been a lecturer and researcher at yeah it's only
00:00:27
receive applied science for the last one and a half years
00:00:31
and the yeah the p. h. d. and the post that actually
00:00:33
uh at it yep that my uh these topic was on summarisation
00:00:38
and as it is c. later this is also where the topic of house use nations was almost
00:00:43
one of the earliest fields where this came up okay so uh i wanna talk a
00:00:49
bit uh today about what these nations are such that we all have kind of common ground
00:00:55
um then uh quickly say or or talk about the hypotheses
00:01:00
by l. that language models so sore shortened alums and isn't it
00:01:05
um and then how we can detect them and and then maybe a a bit of an outlook how can we
00:01:10
avoid hallucinations um yeah so as a definition of what
00:01:15
lucy nations are and they're talking about outputs off language models
00:01:20
that are not supported by some uh given knowledge
00:01:23
source so even if they are correct uh factually
00:01:27
uh if they're not supported by this knowledge source um then that would that would be a hallucination
00:01:33
um and yeah i hear i i mentioned this is slightly different from
00:01:36
the question of actuality were actually more thinking about the truthfulness yes or no
00:01:41
yeah but andrea said this morning right we're moving even in factual to removing more to with
00:01:46
respect to a given knowledge source right i seen more and more researchers come to that conclusion
00:01:53
yeah in the last couple of years it getting people were saying that
00:01:56
the there are clearly true facts right and we're mostly kind of thinking about
00:02:00
laws of physics types of things where it's not really
00:02:03
debatable right but uh uh bud uh right now and especially
00:02:07
with the topic of today uh yeah there is not always common ground of what is to with with this falls
00:02:13
yeah and so we're we're trying to answer got more
00:02:16
with respect to a a specific uh knowledge source okay
00:02:22
um why do language models even listen it why do they come up with stuff um
00:02:28
and i really like this explanation of a chunk should men here from open a i
00:02:33
uh who uh explain this in the context of reinforcement learning where they have
00:02:38
a a term called behaviour cloning which is for most people in machine learning supervised learning
00:02:44
yeah or um them same term but just kind of different origins
00:02:48
uh for the same thing and i would like to uh summarise this this uh slide here
00:02:54
uh a bit so he start they we start out by uh having the model
00:02:59
and having a model that their training and it knows some facts we've asserted probability
00:03:05
um and doesn't know some other facts yet that's kind of in the state of training so
00:03:11
if we then in this state trainee to predict the fact that is on onto the model
00:03:17
it will learn to hallucinating to say oh okay this is something i didn't know
00:03:21
i wouldn't have predicted this but in training i'm being told i should predict this
00:03:26
so obviously sometimes i need to come up with stuff for um
00:03:32
and the other way around also works or or danny uh an obvious on to be well just say you don't know
00:03:38
when when it's not known right um if we train it's
00:03:41
to say i don't know for a fact that it actually knows
00:03:45
and that that means we're training it's to withhold information from us
00:03:50
that is also a problem this other direction is also a problem and so really
00:03:54
kind of what we would need to know is at any given point in time
00:03:58
you need to know what the model knows so that we
00:04:01
can give a proper training signal and um so that's uh
00:04:07
yeah otherwise we will always re needs to loosen it and
00:04:09
that's kind of elusive so for uh to exactly know that and
00:04:14
um that i've been taking the picture bit black and white or the slide also this
00:04:18
uh but of course there's always probably is probably that is attached to each of those
00:04:24
yeah facts right um yeah so probably at least the waiver training in the
00:04:30
right now uh we're never going to get a complete completely read often use initials
00:04:37
i said before in summarisation summarisation was a field where this showed
00:04:41
up uh as one of the earliest uh fields in natural language processing
00:04:47
and um that's because they do exist in the training data that we have uh the biggest um
00:04:54
training data sets that we have for summarisation to include hallucinations in the training data
00:05:00
so uh these data sets are usually automatically scraped from the internet typically from newspapers because
00:05:06
that's a huge source off an hour to go with the
00:05:10
short hello mark goes the short summary that and that's kind of
00:05:13
the setting we haven't summarisation right we would like to uh
00:05:16
extract important information in a much much smaller a piece of text
00:05:21
um and so newspapers uh were publicly accessible through the internet
00:05:26
that was a source for researchers to collect a lot of
00:05:29
data uh where we we could demonstrate summarisation models on a um
00:05:35
and of course the newspapers when offers or editors
00:05:39
uh create summaries of news articles they will use
00:05:42
their background knowledge about the topic to write reading about
00:05:46
and not specifically just look at the article and create the summary from that right
00:05:51
so sometimes they will have some facts uh to
00:05:54
the summary that is not actually present in the article
00:05:58
um and the kind of worst example oh off this in summarisation
00:06:03
is that x. m. l. data set make some for extreme summarisation
00:06:08
um where seventy five percent of the examples in that data set contain at least one loose emission
00:06:14
according to a twenty twenty study um and so the models if they're trained on this data set
00:06:21
they have to loosen its right seventy five percent of the examples have hallucinations
00:06:25
they will learn to a losing a just a from this input output pairs that they are given
00:06:30
and you're thinking okay that's stupid but why do we create data sets with so many hallucinations
00:06:35
so i want to show you one example of how that can can happen at so the x. m. is uh i
00:06:41
scraped from the b. b. c. news website and n. b. c. news
00:06:45
website is a built up like this so you'd have the title here
00:06:49
and then you always have this leaving initial sentence here which is the
00:06:53
one sentence sort of summary printed in bold at the top of the page
00:06:57
um and then there rest of the article follows below them
00:07:02
and they say it's meant to be read as one article right
00:07:05
but people said oh okay so this is a one sentence summary we want extreme summarisation
00:07:11
a summarise an entire article in one sentence we can use this to believe data set
00:07:15
this year will be the summary is one sentence that we were going to predict that from this article down here
00:07:22
and naturally some of the fact mentioned in this initial sentence here never appeared again in the rest of the article
00:07:29
that's why we ended up with the data sets that hot so many hallucinations intent
00:07:33
so just kind of a byproduct of a automatic data collection
00:07:40
okay um so that brings me to the next step can we actually detect when language model set loose in it
00:07:47
and uh one being or one uh forts that came to our minds would be hey
00:07:53
yeah they should be less certain when they output facts that
00:07:57
are hallucinating stand when they output output facts that are that true
00:08:02
and indeed uh when we look at the models uh uncertainty while predicting the
00:08:07
next token so that's how they build the summaries right updating talking after talking
00:08:12
we can indeed find that here i have one example of a summary and and the one a
00:08:17
lot just bars here often certainty so i'm looking at that the coding probabilities computing the entropy of that
00:08:23
yeah and that gives me a matter of uncertainty and the largest uncertainty here actually starts
00:08:29
this section more of a hello cynic that factor um so we too big that observation as
00:08:36
the basis of our work so james henderson then i added this
00:08:39
work uh two years ago uh on a specific methods to detect hallucinations
00:08:46
in where we looked at one side the attention patterns and the other side also decoding probabilities of the model
00:08:52
so we kind of looked at how the model really comes up with its predictions and and
00:08:57
how it kind of builds these summaries to try to identify the choices that it makes that seem
00:09:03
yeah a bit less certain or a bit more prone to hallucinations exactly um picture off
00:09:11
their attention patterns here uh varies be uh just an example here b. c. d.'s diagonal pattern
00:09:18
that corresponds to when the model copies uh some input text
00:09:22
in which case we would for example be more certain that it wouldn't be hallucination
00:09:27
so this is one possible add white folks
00:09:30
or gloss box yeah lose initially detection method
00:09:34
so uh recording it's white boxer gloss box because we're looking into the model
00:09:39
as opposed to black box every just seeing the input and output
00:09:44
um another uh interesting method here uh that to be so in this category um is uh
00:09:50
checking the input tokens contributions to the outputs
00:09:54
so kind of when the model generated output
00:09:58
how much did each input token contributes to that a
00:10:02
prediction and from there you can also find to okay
00:10:05
somehow here none of the input tokens contributed very much it is so
00:10:09
maybe it's not really taking that information from the input source
00:10:13
text and it's rather has to loosen it that it's right
00:10:17
so um those are interesting white box uh or plus boxes nation detection methods
00:10:23
and then there's a a lot of work on black box hallucination detection methods when you just say
00:10:29
okay maybe uh i don't have access to the
00:10:31
model like for example in captivity being gemini right
00:10:36
we don't have facts we cannot look at the the models internal attention patterns and and the probabilities
00:10:42
so what we have is we have the input to have the outputs
00:10:45
can we detect from that all ready if the model has solution like that
00:10:49
so those are the black box lose nation detection methods and and
00:10:53
i mean this is now very much into the defect verification area
00:10:57
am i think even he's been here is very similar to something you should uh in
00:11:02
the morning um but uh exactly so there are different approaches also so different kind of categories
00:11:08
uh of what we can do here um and uh one uh
00:11:13
so i'm i'm just um yeah so one is to for example use
00:11:18
a natural language inference model so that's model that can say if
00:11:21
a piece of text can become clue that from another piece of text
00:11:26
so we can use that for example to to say can be really concludes and to the summary
00:11:31
from the dust article uh in the case of summarisation right um
00:11:36
there are question generation and on string based models where we try to
00:11:40
uh create or automatically generate questions um and in the morning we
00:11:46
heard that part works fine with g. i. g. b. d. right
00:11:49
generate questions about the summary and then try to answer that from the article if we can't answer it
00:11:54
then maybe uh maybe that information has been added
00:11:59
and it's not actually coming from the input text um and then i think they're them
00:12:05
the more recent very promising direction is also what is picture here on the right is about
00:12:10
uh where we break up longer texts into
00:12:13
smaller parts which are sometimes called atomic facts um
00:12:17
we can you do objective to do this and they're researchers studied that this works pretty well um it can do that well
00:12:24
and we break up a a longer claim like this one here so uh this
00:12:28
claim that they used is to buy then stated on august that thirty first at
00:12:33
understanding the speech when i was vice president violent crime for fifteen percent in this
00:12:38
country the moderate noise up twenty six percent across the nation this year under donald trump
00:12:43
um and then these were rather long sentences a
00:12:47
pretty hard to fact check uh yeah at once but
00:12:52
it's it becomes a bit easier when we break it up into these individuals statement so you can see here
00:12:57
uh there are they are broke it up into to be true question so the
00:13:01
the crime rate for fifteen percent during to buy tons of vice presidency uh and uh
00:13:08
did the murder rate in twenty twenty increase by twenty six percent from twenty ninety work and then
00:13:14
there are implied questions as well so i thought that was smart that researchers also looked at that part
00:13:20
um yes by than comparing crime rates from the same time
00:13:24
interval that because that otherwise would be maybe unfair sweat right
00:13:28
um and he is violent crime rate in murder rate directly comparable
00:13:33
or do we just one include much more so um yeah very interesting uh work
00:13:38
and i've seen yeah the more recent work be more in that direction but we
00:13:42
break up uh these these rather tricky the complicated claims into these atomic facts okay
00:13:54
um yeah so uh ways to mitigate this
00:13:57
how can we reduce uh or avoid hallucinations
00:14:01
there are again a bunch of a different methods so
00:14:05
i yeah i will make this last available so you'll have
00:14:07
access to these links and pointers if you're interested right or maybe if you're trying to tackle a similar problem um
00:14:15
one way in which we can make it uh or we can
00:14:18
make the models loosen it less is to introduce an intermediate clad extent
00:14:23
so uh kind of again kind of making a or e. similar to this kind of chain of fourth thing right if we
00:14:30
kind of split it up into multiple smaller tasks models can perform better and and the
00:14:36
don't have to do that much in place it work which relates to lesser chances of hallucination
00:14:42
um so there's work in that area then there's work
00:14:45
in directly fine tuning a models to reduce hallucinations uh
00:14:50
for example this uh work from this year that with
00:14:53
our preference optimisation from either retrieval based or model based
00:14:58
a kind of source off actuality uh we can post process the outputs so really rank
00:15:05
uh the more trustworthy once in a generate multiple outputs then
00:15:09
had to pick the one that is most trustworthy out of those
00:15:13
um or we can a reference
00:15:18
and the outputs to handling gets to source uh to the source of information so
00:15:22
these kind of retrieval augment that that idea that we've also heard about in the morning
00:15:28
um and that's uh actually yeah so combining citations redevelopment that
00:15:33
generation or a forcing models to stick more closely to reference
00:15:37
uh which i'm actually also currently working on so to conclude a there is also this question
00:15:44
of should be actually get rid of hallucinations completely and i would say that in some applications
00:15:50
it's a really require that the strictly adhere to the
00:15:53
facts um and there we don't absolutely don't want nations but
00:15:57
on the other hand with jet u. t. v. also so
00:15:59
a lot of benefit in creativity lawmakers for a special to right
00:16:05
uh and they're in that those cases uh creativity and hallucination bill maybe kind
00:16:11
of go hand in hand uh in which case and maybe don't even wanna

Share this talk: 


Conference Program

Opening and introduction
Prof. Lonneke van der Plas, Group Leader at Idiap, Computation, Cognition & Language
Feb. 21, 2024 · 9 a.m.
Democracy in the Time of AI: The Duty of the Media to Illuminate, Not Obscure
Sara Ibrahim, Online Editor & Journalist for the public service SWI swissinfo.ch, the international unit of the Swiss Broadcasting Corporation
Feb. 21, 2024 · 9:15 a.m.
AI in the federal administration and public trust: the role of the Competence Network for AI
Dr Kerstin Johansson Baker, Head of CNAI Unit, Swiss Federal Statistical Office
Feb. 21, 2024 · 9:30 a.m.
Automated Fact-checking: an NLP perspective
Prof. Andreas Vlachos, University Cambridge
Feb. 21, 2024 · 9:45 a.m.
DemoSquare: Democratize democracy with AI
Dr. Victor Kristof, Co-founder & CEO of DemoSquare
Feb. 21, 2024 · 10 a.m.
Claim verification from visual language on the web
Julian Eisenschlos, AI Research @ Google DeepMind
Feb. 21, 2024 · 11:45 a.m.
Generative AI and Threats to Democracy: What Political Psychology Can Tell Us
Dr Ashley Thornton, Geneva Graduate Institute
Feb. 21, 2024 · noon
Morning panel
Feb. 21, 2024 · 12:15 p.m.
AI and democracy: a legal perspective
Philippe Gilliéron, Attorney-at-Law, Wilhelm Gilliéron avocats
Feb. 21, 2024 · 2:30 p.m.
Smartvote: the present and future of democracy-supporting tools
Dr. Daniel Schwarz, co-founder Smartvote and leader of Digital Democracy research group at IPST, Bern University of Applied Sciences (BFH)
Feb. 21, 2024 · 2:45 p.m.
Is Democracy ready for the Age of AI?
Dr. Georges Kotrotsios, Technology advisor, and former VP of CSEM
Feb. 21, 2024 · 3 p.m.
Fantastic hallucinations and how to find them
Dr Andreas Marfurt, Lucerne University of Applied Sciences and Arts (HSLU)
Feb. 21, 2024 · 3:15 p.m.
LOCO and DONALD: topic-matched corpora for studying misinformation language
Dr Alessandro Miani, University of Bristol
Feb. 21, 2024 · 3:30 p.m.
Afternoon panel
Feb. 21, 2024 · 3:45 p.m.