Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:01
thank you actually really the my first slide so i can just move on and go straight to
00:00:06
the point so it's not my p. h. d. in two thousand
00:00:09
eighteen one conspiracy theories but not so popular like it after calling
00:00:14
and there's a psychology is that he used to read a lot of stuff
00:00:17
to know what's our conspiracy theories and why we know lot ah it's yeah
00:00:23
uh we know a lot about the uh um uh the ground or the
00:00:27
psychological background of people who believe in conspiracy theories we also know how they spread
00:00:32
and how they the effect believe of other people and
00:00:38
what was the the the the negative consequence is what me
00:00:42
didn't know much at that time was the actual text so what are conspiracy theories right
00:00:48
so we decided to be the colour calls so there are some problems if we'd be the corpus of conspiracy
00:00:54
theories so wouldn't that text okay and even if we
00:00:57
do it's it's inexpensive and the sample size is not enough
00:01:02
uh we don't know what is a conspiracy theory okay i mean we haven't intuition but is not so so
00:01:10
so you know so easy and then how which are the criteria
00:01:16
to classify conspiracy theories to be based on the quality
00:01:20
interactive elements the language okay there was another problem so
00:01:25
what do we look at text so insertion media there were
00:01:29
some research uh uh there is some research in that case
00:01:33
and but there's a problem soul social media are a texan social
00:01:37
media um out shorter yeah i delegate is so full of noise
00:01:42
and they're not be independent and plus are they conspiracy theories
00:01:46
so we got this idea to look at text in web pages
00:01:51
so uh taxing web pages are standalone document they have space for in depth
00:01:57
and elaborated discourse and their specialised source's creative for developing and spread conspiracy theories
00:02:03
so they can also provide mash or off the spread
00:02:08
so the what to look we have a a labelled website
00:02:14
classified on a continuum the quality of informational but
00:02:18
also on a hard classification between conspiracy and main street
00:02:23
so we can look at those website uh okay so once we defined the website
00:02:30
we can simply takes take all the documents from these websites right so
00:02:34
competitive italian are and i don't know some cooking documents from the mainstream right
00:02:40
but this is a problem because then you get kinda lonely as an important war the for mainstream
00:02:46
the language right so it's not really accurate so an idea is to do is
00:02:52
to match documents by topic so in this way we have topics like a lady diana
00:02:58
and we have the there are some a conspiratorial that show and the mainstream national so in this way we can extract
00:03:04
the language of conspiracy theories so our idea was to do some google search
00:03:12
and we simply search for topics clean website we crossed
00:03:16
it is released all the topics and l. d. stuff website
00:03:21
and we got some good queries from which we accepted the u. r. l.s and then the text from
00:03:27
them and in the end we got to local the language of conspiracy goat balls and don't know that
00:03:34
the data set of news article for studying the language of dubious information
00:03:39
yeah i know
00:03:41
so locale the of these one hundred thousand documents the
00:03:46
revolving around the uh topics that have generated conspiracy theories
00:03:51
so is mainly focus on conspiracy why though not has
00:03:54
been built to unless the language of conspiracy and political ideology
00:03:59
so also for you know a psychology thing bottom political science and stuff like that
00:04:06
um and it's quite be because it's two point two million documents
00:04:10
so it's huge it be my type of stuff so uh the
00:04:16
quality of these two corpora is that they're quite reaching data and
00:04:20
meta data so they have information on our uh on three levels
00:04:25
on three that it's so we have information about the documents so the text
00:04:28
itself that that title a semantic compare and a um like a top peaks uh
00:04:35
uh we have information about the web page and the spread on face broke which is important so we can
00:04:40
connect the the lexical features to the spread the the
00:04:45
purity of single web page nine information about the websites
00:04:50
so now for the funny part uh how much time i had because this is
00:04:56
i knew wow fantastic so we can spend a lot of time for notice that
00:05:01
we can do with local so i think the most important thing we
00:05:04
can do with these corpora ease uh create an algorithm to automatic detection of
00:05:10
conspiracy and also ms information so i'm i'm working beats
00:05:16
the those that can live on those candies and we got
00:05:18
this idea to this is quite technical but i'll try
00:05:23
to give you the idea so they yeah if it yet
00:05:26
is the way to extract the important words in
00:05:29
a document okay so that's like the ah and
00:05:34
to exactly important boards are in conspiracy corpus but based on the fact that
00:05:39
they are topic matched so if we do that we obtain a list of words
00:05:45
which are super conspiratorial i know you cannot read in that back because it was my problems away every for you
00:05:52
is elite ca evil proved propaganda et cetera
00:05:58
and then pulled the exclamation point to so this is
00:06:01
the you know the very important a conspiratorial language so
00:06:06
we asked charged b. d. to evaluate uh some tax the from though now the
00:06:12
nine thousand tax that an wells correlation of point fifty setting which is oh
00:06:19
okay i would say for this kind of task but we can talk later
00:06:23
uh i'm still satisfied with that uh another thing you can do with
00:06:28
local i mean i can do but you can also do um is
00:06:33
analysing the website incoming traffic uh um so this for example
00:06:39
we took the one thousand seven hundred website from donald
00:06:43
and we analysed how people go to the website the
00:06:47
so basically in the graph on the gonna use use one or yeah here
00:06:51
you see that as the biological as the biological yeah they geological bias increase
00:06:58
people in conspiracy tend to go directly to the web site and not push by social media
00:07:04
so this is quite important so people go there because they have this confirmation bias they want
00:07:09
to come for their ideas to the website didn't go there because they're pushed by other people
00:07:14
according to our results uh another thing people can do
00:07:18
also easily just analysing lexicon featured by uh um bankroll
00:07:24
so you can take the convenient effect of political ideology or
00:07:28
the you know one conspiracy or analysing lexical feature by domain quality
00:07:33
see how it here or you can see the
00:07:38
evolution all for the topics through times here you
00:07:42
see the two spikes in the topic in russia ukraine to doesn't fourteen into doesn't for me too
00:07:49
um there's another viable in local quite important
00:07:52
which is the set of representative conspiracy theories
00:07:57
which is a set of documents that have lexically similar
00:08:00
to the whole conspiracy so called balls so those documents
00:08:05
as you can see on the right here and you cannot read about
00:08:09
are based on language of the sexual dominance
00:08:13
of batteries mcgrath show the users clinician markers
00:08:17
and they use also language of in group and our group like uh
00:08:20
the use of pronounced week day and they are also more shadow unfeasible
00:08:25
so you say how many things you can do with local and we don't have time just promoting opened on them
00:08:30
in case you haven't seen that so feel free to use it at a three by the way so all yeah
00:08:37
so another thing uh we did actually and we partition science
00:08:41
advances was that um we wanted to understand the conspiracy mentality
00:08:48
do you think that because if any in text so
00:08:51
one obvious the thing in conspiracies that everything is interconnected
00:08:56
so if there are events and you have the conspiracy mentality you tend to see those
00:09:00
a bands in a conspiratorial way 'cause events are connected to each other because evidence that
00:09:07
there is a conspiracy behind lady diana is also evidence for the fact that the house is blocked
00:09:13
okay so what we'd go okay we do we check how topic school the
00:09:19
the call core together and we did it and me the this fantastic plot
00:09:25
we show that's in documents e. e. in conspiracy documents if you talk about can trade
00:09:31
you also talk about the ball out billy gates the encoded boxing's so you have these
00:09:36
and related topics together but in the end they're similar to each other so here you see
00:09:43
yeah i've seen you know there's a quite big data will be like
00:09:49
and they have it we can also explore the
00:09:53
crappy t. v. with no okay in conspiracy theories so
00:09:57
without is this interconnected it interconnected nest a product of the activity or something as that
00:10:04
so we took nominal compounds from local and we know that
00:10:08
we created a series of measure also he there's a pollster
00:10:12
downstairs if you miss the chance lightly you can go there otherwise but
00:10:18
i just go pretty yeah so we found that a conspiracy uh
00:10:23
a language in conspiracy actually nominate compounds in conspiracy theories are truly a
00:10:29
form in a creative way but they are using a very rigid
00:10:33
way so with all that d. c. is because that is the um
00:10:38
there's a isn't probably a semantic problem in the head of price but also for social signalling
00:10:46
and another thing is that the signal in their smarter so this
00:10:50
is another thing you can do this is my favourite so okay so
00:10:55
we uh there's this idea that conspiracy theory a
00:10:59
conspiracy text you was a fancy boards in the
00:11:02
context uh that is not really fancy so you can use words like i'm i'm about that pizza
00:11:07
and then mom you put the cool word like or be the frontal cortex okay and this sounds
00:11:13
cool right a way to measure this is to use the genie coefficients okay the jenna got fish unmanageable
00:11:21
is an index of the uh unequal distribution so if you have
00:11:25
expected highly sophisticated word you don't type hygiene a if you have the text me also pleased to get the border you
00:11:31
don't have high genie but if you put a lot of cool words in the context of low words you had hi jeannie
00:11:39
okay so we asked subjected it again so i also use a i
00:11:46
to create text and we based on a local see it's
00:11:50
so the topic of local and then we asked tentatively to create
00:11:53
the bullshit mushroom of this text we apply our genie and
00:11:58
we found there are actually differences all the the bush whiplash wrong
00:12:02
had higher genie coefficient on lexical sophistication and then we
00:12:07
did on local and don't out and we found that
00:12:11
basically there is this yours the fact so these these effect um can i i don't think it means that
00:12:20
'kay ah actually finished so those are corpora are
00:12:23
quite big and you can train your global back towards
00:12:27
the want to extend these because actually is just to show that you can do double backed or
00:12:32
for example we create the conspiracy the um
00:12:36
car pools from donald and the the local of two thousand forty four thousand documents
00:12:44
which is quite big and barry go to train a a double back there i don't know if
00:12:50
you can train large language model for from these a small thing but would be cool to have a
00:12:56
conspiracy type typically i and just the bottles because we're in
00:13:01
switzerland and there are other languages other than uh uh english
00:13:06
we believe that we people even uh we stole the uh language of italian i would like i can fly
00:13:12
but they should put also vatican city and somebody you know and teach you know so just to be fair
00:13:17
and the german and we are building a german corpus
00:13:22
uh some making soul and bucks i didn't talk about
00:13:27
those corporate because they're not topic matched so they're only
00:13:30
corpora off a news information so bad uh information and
00:13:38
that's it's just want to say fancy data monica him

Share this talk: 


Conference Program

Opening and introduction
Prof. Lonneke van der Plas, Group Leader at Idiap, Computation, Cognition & Language
Feb. 21, 2024 · 9 a.m.
Democracy in the Time of AI: The Duty of the Media to Illuminate, Not Obscure
Sara Ibrahim, Online Editor & Journalist for the public service SWI swissinfo.ch, the international unit of the Swiss Broadcasting Corporation
Feb. 21, 2024 · 9:15 a.m.
AI in the federal administration and public trust: the role of the Competence Network for AI
Dr Kerstin Johansson Baker, Head of CNAI Unit, Swiss Federal Statistical Office
Feb. 21, 2024 · 9:30 a.m.
Automated Fact-checking: an NLP perspective
Prof. Andreas Vlachos, University Cambridge
Feb. 21, 2024 · 9:45 a.m.
DemoSquare: Democratize democracy with AI
Dr. Victor Kristof, Co-founder & CEO of DemoSquare
Feb. 21, 2024 · 10 a.m.
Claim verification from visual language on the web
Julian Eisenschlos, AI Research @ Google DeepMind
Feb. 21, 2024 · 11:45 a.m.
Generative AI and Threats to Democracy: What Political Psychology Can Tell Us
Dr Ashley Thornton, Geneva Graduate Institute
Feb. 21, 2024 · noon
Morning panel
Feb. 21, 2024 · 12:15 p.m.
AI and democracy: a legal perspective
Philippe Gilliéron, Attorney-at-Law, Wilhelm Gilliéron avocats
Feb. 21, 2024 · 2:30 p.m.
Smartvote: the present and future of democracy-supporting tools
Dr. Daniel Schwarz, co-founder Smartvote and leader of Digital Democracy research group at IPST, Bern University of Applied Sciences (BFH)
Feb. 21, 2024 · 2:45 p.m.
Is Democracy ready for the Age of AI?
Dr. Georges Kotrotsios, Technology advisor, and former VP of CSEM
Feb. 21, 2024 · 3 p.m.
Fantastic hallucinations and how to find them
Dr Andreas Marfurt, Lucerne University of Applied Sciences and Arts (HSLU)
Feb. 21, 2024 · 3:15 p.m.
LOCO and DONALD: topic-matched corpora for studying misinformation language
Dr Alessandro Miani, University of Bristol
Feb. 21, 2024 · 3:30 p.m.
Afternoon panel
Feb. 21, 2024 · 3:45 p.m.