Transcriptions
Note: this content has been automatically generated.
00:00:01
um hello everyone so um my name is uh one which i'm i'm
00:00:06
senior degeneration manager that's it but uh so today i'm
00:00:10
gonna talk about some uh enabling distance tormented the machine
00:00:14
also have to say i'm not really yes but in yeah and um my my the main uh oh expertise is
00:00:20
but i have a financial then descends but then it but recently i haven't
00:00:25
had a chance to actually collaborate on some projects that you're working with data
00:00:31
huh so first to be on the talk about oh who we are
00:00:34
as a company some of the challenges in the context of digital sovereignty
00:00:39
uh some they use cases in facts uh into the mean of a a and machine
00:00:44
learning and also some solutions that you're a docking inch and first we are quite sick but
00:00:52
so sick days are leading provider of
00:00:54
secure authentication invocation disability supply chain solutions
00:00:59
we're mostly known for our um uh see
00:01:01
you eating stand for security uh physical security features
00:01:06
did you can find in in many um uh banknotes around the world
00:01:11
uh we also known for our uh solutions for physical marking go products an excise goods
00:01:17
um do to nature business you are uh along trusted adviser to garment central banks and
00:01:22
and not putting pins and so on and we like to save ourselves that we are
00:01:27
in the business of enabling trust to to our to our customers into our pockets
00:01:33
no no we're also a global companies your your roughly three
00:01:37
thousand people again and in many uh countries in the world
00:01:42
v. provides of your present on let's say all the confidence
00:01:45
in the provide technology and services in many many other countries
00:01:50
we're almost one hundred your company and your basic living in switzerland
00:01:55
and also most of our our in these actually section base here
00:02:00
no oh well um as a sense uh earlier um
00:02:04
we are in the um in the business i mean even interest but enabling trust
00:02:08
these is difficult you know trust difficult to achieve it's difficult but even physical walls
00:02:13
it's even more challenging to achieve that in the field in the in the digital world and and
00:02:18
some of the challenges that our our customers uh and partners have are in the main on digital sound
00:02:26
so when we say like digital some anti what do we mean by that it's about controlling
00:02:32
the tacit and personal data it's about ability to act independently in the in the digital world
00:02:39
uh also ability to exercise control uh in in in the ability to exercise
00:02:45
you know some rules and regulations in the the the new the mean of control
00:02:49
and when we talk about the society what mostly what implement in the what's uh what sense they is
00:02:57
basically some of the data cement the rules and regulations such as
00:03:00
data residency where you have the the regular uh regulations like um
00:03:06
they're the data is a store uh how's it takes asked how's process that's on also data protection and so on
00:03:13
and roughly around seventy percent of all countries have implemented
00:03:16
some sort of uh some level of data some antsy rules
00:03:22
but besides besides all that we still see a lot of a lot
00:03:26
of headlines that are like a lot of personal data gets exposed interval
00:03:30
hack that some of the public agencies and public administration and being it's
00:03:36
being under you know some kind of hacking or substituted back and so on
00:03:41
so um you know this by the regulations garments in companies they
00:03:46
lose control of of the data and data regulation alone is not enough
00:03:52
and of course when you look at the the emergence of generative yeah and lodging line
00:03:58
three more lot models they only amplify the problem so now we have even even bigger problems
00:04:04
and then she said headlines like like this that that change a pity for example you know has an issue sweet uh
00:04:12
using some private information for for running the models uh also
00:04:16
tried it but be privacy model and also cooperate and and propositions
00:04:22
so those problems and now even even the even bigger with the with the emergence of generated yeah
00:04:29
but still it's you know a eyes uh is very
00:04:32
interesting and very promising field and and uh let's say
00:04:36
our customers you know they they will use these technologies that and now just gonna while up quickly to some
00:04:44
common yeah a fact and use cases um when does that much
00:04:50
so when you look at the the whole kind of machine learning process
00:04:54
the usually it's consist of you know some faces and and first you
00:04:57
have a phase where you really need to uh get the data to
00:05:01
provide data which is usually done by by some legitimate they don't or
00:05:06
in our case is it a these are the common so they provided data that data
00:05:10
gets usually gets normalised uh a clean then
00:05:14
in some cases anonymous then that goes into the
00:05:18
training was training is completed this uh then in all the model gives it boils and operate
00:05:23
it in some uh some production and then gets carried by by um let's say external parties
00:05:30
um and usually as he like garment you know plays the role of bacon all over so they
00:05:36
provided data uh some cases they also supervised or
00:05:40
control the the overall architecture the training architecture of
00:05:44
and the training and also in some cases they actually operate to the modern production
00:05:49
but in much much what's the
00:05:52
um mm what's more often is that's
00:05:56
the actually the this job of you know training and you know the the the
00:06:00
data processing training and then uh uh operating it gets don't you buy so is provided
00:06:06
so what's up subcontractor so now you have you know government service providers who all have access to
00:06:13
to the data and also to to the training more um and also to the models that as well
00:06:19
and then of course there's the all races binge robots that age and that was tied to access to the to accept
00:06:25
to the uh to the architecture of the of the of the model of the training and also also to the mall
00:06:33
and when you look at the possible threats you know of course there's the
00:06:38
there can be a fresh from the inside from the you know some kind of
00:06:42
garment insider that he's acting maliciously either intentionally or non eventually so for example are there
00:06:48
if they are you know but the word you know computer gets hacked and then the uh hackers get actually access to
00:06:55
some of the company or government resources for that um
00:07:00
so there there's there's there's also like a fat from the
00:07:03
service provider because the service provider usually a private company so they
00:07:07
have their own um kind of a uh_huh uh_huh they they have
00:07:11
different teams and so on and for example also provided they can
00:07:17
um maybe they want to save money save effort so maybe they don't wanna train uh the the
00:07:22
model so they they will take some similar model from from another customer who for example the cannot achieve
00:07:29
the accuracy uh all all of the model something
00:07:33
perhaps they they had some on offer is data
00:07:36
to to to this uh garment to bathe in order to achieve the accuracy
00:07:41
that's also those are all kind of an all possible facts and of
00:07:44
course you know there's a malicious third party and be in you know it's
00:07:49
and then when it comes to goals it so i'll just give the data to steal the model
00:07:54
a two one p. m. if performances on to earn money would say money
00:07:59
these are kind of the the common goals and uh yeah when it comes to techniques it's the
00:08:03
um is the uh the data manipulation data poisoning
00:08:07
uh also families so bearing the the model twenty four
00:08:11
better certain records where um there used to train the model and also reconstruction parents acquiring the
00:08:17
the model to reconstruct the training data or or the big well
00:08:23
and the facts in effect can be like financially go
00:08:26
regular to read a petition also instructions uh uh_huh so um
00:08:33
when it comes to you know beta some within a i like what what what are the needs of the garments
00:08:39
oh or government factor oh so first no garment
00:08:42
one sovereignty over its data and and machine learning wells
00:08:46
in order to enable inner integrity of the data uh used to use the machine well
00:08:52
also the protection on on on the rise data that they use but i don't leakage
00:08:56
or or it's it's um so as i said like we are you know we are
00:09:02
as a company we also famous is for for physical market solutions but uh in the
00:09:07
digital domain adventure world a new what the marking technologies are actually require this ditch the will
00:09:13
uh and this is to watermark you know the the what is the most valuable discusses the which is the bit
00:09:22
um alright so some some of the solutions that we're now looking into how to actually address the
00:09:27
stress how we actually we can uh you know that uh uh yeah have we can solve these problems
00:09:33
so one of the main research questions that the
00:09:36
um the we are now investigate and and and
00:09:39
that this can be watermark the data so we can first of all the fact that data leakage
00:09:45
um this and also can we detect that that there are some
00:09:49
other models that were trained on the on the bottom or later
00:09:52
um and also can we detect the leakage source reports that
00:09:58
and for this we are looking into different things and one of the cases
00:10:04
a membership interference attack uh so these these things you know aim
00:10:09
to detect whether known record was used to train the human model
00:10:13
so there are these techniques are usually based on you know measuring different behaviour of
00:10:18
the model for non members and non members so mostly due to overheating so it's
00:10:23
basically based on the fact that it's the uh you know models act differently or act better uh uh
00:10:30
you are on the train data versus uh on the data that they haven't seen before
00:10:36
and um we we believe that these can be
00:10:39
mostly used uh these types of techniques can be most most uses of prevention to to
00:10:44
find out if your model actually it's the data part to try you pour the model
00:10:49
or to use it as a detection tool to the discovered another model usually
00:10:56
this is just like i'm not short short little small but i don't like um
00:11:01
uh about this so basically the for for mention for for this type of ah
00:11:06
technique you know of when you train the data you take some samples from
00:11:10
from the data population than you use it to to to train the the model
00:11:15
and if you don't train the model in the proper way in this model actually acts um
00:11:22
act differently or more provides a more accurate information on the train data versus on the test data and then
00:11:29
this later can be explored by the light yeah sorry
00:11:33
um in in yeah and then they're going the kind of black box model
00:11:39
and the other one is the the data set what the marking of
00:11:43
an these techniques aim to verify the authenticity and integrity of data sets and
00:11:48
couldn't rights owners usually consists of two phases the the embedding phase where
00:11:54
that what remote is inserted into the entire data set or sets it with
00:11:58
and then there's the verification phase better function is a is a different to the ownership of it
00:12:05
and also we we believe that this type of a a a technique can be used as
00:12:11
a detection tool to discover than other public mobile actually use use your data for for trade
00:12:17
uh also on this type of did it you know
00:12:20
can be a used uh or can be adapted the
00:12:24
tomb you know some of the records in a way that the market would be assimilated in in the mall
00:12:31
and yeah just a short diagram a about dates automatic so when you have
00:12:37
some kind of a data uh data sets are enough for some kind of justification
00:12:42
then you can add some disturbance or some kind of the key to
00:12:46
label to its ah so then this becomes kind of like the sample
00:12:51
then all these data goes into training and then if you really want to actually detect whether the the
00:12:57
data is being a lead somewhere um you know if
00:13:02
you actually use this particular sample on this uh um
00:13:06
on the suspicious model you can actually get the the the information uh about this you
00:13:12
know you can actually get the the label um uh that that was marked and before