Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:00
yell and the things we're doing this stock off
00:00:03
so i'm gonna talk about investigating multiple facets of communication
00:00:07
skill assessment and a feedback so that'll be the
00:00:11
primary part of the uh doc ah but they quickly
00:00:17
oh oh just to sort of introduce myself so i spent two years uh
00:00:22
on vision pass man it was a a senior research engineer them uh came to
00:00:29
media and started working on this intersection uh off social psychology and uh
00:00:34
a computing with a manual uh and uh so this was that
00:00:39
these is a topic and was talk again um with daniel and it
00:00:44
would would show mark as well and i'll be look that i'll be
00:00:48
lot as well as a human draw what interaction and now i'm ah
00:00:53
i'd like to bangor and and also visiting for two months uh you need a
00:00:58
uh and i'm now working with opera summary on
00:01:01
schmidt with whom daniel has a a longstanding collaboration uh
00:01:07
for some time right so uh in terms of uh in terms of my research area i'm visiting
00:01:13
uh you kneel and a very briefly i just wanted to um
00:01:17
that's a one our our institute as well uh just to sort of give you
00:01:21
a rough idea so you might have heard of a i a teasing instead of
00:01:26
a technology which is a much more well well known and those are the wall ones which are
00:01:32
let's say having a lot of history but as a more recently there have
00:01:36
been many new a new high eighties right uh so you have i. d.'s
00:01:42
one per state uh so this how it uh india loops and here is where we
00:01:47
are bangalore ah and i'm originally from your
00:01:51
channel amen draws here right and uh so this
00:01:56
uh so this is central funded like e. p. f. l. these institutes uh i'll read as the uh
00:02:03
i i see which is an extra eye is is actually standing for information uh
00:02:10
thing it means like ms that here information technology so they're particularly
00:02:14
focusing on information uh technology and uh it took turns out the
00:02:18
a triple id hydra but and to apply to bangalore uh they
00:02:22
are actually sell find it uh so that i. e. stands for international
00:02:27
if it is a funded by the centre that you can use the word indian it seems so that's the
00:02:33
uh governments rule and is actually international and itself funded ah
00:02:37
just the capital expense the state of the uh charity funds
00:02:41
as but otherwise we have sort of cell driven self run
00:02:44
student fees goes as a input to the staff and faculty salary
00:02:50
right and we have two main programs and back so as the masters uh oh
00:02:54
masters institute so we have and taking c. s. e. n.
00:02:57
easy and integrated impact so that we'll programs a five year program
00:03:02
c. c. and uh you see uh well more recently ah we have also started
00:03:07
the online p. g. deplore minded assigns an m. l. e. i. with a partner
00:03:12
and this is become a very big thing now so we we we it is one of the we say uh oh
00:03:19
best performing in terms of revenues in the country uh this program
00:03:24
and also in terms of scale we like working with two thousand students
00:03:28
play a a types right so it's a a big a big thing that has happened to the institute in terms of a lot of
00:03:36
a teaching load and stuff yeah we do spend reduce spent a lot of time on this
00:03:41
uh in terms of because we have a financial needs and so all right um
00:03:46
so this is just like the band lower we're there and other campers
00:03:52
uh facilities everything is pretty good so in case somebody
00:03:57
wants to visit are inserted you're welcome and this is my
00:04:00
lap multimodal perception that ah we are looking at different modalities
00:04:05
region speech a text modalities and um to begin with uh
00:04:11
need to look at human centred past what some of these interesting applications which i
00:04:17
uh at some point and as you can as you can see uh we need to process different modalities
00:04:23
and i'll go over some of these application uh indeed it but sort of to sort of a step
00:04:29
back a little bit and see where we're coming from so that remain translates so the first in being
00:04:35
automatic behaviour analysis right are times to um thanks to um m. l. and uh
00:04:43
a bridge in speech technologies right so we can do a pretty good automatic behaviour analysis
00:04:50
we can also do automatic behaviour perception which was sort of my pieces will act
00:04:55
and you can also start to do a automatic behaviour analysis and synthesis right
00:05:00
and our if you i mean i'm just gonna draw in few a key words so nonverbal cues are
00:05:07
known to be very important right that is something that uh we know all and are traditionally the um
00:05:13
social psychologist or a human communication experts used to do this um
00:05:20
man really right so they would look at ah look at say it would be
00:05:24
a piece of a sentence or phrase and and look at say eyes grows mouth head
00:05:30
games and and so on right and then they would coded um meticulously
00:05:36
uh and it is a very cumbersome uh process right and that is where
00:05:41
uh we say if humans are able to look at nonverbal behaviour and make social uh inference
00:05:47
right so these are right would slide but i just using it here to set the context ah
00:05:53
and social psychologist ah look at the correlation between behaviour
00:05:57
and the a social construct using man with whom you extraction
00:06:01
and what we could do is we could uh in terms of social interaction modelling to begin with
00:06:06
we could automate your extraction behaviour cue extraction we
00:06:09
can also start to model from beginner to perception right
00:06:13
so this is in the mind right the social inference so we could also have models which
00:06:18
uh i'll say in for who is the dominant person or how the
00:06:21
group is in terms of the group behaviour and so or like so
00:06:26
this is sort of the uh contacts so automatic you extraction
00:06:29
and so let's automatic uh behaviour a perception right so well
00:06:35
and uh i'll over the years uh things have only
00:06:39
become better and better rate for example i'll open face uh
00:06:44
you could using open face we could uh look at the i a. is we could look at the height boards
00:06:50
um we could look at facial landmark facial uh expression
00:06:55
interviewer time a way right so what this says a helpless
00:07:00
yeah so at least uh to begin with say the face
00:07:04
and i is we're able to quantify oh what's happening in a you know
00:07:09
you know precise rain earlier time fashion right so that's a huge uh cues development um
00:07:17
we we could that's that's and
00:07:23
hand guesses as well right so times to open ports
00:07:28
one can start looking at very precise joint location estimation in terms of
00:07:32
body gestures as celeste hand guess just right so that is also a fantastic
00:07:41
so
00:07:44
once we have these juice automatically uh extract it right so
00:07:49
this can help us for example if you can uh do this behaviour tracking as in
00:07:56
extract the head always extract where a person is looking at and so on
00:08:00
uh maybe you could summarise the behaviour and start to predict social constructs right and
00:08:08
and this is what is shown here you can extract that somebody and then start to build models
00:08:14
which they can behaviour as in good condition don't behave yeah you can map that to a perception
00:08:20
right uh and this is an interesting uh a a a bit of research
00:08:25
right and that thought technology or the toilet trained as a starting from say
00:08:30
um what used to be industrial robot which doesn't and that's that a human who i'm looking at
00:08:37
a social navigation right uh sorry uh physical navigation now you can start to move to words
00:08:43
a social navigation or conversational e. i. right and uh and
00:08:48
it is also situated uh if you have a a robot
00:08:51
you can show directions you can you can make some movements or you can also um look at
00:08:57
i just agents without the well the well or what uh
00:09:02
doing gestures and moving and stuff so you can you can
00:09:05
you you don't have that but then you can pretty much have a conversational yeah a a going that might people are
00:09:12
multiple people talking to agent and so on right so this is also becoming very uh
00:09:17
very interesting and possible of course driven by all this uh improvements
00:09:22
in perception technologies right so that you can do perception and action
00:09:26
uh together right so it on one hand you have the uh behaviour analysis which
00:09:31
is helping you to finalise or perceive that situation right and you can you can
00:09:37
start to make some decisions as in okay finish the not speak now and so
00:09:41
on which is uh taking decisions and start to output it and also be is on
00:09:46
based on interactions you can also learn models um i mean you can use prior models or keep learning
00:09:52
uh models to how you can interact right so this is our the tech trends right and now ah
00:09:59
when we ah n. s. r. well looking back in india and looking at some of the um what is it
00:10:06
social problems or what have some problems which can
00:10:10
be addressed using these uh take our trends right so
00:10:14
at least to be a retard where a pretty interesting so um so to
00:10:19
begin with art i mean agenda problem in india is yet trained manpower is bad
00:10:25
and our and and it is only available in c. d.'s
00:10:29
particularly in private um schools or colleges state so or in private
00:10:34
uh situations right so the largely the government machinery
00:10:39
is operating enough you know where you say uh below potential
00:10:44
in engineering colleges uh which are like a lots of private colleges
00:10:49
right so the even there at the the quality off say that
00:10:53
engineers that are coming out is pretty bad in terms of imply ability
00:10:58
and one attribute a one aspect of that is the soft skin great there's also fart still
00:11:04
coding skills and so on it is uh is is lacking and you've you want to have
00:11:08
i'm more in for you can look at that's biting mine's desert study which has been
00:11:12
doing a lot of good work in terms of a measurement of how situation isn't on the skills and so on
00:11:18
uh so it is basically a soft still uh is is is is low right and we think
00:11:24
technology it would be a a a solution or uh or
00:11:28
play a role in this uh in this problem again ah
00:11:32
peace that passes across country is a show that uh we are in the
00:11:37
bottom right uh so this is from india but as you have other countries doing much better and these two states are supposed to
00:11:43
be the best states uh that the government talk we can showcase
00:11:48
but then the results turned out to be pretty bad and uh oh
00:11:52
and i don't mean school school say seventy percent of the students right
00:11:56
uh so it's a it's a it's a major problem because you have
00:12:00
in terms of population we have lots of young people
00:12:03
uh entering say schools and colleges but then ah the quality
00:12:08
uh is is a major issue right uh so so
00:12:13
so maybe we think that ah some back here language mask
00:12:17
is um uh to be able to teach as well as
00:12:21
a hard assets people is is uh probably an opportunity we think we can play a role ah id in the tour
00:12:27
a problem which is in the space of accessibility um people with hearing impairment uh
00:12:33
did not socialite ah we do not have enough a sign language interpreter site so
00:12:39
so these are some of the more deviations right uh which form the basis for some
00:12:44
of the problems that uh we're going to look at right one is still assessment and feedback
00:12:49
uh situated human agent interaction and three uh indian sign language synthesis right
00:12:56
and uh let me focus on a scale assessment and uh and feedback so
00:13:02
because the slightly general term i'm soft scale is one aspect topic which is what we're
00:13:08
focusing on a much more than others um which which could mean interview in interviews or
00:13:15
in teaching up ascending situation but i also sort of interested in this larger
00:13:20
uh being which is which is about say a dancing scale of photographic skill or
00:13:25
any other a skill that um maybe you can sort of classes and you feedback
00:13:30
uh from from art you can also look at sports 'cause i think there's a lot of
00:13:35
interesting work already happening but we we come from my point of view we think these out
00:13:40
uh important problems to look at and maybe we could have some impact on uh by working on these problems
00:13:47
and uh just to quickly put a given snapshot and we'll move on to the primary topic
00:13:51
so in terms of dancing school we have a look that i'm a but not b.
00:13:56
m. dance fall madly look at was just
00:14:00
not just recognising process but also uh expertise of
00:14:05
how how i mean what what are the expertise level off of a dancer was performing these
00:14:10
abortions and this this class of work is also start starting to be known as quality of action recognition not just
00:14:17
action recognition which is the poster but a quality of action um and oh
00:14:24
not just portion but also expressions i'm right and oh so this is for the dancing so we just uh
00:14:31
look that one particular dance a form which is but not b.
00:14:34
m. ah we also look at photograph uh analysis different morgues of autograph
00:14:39
uh and uh how you can give feedback to people so that is one uh one but we did
00:14:45
and then our table tennis video analysis as in looking at the body a body
00:14:52
joint and a bad relationship bad ball relationship and so on so these are uh oh
00:14:58
two things that uh we have started to look at in sports as well but i'll poke
00:15:02
was on the first problem which is soft scale and particularly communication a skill assessment right out
00:15:10
so you probably have heard of this higher view uh which is a which is uh
00:15:17
which is a platform to uh to sort of do the interviewing
00:15:21
and this interviewing is i think that is in the sense that is nope wasn't on the
00:15:26
uh are the side maybe a question is given to you and then uh you
00:15:30
would accord uh you record your response right so this is a this is a um
00:15:37
is one example which is had of you but then
00:15:39
uh such systems are starting to become commonplace later add up
00:15:44
that people are are assessed that using these uh platforms right and oh
00:15:50
and what we would like to uh look at is is the
00:15:55
set of all the parkway save a certain set of possibilities of
00:16:00
uh accessing people oh these are fairly oh well understood a face to face discussions and
00:16:07
a group discussion need not be assessment would into ah that have these assessment oriented interactions
00:16:14
uh off which save face to face interview is the most uh most common uh
00:16:19
for all right and then that is as in can as a radio interview which we just talked about
00:16:25
uh but then there are also other possibilities instead of having an interview a
00:16:28
you can have a discussion with the p. uh uh or a group discussion
00:16:32
uh and also uh written interviews oasis and songwriter this is that assessment oriented
00:16:38
uh interaction a space that i mind is disadvantageous a face to face interview is a
00:16:44
a proper social interaction that the person on the other side you can promptly can ask questions
00:16:49
uh it it it but then it is non scalable that sends a
00:16:53
up the number of interview us uh you cannot you cannot scale if you
00:16:57
have say thousand uh interviewees right whereas here this is gonna scale ballot ah
00:17:04
and you can also wrong you can ask questions maybe your resume is uh
00:17:08
in in its traditional form you don't have the prompting option it is just a free form it's human just like you have it
00:17:14
a a document based this you may you can have all of us
00:17:18
to me it's still scalable uh but then ah maybe there's no prompting right
00:17:24
and and so on and so this is sort of the space ah we would
00:17:27
like to uh look at in which case defacing value appears we'll resume yep yours
00:17:32
and also the i think and as a video interview uh so there's that platform bad
00:17:38
uh you can ask a question and and on so and then you have the video recorder
00:17:43
and you can uh you can want to the next question and so
00:17:46
on to it's it's like a a high review type of our ah baffle
00:17:51
right and out what are the the main question that we're asking
00:17:55
a v. a. so that synchronise the video interviews that emerging right ah
00:18:00
face to face interview that fairly standard well understood
00:18:03
but these are relatively new um there's no a
00:18:07
systematic study which compares face to face a interviews with a a bit i think and as um
00:18:15
as in goodness uh well of the interviews yes in terms of
00:18:20
in terms of assessment right in terms of assessment in terms of
00:18:23
uh also comparing spoken versus a written uh as it can as
00:18:28
media interviews so one one setting we also have in our work
00:18:31
is the same person is appealing and bought a setting so we have the same person appearing in face to face as well as
00:18:37
uh i think it is a video interviews and also uh according to
00:18:41
our knowledge we we we also have we're gonna interesting problem which directly predict
00:18:47
actionable feedback to participants i'll come to it but subsequently uh instead of getting a feedback
00:18:52
where you say you're in the this percentile that was into you can also directly but it
00:18:56
uh actionable of feedback so i'll talk about it as well okay so
00:19:01
the the first uh first study where we had a hundred participants
00:19:06
who go to ah the same uh i mean they both do
00:19:10
what i think it is video interviews hassle as a face to face interviews one after the other so in this case
00:19:20
it's not a randomised so they everybody goes through this followed by the face to face a interview right
00:19:28
and we had hundred participants um to twenty to twenty the age group
00:19:33
and we have a we had a hundred questions that you rated by asking
00:19:37
he had chart a high jars of possible questions and five questions are sample randomly
00:19:43
and then ah you you go through this uh well i think an s. video interview
00:19:49
right and we wanted to ah study uh in terms of
00:19:53
these two settings how the manly the ratings differ in terms of
00:19:57
how people up let's see uh in these two settings ah the
00:20:01
interviewees we also wanted to study how the automatic production is different
00:20:07
in terms of say performance uh for the setting was is that setting
00:20:11
right and a lot and we also wanted to see the any difference
00:20:15
in behaviour uh across the second right so this was our original a
00:20:19
goal and to what this we had a we had look bad oh
00:20:24
a a a ruby instead of rubric set attributes that
00:20:27
we manually annotated so the annotator as annotated uh annotated
00:20:32
all all i mean all the interviewees on these uh attributes like op which uh this is out of ah
00:20:39
variable of interest what all our communication skill rating right we also have our the ah
00:20:45
other attributes based on this uh this work which
00:20:49
is a fairly standard work in this area um so
00:20:54
once we have the annotator as well looking at uh people we could also start to look at the uh couple agreement
00:21:01
and the the agreement for the effective communicator waiting is a pretty good right
00:21:06
so in terms of agreement is pretty good uh there uh there are features
00:21:11
which are good some didn't have a lot of agreement right and oh
00:21:18
in terms of correlation to begin with our fluency turned out to be a the most relate it
00:21:25
so the manually annotated a attribute please note that
00:21:28
uh so this was a pretty uh closely correlated with
00:21:32
uh with the overall communications kill a point eight two and and point
00:21:38
seven three actually does even higher in the case of in to interview
00:21:42
interface based interview that's as a face to face right and there are a bit out a set in
00:21:48
uh we see some differences that you see but then uh oh more or less uh you you you see that
00:21:56
um you see that uh it is similar patent right in
00:21:59
terms of what features or what attributes are related and so
00:22:07
and are out of the hundred we use uh that uh that was used in
00:22:13
the interface i mean so we had hundred and what cases uh eighty nine had
00:22:18
a majority agreement right so the large portion where a majority
00:22:23
ugly so we're talking about one two five scale uh and i'll
00:22:29
eighty nine videos there was agreement and uh eighty to uh reduce there was agreement in the case of a
00:22:35
face to face right and then just looking at the common set of wheels we have a sixty nine of videos
00:22:42
right and this is the set of wheels we started to look at uh on is what
00:22:46
the subsequent analysis site and if you if you if you sort of think about what is that
00:22:51
what is our goal our goal is to see if this is sort of a benchmark
00:22:55
red uh using the attributes manually extracted attributes how
00:23:00
well can you predict uh the communication skills rating right
00:23:05
can you do at similarly for the case off our for the case of a
00:23:09
automatic production right so here having videos that task as well right so the task
00:23:16
a task is basically are the task of predicting who is a below average communicator
00:23:24
right so you have these ratings and uh oh we have these
00:23:28
uh well videos where there is agreement and in those uh videos oh
00:23:35
i mean the ones who are below are below the average
00:23:38
score right which is three those other ones uh what below average
00:23:43
and and i'll uh averages out the set so we have a binary classification a cost
00:23:49
which we're trying to ah of which we are trying to uh i can't write and uh using them and a little
00:23:55
um and other features we're able to reach an accuracy of the order of
00:24:00
uh eighty eight ninety right out in what in the settings of face to face as well
00:24:05
as interface bit it slightly lower because of interface ms but still you are able to reach
00:24:11
ah of of the order of nineties but that's when you start
00:24:15
oh using automatic uh features site and here are uh the set of
00:24:21
feature which turned out to be a the most important and here you have features
00:24:26
i'm speaking activity based right in terms of time and so on a prosody a speech
00:24:32
data articulation drape so speech and make a optimal uh speaking of it turned out to be
00:24:38
uh an important uh important feature and the prosodic feature uh we absorb where sort of the
00:24:46
most important right and it is sort of understandable that communication skill manifest mainly in terms of
00:24:52
uh in terms of uh audio and here uh even more so in terms of uh prosody
00:24:58
right so what this also means is maybe if you have a automatic model to predict uh
00:25:05
very communication skill uh you can use a speech base moderator prosody base model
00:25:10
and uh uh it's or turns out here the vision features the visual features that is so important it so
00:25:16
uh of course vision pages are hard to compute and uh for this task they didn't add much
00:25:22
right so which means that maybe a model uh using prosody is good enough
00:25:26
are good enough for predicting communication uh skip right we also had um apart from
00:25:33
a visual prosody is speaking activity we also had lexical a lexical features uh for example
00:25:40
an interesting feature uh was a difficult word
00:25:43
count so really bad i'm a sad uh l.
00:25:48
a sad a receipt side exam right and what was supposed to be difficult words
00:25:53
uh from from there and we we hypothesise that maybe
00:25:56
the one who have good communication scale they also start to
00:26:00
use words which are uh which are uh probably not use but other than they're supposed to be typical words and all
00:26:06
right so it so it turned out yes those features that pretty pretty useful
00:26:11
and another interesting observation here was we did do the um manual transcription completely
00:26:18
so we had a for all the videos we had a manual transcription as well as the a. s. output
00:26:24
right and uh we we could see that uh for
00:26:28
this task the um manual transcriptions perform as well as
00:26:33
uh as well as automatic transcription and the accuracies of the l. a. s. r. better pretty
00:26:38
high right so it was it was uh i would say fifty percent or so so does
00:26:43
it is not really good whatever it but still uh well but still uh it was it was
00:26:50
as good as the uh um as good as the manual
00:26:54
transcription it so i think um i think uh the broad conclusions
00:27:00
from the study is um yes we we can get similar accuracies
00:27:05
right face to face was as a interface based right so we have a benchmark here
00:27:11
uh which is a face to face of with respect to that i'll be uh able to reach pretty close seventy
00:27:17
nine and eighty three right so we're able to reach a pretty close in the interface based setting so that sort of
00:27:23
uh say is that uh you can i mean there's not much lost in terms of automatic production
00:27:29
uh in the case of a interface base the setting right and uh i
00:27:33
also um if you wanted to have a quick uh a quick job of production
00:27:40
using prosody cues um is is a is a safe and a good option right
00:27:47
okay so more results you can uh you can find in this uh in this paper
00:27:51
so we have also try to look at differences in behaviour and so on a a
00:27:57
oh okay so this was about um combating face to face interviews
00:28:03
was says uh interface a base interviews and uh oh oh oh so
00:28:11
after doing this uh next thirty uh what we tried to do was we try to see if how this
00:28:16
interface base interview which is a higher view that hoping to you how it compares
00:28:21
with when you have not spoken but uh written for great britain interface based interview
00:28:28
and a return a essays right a value have longer
00:28:33
time to think and write a about a a topic whereas
00:28:38
in the interface based setting uh you might only have
00:28:41
very limited time uh correspond to the questions like um so
00:28:47
so uh in terms of uh in terms of the ball what
00:28:51
uh it was it was a very similar ball the so the difference study uh different uh i'll uh data set i'll
00:28:58
where we have again the same situation i the same participant is going to uh is going to go to all
00:29:05
go to the video interview as well as the return interview and they say like
00:29:10
and the interface looks something like this you can choose a choose to go one
00:29:15
one of these option it is it was the choices given to the participant they
00:29:20
could uh do one first and then the other one later and so on a
00:29:24
right and i mean ah again i mean looking at the cop our values and so on so here um
00:29:32
so here you see that um that one observation uh the multiple
00:29:36
observations one of the reason is the content relevance had a pretty little
00:29:41
a little couple value right um and the overall
00:29:46
uh so so maybe people do not a hobby
00:29:50
on uh how relevant the content is uh when it is when
00:29:55
the uh interview itself is a short interview rated is not long
00:29:59
but as the moment you bought two essays you start to get
00:30:03
a higher couple values a point seven one and a higher uh a
00:30:10
higher values of this couple would sort of tells you whiny uh why
00:30:13
is is that a very i mean look that as an important instrument
00:30:18
uh for accessing a written communication a skill right so of so the couple
00:30:23
values here that is slightly a lower rate that is one of the vision
00:30:27
uh apart from that yeah we did have some uh uh writing based a rubik's
00:30:32
which which read uh which we have related which we did manual uh annotation right
00:30:39
and it the the the the correlations in terms of what we what uh what
00:30:45
rubik's out most related to the what all uh all we're all communications scale
00:30:50
and uh so these where some of the important ones you see here fluency confidence
00:30:55
uh and so on convincing so convincing or persuasive nesting to be very important
00:31:01
uh important uh attribute a two pretty communications good
00:31:07
and uh again i mean we have these course the distributions and
00:31:11
so on and we make it into a binary classification a task right
00:31:16
and then try to see how so it it is exactly the uh same story
00:31:22
that we saw before we we we try to use a rubik's to put it
00:31:26
right and with the rubik's you can go for a pretty high in terms of
00:31:29
numbers accuracy uh f. and school and so on you can you can do pretty well
00:31:34
uh and we want to see uh with the uh well automatic features how
00:31:38
far can you get right and uh as a result uh results show um
00:31:45
in terms of the overall numbers here we have interrupt
00:31:48
eighty we have a point seven five which is the largest
00:31:52
a lot just accuracy school here like and uh
00:31:56
it so here this is all only text based uh features like that is that uh did include
00:32:03
a prosody features and so on right and uh so you can see that ah or is it
00:32:10
i mean you can you can use use this setting a pretty much
00:32:16
uh i mean in terms of automatic assessment be settings that as good as
00:32:20
uh say using a essays as a as an instrument though the top
00:32:25
was very little well but in terms of of prediction accuracy e. does
00:32:30
it is sort of similar uh numbers right so that is one observation we also observed that
00:32:35
uh they seem to be one common factor right which is the communications scale
00:32:40
factor which was related to the overall rating in all three settings right so that's
00:32:46
i mean there's not much difference between these uh between the settings and uh maybe
00:32:53
there's one ah or you say so we we we might hypothesise right so we might
00:32:58
say people much some people might be good at this some people might be good at
00:33:01
that and so on it so that was a little uh found to be very low
00:33:05
the sense because uh they seem to be a common factor if you are a sort of low in this uh
00:33:10
scale then you're doing pretty much a low in other settings and soul right so that was also out of that
00:33:16
observation that came out of this and are we also had a a a user preference sort of
00:33:22
a question right and i'll in terms of the interviewee is that preference was
00:33:28
this right uh up in terms so half of the participants preferred this setting
00:33:36
um are compared to say one of these right and between these it was sort of equally
00:33:41
uh you could redistributed but this was this turned out to be a a
00:33:45
more usable for a setting right so that was also interesting observation for us
00:33:52
right so so as we just saw a i've for study uh try
00:33:58
to understand the interface based setting reserve is well understood face to face interviews
00:34:04
right uh because your replace this and what are those right so we
00:34:08
want to understand ah how good this is in terms of various aspects
00:34:13
ah also in terms of uh other options maybe if you have a uh
00:34:18
if you have a situation where you can't do the video streaming already gone uh
00:34:22
uh you have a bandwidth issue maybe this could be a a possible solution written interview
00:34:27
of course you might have to do a proper link uh well maybe switch on the camera out once in a while
00:34:33
and do a proctor inc maybe you never know who's writing a examines all right so maybe that is one issue uh
00:34:40
and you also uh we could we could compare it with a a
00:34:43
more avail study uh i'll reset assessment tool uh in the written space right
00:34:52
um so one of the limitation off ah these emerging are sort of uh
00:34:59
emerging sort of assessment tool is you know that you'll uh you can understand how well that talk
00:35:06
right oh by then sense but you do not know all our if they're a good team player
00:35:12
right so how would you uh how to know about it right
00:35:15
of course if you have a interview interviewer interviewee interaction uh then
00:35:21
based on the interaction you can find out some of these aspects uh oh but then ah
00:35:27
again the interview uh having an interview it is not scalable right so one
00:35:32
oh one option uh oh is to have a online a
00:35:36
face to face discussion rather than a partner interaction partner is actually
00:35:40
appear right it is not um it is not an interview
00:35:44
uh but it is uh appeal rights or two people are interacting
00:35:48
i'll order of c. s. type like a platform
00:35:53
uh and at the same venue also for analysis you need to uh you need to record the videos that's
00:35:59
well maybe not so what like so it was built this a discussion back one right and then started to uh
00:36:07
ah started to just do a similar sort of study that uh you're predicting communication skill
00:36:11
at the individual level but also uh trying to predict the discussion waiting at a diet level
00:36:18
right media the hypo this is is uh if you have a uh if you if you if you are a good communicator
00:36:25
and you have such a peer to peer discussion maybe
00:36:28
a bit of five or ten people right you would you
00:36:32
would of course be rated high in terms of your
00:36:35
communications good but also you you possibly have very good discussions
00:36:39
uh with with multiple people right so that's a possible a hypothesis uh of
00:36:44
course we didn't have much to put such interaction but the the board was
00:36:48
uh could begin with go to words assessing um whether
00:36:51
the person uh is able to uh say interact with others
00:36:56
uh you know the which is a which is perceived as
00:36:59
say not dominating or not interrupting and and and be able to
00:37:03
listen to others and so on right so those things you can uh you can measure maybe in this uh in the city
00:37:11
right so this was uh so it is a similar flow i'm not explaining a a lot on this
00:37:16
uh and then the other work that we uh i just mentioned was about feedback
00:37:22
uh production where you have a positive feedback and a negative feedback
00:37:27
and the baby uh pose a problem is uh uh of course i mean the the motivation for this
00:37:32
what came from the fact that if you have a a feedback in terms of say putting people in bins
00:37:38
and say that you are and there's been in that bin right it is not really a easy for
00:37:43
uh say uh interviewee to take the feedback and work on it right so that that becomes
00:37:48
a little challenge whereas if you have a a direct feedback uh like this of course we be
00:37:55
we we did some work before we came up with say a these a set of feedbacks
00:38:00
by talking to people who are trainers and usually ah you feedback right and then we say ah
00:38:07
let us pause it as a problem that it's like are you just
00:38:11
uh for a given video uh you predict whether this is on or off
00:38:15
right just put it on or off and maybe for some people
00:38:19
everything is on right so you they have to work on multiple feedbacks
00:38:22
maybe for some it is only one feedback right and and that that is what it is in terms of actionable uh nature right
00:38:32
okay so apart from how these studies we also
00:38:37
did ah one one um or is it too
00:38:42
one word where you want to see that the uh the sort of makes sense
00:38:47
uh in the in the real world right so we worked with to start ups right oh
00:38:53
these are to start ups so this um the startup e. hunch
00:38:58
right so they met in the space of ah in the space of hiding
00:39:03
uh hiding people uh or selecting people uh for a particular profession save face
00:39:11
uh and so on right so they they did have some clients where um they did receive lots of applications
00:39:18
and they can they can devise a a test and then maybe based
00:39:23
on the profession they have to ah sec people who can be subsequently
00:39:28
uh interviewed face to face and sore right so this was uh this company and uh what we did we do got or
00:39:35
oh so that i mean i mean they had a they had created a business model read um
00:39:41
they were doing it manually the sense of videos will come and or night i mean they just gave
00:39:47
uh i say two days time right so they have to do this
00:39:50
man will uh creating of every person um i'll begin today's right and
00:39:55
uh oh i mean what time they found that uh it is not going to stay by the sense uh they don't have enough people to
00:40:03
hi there this so they approached as an uh we gave them this
00:40:07
prosodic model that i talked about uh just to um just use prosody features
00:40:12
um for predicting the communication skill uh rating maybe on a scale of one to fight right and they found that
00:40:19
uh that was giving them a very good starting point right
00:40:23
so the the in fact use the user model and ah
00:40:27
and and and started uh getting benefits from it right to there's one or is it a small success story or whatever
00:40:33
right and we also approached another uh another startup uh
00:40:37
which is it was working on a face to face interviews
00:40:41
right um but are they had a business model is slightly different their business model was based on feedback
00:40:47
right so they uh they you uh the off but i'm saying export
00:40:53
who will do a interview with a with a candidate in a college right
00:40:59
and the student pays for that interview they they
00:41:02
they uh money for this interaction like a mock interview
00:41:06
and the mock interview could be a for a heart still
00:41:10
as well as for socks good so they you back feedback saying
00:41:14
okay maybe you have to uh include this on the other uh so they have the c. v. and and start asking questions
00:41:20
and they will give feedback wharton hearts get as they let some uh starts get right so here is a so
00:41:28
they already have a a booking business model and the challenge
00:41:31
was um their margins very low defence whatever money the student uh
00:41:37
uh gives it goes one two uh one to the uh or is it the expert
00:41:45
right and then little bit to the platform which is doing the storing and
00:41:48
still right so um then margins very small so we're trying to explode if
00:41:53
the i think it has to be your interview could be uh could be a very uh very that
00:41:59
could add value to the uh interview we are the one who is uh trying to get training right
00:42:05
so as soon as we talk to them and they did try a
00:42:09
few uh or is a user experiments and stop what they realised was um
00:42:15
so that here in terms of assessment this makes sense but for
00:42:19
a person who is uh trying to get feedback right for them um
00:42:24
i mean this mechanical ah nature of one question after the other
00:42:28
uh that was very uh very boring right uh i got a question i and so i got a question answer
00:42:35
uh and so on it unless of course if a feedback quickly follows it's a it's a
00:42:39
good thing but then the um somehow well it was a little mechanical a type of well
00:42:47
an interview right and they they in fact as just to work on the problem which is uh can you work on a
00:42:53
sort of a follow up question a generation right so if you have a question
00:42:58
and you have a on stuff right now can ah can you come up with
00:43:02
the possible a follow up question uh which can be generated based on the on
00:43:08
so that the person has given like somehow i mean that brings in the interactive nature
00:43:12
the whole uh interviewing process right so it just means
00:43:17
that yeah that is somebody who is a responding to that
00:43:20
on so that the person has given like so so based on this we started
00:43:24
working on this problem to begin with we uh we looked at the return setting where
00:43:29
everything is text form right uh in the future we can extend it to uh the
00:43:34
spoken formats but you can that yes that and then do the whole stuff right so
00:43:39
what we did ah well does the more recent work was to
00:43:44
uh somehow um summarise this ah this a paragraph with the f.
00:43:50
a. a a one or two sentence and the sentence has to
00:43:53
be said for that um it has to be different right in the sense if you uh if you have a question and an answer
00:43:59
and you pick up a part of the ons the which is very
00:44:03
much related to the question uh then you almost asking questions similar to the
00:44:08
question that was previously asked right so i'll first step was to somehow um find out
00:44:15
a sentence uh here which is dissimilar to this uh question and possibly and
00:44:21
so it is it related right ah but then it is part of the onset
00:44:26
and then uh so here you see that this is the the the sentence above centre for this which are which are chosen right
00:44:33
and then we also have some focus are broken down to one and then we use the uh uh
00:44:39
a model a question jen eugene a model which was already up to be trained and uh
00:44:44
how was used in a a particular application we use that oh mortal um to our new
00:44:52
of course the model itself need for couple of data set for it so we use the this glad data set as well as an amazon
00:44:59
uh data this is from stanford is uh i misunderstood which was
00:45:02
so it later we found that this was pretty good for art class
00:45:06
right so this huge annette model is a bi directional a l. s. t. m. o. model and uh
00:45:12
it does train on a cop was which is educational oh corpus right so as you can see uh
00:45:18
that is there's a pretty interesting right so i'll in terms of
00:45:22
in terms of what we observe uh somehow using the summarisation step
00:45:27
uh that turns out to be a important so if the the question follow question generation
00:45:34
bit a summarisation uh seems to be backed out like a compared to
00:45:40
i'll without a summarisation and they also found that the uh
00:45:45
i mean you can use existing a data set which have question unanswered for this task
00:45:50
and get our good reasonable does that's like so we didn't go ahead and
00:45:53
evaluate uh evaluate these questions uh in terms of grammar and relevance right and um
00:46:01
and we we could get well fairly decently that some eighty five percent or whatever
00:46:06
it so we were able to uh we were able to get pretty good oh
00:46:11
a school worse in terms of uh relevant and or grammar
00:46:16
a grammar slightly better but relevance also but not back okay
00:46:21
so this was this was an interesting problem be a stumbled upon because
00:46:25
we interacted uh with the start of a company right so that was
00:46:30
good and the just the sort of apple this uh this aspect right so i'll be i'll
00:46:38
once we started looking at the follow up question and so on so we we're restart also thinking about i mean what should be the
00:46:44
basis of the followup question can it be just a learned using some
00:46:49
data and so on and so i mean there are some interesting ah
00:46:53
interesting papers uh in the space more from the personal uh
00:46:58
psychology literature right so which i doubt i'll mention just either
00:47:01
uh into a a part report so
00:47:07
one thing that is very clear is that ah structured interviews
00:47:11
are a better than on structured interviews by structure interviews we mean
00:47:16
um questions which are a fairly standardised you go through those questions
00:47:21
buses say unstructured interviews where the interview it just comes up with
00:47:25
his or her own question right so it was it is very well it is fairly poor
00:47:30
that are structured interviews reviews the um bias
00:47:34
so in terms of discrimination of say a people agenda uh or
00:47:39
a base uh and so on but agents one so
00:47:42
the discrimination ah was found to be less uh less in
00:47:48
ah the lesson structured interviews was on structured interviews
00:47:52
either a interview settings actually structured a structured interview in that sense right um
00:47:59
and if you if you see the flow rate it as a follow up question is going in
00:48:02
the opposite direction it's it is a it is not a structured uh interviewing that's inside so it is
00:48:10
uh it is becoming an structured a type of interview right so the
00:48:15
very i mean i mean one has to maggie these uh things are somehow
00:48:20
um get um a a social psychological a basis for this site i mean
00:48:26
there is a need for structured interview but there are some works which also
00:48:29
uh say that there are certain limitations uh let certain limitations with the
00:48:34
a structured interview which can be maybe managed the unstructured interviews maybe what should be there right though
00:48:41
and uh of course the questions uh what goes as a valid question uh in the uh
00:48:48
in the set of questions are being asked in a structured interview there's also bases for that
00:48:53
there are questions about past behaviour a situational and so on and and sort of some of the logic that is
00:48:59
maybe if you if you really like intention uh that also other determines your future behaviour
00:49:04
and so on it because uh if you see uh they know that they are interviews up
00:49:09
uh meant to sort of get some understanding of how the person is going to be gave a in in his job right
00:49:15
uh and so on so i mean i if i just reading this if you're interested um
00:49:20
where they talk about impression management followup question but humans so there was there's i i'm human that
00:49:26
a follow up question but uh with humans when you have a structured interview uh that it
00:49:31
shouldn't be many follow question maybe one is good but not many because the more it becomes a
00:49:38
it loses the loses the structure nature itself because um i mean
00:49:45
i mean in some sense uh you want to control the interview was to just stick to
00:49:49
a stick to assessing the thing that they want us us but not
00:49:53
get into a lot of um maybe their own biases and soul right
00:49:57
okay so maybe a future goal will be to somehow
00:50:03
incorporate um incorporate some of these uh studies as well yeah
00:50:09
so this is the main part of work that i wanted to
00:50:12
talk about but i'm going to now just a flash uh some ah
00:50:20
some related work that we have done just to give you
00:50:23
a ah intro right so we also looked at a presentation skill
00:50:29
as in uh looking at how how engaging a a person is presenting and how engaged l.
00:50:37
the students are on and off we have ah we
00:50:40
have also a ratings in terms of content and they really
00:50:44
uh that was given by the uh students or the participants i'd so that was the same in other day long seminar
00:50:51
and the people that uh presenting one after the the the also dating each other and so on so this is
00:50:56
ah one word biting chew visitor uh uh you happen but but then you are
00:51:04
so so of course you can see i'll open for send open face in action uh
00:51:13
and so we wanted to look at not just tell classroom setting but
00:51:17
also moog setting engagement like ah so this was participating a challenge i'll
00:51:25
written written communication skill not just in terms of like text but also handwritten
00:51:31
right maybe uh it's it is possible to uh take a picture of
00:51:36
a notebook that a student writes ah and and and be able to access
00:51:41
ah the communication skill maybe you can have a automatic english teacher and
00:51:46
so on like uh so this is one area where we started working and
00:51:51
uh i'll mention this a data set it is interesting so
00:51:55
this is a p. c. i. data set a annotated set
00:51:59
and uh it so don't doubt that well it has these uh don't roots
00:52:04
uh here and also the overall rating and so on and it's a very challenging
00:52:09
a data set right and i think this is sort of a reasonably close to how maybe
00:52:15
the though notebooks of students are going to look like like and if you can do uh oh
00:52:21
do handwriting recognition hassle as a rating uh that would be an interesting problem
00:52:28
and uh well
00:52:34
so i'm gonna conclude in next five minutes or so well
00:52:40
ooh ooh ah ooh
00:52:48
a
00:52:55
ooh ooh ooh ooh ooh ooh ooh ooh ooh
00:53:07
uh_huh uh_huh
00:53:17
ooh ooh ooh ooh oh okay so this shows uh the sort of shows the direction in
00:53:26
which uh we want to go which is a which is to be able to do the um
00:53:32
employment interview using an agent like and uh maybe start to have a
00:53:37
follow up question generated and asked the question and so that is uh
00:53:42
that is a deduction which we wanted big and we'll also uh
00:53:49
ah
00:53:55
okay so here uh i was mentioning about the language assessment uh as a language teaching so
00:54:01
does the uh there's one platform where we're trying to see if we can have a language
00:54:07
a teaching who want an assessment room we have some body comes maybe
00:54:12
the register themselves as a user and uh they can do multiple
00:54:16
sessions interacting with the agent and maybe learn few words in any language
00:54:21
maybe converse in a a a home language maybe english than french
00:54:25
uh and then maybe also access ah subsequently uh
00:54:31
in terms of their about uh how will they learnt and so on and so this
00:54:35
is something that we're building a and b. think this could be a interesting uh application
00:54:47
yeah
00:54:52
if
00:54:58
uh huh so uh here you see so well we say it's it
00:55:04
tori sentence i would say but the idea is if somebody speaks
00:55:08
ah do the a. s. r. ah maybe sent simplify the sentence
00:55:13
translate the sentence into a sign language and be able to uh do it with
00:55:18
the agent right so this is the sign language a synthesis work i was just mentioning
00:55:23
and uh of course he uh the the the challenges uh
00:55:29
in its current form right so what i
00:55:31
uh sure knew is uh is as reasonable limitations
00:55:39
so there's a lot of man will work that went in to create these uh
00:55:44
these guess yes but it would be ideal if we can have um say vision
00:55:49
a region helping in terms of getting the t. v. joint locations
00:55:54
i'd give a sign if a if an expert performs these uh signing uh
00:55:59
if you can get the three joint locations and be able to transfer it
00:56:02
uh to the age and then ah make it not concatenate abut parametric and so
00:56:07
on ah i think ah it is interesting space the started uh working on right
00:56:14
where you can have maybe a a a a person who's hearing impaired they can follow a lecture
00:56:20
or uh they can be part of an interaction right so that is the goal and here there is lot of challenges
00:56:27
uh in terms of pick summarisation a translation less data uh and so i okay
00:56:37
so i'll i'll stop here i think we also have done some work in a few related a. s. but i

Share this talk: 


Conference Program

Idiap Speaker Series: Investigating Multiple facets of communication skill assessment and feedback
Dr. Dinesh Babu Jayagopi, Assistant Professor at IIIT Bangalore
June 13, 2019 · 11:06 a.m.
230 views
Q&A
Dr. Dinesh Babu Jayagopi, Assistant Professor at IIIT Bangalore
June 13, 2019 · 12:03 p.m.
200 views

Recommended talks

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
July 4, 2016 · 2:01 p.m.
2370 views