Player is loading...

Embed

Embed code

Transcriptions

Note: this content has been automatically generated.
00:00:00
coming so my name is andy i'm from e. p. f. l. from was on
00:00:04
where my research area in a interaction systems in a group called react
00:00:09
and normal they wear likes me hats in my life one hat is doing research publishing
00:00:14
papers another hat which i actually enjoy and maybe a sinking shaman more now
00:00:19
is doing some software engineering mainly web immobile or sound data back and stuff
00:00:24
mainly for research purposes building some applications testing that was users and trying to understand better that behaviour
00:00:31
and another had peace interpret mario i've been into start tops they all fail by now so anyways
00:00:37
was a nice experience uh yeah i still remember it well so that we have uh
00:00:44
when you're coming
00:00:48
okay
00:00:51
okay so anyways uh thanks for coming everyone a sense one time for in in for inviting me
00:00:58
second time it's buzzer being here every like coming talking to people and the sharing ideas
00:01:04
uh they'll be talking about how uh last exact can be used to be able
00:01:08
to every commander system in your mobile application your web application thankfully to provide
00:01:13
some additional value to users and i will show you that it's basically super easy
00:01:18
to do it if you already have a plastic search up and running
00:01:22
so this i'm not going to repeat maybe just my name so i'm i'm very and
00:01:26
doing lots of different stuff in life even more but not related to profession
00:01:31
and the basically let's start why i focus on recommendations and
00:01:36
if you look into modern but platforms a few main reasons why recommendations exist
00:01:42
the first one is basically to tell platforms make more money so by increasing engagement
00:01:47
so you can increase page views you can make your user stay longer
00:01:52
on your website or interact more with your platform in it like a increasing number
00:01:57
of pages use of screen use also on the second one is two
00:02:02
over calmly beat information overload yeah i mean everyone here knows
00:02:06
that uh in social needy or an website its own
00:02:10
mine markets whatever you have there's so much content it's very hard to
00:02:14
find all fun what you're interested in so one way of
00:02:17
addressing this choice actually saw how it's possible to do it is
00:02:20
to implement some kind of record under system and finally
00:02:24
this uh the third one is somehow related it's information find ability
00:02:29
mainly in content repositories that you can find inside of inter processor big organisations where you have lots of files
00:02:35
to ask these users finding something which is useful for him
00:02:40
but maybe otherwise he would never find it and i will show you some example not state to obstruct at this point
00:02:47
so just very brief intro coon also recommend or systems work here more
00:02:52
that's all that's great then i really need this light that's amazing
00:02:56
how high level last three types of recommend or systems if you look at the content based
00:03:01
collaborative filtering and something called hybrid which is in the sense as
00:03:05
sense some kind of combination of content based on collaborative filtering
00:03:09
now let's check how they work each of them more or less so if you help contain base recommend their system
00:03:15
imagine you have some map that form and you have a user who
00:03:18
interacts with sign type of web page job some content i
00:03:23
and using some similarity metrics based on content so you'd have to web pages you can
00:03:29
compute some similarity you have another web page which is similar based on the content
00:03:34
can be name on who can be uh say text inside of the book so
00:03:37
this is content based interactive and since this user interacts with this file
00:03:42
it's kind of like in this file then this file is similar to that one then the hypothesis is is it that
00:03:47
the user might be interested in another file as well and that's content based recommended system in in essence
00:03:54
not another way he's collaborative filtering in this way the users well interacts with some content items web
00:04:01
pages uh on sport if i can be music and this and videos this kind of stuff
00:04:07
not necessary containing any text just you know that it's the content type and you can also use
00:04:12
around the platform maybe another guy listening to some spotty fine music and he listens to similar
00:04:18
songs that you do so there are many sounds that overlap you listen to hard rock
00:04:23
this guy listens to you listen i don't know the beep of a jazz this guy this and
00:04:28
so this option is this case either this guy is somehow similar to you because you have similar tastes
00:04:33
and then the recommendation would be provided in the following way you know that as a guy who similar to
00:04:37
you also listens to pink floyd for instance then it makes sense to recommend to the regional user
00:04:43
i don't know if i made it clear can be bit complicated but the idea is that you have content
00:04:48
here you need to have a really actual features like some text
00:04:51
to compute similarities unless you some more than similarity metrics
00:04:57
uh_huh
00:04:59
i
00:05:01
yeah some kind of metrics but mainly should be some type of content feature inside
00:05:06
here it's not necessary you can have almost zero knowledgeable content type and what is inside
00:05:11
they sunk its cool hard to quantify it also some people do some frequency
00:05:15
analysis and so on but still this way of recommending sings will work
00:05:19
and hybrid approach tries to uh combine this in some way
00:05:23
this is just a very basic interrupt so in my talk i will focus on
00:05:27
content based recommendations weasel was stick search and i will show with the
00:05:32
using example of about platform that we developed in a lot in the lab it's called grasp
00:05:38
social media platform and the collaborative work uh normally sharing can wire meant in essence
00:05:43
so you can think of it as well as a advanced content management system
00:05:49
okay so just to get more specific this is a bit the technologies that we're using
00:05:56
uh and we use more good maybe we use express and a large yes a node yeah so
00:06:00
it's called means that more less usual stop my remote focus on it a lot just maybe
00:06:05
later we'll refer to show how we integrate elastic search into all this but
00:06:10
i think to get like really specific i would just make a short them on the platform
00:06:15
it's available for free we use it in educational context musicals of with
00:06:19
humanitarian organisations to help them uh with knowledge sharing instead of organisation
00:06:24
so i'll just make sure demo and they show how recommendations implemented inside of the
00:06:28
platform and then i will go into more details how this recommendations actually
00:06:33
maybe i'll just drop some share for that reason why seat
00:06:39
so you can see actually was like maybe oh just also take my laptop uh_huh from singer it's all off
00:06:55
so yeah i was just looking out for a second then we're going again to show you
00:07:00
as a platform is called bras i'm just here for like shall
00:07:04
we need for demonstration purposes i can just looking quickly
00:07:14
and in grasp everything disorganised in kind of spaces which
00:07:18
of folders containing contents of reasons this is
00:07:22
demo malaria space which contains different types of files you can integrate
00:07:27
like we could we keep it yeah pages you can integrate different types of content basically coming from
00:07:33
is it your computer scented yeah files microsoft files also coming from online for instance you can integrate
00:07:39
you do video so the idea here is space is some kind of place where your team
00:07:44
can integrate different various types of uh content coming from different sources so that's one point
00:07:51
and then what we where we have recommendations here is for instance yeah let me show here
00:07:57
we have this so if you open one content item this p. d.
00:08:01
f. file you can go here into this uh relate the top
00:08:06
takes a bit of time and i'll talk about that white cane and you
00:08:09
say oh the content items on the platform that are similar to these
00:08:14
p. d. f. report but also take into account some permissions established
00:08:19
into the pattern because of course it's important that you some
00:08:21
quantum describe but we don't have access you don't recommend it so
00:08:25
it's another aspect also for commander systems take into account
00:08:29
that uh you need to really protect privacy seriously not too weak anything by chance
00:08:35
uh to users that they're not expected to see so this is
00:08:38
one way of using recommend or system like recommending relevant content
00:08:42
it's fairly straightforward we can do it when you don't like them so we can also do it
00:08:47
for space where you get the related documents not only for one file but for example file
00:08:55
like for folder just as well so this is something we call compaq show recommendations recommendations on a
00:09:02
daily work in the context of interaction another type of recommendations we have here in grasp
00:09:08
or is so personalised recommendations i will explain how it works and details but the idea here is that
00:09:15
we automatically identify user interests based on user interaction is the platform so here
00:09:20
you see this is my profile in grasp and based on my interaction
00:09:24
the system things i'm interested in knowledge management in miami uh in interpret nor sheep education obstacle regions
00:09:30
on this is all done automatically and i i will talk how we can do it automatically
00:09:35
and based on this profile that is constructed in the system the system can recommend to me
00:09:42
different contend that matches my interest so you see something about knowledge on the cheeks
00:09:46
trance in norwich management and so on and assessing the system is able to recommend to meet people with similar interests
00:09:53
so this is driven more by the amount from m. s. f. where they have organisations lots
00:09:57
of people and they would like to employees of m. s. f. would like to see
00:10:02
people from inside the from a seventy similar interest so they can get in touch because
00:10:06
organisation is very distinctive m. s. episode doctors without borders uh we work with them
00:10:13
so again i just showed briefly just types of recommendations and they come
00:10:16
back to my presentation would come but the issue it's here
00:10:26
yep so just to somehow what you seen on the previous uh
00:10:30
on the damn won lots of recommendations and brass contact show
00:10:34
similar content to specific content item and second type is personalised whether
00:10:39
commendation for provided not all content but for specific user
00:10:43
and you can actually combine doubles but we we don't do it right now but you can do both contextual and personalised
00:10:50
in the context and for you it's another it's outside of the scope of all this talk
00:10:55
now how do we do this content shall recommendations and yeah by
00:10:59
tom briefly why we do them but just to uh
00:11:04
recall like main scenario for us was teachers we work
00:11:07
with teachers and teachers assembles spaces for educational purposes
00:11:11
so you want to have space to indicate your students about uh you clear physics
00:11:17
so to us is then in this process is that was the main goal of this complexion recommendations the teacher would upload some
00:11:23
technical part of that book about nuclear physics and our system to pick automatically lower
00:11:29
educational resources already created or edited by other teachers so we facilitate like
00:11:35
content creation all controlled population part of work of the teachers that the
00:11:40
men pork was aunts more or less what is written here
00:11:44
and the first step to provide such kind of recommendations is to get they think elastic search
00:11:49
so cross piece uh mean application meek we come on good maybe as the
00:11:53
main data storage and then you want to get they think elastic search
00:11:57
so fortunately we use this uh object relational mapping uh
00:12:02
plugin or white recalled mongoose and these mongoose it's quite easy
00:12:06
you have some plug ins called mum saw stick
00:12:09
which basically it's it's all the name in k. again you have
00:12:12
one was one do size to keep which is different
00:12:16
a. p. i. requests that comes from blues to mount would it be and based on this a. p.
00:12:21
i. requests make sure that the date inelastic search stays
00:12:25
up today it works quite well in practised
00:12:29
so it okay we have the tennis stick search based on the switch was imported from longer the beep now i'm just
00:12:35
reminding could be it's how a search works in last exactly because i think it's it's a
00:12:39
good wrecked this also to norwood well on the stand white plastic surgeons good for recommendations
00:12:45
so if you look on um the data derived elastic search i'm i'm not
00:12:50
talking that it's an allies remove stop morton you do mapping like
00:12:54
all this kind of stuff but then when you start computing rather once say there is a very common or even on index time
00:13:01
you do a few things you i think actually step one into should be
00:13:06
like reversed so first you need to compute he applied yeah
00:13:09
it's like their frequency invert document fact frequency for parents
00:13:14
present in the string so you know how each term is
00:13:18
a descriptive for specific string in your text document
00:13:23
then you represent based on the weights you obtain
00:13:25
here you represented documenting multidimensional document vector space
00:13:29
i will show would it be to move easily so it's more like a obstruct
00:13:33
i think now so you have some back to representing specific document in multidimensional space
00:13:38
and then when you want to compute similarity between the kerry and the document you just
00:13:42
compute by default at least a cosine similarity in this mine multidimensional space um
00:13:51
you can change change the ovaries bit actually plug able
00:13:56
and if we look usually an example how it works it's like maybe
00:14:00
it's not super readable but say you have three documents i'm happy
00:14:04
in summer of the christmas i'm with hyper thomas and the third
00:14:07
one and the you have the square recalled happy hyper bottomless
00:14:13
so the square really will look on t. f. idea of well use that were computed for
00:14:18
these documents here we skip not important part we just focus on this to keywords
00:14:23
so we have happy and hyper bottomless and you have some t. f. idea weights i don't go
00:14:27
into details have this formal works but what's important that whatever document inside of a plastic search
00:14:32
when you have a query you can compute cosine similarity between the kerry which
00:14:37
is here like here this is two dimensional example because that too
00:14:41
keywords and specific documents so you have uses a document one and document to
00:14:47
they're not super relevant but there is documents re which is actually
00:14:52
completely align with that very act so that's the
00:14:55
document which is super relevant for specific where
00:15:01
okay and the using the same reason you can actually gets into recommendations because in essence it's the same idea
00:15:08
you have some string quarter you have some piece of content content but i'm talking about on ten bucks recommendations you
00:15:13
have some piece of content be document s. drink and
00:15:17
you want to get other documents which are similar
00:15:21
based on the content specific document so in essence it's the same out recent basically
00:15:27
it's exposed with different they'd be i switch it will show but the idea behind it is very similar
00:15:32
represent document and you can also do similarity between people and documents i would consider
00:15:37
it a bit later but again the idea is it's exactly the same represent
00:15:42
i came in multidimensional space compute seem like some metric in
00:15:47
this case cosine similarity and then you can rent
00:15:50
results based on their similarity
00:15:53
and for this purpose to make life easier there is a square
00:15:58
equal to more like this inside of a plastic search
00:16:01
that's the choir is it tells you to gets item similar to your specific item
00:16:07
and the square because two ways of representing sings the one is text based where you specify s. drink
00:16:14
which you want to get similar items but the more interesting one in the one that we will see a basement
00:16:20
where you call more like this but instead of specifying piece of tax you specify
00:16:25
ideas of different documents which already indexed sane that i want to get
00:16:32
quite items from your index which is similar to this to the first one like
00:16:37
id one and the second one i need to anyone in this example
00:16:41
the tax base and look human base require a more like this square is a combined
00:16:45
because you have documents was like this with the one and i did too
00:16:49
and you have some piece of like some string that you want to
00:16:52
get similarity so essential that the quote from a plastic uh documentation
00:16:58
what this uh more like the square das is that it looks into all the
00:17:01
fields that you have specified here on which feels you you should look
00:17:06
it concatenate the text information from the feel so
00:17:10
safe you have a document wis type though
00:17:14
and maybe content inside sink of media file help by political content
00:17:18
then the square we'll concatenate all that sexual fields which are there and then it will select
00:17:26
highest ranked t. f. idea of time so we to compute d. f. i. d. f.
00:17:30
where returning this concatenated string and then it to select only top k. terms
00:17:35
which are the most descriptive for specific a document and
00:17:39
then run like normal does gently war or very
00:17:44
so it will run or carry on keywords who is top here for idea for uh well it's
00:17:50
uh no if it's uh yeah i need to freak interrupt me at any point then ask a question see if that if
00:17:56
something is not here so i just better not so that's the point of this uh more like this square it
00:18:02
we try to use it that's basically like a piece of code that
00:18:07
allows you to include recommendations if you have a user is
00:18:10
you can get similar users with it if you have nukes you can get similar books with it if you have you
00:18:17
whatever uh well how would you all but uh using cup the magic speech recognition
00:18:21
you can halves gets in the audio file so it's very simple to add
00:18:26
like similar the singular penalty on application that's like my point
00:18:32
of this of the stock basically but that's uh that that way has some problems
00:18:38
so i was working with the recommendations in the end of two thousand fifteen begin couple thousand sixteen and then
00:18:45
s. h. w. when when i he offered document was supplied in
00:18:49
more like the square it it was just concatenating strings
00:18:52
and then running the carry on all the documents uh the problem with
00:18:58
that away is that there is no way to boost specific field
00:19:02
for instance if you do search you want to use the title meaning that
00:19:07
if you carry matches the title of document it's more important if you then if your query matches
00:19:12
some content some are on page two hundred twenty two i mean that's logical the same as
00:19:17
recommendations but the a. p. i. which exists and also the implementation in the scene
00:19:22
which you can check here and id check it didn't have any possibilities to do it
00:19:27
to boost the specific fields for more like this squares that was one limitation
00:19:32
then when i was preparing for this talking also checked recent developments
00:19:36
and apparently in begin couple thousand sixteen like early february
00:19:41
the switched from running one query over all of the fields to more
00:19:45
field but if you wearing so you would not concatenate all the fields into one string
00:19:51
can run against all the fields you would drown title compare we start though
00:19:57
description with description content was content which makes sense in some scenarios in our specifics in
00:20:02
our it was not super good because in ross per some files help wanton
00:20:08
say p. d. f. but some files we had we put description say if file has it's a small application
00:20:15
web application doesn't have lots of content but the user can actually probably
00:20:19
description this application is designed to support starting physics and so on
00:20:23
so and number case what we need to do is make chink across
00:20:27
multiple fields we we didn't want to match just description was
00:20:30
description content was content we wanted to match description was content and
00:20:34
what's worse is that probably shouldn't work for us super well
00:20:38
and then what we dip we just to decided to switch
00:20:41
to normal carries instead of more like this carries
00:20:46
to compute recommendations because essentially search is a recommendation it's
00:20:51
just another way of looking at search it's a it's computing recommendation
00:20:54
but whereas aquarius not by by human being but various automatically
00:20:59
uh built by your software based on the file and that's essentially what this more like the square it was doing
00:21:07
to hear what we were doing we were just manually concatenating strings putting some limitations because this unit
00:21:12
of how long you acquire can be and doing this type of query across multiple fields
00:21:18
uh that's more or less usual match very so we will not go three to just you reverb you stink different uh
00:21:25
if you know fields with different well used to more or less the wind it to how it works and then
00:21:32
yeah as i demonstrated that resulted in the style where you just conclude can see relevant on them like that
00:21:38
so that was first part were showed you content based on the actual recommendations and the second
00:21:43
thought of a second part of my recommendations that they should be was based on uh
00:21:48
i didn't define user interests and showing some think which corresponds to
00:21:52
user interface person might recommendations that's one going to talk about
00:21:57
yeah there is lots along the east of motivations but basically what we where getting from
00:22:01
teachers when we were working these teachers that they were using this but for
00:22:06
is that we want some think if you provide recommendations boss we want something which is understandable
00:22:11
we won't understand why and something which is interactive that we can adjust that we don't
00:22:16
want black box uh approach of recommended system which you can often see that you
00:22:20
don't understand why something is being recommended and you don't know what is the user
00:22:25
model inside of the system that were two main room drivers of our design
00:22:32
and based on this we decided to build the recommended system that uh a bit is
00:22:37
based on user interests which automatically identified so you have some we were saying okay
00:22:42
let's say we explicitly have vector of user interest me stuff user interest saying i'm interested
00:22:47
in this and that and that and this you can find on some platforms
00:22:51
and based on the selector we can build the recommendations based on using
00:22:54
content base recommendation but then the question was how the automatically
00:22:58
build this user interest profile because you can ask user to say hey user please input what you're interested in
00:23:05
but study show that more than seventy percent of users will never input anything can actually don't need studies for that
00:23:11
we we know from our users if you asked lots of manual input than most of the users who never do the input
00:23:17
and then they will not to have this recommendation so you don't know if
00:23:21
it's useful or not so we were thinking of doing this single automatically
00:23:26
i'll just try to go up quickly and then if you have a question things and you just can't uh ask me and i will come back
00:23:33
because it's a technical me top so there should be some crappy technical diagram and that's what it is we need to build pipeline
00:23:40
for extracting content from multiple content times types so when you upload the
00:23:45
some content to grasp of be plain text file some p. d. f. would
00:23:50
you pack this we didn't do that but it's like the yeah
00:23:54
we were we build pipeline that would get text information map of data from this files and put it into
00:23:59
one would it be and then into uh elastic search so for plain text files it's is it
00:24:04
you just read the text you do content analysis you extract some features
00:24:09
and then you can do recommend accommodations for something like images
00:24:14
you also want to look and then base recommendation and then you need to extract text if it's present on the image
00:24:20
so for this reason we used multi pull open source light reset more familiar with them
00:24:25
the quite good apache kickoff allows you to extract actual
00:24:28
information from multiple what content types p. d.
00:24:32
f.'s powerpoint you name it like they have lots of types of files that they support
00:24:38
look twenty could this iraq it's a very nice open source uh o. c. r.
00:24:43
optical character recognition that allows you to scam files it works surprisingly well
00:24:48
it was trained by able at some point so for english at least it works very well and it's trainable for other
00:24:54
languages as well i think it works also for french uh uh yeah and for some natural language processing stuff
00:25:01
which i really didn't mention why with the concepts and so on we use
00:25:04
all committee i before it became part of i. b. m. what's on
00:25:08
again if you have questions later can explain what it that because it will take just really lots of
00:25:13
time to to say so we we just don't focus on the on keywords we also do
00:25:18
because user interests they should be quite high level right you would
00:25:21
say i'm interested in they assign suddenly takes you wouldn't say
00:25:25
i'm interested in this t. f. or yeah l. worrisome you know when it's applied to indication of data
00:25:31
so that's how users think what about the interest the sinking like high
00:25:35
level of obstruction categories that's why it's not enough to
00:25:39
just do keyboards based analysis of your content
00:25:43
so what we were doing you were doing something cool concept extraction or concept
00:25:47
identification automatically when a instead of using pure keywords of your content you
00:25:54
analysis keyboards and you try to use some high level abstractions how humans would
00:26:00
i think when you read the book when you read an article using high level concepts so that that's like we didn't build it there like
00:26:06
p. p. r. s. you can train your number isn't but the eight guys which are quite good and quite fast to put in place
00:26:12
so what is important to get out of there is that you have multiple files in the plot wanted you have powerpoint and so on
00:26:19
for each file you can do using pipelines of the show you some concept identification so we know
00:26:24
the speed your file is about the location that you could stop the coach and so on
00:26:28
and then based on who based on those items and we also track how users interact with but also we know
00:26:34
that andy visited this file downloaded that file like that saying console
00:26:39
and by some combination of these different things using some money i mean sure models
00:26:43
that's how we were doing it you can build fine now a profile
00:26:49
of the user basically market lines things uh ending scenes and just get
00:26:53
this final profile which on the screen should you can see here
00:26:57
and that i showed you and it turned out to be meaningful so this is not for everyone because we did some uh
00:27:04
checks with users uh what was the accuracy in this kind of style so that
00:27:08
our problems with smoke misidentified concept but i can also talk about that
00:27:13
in what else in general it was meaningful at least for me and for people who interact that the what was it but it was
00:27:19
and then and and was similar way we just did the match
00:27:23
carry on multiple fields the same would be wisconsin base
00:27:28
recommendation because this is essentially also contain base recommendation but cruel
00:27:32
cross type so you have documents coming from the
00:27:36
user type and you might sinuous documents coming from the item type let's say like that
00:27:42
uh yeah and based on that we have the suggested by
00:27:45
which is personalised recommendations and we have this similar tab
00:27:49
which is users content based similarly to you where
00:27:53
content is a vector of your interests essentially
00:27:57
i'm just finishing my presentation so here you just talk about
00:28:02
two types of recommendations convection recommendations based on content of item or collection of items
00:28:08
and personalised recommendations where you do identify automatically user interests and the
00:28:14
content base recommendation based on the spectre of user interests
00:28:18
and again i discussed a bit more like these versus match where is the shortcoming of this using much
00:28:25
square instead of more with this at least how we're doing it is that it's a bit smaller
00:28:29
so maybe when you some of them are useless rotating sink was a bit like three four seconds
00:28:35
that's because we viewed through them aquarius and it takes lots of time to match
00:28:40
with more like this there is a couple of the select just twenty five by default
00:28:45
most descriptive terms in the run clarion twenty five terms which is much faster than running various think
00:28:51
mark seems like two thousand plus parents that we do so there is space for optimisation
00:28:57
and some possibly because if you were going this way one
00:29:01
saying peace it's easy to add highlights highlights it's uh
00:29:06
uh basically when you use more like this choir just much
00:29:09
carry the elastic sec changing will return to you
00:29:13
and the matching keywords so with this you have potential to explain to the user
00:29:17
white specific sink was recommended which is often the problem of recommend or system
00:29:22
it's cool to probably move interpret ability i think it's six understanding why something is being recommended
00:29:29
because it's recommended but white with this using highlights on based on content you can actually
00:29:35
show to the user what this specific part of interest that is matching with
00:29:40
the document or matching was also you so it's kind of group
00:29:44
i'm more when stuff for has anyone used percolate or nulls percolate or
00:29:50
yeah it's a advancing can elastic search kind of telling you to set top alerts
00:29:58
it's it's so very interesting so you can set up alert and when
00:30:02
a document so what is the carry you can set up where
00:30:05
it and wondered say another user upload documents and when one of the
00:30:10
newly uploaded documents matches your query you can trigger sunny one
00:30:15
so it's like like three years in databases in some way in the back
00:30:19
so it's nice if you have this vector of user interest because
00:30:22
you can actually set up this percolate are based on the subject of interest when you file is being uploaded
00:30:28
and use this forum matches interests of specific user you can inform using saying hey check it i would
00:30:34
joan or whoever opposes knew who file which we think might be interesting to you but it's also another
00:30:40
way of engaging users and making sure they come back to your platform from time to time
00:30:46
you know who i didn't talk about collaborative filtering but there's a show
00:30:49
to you it's based on the user user similarity and user
00:30:54
user seem like you can do it with elastic search so in series also possible to look a lot work if you're doing
00:30:59
just it's not that straightforward like i showed you with one carry you don't need to do
00:31:04
a bit of data preparation and then put into plastic surgeon you can do it
00:31:09
maybe also you will need to how would you say replace the fall the relevance uh
00:31:14
computation thing because it's laudable the engines it computes relevance investing searches portable so you can
00:31:21
define some your own metrics to compute this kind of relevance but it's possible
00:31:26
yep willow thing that's it for me
00:31:29
but not what was on time or like what point if you have questions
00:31:43
uh_huh
00:31:51
yeah we were considering here we are single doing that but we needed
00:31:55
to go fast and in that time uh it didn't happen
00:31:58
so we were thinking of of course but it wasn't it look like to advance stop at that point you know
00:32:04
digging into loosing internal switch id it an understanding
00:32:07
but then contributing can change it was like
00:32:11
and it's we are like research lab so it's it's a bit different type of
00:32:15
activities that you should focus on but that that would be good actually do
00:32:20
uh like for instance blue stink you can implement it it's it's that it's not exposed
00:32:24
but you can see radically you can implement and expose it the two users
00:32:29
but then again we need to write a paper and use measurements and if you continue to use
00:32:35
a new scene propagate elastic search and so on it's like where you want a cycle
00:32:54
ooh uh_huh oh
00:33:00
uh work uh hum
00:33:07
oh you mean not compute not to compute it dynamically when the user presses yep
00:33:17
uh i actually
00:33:34
or
00:33:43
yeah i agree you can uh more ties it i it all depends on the ratio between number of users and number of docking
00:33:49
well usually a number of documents is much much because the number of users
00:33:53
so i agree it makes sense to or more ties the sink
00:33:57
basically means that you do percolate or you set up circulation for like per collection
00:34:03
query for each user based on interest profile and when i when you
00:34:06
document the write ups you make sure the users upon document there i will
00:34:11
and you start so when you uh when the user presses it's related
00:34:15
personalise recommendation if they already have it yeah i agree that's a wonderful i think
00:34:20
more or less good technique stop to my this kind of for a long
00:34:23
running queries but you cannot do it at least as i understand
00:34:28
uh for this complexion recommendations because yeah the number of documents so huge
00:34:35
yeah and it will be ice in the data set will be too big if you
00:34:38
make sure that every new arriving document with every existing though that too yeah
00:34:47
so that will be just a big story guess what
00:34:51
but uh for user side yeah i think it's a good idea
00:35:06
oh
00:35:10
okay
00:35:17
so you put a date inelastic search
00:35:21
you set up marking indexing it indexes it so it computes t. f. idea of this kind of stuff it's already there
00:35:27
now everything else that happens how it's implemented now it's not the best probably implementation
00:35:33
it's happening pick up one user click on this top on this panel
00:35:40
so like a of whatever you so there when they open the stop is related content or
00:35:45
my suggested content all computations happen start capping comes that moment of time
00:35:51
so they're not much for computations except jeff idea which is
00:35:55
by default what the last exceptions for any document
00:36:01
but to for me like the like you said caching can per correlation it would be definitely a good way of things
00:36:09
yeah
00:36:13
uh_huh oh
00:36:17
oh oh oh oh oh oh oh huh
00:36:31
ooh
00:36:34
oh uh_huh it's yep uh but uh
00:36:49
uh we we didn't compare
00:36:52
do we do to evaluation on a user's just what we implemented
00:36:56
but we didn't compare it with alternative like within compare
00:36:59
like with um a sense that we work with they have this it's just set a hierarchical library with uh
00:37:06
but some yeah that the smart work with taxonomy of of terms uh oh like
00:37:11
you're okay this is a uh both the diseases this is about my idea
00:37:15
so normal things is systems how we see it you have a number of ways of reaching to the same content
00:37:22
you would have will search obviously would help taxonomy some would put tags
00:37:27
recommendations another way of doing things for instance the case with
00:37:32
m. s. f. is that something does not in taxonomy
00:37:35
because they don't know it exists button on the user possible that you want
00:37:38
to be able to find it the problem is keywords something that
00:37:42
didn't explain maybe well a problem that when user search for specific keywords
00:37:47
it's often to specific source as a document can use different keywords
00:37:51
you see it can be still about malaria or maybe diseases but it will use different keywords of sometimes
00:37:57
it's possible to me is a document just because you don't use correctly was in our case
00:38:05
so for the not no we don't
00:38:09
you can't know you you have a in like you have with the possibility of doing that they just we don't
00:38:13
hold it on the yeah you have synonym tables then square expansion based on c. minuses basically less exact
00:38:20
keeps the table and then when you write to carry it can expand your uh no data to
00:38:25
information uh why it's but that we have the the s. c. synonym tables for different languages
00:38:33
but you trade precision and recall right i mean if you start
00:38:37
expanding synonyms it sometimes possible to rules semantic meaning of this
00:38:46
it's about break ups some
00:38:51
yeah if you have any questions anyways you can drop me messages the presentation will be alive
00:38:56
it was a bit maybe too technical i agree so it's it's fine uh
00:39:05
one was like real quick slide so
00:39:08
i think it might be here recently i'm looking what to do next in my life
00:39:12
if you have some ideas for startups employment uh different things talk to me
00:39:17
we kind of created this consulting company called palm a lock me and my friends
00:39:21
and the if you will need someone to build your laptop i think you know one it's here everyone also but a lot more while
00:39:27
some data you can talk toss or if you have some uh hiring ideas of startup ideas i'm open to lots of things

Share this talk: 


Conference program

Multiple ways of building a recommender system with ElasticSearch
Andrii Vozniuk, React-EPFL
11 May 2017 · 1:04 p.m.
490 views
Building and deploying Kibana plugins ... And should I do it?
Alexandre Masselot
11 May 2017 · 2:04 p.m.
404 views

Recommended talks

Embedded Interactive Learning Dashboards with ElasticSearch And Kibana
Andrii Vozniuk, EPFL
21 Sept. 2016 · 8:17 p.m.
173 views
TensorFlow 1
Mihaela Rosca, Google
6 July 2016 · 10 a.m.
1,840 views