Embed code
hi everybody a um my name is uh but like i say i'm gonna present to um arm
and i stick search i like stashed given a set up or how it works why we do
it i know it's not something really new i mean a lot of people do it
but there are quite a few changes uh in that thing means setting this thought
i think it may be interesting to discuss this together so this is the agenda what we're gonna talk about
we start with a short presentation of me why i'm where i'm working with doing
um then i will describe the to the to be problem we try to solve
um why why we put the elastic search look stash and keep on uh
on then we show you how we solve it oddly ways yet 'kay
um we'll finish was two point that or the other way some quilt on during this process
of setting up a six o'clock stashed given a and what are the next steps
so we have a testing centre stage to been running in production for about
two or three years with it
so i'm i'm first few words about may on the software engineering engineer i
studied just a few meters away from here to p. f. l.
i'd either both ten years of software development um for the last four years
i'm working mostly in distributed system using plastic search
doing quite a lot of kafka spark
um the hyping my colleague who work was in development part of the company
to our cute picture um the operator distributed application
mostly in the crowd in the amazon crawled mostly but in different now
i work for a company that's called came to camp that is just a few blocks
a few building away here have tree department one is just
push assertion rebuild website that the average joe spatial information
for example does a website was was a sister
poses for administration for jews should that that
another one that is focused on enterprise resource planning and that the said one
well i'm working we were for in infrastructure division of the department
and we working with the mission we do a lot of thought that doctor um do yeah
generally encourage infrastructures code and what the mission
you almost never never do anything by connecting to the server but or was automate stuff so
if it was one so much of what it should work for an twenty o. one and
so we go from infrastructure crash creation architecture
to management to monitoring and skating doubt
i'm skipping down uh whenever necessary and always in the
virtual lies environment like the amazon crowd mostly
so let's go to the um
to to the program what
what are the problem when you're reading it distributed application that is composed
of many many different services and run on many servers persevered because
each server is is always on the cluster of at least two
machine to have redundancy and most of the time more to to be able to under the load
uh so it becomes difficult when you have so many different server
to when when your customer call you and tell you who
a few of my customer says they have you know all this the website is very slow and then what
we hope we can either but this how can you handle this problem and try to find the solution
of course you can not just try it and see how but it's quick for me it
works for me because there are many different servers so we would you just don't it
the same server is not because it works for you but it's working for your customers
so how how do we get the crick review what is happening in a distributed environment
how do we measure the impact of change if we change the configuration if we
if we put the new version of the application online how do we
know if it's quick use lower we've more more or less or
other we money though the error rate the right response time across the difference everest
forgiven service or or for what europe occasion and one more
difficult point is that sometimes one service will call
just we just call another server so it's not because one services slow that that's
this or is that those really has a problem because maybe it's just one of the service that it will and that
that it will uh a yeah ask information and that on so slowly or is not working as expected
so like i that is this helps a lot to answer this kind
of question and uh what i would like to underline it that
um like analysis is not only the tool for system administrators
but different people from different role can really um
uh find that interesting information in the bailey job was log analysis for
example application all that that that will be part of that though
not really technical uh would be interesting to see what all
the most used feature of the application for example
or what are the most of you or we use the data set in their application
and we can and so this kind of question uh with the
the log analysis the d. i. l. c. e. kale stack
uh application developers are very interesting in understanding yeah um
uh in seeing if they are uh error in the application i'm talking mostly
about web applications away or means that the answer is something like
four hundred four five five and we don't just doesn't work in the browser
of the of the customer um that they're also interesting in knowing
if if the if they just put a new really isn't production is it faster
than the previous one is it's slower what the impact of this new version
um another question but it's uh also quite a lot of time important
that it the cash working as expected uh i don't want to
to rebuild the on the response each time so i need to make sure that um
the caches working and it as expected in it's not that easy to pass that
and what is the ratio of hate to me is uh oh i'm of cache to cache
means and for the system administrator uh all that question that are very interesting is
um and not that easy to to answer is is the load balancing working as expected is the low
that um balance eventually between audio server and what
the impact of configuration chant maybe you just
change the configuration or a great divisional from either where um that you
want to see if it if it doesn't in in fact
and in the distributed application you can not imagine just connected to the server and having
a look at the modified there you really need to centralise index and search this
so that's why we will use uh oh i'll tell us except lock
stashing humana so this is the view of a web application for
from let's say the rubber point of view that the application
that that maybe it's in yellow is the client browser
i think the request to load but answered slot but in so we forward the request
to service or another um the the the seventh of the running in the cluster
we maybe could another service or maybe could not a database
and then answer to them to the client browser
no no this is how we see the data for we will add
two new that afro to each application we manage in production
we'll always add that i flew with that you see here in green
uh well we'll collect metrics on the server for example the
c. p. u. use h. m. m. or you
say it all this space or or maybe quite a lot of different metrics but this out numbers
and we we centralise them in a server the one in
green that's called graphite severed graphite is quite old and
something else but anyway you will have a time i turn surrey database where we collect or this matrix
um then we will be did um dashboard to to to see
and what's happening in your news server this awful numbers this could be
part of another dog but today we focus on the blue
on on the blue that afro well we we should the
rocks here we're talking mostly about access logs access what
about generated by web server apache or or cash caching
several like a brownish you use a lot the
the um the logs from the rule but answer because uh we like them because the other one but all
a closer to the end users so what you see in the log off the
road that answer is more or less what do and user will we see
um so we will take this this dog that we would shoot them from each
server we ship them to chewing server will maybe come back later on this
and then we'll have lots nash reading this dog's transforming them
crossing than adding information enriching them for example
and then indexing them in their sticks out and then it seemed the top we use here by now
to ah have a mandrake hit it and the greek you see
the view on who the slides and the dashboard so
this is the basic who hitch time it's not i mean this is just
an example what application but the important point is that each time we
put an application in production we always thought by having this to get that
through the fruit of the logs the fruit of the metric some
which are really um more battery for us to be able to run on an application in production
if if we're going to a more detail in to this group autumn
with the that it's it can be divide it more lesson four
stages before first stages to collect and should the log this
i'm on the web server i need to read read this
lot and ship them to centralise the um system
the second part is to transform and and reach the log so read
read than maybe other feed maybe parts then maybe change the format
make them more meaningful or more useful
the supposes to put them in a stick surge within searchable so in like them in a sticks out
and finally the final product what we want is babe being able to visualise
aggregate in aggregate information in a cuban and dashboard oh sometimes also
extract interesting information with a directories to elastic search
so now we go a little bit um deeper into all this for this four stages
this is a a simplified view this where you just see one of this blue
for each of the stage some first wants collide then analysing extract
them index make them searchable i'm finally have this cabinet dashboard
the first step is to collect and some provide the logs
which knows which means move the logs from the applications over the web
server or to the proxy or whatever the the light management infrastructure
this sounds very the but in fact i think it's the most ha maybe maybe
the most difficult part and and uh for sure the most dangerous one
because if you don't do it well maybe maybe you can get maybe you can keep your application server
which is of course uh not the go right on this so it must be very light
light on the sun sever side because if you do pour seeing or
if it is something that is as you plan your memory consuming
when you we have a high load you will have a lot of log so you it it will
consume a lot of resource and it's when you want your server to be able to use it
the resources to cancer requested not to post the logs it's not it's job so it
you you should do that the last possible on on this on this server
and the very important question but you have to ask yourself when setting up this says um
this part is what happened when when the receiving side is not report responding
what happened here i've let's dashes down for one reason or another the
the network link is down or you have a men's names
so first ideas are that you will buffers so maybe you would
keep them on your on your rex ever on then maybe
this will be full human you won't have enough enough pretty enough place to keep them to continue to both of them
and then in this case you will have to decide if you just broke
your application is it oh no i cannot log i can not
keep my dog so i prefer not to answer requests what would for example do the bank probably
the prefer not to um to request the right answer because the to the cannot log
or maybe use the you decide that you just drop the roads because you say okay i mean information
website if i lose if you are the most important is that the service continue to work
but you have to take this decision it's for sure you don't know
so it's no surprise you broccoli you drop once your brother awful
and uh if you don't decide it will probably brought can decide for you
uh the second step then is a once once you should you have shaped your out from
your web server to the log stash awful to do your centralise the log infrastructure
is to use like stash likely or maybe some maybe something has but in
this case we use a lot lot stash to at contextual information
or to enrich with information from external resource so for example
we had the the contextual information from the source
uh in a in a stall door access like you don't have exact name of the server that produce
the the lots uh so we had this information on the receiving side because we
know where the connection is coming from so we had the information about who
and with this dog and we had also information
about a year but are coming from a classification system so we're classification system
what which server um even if it's a development integration of production server
another one possible a transformation is to enrich the information
for example what we do is an adjudication of the requesting i peed so you can use a database with
with the ugly and try to put the position well in new um
in in your outline of um to to see where you use
or coming from or country maybe just if you don't want
the position or maybe when you have a service that is only used by with katie user
maybe we had something about your use of the tick tick from your user database
uh on the uh on another very in a classical we will uh
something that we do very often is to just possibly your head
and extract part of your power meters to put them in that would feed because in this case then it would be for easier
to search out to agree gate based on this feud interest except if it's just
within the you're out of your request would become more and more complicated
so that's for the transformation part just begin for this scan if you if you do
if you put a lot of uh of transformation with what of regular expression
uh in log stash it can become quite heavy a grunge presses
consuming quite because in quite a lot of c. p. u.
once we have and richard who this like i'd information
when we're in like them inelastic selection so let's dash is
a very i think one of the very nice feature
of lex test is that it does a lot of input and output plugin you can read from a
lot of different sauces and you can write to a lot of different outputs in this case we talk about
the elastic search but it's really amazing the number of old could you have a a and input
so we're indexing and elastic so trim uh we strongly recommend that you
you you know as except can use dynamic mapping just try to
to put to type on your data based on the first night i mean it it it index
but we strongly recommend that you use the predefined them but you you beat yourself you
but i'll let you decide what's the type of a fade and um the um
this makes things easier when then later on you want to be reduce option
um that the white by bad surprise um
okay there are none of your room and devices it the the the
maximum in excision rate we depend on the number of out yet
so if you have a really high level a website need to have enough shots so you can
and in that uh i was takes upset about the problem lies the index station process of
um but keep in mind that if you have a um indexed logs they will take quite a lot of space so
and uh for example for customer ah we who had where one of
our biggest customer is really high volume web site if we can
quickly quite a a costly to keep cool the logs for very long time
so yeah you have to say make a little bit about the data live sickle how long do you
keep this dog and maybe after a few moments with on only keep a subset of the logs
things are testing well and there are also or a
summary integrated um a subset of the logs
you can use it to that school curator to remove you over there in this is the
um the yeah life secure your data as he once you will log inelastic sojourn
you can you can start to use it so you can use cuba to build a new dashboard
and that um you can build different vegetable depending on your
target user so for example we have the dash board for the system administrators and
also back what for application developer and of another body doubled for customer
and um in the beginning we we started doing here
katie really as a tool for technical people
and uh it's really interesting because we realise that the management of the cause of or customer they have
q. and uh who put in like eight hours per day this probably they back opening day in maine and keep that ah
and the do you finish the day but clothing was was software it's
really give them the concrete you of their of their system something really
easy to understand and uh we were very surprised to see home hold a lot it's in fact
and uh right now the push us to have more of this
well in the beginning we had really to push quite hard to make them understand or useful to be
we also use this log regulations tacky this a tomb
that you're extraction but we we we have
time based job crown job but we did do a few request an elastic search
and then put them in another system because they have a system where there for ten
then last year was the number of requests per day for example so we do if your regular extraction of uh
of that of of aggregated data from elastic search and we also use it to measure
and um to measure the compliance of a given service their service level agreement for
the seventh mean imagine for the service manager about by or whatnot so
we can really really easily see the number of request but
here why they feel when the fader and it's very
uh interesting for the quality and management of your patients
so this is a small example this is quite old i mean i'm right not given uh the
latest version of q. gonna doesn't look exactly like basic change quite a lot can quite fast
and that there is just to show you an example where we
have a few different graphics so this this uh um
aggregate data from uh like i don't know like six or six or
seven server i think oh that's just an example which should
different kind of graphics that we can that we can have and
that or maybe of interest with different kind of people
uh the the first one is just the number of requests per per second here
um and the or per minute i think yes per minute
um the uh we have a different colour or depending on the response time
so if the response times less laws and half a second it's green
um between half second one second re light green and i sing
and once again to one top five it's it's alright and and then read if it's higher
yeah we just have the proportion of cash it in question is how
many of the record come just from the cash and little cat
and here for example with that that's at the buttons have been requested that
that is something that is extracted from the you are out from the request so
that's mostly for the customer they see the data that have been extracted
and he is here we have zoomed view on the request that
all slow so it's only the orange and red part
because him you don't see a lot because the whole very a slew low proportion
so he is in view of the slow request understate the slow repressed
response code we said that they'll slow but they don't they
and what's interesting here is for example we have different kind of performance depending of
the data set because maybe sometimes they didn't add the index was issue that
in the database so then you can select one uh of the data set just click
on it because it's given it will either feature added to refresh who the grass
with this filter so showing you only the statistic for this given that asset
or we could just take for example on kashmir is and see what all the data set up on
never in the cash and maybe have a look if that's to see if it's well come euros
uh a few lessons we learned that all the way so maybe i can just put
it here to maybe you don't have to spend time hiding it gets this
a bit below collection and shipping power to is the most critical because if it
face a lot uh with a lot of program it can it can maybe
brought your application if you use something like says log and uh and
and then you don't have connection if you use t. c. p. r. t. c. p. configuration if it
at one time it will it will just broke your web server and stop here
and um you need to decide what you do with and if you're carefree jeffrey configure
shape or you can do a lot of things but in it you will have to spend time on
this and you will have to decide what if it's acceptable for you to drop blocks in case
of in if the we were receiving and the log stash is not available anymore
um uh the other pointed that uh if you do you can do a lot of things in you know
a lot stash as input feature and how could we use a lot the input of the output
if you do a lot of things with the regular expression crossing in the filter part of what stash
it will use a lot a lot of resources it's quite a um
yeah it it is quite a lot of uh resources and this thing
this transformation can be quite difficult so if you have a few
quite easy sings i think it's really the right choice if you really have ah
if you want to do a lot of things in your feet to fit about if you want to change a lot of things in your lots before
inserting them in the last except if you want to congregate sings compute sayings
modify things maybe maybe you should use something else that the moustache filter maybe the
ethics search five in just note i never tried it's
it's on in but i know that right now
i think such can do a lot of transformation to get that that when you in back then
or all site we will use chaff guest re is where we it's it's it's a framework where you can
that you can use just to modify modify stream of that ah
all people use cross training i don't know but just saying it's it's good to have a
few for future and but it's not good to write a program that transform your data
uh they're already said it but keeping that have a very long time can can be quite
expensive that that indexed in elastic search takes quite a lot of space deferred lock stash
a template if you use it will keep you each fit in to format the one that is on allies didn't
talk in my uh to get nice and and i and another one that is just the row string
so it it will take for more space of course than if
you just have your you find is comprised on desk yeah
um so if you want to keep it out with i volume website
here to keep them for very long time maybe you should build
aggregated data set maybe you should agree gate i dunno data or week how many
requests to many come on many record the weekend a button above um
or or do i have and things like this are unit of time
because you will be able to get them room go for a lot of course if you
just one document or how you can you can keep them for four years and it
and i would also say with elastic such as with any distributive system
i would recommend start quite big with quite big infrastructure don't
don't be shy to put a lot of crime a lot of c. p. u. and then
you can lower your requirements if you see that it works well
uh it was quite well in doesn't use were your resources
doing it the oppose it starting sporting growing is quite
time consuming so i already everything's good ways
is to is to start with the bigger than what you expect you will need and then um lower lower unions
um but i think there's something else that is um it's important to understand
but it's it's a really it's too that we work perfectly well in almost three time most of the time
but if you have a huge spike like distributed denial of
service attack on your on your application for example
it will likely not help you as much as you would like because point thing
the logs takes quite a lot of time so you you want um
you won't have your infrastructure designed to handle this kind of traffic that you have during
your huge part so it it won't help you a lot because it will probably
for behind um try to index all this stuff and you won't see
really the last minute the last second in you keep an uh
because uh you can not design your infrastructure to be able
to indulge in real time unusual spite it will
be just too expensive to maintain it so for this
kind of a really huge and unexpected spike
don't expect to every time that i with this and maybe you you need
to focus on using matrix just the numbers coming from the servers
and not as something that have in computing that posting in indexing blogs
so i think that's more or less from uh how we do it that way just give you in a few
words what all the next step for us that big because it uh i think it's interesting cause it
we we hear what you so we just as exception to that i don't remember exactly and even uh
three ice thing i'm not so sure um that uh we we we are in the process of for installing in
putting everything yeah on the uh with version five a lot of things changed in between
so we have to really do a lot of saying that that's what happened with
host moving company like last that movie when we started awake i think
the where diana now we'll to integrate or something like that
of course that a lot of things so things move very quickly we don't have to change a lot of things
uh right now uh i show you we have a better for but this buffer is just
a in member in memory buffer you give us if you give us one day of
of these to replace to replace some single to change something but if you have if
we have a huge spiked like the knives or is that we are the ones
and then it will only and the like if you also fraud so it's not enough to really
stop losing on the very quickly it's not enough for us to react so we chant this
uh replace this was calf cat that will open a new
world of possibility jeff cat is a cute system but
but can write the cumin disk so you can you can keep huge amount of like and it's very performance
um the uh in our next step we we only use we would use a lot look stash a lot because
we do a lot of input output and it's really missing for this same was a lot of time
but we won't use the features anymore because we will do most
of the transformation in cactus streams uh also personally i think
it's quite difficult to test and debug alex transformation like station for
easier in a in in a programming language like constrains
um the last point is but not very important but we're really ah we will use doctor everywhere i
don't know if you know deduct doctor with a bit but we have with his talk running
running and doctor which is very nice to be able to test new developments in your future very quick

Share this talk: 

Conference program