How to accelerate in Splunk

Blue Team Consulting
30 Oct 202217:55

Summary

TLDR本视频深入讲解了Splunk中加速数据处理的三种方式:报告加速、摘要索引和数据模型加速。首先介绍了报告加速的基本原理,即通过预先计算并存储摘要数据来加快搜索速度。随后,探讨了摘要索引的概念,即将搜索结果输出到另一个索引中以提高效率。最后,详细说明了数据模型加速的优势,特别是使用tstats命令进行数据模型加速的方法,强调了它相较于普通stats函数在速度和性能上的显著优势。视频通过实际操作和示例,向观众展示了如何有效利用这些技术来优化Splunk的数据处理性能。

Takeaways

  • 🔍 Splunk加速可通过报告加速、摘要索引和数据模型加速三种方式实现。
  • ⚡ tstats命令比常规stats函数更强大、更快,用于优化Splunk搜索效率。
  • 📊 报告加速通过在后台创建基于报告结果的摘要来加快搜索速度。
  • 🔄 只有包含转换命令的搜索才能进行报告加速。
  • 📈 摘要索引通过将搜索结果输出到另一个索引来实现加速,这个过程类似于索引的嵌套。
  • 🔗 使用collect命令可以将搜索结果输出到摘要索引中,以减少搜索数据量,提高搜索速度。
  • 🚀 数据模型加速被认为是三种加速方式中最优的选择。
  • 🔧 tstats命令与数据模型加速紧密相关,可以高效查询统计信息。
  • 📋 数据模型加速分为临时的ad hoc加速和持久的persistent加速两种类型。
  • 🔑 对于安全运营中心(SOC)分析师来说,tstats命令在创建相关性搜索以触发显著事件时尤其有用。

Q & A

  • Splunk中加速报告的条件是什么?

    -在Splunk中,为了资格进行报告加速,搜索必须包含一个转换命令。

  • 为什么数据模型加速在Splunk中被认为是最佳选择?

    -数据模型加速被认为是最佳选择,因为它通过优化和加速搜索查询来提高性能和效率,尤其是在处理大量数据时。

  • 什么是摘要索引,它是如何工作的?

    -摘要索引是一种加速方法,通过将常规搜索的结果输出到另一个索引(即摘要索引)来工作,这样可以减少数据量,从而加快搜索速度。

  • 为什么tstats命令比常规的stats命令更强大和更快?

    -tstats命令更强大和更快,因为它是为了在加速的数据模型上运行而优化的,它可以直接在时间序列索引文件(TSIDX)上执行统计查询,从而提高性能。

  • 什么是数据模型加速中的持久和临时两种方式?

    -在数据模型加速中,持久方式是指通过创建持久的加速数据汇总来实现加速,而临时方式是指仅在使用数据透视编辑器时临时加速数据。

  • 为什么在编辑摘要索引的搜索时不直接修改父搜索命令?

    -在编辑摘要索引的搜索时,不直接修改父搜索命令是为了确保原始数据集的搜索不受影响,同时能够将新数据继续发送到摘要索引。

  • 为什么摘要索引可以使仪表板面板运行得更快?

    -摘要索引可以使仪表板面板运行得更快,因为它减少了搜索数据的量,只对预先汇总和索引化的数据进行搜索,从而加快了搜索速度。

  • 什么情况下应该考虑使用tstats命令?

    -在需要极高性能的情况下,尤其是在安全操作中心(SOC)进行事件检测时,应该考虑使用tstats命令,以确保快速响应。

  • 什么是TSIDX文件,它在Splunk中扮演什么角色?

    -TSIDX文件代表时间序列索引文件,它们用于在Splunk中加速搜索查询,特别是在使用加速数据模型时,只搜索这些文件而不是原始数据。

  • 在Splunk中实现加速的三种方式是什么?

    -在Splunk中实现加速的三种方式是报告加速、摘要索引和数据模型加速。

Outlines

00:00

📈 报告加速与t stats命令入门

这一部分介绍了Splunk中加速的概念,包括报告加速、摘要索引和数据模型加速三种方式。报告加速是通过在后台运行进程来构建基于报告结果的摘要,使得搜索更快。为了进行报告加速,搜索必须包含转换命令。通过访问设置中的“搜索、报告和警报”,可以查看并编辑加速报告的选项。摘要索引被视为一种不太被推荐的加速方式,尽管它对某些用例有用。这部分强调了t stats命令的重要性,这是一个比标准stats命令更快更强大的统计工具,特别是在处理加速数据时。

05:00

🔍 摘要索引的实践应用

详细讲解了如何使用摘要索引来加速Splunk搜索。通过创建一个新的索引来作为摘要索引,并使用collect命令将搜索结果输出到这个新索引中,可以显著提高搜索速度。摘要索引仅包含被发送到它的数据子集,因此搜索这个更小的数据集可以更快完成。此外,还介绍了如何在仪表板面板中应用摘要索引来提高效率,特别是在处理缓慢加载的面板时。通过示例展示了将搜索结果重定向到摘要索引和在仪表板中使用这些索引的过程。

10:02

🚀 数据模型加速与t stats命令高级应用

探讨了数据模型加速的两种方式:即时(ad hoc)和持久(persistent),以及t stats命令在持久数据模型加速中的应用。即时加速仅适用于枢轴表编辑器,而持久加速则通过构建基于多个TS idx文件的摘要来优化搜索速度。提到了如何为数据模型加速设置并使用t stats命令来提高搜索的效率。最后,以一个实际示例结束,展示了如何使用t stats命令来检索和分析特定数据模型中的数据。

15:05

🔎 t stats命令深度解析

通过一个具体例子深入分析了t stats命令的应用,包括构建查询、识别和调查不明数据源。展示了如何利用t stats命令根据用户和应用程序的数据来构建统计查询,以及如何针对特定查询结果进行进一步的深入分析。这部分不仅强调了t stats命令在处理大量数据时的效率和强大功能,还展示了如何在遇到不确定或未知结果时进行有效的问题解决。

Mindmap

Keywords

💡Splunk

Splunk是一个软件平台,专门用于搜索、监控和分析机器生成的大数据。在视频中,它是主要的工作平台,用于讲解加速和t stats命令。通过对Splunk数据的操作和分析,视频展示了如何提高数据处理的速度和效率。

💡加速

在视频中,加速指的是通过预先处理和汇总数据来加快搜索速度的技术。Splunk通过报告加速、摘要索引和数据模型加速等多种方式实现加速。这使得用户在进行数据搜索时能够获得更快的响应速度。

💡t stats命令

t stats命令是Splunk中一个强大的统计工具,用于快速汇总、计算并返回数据模型中的统计信息。视频强调了t stats命令相较于常规stats命令在处理速度和效能上的优势,尤其是在使用数据模型加速时。

💡报告加速

报告加速是Splunk中的一种加速技术,通过在后台运行报告并构建基于报告结果的摘要来实现。这些摘要数据比完整索引小,使得基于这些摘要的搜索能够更快完成。视频中通过实例展示了如何设置和使用报告加速。

💡摘要索引

摘要索引是一种将搜索结果输出到另一个索引的加速方法,在视频中被描述为“索引的索引”。这种方法通过预处理并存储搜索结果到一个较小的数据集中,来加快后续搜索的速度。尽管视频作者对此方法评价不高,但也展示了其应用场景。

💡数据模型加速

数据模型加速是通过预先计算和存储数据模型中的汇总数据来加速搜索的一种方式。视频中认为这是三种加速方式中最优的一种,可以显著提高基于复杂数据模型搜索的速度和效率。

💡transforming命令

在视频中,transforming命令指的是能够改变数据形态或结构的Splunk搜索命令,如stats命令。只有包含transforming命令的搜索才能进行报告加速,这是因为这些命令生成的结果可以被汇总和加速。

💡collect命令

collect命令在Splunk中用于将搜索结果输出到另一个索引,常用于创建摘要索引。视频中通过使用collect命令,展示了如何将详细的搜索结果集合汇总,以提高后续搜索的效率。

💡TS idx文件

TS idx文件(时间序列索引文件)是Splunk中用于存储时间序列数据的索引文件。在数据模型加速中,Splunk优先搜索这些文件而不是原始数据,从而加快搜索速度。视频中提到,t stats命令就是在这些文件上进行操作以实现快速数据检索。

💡SI stats命令

SI stats命令是Splunk中用于在摘要索引上执行统计操作的命令。它是stats命令的变种,专门用于处理摘要索引中的数据。视频中提到,SI stats命令可以进一步加快基于摘要索引的搜索速度。

Highlights

Introduction to acceleration and the power of the t stats command in Splunk.

Explanation of how acceleration can improve search efficiency in Splunk.

Overview of the three ways to implement acceleration in Splunk: report acceleration, summary indexing, and data model acceleration.

Detailed guide on report acceleration and its prerequisites.

Step-by-step instructions on enabling report acceleration and its impact on search speed.

Introduction to summary indexing as a method of acceleration.

How to use the collect command for summary indexing to optimize Splunk searches.

Example use case of summary indexing for speeding up dashboard panel loading.

Explanation of how summary indexing reduces search times by creating a smaller dataset.

Introduction to data model acceleration as the preferred method for speed optimization.

Differences between ad hoc and persistent data model acceleration.

How t stats command utilizes TS idx files for faster data retrieval.

Demonstration of building a t stats command with a focus on specific fields.

Example of investigating unusual data using the t stats command.

Importance of data model acceleration for security analysts in Splunk Enterprise Security.

Transcripts

00:00

in this one we are going to cover

00:02

acceleration and Splunk and we're also

00:04

going to cover a critical command t

00:06

stats but it would kind of be pointless

00:09

to just jump off and start showing you t

00:11

stats off the bat without first

00:14

understanding how acceleration can be

00:16

invoked and then we will definitely get

00:17

into t stats and why it's more powerful

00:20

faster than regular stats function and

00:24

to do that like I said we're going to

00:25

cover acceleration first acceleration in

00:28

Splunk can be implemented in one of

00:30

three ways acceleration of reports

00:33

summary indexing and the acceleration of

00:36

data models and the acceleration of data

00:39

models will probably always be your best

00:41

bet someone please try and prove me

00:44

wrong so starting with report

00:46

acceleration when you accelerate a

00:48

report Splunk software runs in the

00:50

background a process that then builds a

00:52

summary based on the results returned by

00:55

that report when you run the search it

00:57

runs against the summary rather than the

00:59

full index because the summary is

01:02

smaller than the full index and contains

01:04

precomputed summary data relevant to

01:06

what that search was the search should

01:08

complete much quicker than it did when

01:10

you first ran it let me just show you

01:13

what I

01:15

mean first in order to qualify for

01:18

acceleration it has to be including a

01:21

transforming command we can go to

01:25

settings searches reports and alerts and

01:28

take a look at a few examples so if we

01:30

open one that doesn't have a

01:32

transforming command like the wire shark

01:33

one here we can click edit and notice

01:35

that there is no option to accelerate it

01:38

that means that this search is not

01:40

formatted and it's not meeting the

01:42

criteria to become accelerated as you

01:44

can see there are no transforming

01:46

commands present in that search but if

01:49

we go back and we click on one that does

01:52

have a transforming search in it we can

01:53

see the edit acceleration and edit

01:55

summary index button are

01:58

present we will open this one in new tab

02:01

run it and see that the transforming

02:03

command present is

02:04

stats so we can go ahead and click edit

02:07

and edit the acceleration this is just a

02:10

checkbox saying I'm going to accelerate

02:12

the report and the summary range when

02:15

you set your summary range this is the

02:17

specific length and time that you want

02:19

the data to be accelerated then your

02:22

report will run as accelerated and it

02:24

will become more

02:26

efficient so you can just set the day

02:28

here

02:31

but for now I'm just going to leave it

02:33

as is and press cancel we'll come back

02:35

to that

02:37

later so that kind of covers report

02:40

acceleration in nutshell it's very

02:43

simple and if it fits your use case go

02:46

ahead and do it next we will move into

02:48

the second way to accelerate in Splunk

02:50

which is summary indexing honestly I'm

02:52

not a fan of this method as it kind of

02:54

just gets like index Inception as you'll

02:57

see but a lot of people actually do use

03:00

it and find it very relevant and very

03:02

helpful for the use cases so clearly I'm

03:04

missing something major here but I

03:07

usually just roll with creating um an

03:09

accelerated data model and leveraging t

03:11

stats but it I would be remissed if I

03:14

didn't cover summary indexing as well

03:16

because I've seen I've seen so many

03:17

people use it in the past so we can dive

03:20

in basically what you do with summary

03:23

indexing is you build your normal search

03:26

and then you can output those results to

03:28

another index using the collect command

03:30

and when you do that you're going to

03:32

Output that search results and you're

03:34

going to collect it into another index

03:36

that other index is going to be your

03:38

summary index you have to create that

03:40

summary index call it summary underscore

03:44

whatever you're summarizing or what you

03:45

will remember and then you can use that

03:48

index to search against and that is the

03:50

name summary indexing and it will only

03:53

contain the other small sets of data

03:55

results that you are sending to it with

03:56

the collect command so you know air

03:59

quotes I'm doing doing faster it summary

04:01

indexing is faster because there's less

04:02

data to search um because it's already

04:05

being searched by that primary search

04:07

from the other main index in the

04:09

original search you built but let me

04:12

stop trying to explain it with words and

04:14

we'll just show it in the demo actually

04:17

a good use case for this is to change

04:19

your dashboard panels if you find that

04:20

your dashboard panels are starting to

04:22

run slow so we're actually going to do

04:25

it for that use case so I'm going to

04:28

find a dashboard to edit I'll go into my

04:30

dashboards and home Dash I'll just work

04:32

with that one that's

04:37

fine we can click into it and this is

04:39

what we would do to find information

04:41

about our log levels and executables

04:44

running so here you would put in what

04:45

log level you want information on and it

04:48

would pass the token to the first panel

04:51

let's not work with the token panel

04:52

let's work with the second panel for

04:55

executables so this is a output of all

04:58

the executables that are running and you

05:00

can click into it and view the events

05:01

related to them but let's go ahead and

05:03

open it in a search here we can see it

05:06

already has a transforming command of

05:07

stats that we covered earlier and it

05:09

outputs all of my executables running on

05:11

my computer for whatever my time picker

05:14

all time so we're going to do as I

05:17

mentioned before is and send this to a

05:20

new index that we had created that will

05:22

act as our summary index so I'm going to

05:25

a pipe collect and then the index that

05:28

I've previously created so you will have

05:30

to create a new index that was going to

05:32

act that is going to act as your summary

05:33

index before you do this I called it

05:35

summary uncore

05:37

executables we can go ahead and run this

05:40

and by running this it now takes the

05:42

results of those events and sends them

05:45

to that new summary index of summary

05:48

executables so now that index has become

05:51

populated with the events that were

05:53

generated from our original

05:57

search and if we wanted to take take a

05:59

look notice we have almost 400,000

06:02

events here we can copy this and search

06:05

on this index that we just

06:07

created paste it

06:10

in search over 24 hours because I just

06:12

populated that index and boom it cuts it

06:15

down to 189 events only the ones that

06:17

are applicable to executables running on

06:20

my

06:22

machine so summary index searches run

06:25

faster because they're searching a

06:27

smaller data set and that one just

06:30

Narrows down the results based off what

06:31

you decide to send to it can go back to

06:34

the main search here and I'm going to

06:36

copy

06:38

this and if this is something that

06:40

you're going to want to do then you're

06:41

also going to want to cover the SI

06:44

command

06:46

family and one of the commands that we

06:48

have is Si stats probably the most

06:50

common one that you will use this just

06:52

takes it to the next level of the normal

06:55

stats command and now you have the

06:57

summary indexing SI version of stats so

07:00

the command knows to pick it up and work

07:02

with the and it knows it's working with

07:04

a summary index set of data so it will

07:06

go even faster than if you were to do it

07:08

with regular

07:11

stats we can take a look at the time

07:13

that this search took to

07:20

run and it's 432

07:25

seconds and if we go back into our

07:28

search and reports and we edit our first

07:31

search now bear with me this is where

07:34

people get very confused we have a

07:36

scheduled search that's set to run

07:38

acting as a report this right here is

07:41

the Parent Command what we want to do

07:43

with this parent search parent parent

07:46

set of uh parameters that we are giving

07:48

Splunk is send it to keep running on our

07:51

Crown schedule or however often we have

07:53

it set but also collect it to our new

07:57

summary index we don't want to change

08:00

this command here to our index equals

08:04

summary

08:05

executables

08:07

because that will not be querying the

08:10

actual data set that we need from our

08:13

correct parent indexes I'll show you

08:16

where to put the index equals summaries

08:19

executables in a moment but when you're

08:21

editing the current search you're only

08:23

going to add the collect command at the

08:25

end because we still need to generate

08:26

that new data that's coming in from our

08:28

data sources from those indexes that's

08:31

relevant to your Splunk environment so

08:33

when editing the search we're only going

08:34

to put in the pipe collect index and the

08:37

summary index that we want to send it to

08:40

this will now run on a cron and that

08:42

cron will run this search and that

08:44

search will populate that summary index

08:46

over time go ahead and save it it's

08:48

accelerated now we go back to our

08:50

Command that we

08:51

created to query our summary index we

08:54

can copy this go back to our

08:58

dashboard and and

09:01

edit and then the search we're going to

09:03

tell our dashboard to make it more more

09:06

efficient and run faster is if we edit

09:08

the search we're now going to take this

09:11

Parent Command that we that is used to

09:13

run the scheduled search or the KRON and

09:15

populate that dashboard we're now going

09:18

to pull that out and input our new

09:21

command that we built with our summary

09:23

index so we will copy this one and paste

09:26

it into the dashboard panel this is

09:28

critical because if you do have

09:30

dashboard panels that are taking forever

09:32

to populate summary indexing can be very

09:35

useful in this really Niche use case so

09:38

I'll paste in our new search that's only

09:42

querying that smaller data set in the

09:44

summary

09:46

index and save it

09:52

off now that dashboard panel is going to

09:54

be Wick it fast go ahead and save it and

09:58

you can do this to as many dashboard

09:59

panels that would fit your use case or

10:01

any kind of data source that needs to

10:03

run faster or any panels that are just

10:05

acting slow if we open this in a search

10:07

now you can see we are now leveraging

10:10

our summary indexing to generate our

10:12

results for our dashboard panel that's

10:14

uh some reindexing sorry for that

10:16

headache but a lot of people get

10:18

confused on which one needs to go where

10:22

and as long as you think about what the

10:24

command is doing and where your actual

10:26

data is getting inputed to during injust

10:28

to what index you should be able to keep

10:30

it straight just fine let me go ahead

10:32

and clean some of this up and we will

10:34

move into our last way to invoke

10:36

acceleration in Splunk which is

10:38

leveraging a data model and so far if I

10:41

would rank report acceleration I would

10:44

put it above the summary indexing option

10:47

but if you can leverage summary indexing

10:49

in this way maybe I would put it above

10:52

report acceleration it all kind of

10:54

depends but like I said I I'm going to

10:56

rank data model acceleration as number

10:58

one so we can go ahead and get into that

11:00

and cover the tats command so there are

11:03

two types of acceleration you can do

11:05

with data models ad hoc and persistent

11:08

ad hoc means you are using the pivot

11:09

editor and it's temporary and it's only

11:12

usable with pivot so that's about all

11:14

the time I'm going to spend talking

11:15

about the ad hack way to invoke

11:17

acceleration with data models next up we

11:20

have the persistent way this is where

11:22

tats can be used and this makes it so

11:24

that there are specific summaries of

11:26

multiple tsdx files that are being

11:29

leveraged to optimize

11:30

speed TS idx files stands for time

11:35

series index files so let's get into

11:38

some examples of persistent data model

11:40

acceleration and of course you got to be

11:42

admin to accelerate it or at least have

11:44

the permission granted to you to

11:46

accelerate your data models go back into

11:48

uh search and

11:51

Reporting all right and we're going to

11:53

head over to the index of web n I'm just

11:57

joking that's Terri terrible all right

12:00

we can go into settings data models and

12:02

I think for this one I'm going to pick

12:04

on the authentication data

12:08

model and here we have our breakdown of

12:11

the data model and the components to it

12:14

if we scroll to the bottom these are the

12:15

fields that I'm going to be working with

12:17

when you run an acceleration Splunk will

12:20

build an acceleration summary based on

12:22

the Range that you set so what does this

12:25

mean it means that the range of the data

12:28

it will take on a new form of TS idx

12:30

files and crank up your search speeds

12:32

when you have your index of data there

12:34

are only two parts to that index the

12:37

first part is the raw data files and the

12:40

second part is the TS idx files so when

12:43

you accelerate a data model in Splunk

12:46

you tell it to basically ignore

12:48

searching of all that bulky raw data in

12:51

the index and only search those TS idx

12:53

files that are in there and they're a

12:56

lot smaller and I'm not going to get

12:57

into the granularity of how they differ

12:59

and how they're leveraged and how Splunk

13:00

knows to search them but just push the I

13:04

believe button there and believe me when

13:06

I say when you're using

13:08

acceleration it's only going to leverage

13:10

your TS idx

13:11

files and when you leverage tstats that

13:14

will be performing those statistical

13:17

queries on your TS idx files and as a

13:20

side note the tstats command is most

13:23

commonly used with Splunk Enterprise

13:24

security so that's pretty much for all

13:26

your sock analysts out there because

13:28

anytime we are creating a new

13:30

correlation search to trigger a notable

13:32

event we want to First consider if we

13:34

can utilize the tats command because we

13:36

would want those searches that are

13:38

leveraging detections in the stock to be

13:40

as fast as possible so that there is no

13:42

delay for the analyst triaging those

13:44

searches I'm going to take app action

13:47

destination and

13:49

user so now that I know what Fields I'm

13:52

going to use to search I can start

13:54

populating out my tstats command

13:56

leveraging the data model so I'll keep

13:58

that open for reference and I'll pop

14:00

open a new tab and start building it out

14:03

so I'm going to start with t stats and

14:05

then I'm just going to values out some

14:07

of those leverage

14:10

Fields so I'm going to start with

14:11

authentication that's the name of the

14:13

data model and then the field of

14:18

app and I'll call it source

14:24

application I'm also going to pull

14:27

Authentication action and I'll leave it

14:31

as

14:32

action just make sure it's the correct

14:35

field here parent authentication doapp

14:39

yep

14:41

dot action okay so I think I got it

14:44

authentication

14:48

doaction I'm just going to call it

14:54

action notice my colors are not popping

14:57

up so I've definitely typed something

14:59

wrong

15:01

here I forgot a double quote whoops okay

15:05

call it

15:10

action and I will take

15:25

destination as I'll just leave it t

15:40

from and I will count from the data

15:41

model of

15:43

authentication and I'm going to do it by

15:51

user so authentication do

15:57

user

15:59

all right we can go ahead and run this

16:01

and I'll just do it over the past

16:06

month that was my dog

16:08

sneezing and I would expect to only see

16:11

me so this one number eight Lo internal

16:14

unknown only makes me a little bit

16:16

nervous and I have no populating values

16:18

for sours application so definitely

16:21

spelled something wrong there's no tea

16:25

man got to have the tea live for it okay

16:30

rerun it and we see okay yeah it's an

16:32

internal application that I have but

16:34

let's say I didn't have that field

16:35

populate and I saw unknown and I was

16:36

super paranoid um let's just go ahead

16:40

and

16:41

investigate those two counts there so

16:46

unknown application I'm just going to

16:47

copy this because I'm lazy and I'll open

16:49

it up in a new tab and I'll say from

16:52

data model authentication because those

16:54

are where the events are populating

16:57

from

17:02

and I will do just a

17:04

search and give it the

17:06

app copy

17:10

pasta and user was unknown so user

17:15

equals unknown and I'll run this over

17:18

all

17:19

time well 30

17:25

days and let's see what we get

17:29

yes that i