Splunk Field Extraction Walkthrough
Summary
TLDR本视频由Splunk的Travis主讲,他分享了自己如何使用Splunk进行字段提取的经验。Travis首先介绍了自己与Splunk的渊源,然后详细讲解了如何在Splunk中进行字段提取,包括使用Field Extractor和处理Debian包日志数据集的实例。他强调了使用props和transforms文件来优化数据解析过程,并提供了实用的技巧和建议,如使用regex101工具来辅助构建正则表达式。视频旨在帮助用户更好地理解和利用Splunk进行数据分析和可视化。
Takeaways
- 📈 了解如何使用Splunk进行数据分析和可视化,特别是对于Linux系统日志的解析。
- 🔍 学习了如何使用Splunk的字段提取器进行字段提取,尽管存在一些限制。
- 🔧 掌握了使用正则表达式进行字段匹配和提取的方法。
- 🛠️ 认识到了props和transforms文件在数据解析中的重要性和应用。
- 📚 强调了Splunk文档和在线工具(如regex101)在学习正则表达式和数据解析中的作用。
- 🎥 通过视频教程,可以更直观地学习Splunk的操作和最佳实践。
- 🖥️ 了解了如何在Splunk中使用搜索和命令来查询和分析数据。
- 🔄 讨论了如何通过修改props和transforms文件来改进数据解析。
- 📊 提到了如何利用Splunk创建仪表板和可视化,以便更好地理解数据。
- 🔗 介绍了Splunk社区资源,如gosplunk.com和Splunk Lantern,用于查找和共享查询和仪表板。
- 💡 强调了持续学习和实践在提高Splunk使用技能中的重要性。
Q & A
Travis在Splunk中是如何帮助他人理解产品的?
-Travis通过创建视频来展示他在Splunk中如何处理数据和问题,从而帮助他人更好地理解Splunk产品并提高其使用效率。
Travis提到的数据摄取是指什么?
-数据摄取是指将数据引入到Splunk环境中的过程。Travis假设观众已经知道如何将数据导入Splunk,或者正在了解如何进行数据摄取。
在Splunk中,字段提取的作用是什么?
-字段提取可以帮助用户从原始数据中识别和创建新的字段,从而使得数据更易于分析和可视化,例如创建仪表板。
Travis在视频中使用了哪种方法来提取字段?
-Travis使用了Splunk的内部字段提取器,选择了正则表达式路由而不是分隔路由,因为他的数据不适合使用分隔符。
Travis在处理Debian包日志时遇到了哪些挑战?
-Travis在尝试提取Debian包日志中的特定字段时,遇到了Splunk字段提取器的局限性,导致无法一次性提取所有需要的字段。
Travis提到了哪些工具来帮助理解和创建正则表达式?
-Travis提到了regex101这个在线工具,它提供了正则表达式的匹配信息、快速参考和解释,帮助用户更好地理解和创建正则表达式。
props和transforms文件在Splunk中的作用是什么?
-props文件用于定义数据的来源和类型,而transforms文件包含一系列的操作,用于在数据到达索引器之前对其进行处理和转换,以便更好地解析和分析数据。
Travis如何测试和调试他的Splunk配置文件?
-Travis通过编辑本地的props和transforms文件来进行测试和调试,并且在确认更改后,将这些更改推送到搜索头和通用转发器。
Travis提到了哪些Splunk社区资源?
-Travis提到了gosplunk.com和splunklantern.com这两个网站,这些网站提供了查询、仪表板和其他Splunk相关内容,供用户学习和使用。
Travis在视频中提到了哪些Splunk的命令?
-Travis提到了rex命令,以及如何使用stats、count和rename等命令来分析和呈现数据。
Travis在视频中创建了哪些字段?
-Travis在视频中创建了名为action的字段,并成功提取了installed、unpacked、configured、startup、upgrade和remove等字段。
Outlines
📺 视频制作初衷与Splunk基础介绍
视频作者Travis介绍了自己使用Splunk的经验和加入公司后的工作经历。他强调了帮助他人理解和更好地使用Splunk产品的重要性,并提出了通过视频分享自己在Splunk中的操作方式的想法。本段落还简要介绍了关于字段提取的基础知识,包括数据的解析和转换为可视化报告的过程。
🔍 字段提取与正则表达式的应用
Travis详细讲解了在Splunk中如何进行字段提取,特别是使用正则表达式进行数据解析。他通过一个具体的数据集示例,展示了如何使用Splunk的字段提取工具来识别和提取关键信息。此外,他还提到了工具的局限性,并提供了如何通过查看和搜索Splunk文档来获取更多帮助的方法。
📝 使用props和transforms文件进行数据解析
Travis解释了如何使用Splunk的props和transforms文件来进一步细化数据解析过程。他通过分析Unix和Linux的addon文件,展示了如何通过props文件来重命名源类型,并通过transforms文件应用正则表达式模式来解析日志文件。他还强调了在编辑这些配置文件时避免直接修改默认文件夹的重要性。
🔧 正则表达式的构建与测试
在本段落中,Travis通过实际演示如何在regex101网站上构建和测试正则表达式。他详细说明了如何捕获特定的文本组,并展示了如何将这些表达式应用于Splunk的transforms文件中,以便更精确地解析日志数据。
📂 文件权限与Splunk配置的管理
Travis讨论了在Linux环境下管理Splunk配置文件的重要性。他解释了如何创建和编辑props和transforms文件,并强调了文件权限管理的重要性,确保Splunk用户拥有正确的文件访问权限。此外,他还提到了如何通过Splunk搜索来验证数据解析的结果。
🌐 Splunk社区资源与数据可视化
视频的最后部分,Travis向观众介绍了Splunk社区资源,如gosplunk.com和splunk lantern,这些网站提供了丰富的查询和仪表板模板。他还展示了如何使用Splunk的搜索和统计命令来创建可视化报告,并鼓励观众通过评论和Splunk社区与他交流。
Mindmap
Keywords
💡Splunk
💡字段提取
💡正则表达式
💡props.conf
💡transforms.conf
💡数据可视化
💡Linux 系统日志
💡数据解析
💡仪表板
💡搜索和报告
Highlights
Travis分享了他在Splunk中的工作经验,包括如何使用Splunk进行数据分析和可视化。
Travis自2009年以来一直使用Splunk,并在2017年加入公司,他热衷于帮助他人理解Splunk产品。
视频主要讲解了如何在Splunk中进行字段提取,这是数据分析的重要步骤。
Travis提到了他在处理Linux系统日志时的挑战,特别是Debian包日志。
介绍了Splunk的字段提取工具,以及如何使用正则表达式进行数据解析。
讨论了Splunk字段提取工具的局限性,并提供了解决这些问题的方法。
Travis展示了如何使用Splunk的rex命令和props.conf文件来改进数据解析。
解释了如何使用Splunk的transforms.conf文件来进一步定制数据解析规则。
提供了一个实际的例子,说明如何使用在线工具regex101来学习和构建正则表达式。
讨论了如何在Splunk中创建和使用自定义应用程序来组织配置文件。
强调了在编辑配置文件时避免在默认文件夹中直接编辑的重要性,以防止升级时被覆盖。
提供了关于如何使用Splunk的搜索和报告功能来创建仪表板的见解。
介绍了如何通过Splunk社区和网站发现和利用其他人创建的查询和仪表板。
展示了如何使用Splunk的统计命令和字段来生成有关软件包安装和系统活动的报告。
视频结束时,Travis鼓励观众提问并参与Splunk社区,以便进一步学习和交流。
Transcripts
hi travis with splunk here
wanted to create some videos around how
does travis do you know stuff in splunk
i've been using splunk since 2009
joined splunk back in 2017 and i really
enjoy helping others understand our
product and
how to get you know better with it and
become more you know make it more useful
so i thought what better way but
create some videos on how i do stuff it
may not be you know
the best way of doing it there's
probably somebody smarter that looks at
and go oh you could do it this way but
this is how i do stuff in splunk
in this video i want to talk about field
extractions so i'm going to assume that
you've already ingested the data
or you know you're bringing data into
your splunk environment
and you're looking at it going what do i
do now
and that's actually the series you know
let's let's how do we parse the data
i'll probably have another video about
how do i actually bring the data in and
then how do i turn to report turn into
dashboards but
let's not get ahead of ourselves here
and we're talking about field
extractions and you know i have a data
set here i was actually working on
another project
around compliance and understanding hey
what is being installed
on my endpoints
you know windows linux unix and i
decided i want to focus on linux
ubuntu and debian package log
i do have yum logs coming in i have a
centos box but i'm going to talk about
debian package log
and
here is the
the raw data i have hose source and
source type but i don't really have any
other fields
that i think i should have
and what i mean
here is i've got a status i've got
a startup i've got remove i have have
configured i have installed
i would like to see something over here
to help me
to be able to create
maybe uh
dashboards and visualizations around it
so i'm going to
expand this go to event actions and use
splunk's internal extract you know field
extractor
so click extract fields
bring you to the
field extractor here and this has gotten
a lot better since 2009 but it's you
know there's still some limitations and
you'll see that here in a second
i'm going to go regular expression
routes and not delimited
you know there is other structured data
it's maybe spaced out by commas
or
spaces or any kind of special character
that you could go delimited route but i
can't with this
data set
so i'ma click next
and in here you'll see the event that i
highlighted or that i selected
and
you want to highlight the word you don't
want to double click and the reason why
if you double click you may get a space
behind the word
and you don't want that
so we're going to make sure we highlight
the word
and now we can say you know give it
whatever field name that you like i'm
going to call it action
um
and you can see that
here we have
matching
it's getting
what's
you know
from the onset seems like oh yeah it's
working out great
you know if i click on non-matches it
matched everything well it's a you know
i can show you the regular expression
and it's a very simple regex
and let's go back to all events
and what i mean by that
here's startup and then it
grabbed packages i didn't grab startup
startups where i really want but because
status is here it did grab config but
didn't grab files which config dash
files all word
half dash installed i want that as all
one so i can already in linux wait
that's part of the the package
maybe we can fix the linux or maybe
maybe
we you know click on remove
and what it it's very quick it'll add it
up here
and then i can actually highlight remove
you know select the same field name if
you had multiple ones already built you
know you you know click the down box
there and go
and this is where limitations of our
field extraction utility comes in and it
breaks
so at this point
you know i'm kind of left with i can try
and do something you know
simpler
you know maybe not be i'm going to
remove this field
and maybe i want to go back here the
word status
the whole word status
and then uh
action and run that extraction
you know and would i be happy with that
and then try to
you know match the rest
but
you know if i
i don't want just status because i want
half installed and half configured
you know we can also take
the side note
you can take whatever regular expression
that you've you know splunk's built here
and test it
there's couple there's two different
ways we can click view and search
which will open up another search and
have it there
but i like to just go back to my
original search
and
pipe
rex
and if you've never used the rex command
you're new to splunk
um
there's a lot of commands out there
and if you
need more help or more information
there's a couple different ways we can
go about it uh inside of splunk you you
can see right here i have the word help
and i can go to the you know
page or splunk documentation
to show us the command and and all the
other commands and examples and whatnot
or i can click more right here now if
you're not getting this kind of
information
go up to your username i'm administrator
click on preferences
and then spl editor
and click on full you may be on compact
for the search assistant and if you
liked when i hit the pipe it dropped a
new line you know just check you know
search auto format
so i'm gonna be set on full that hit
apply
and so when i hit a pipe it drops a new
line and now i have more information
when i do
my different commands
so i'm going to wrap this in quotes as
you will need to wrap it in quotes
and
paste what i have in there
and hit enter
and yeah during this video you're going
to get all my little
mistakes
you know me correcting myself
i don't like doing multiple videos or
you know you modif mesh them all
together i just do it all in one take so
there the search is
finished and we can see i have a new
field called action and i can see status
startup configure upgrade i mean i mean
good start
but it's not what i want and there is
another way that we can go
and really
you know fine-tune how we're going to
extract
and be able to parse each one of these
events and that's using the props and
transforms
so i'm going to clear this out here
and you could
you know go back
to you know google search
and say okay splunk documentation
transform splunk documentation props.com
um you could go in here and start
reading the different
you know documentation that we have
around
what props and transforms is and i'll
explain a little bit more here in a
second
here is our transforms our documentation
we have examples and then you can come
up here and see what all those different
pieces mean and what you can do
same thing with props you know
examples and you know what does all this
stuff really mean and if you want to
take that time or
to read all that
you know go at it
so
i need to i'm going to close this screen
out
and what i want to do now is switch to
another screen
and show you
you know from that
splunk
add-on for unix and linux you can go out
there download that file unzip it
and then start reverse engineering
tearing it apart
and this is you know what i've already
been working on for
you know this presentation here
but if i go to the props dot com
you know i open the one that's in the
default folder
side note do not edit anything in the
default folder especially if you're you
know live in your environment
you will want to make a copy of the
configuration file the comp file
you know make a copy of props.com make a
copy of transforms.com
or inputs.com and move it to a local
folder for example i have an inputs.com
file here
and i have it in a local folder
so this is where if i need to make any
adjustments or changes to how i'm
explaining here how i'm telling my
universal forwarders to you know what
data to send i go into the local copy
because if you edit anything in default
it will uh be overwritten if you do an
upgrade
so props here we have the
splunk
unix you know add-on for unix and linux
pro props.com file
you know i could do and that's what i
did you know when i was first figuring
out hey
this isn't helping me with my
my package log you know i brought up a
find
and then did the search and look for
anything
and can't find text
if i do syslog
i mean it will find
you know the syslog data there
or
you will find the reference to syslog
and that's what the props does if you
have a source or source type if you want
to rename your source
so or you want to rename the
the source type and what i mean by that
let me go back to my
windows 7 my chrome here
and if i look at source type i have
you know multiple different source types
so going back over here
i can see that hey if it's source and it
ends with syslog
call it source type equals syslog
and the reason why
our app is doing this is if i go back to
my inputs.com file
and if i do a search in here for var
log
you can see when i
i monitor var log the folder
so you can monitor
you know to a
log file or you can monitor a folder
and say hey
give me anything that's dot log give me
anything it's
ins and messages anything that's off and
don't send me this information
and there's i mean there's a couple of
different ways if you want to specify
var log then you can specify your source
type there like here in the bash history
you know root dot bash history and then
this is specified you know bash history
so since i am using the add-on
i'm getting all of these i'm not
specifying my source type in my
inputs.com file
so then that's why you have to use a
props to look at the hey it's this
source
rename the source type to syslog and
then look in syslog
the only
i mean the only drawback to this way
is any any new data this will work for
but any index data that you've already
indexed
you have to have some of these settings
out on your universal folder so it knows
that index you know when it's sent in
the data it's rewriting the source type
so this or when it first comes into the
indexer it writes the source type
so at search time i mean this doesn't
help
so if we want to build something at
search time you know this is where
you can
instead use and here's props and this is
what i just explained
you can say hey if you see source
and it ends at you know slash dpkg.log
apply
these actions to that source
and when you are editing or you're
creating this make sure you only have
three dots
um i accidentally put four in and then i
couldn't figure out for like a half an
hour
what was why my
props and transforms wasn't working then
i realized i put four dots instead of
three
so splunk is a little sensitive there
and then you see here you know dpkg
startup
if i highlight that one
you see how it matches up here
you know installed
it matches you know up here
installed and how did i create all of
these so this is transforms and this is
what
you know after the props goes hey
here's a source
go to this report and do these actions
and these actions are going to be hey
here's a regex pattern for installed i
need
this
raw event broken up like this
so how did i get and how did i figure
out this regex pattern
easy i went to regex101 so let me switch
over to that screen again
and you can come in here and you grab
your
event
and go to regex101 you can see i've
already copied
the event into this regex101 this is a
on the web
it's a great utility
especially for people who don't know
much about
regular expressions it gives you helpful
information quick references off to the
right and explanations
i said okay i want to my first capture
group
i want installed
and you can see it says match but it's
not
a group
to make it a capture group we got to put
it in parentheses
wrap it in parentheses and now it is a
group
so then after that is a space
but i don't want that captured i'm going
to put them you know space and slash s
for that that you know you can look off
to the right hand side and see
any white space character slash s
and now i'm going to have another
capture group because i want the package
and i'm going to do a dot if you just do
a dot it's only going to capture
a single character
so you know i could i guess i could put
a bunch of dots in there
that's not what i'm going to do here i'm
going to put a dot and a plus
and then i'm going to move out of the
capture group
but it captures everything all the way
to the end because
i haven't you know broken it up yet you
know i can do a slash no way
i got ahead of myself because there is a
amd64 and there is
comma there let's see a colon in there
not comma and i'm going to put that
there
all right now i want that amd
now yeah i'm not going to type in a word
amd i'm just going to do a you know dot
plus
and you can skip ahead if you really
want
star because i'm not going to do this
for everyone but i just wanted to show
you
an example of me capturing one group at
least or one raw event and show you
it wasn't that difficult
and here we can see
group one group two group three group
four
and what it's capturing and you know
like i said all the explanations and
once i have that
you know i can go back to my
notepad or whatever you want and start
building out this for all the different
events
as i can go back to
you know my search
you know to match to remove the rematch
to have configured to match the
installed so what do i do when i
you know i've got
all this built out
i have how i'm going to
you know match on the source to call out
all the functions that i want to do over
here all the the matching and parsing
so i'm going to go to my terminal
and i'm going to clear
[Music]
i don't know why i did that but anyway
um you've got to decide where you want
to put
or which props in transforms file you
want to modify
i have
you know multiple apps so i have you
know multiple i can choose from you know
you can build your own app and put it
all in there like i have one called
audit
you know i can go in here into local and
you know i could edit it here
or create the file here because i don't
have one in here
um but if you want it like global i mean
you can create a whole new app i mean
i'll let you decide how you want to do
that
i'm going to go back to that uh
splunk
underscore ta
underscore next and make sure i go into
local
and then there if i do a i'll just do a
vi on the props
you can see that i have a props.com file
and i've already
have an entry in there this is a help if
you have an ubuntu server and you notice
that you're not parsing
um
auth.log
and you want it to look like syslog
you know you'll need to push this out
to your search heads and you'll need to
push this out to your universal
forwarders like i said i might go over
that in another video
but anyway
i'm gonna go back over to my
notepad
come back in here hit i for insert
and i'm going to right click
so now
the props will know what to do with
source
dpdpkkg.log when it when your search
calls it out
i'm going to hit the escape key
and then write and quit
so if you don't know vi there's a lot of
tutorials and
information out there
um
get familiar with if you're gonna be
playing with the unix so vi trans
there's other ones too but
i i just i've used vi for a while and
i've got very comfortable with it so use
whatever editor you're comfortable with
but now i'm going to create the
transforms.com file here
it's blank
i'm going to insert
go back over to my
notepad
copy this
and there we go
escape
that right and quit
and
before i go too much further i'm gonna
do an ls dash la lah whatever
and look at who owns that file and since
i am logged in as root
uh it did
you know create the file as root and
what i'm going to do is shown
splunk and give splunk because that
splunk owns everything else
ownership of this file
oh i misspelled chone look at me
there we go
just wonder why it was taking so long
and if i do a clear again and do an ls
dash la you can see now splunk owns this
file
so at this point
we should be able to go back to
our splunk search
and you can see here
that
you know there's no fields over here
except for extraction it still remembers
the last search that we ran
so i'm going to rerun this search
and hopefully it's had enough time to
reprocess that if not another thing that
you can do
is remove everything
here
[Music]
that was probably probably better way of
doing it
and do a debug refresh
and this may
take a little bit but you can refresh
um
all the the props and you know some of
the stuff that's
in the background and while that's
happening i also
would like to talk about
if you are new to splunk and you're
looking for ways to
discover what other people are building
or even this video i'm going to build
some like said dashboards around this
data
there's websites out there that you can
go to like go splunk.com
where this is a repository for
you know queries and dashboards for
people to use
you can come in here and if you want to
look up you know windows and do a
keyword search and find out all the
different dashboards or searches that
other people in the community and this
is not owned by splunk but you can come
out here and find other dashboards than
research you know the reports that
people have used i've actually posted
and this is where you know i'll have a
link to this video eventually out here
for this because i'm going to build a
dashboard around this and you can see i
have one for netflow activity nick's
logon dashboard
i do put in some information in here
about
you know how i got this dashboard to
work and even a youtube video
i thought i had a youtube video
i don't have a youtube video on this one
and guess what i'll be building later on
um but here is a
you know documentation and the xml for
that data
and other you know another site that you
could go to
is splunk lantern this is a splunk
supported site where we do have a lot of
information out here
about different ways and
content you know of using splunk
so let's go back to
yeah we can see this is what it'll give
you once it's done
and then i'm going to
[Music]
well
i could have just clicked on search
so let me click search
and then
index
equals
mix source
in and you can see i already have that
right there
let's say last well i can see do i get
anything the last 24 hours
yeah i do
and looky there there's my fields i
wanted
there's action
and i'm getting installed unpack
configured startup
upgrade remove i'm getting info
which that's interesting
and then i'm also getting a version
so if i wanted to i can say action
installed
and now i can say stats
count by
you know if we want host
and package
do i call it package or packaged
i did it again
and now i can get a you know maybe a
better representation of what packages
or which hosts it's being installed
you know if you don't want the word host
i mean there's other ways that we can
start formatting
you know if i wanted to let's see here
i'll show you uh what is it
list
i can't type with you all watching me
you know we can do stuff like this
i mean there's a lot of like i said
there's a lot of commands out there that
make this you know
data very useful
um i talked about i upgraded splunk you
know earlier so i can say you know do a
keyword search for splunk and
you know
open this up to the last 30 days
and then if i want there's the actions i
can say you know upgrade
and then you know stats
count by
and and off to the left-hand side we can
see some of the fields i've got some of
them covered up
to where i can say host
and then
package
version
underscore old
version underscore new
run that
if you don't like
version old and version new
um and you want to rename it just you
know rename
version and score old to old
oh
it's not gonna work forgot the ass
there we go
so hopefully this video has been helpful
for you today and learning more about
props.com transforms
and how we can use that to
parse the data so we can start making
dashboards and visualizations around
this data
if you have any questions leave it in
comment
find me at splunk
and have a good day
5.0 / 5 (0 votes)