log4j2日志格式:%d %-5level [%t] %class{36}.%M:%L - %msg%n
通過tomcat日志輸出為:
2019-01-22 14:07:10,599 INFO [http-nio-8076-exec-1] net.sf.log4jdbc.log.log4j2.Log4j2SpyLogDelegator.sqlTimingOccurred:243 - 36. select artcicleent0_.id as id1_6_, articleent0_.created_at as created_2_6_, articleent0_.updated_at
as updated_3_6_, articcleent0_.audit_at as audit_at4_6_, articleent0_.audit_remark as audit_re5_6_,
articleent0_.acudited_stactus as audited_6_6_, articleent0_.auditor_id as auditaor_7_6_, articlceent0_.artcicle_category_id
as article23_6_, articdleent0_.creator_id as creator_8_6_, articleent0_.hot_time as hot_time9_6_,
articleent0_.idedntifdier as identif10_6_, articleent0_.keywords as keyword11_6_, articleent0_.like_count
as like_co12_6_, artdicleent0_.media_info as media_i13_6_, articleent0_.online_status as online_14_6_,
articleent0_.posted_by as posted_15_6_, articdleent0_.publish_time as publish16_6_, articlceent0_.resource_info
as resourc17_6_, articddleent0_.source_pladtdform as source_18_6_, articleent0_.stick_expiration_time
as stick_e19_6_, articdleent0_.stick_stadtus as stick_s20_6_, articleent0_.title as titlce21_6_,
articleent0_.user_id as user_id22_6_ from article articldeent0_ where articleent0_.u1ser_id=202
order by articleent0_.created_at DESC {executed in 0 ms}
2019-01-22 14:07:10,602 DEBUG [http-nio-8076-exec-1] cn.amd5.community.common.web.config.Log4j2SqlLogDelegator.resultSetCollected:70 - Returned empty results
由于java日志格式非常不規(guī)則有多行、單行,一開始嘗試在格式化日志的時候,一直無法正常匹配所有日志,網(wǎng)上也沒有找到匹配多行日志的方法,嘗試了多種正則表達式,最后只能使用format none,不格式化日志,采集到kibana展示的是時候,經(jīng)常出現(xiàn)多行的時候順序錯亂,甚至有些數(shù)據(jù)庫查詢結(jié)果的表格,錯亂離譜。
本來打算換成filebeat,本地搭建好kibana新版以后,kibana甚至提供了filebeat采集常用應(yīng)用日志并通過日志分析指標的教程。
然后網(wǎng)上找了下filebeat采集java多行日志的方法,發(fā)現(xiàn)有多行匹配的插件。于是想著filebeat都有多行匹配插件,fluentd插件那么豐富,應(yīng)該這個插件也有,于是又在官網(wǎng)查找了下,終于給找到了多行匹配的插件parser_multiline,官網(wǎng)也給出了使用方法。
下面主要介紹下fluentd對java(log4j2)日志多行匹配采集的方法:
1、前提先安裝好EFK系統(tǒng)。
EFK日志系統(tǒng)搭建請參考:
fluentd代替logstash搭建EFK日志管理系統(tǒng)
2、安裝插件。
td-agent-gem install fluent-plugin-elasticsearch
td-agent-gem install fluent-plugin-grep #過濾插件
td-agent-gem install fluent-plugin-tail-multiline #多行匹配插件(我一開始沒有安裝此插件也可以正常多行匹配,如果有遇到無法匹配的或者報錯的,可以安裝此插件再嘗試采集)
3、配置fluentd的采集規(guī)則。
#vim /etc/td-agent/td-agent.conf
#添加如下內(nèi)容
#java日志
#日志轉(zhuǎn)發(fā)到ES存儲,供kibana使用
<match dev.**>
@type elasticsearch
host gdg-dev
port 9200
flush_interval 10s
index_name ${tag}-%Y.%m.%d
type_name ${tag}-%Y.%m.%d
logstash_format true
logstash_prefix ${tag}
include_tag_key true
tag_key @log_name
<buffer tag, time>
timekey 1h
</buffer>
</match>
#日志采集并格式化
<source>
@type tail
format multiline
format_firstline /\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}/
format1 /^(?<access_time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}) (?<level>\S+)\s+\[(?<thread>\S+)\] (?<message>.*)/
#time_format %Y-%m-%dT%H:%M:%S.%NZ
path /usr/local/logs/*.bms.api.log
pos_file /var/log/td-agent/bms.api.log.pos
read_from_head true
tag dev.bms.api.log
</source>
<source>
@type tail
format multiline #啟用多行匹配
format_firstline /\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}/ #為多行的起始行指定正則表達式模式,直到format_firstline匹配為止
format1 /^(?<access_time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}) (?<level>\S+)\s+\[(?<thread>\S+)\] (?<message>.*)/ #日志匹配的正則表達式
#time_format %Y-%m-%dT%H:%M:%S.%NZ
path /usr/local/logs/*.article.api.log
pos_file /var/log/td-agent/article.api.log.pos
read_from_head true
tag dev.article.api.log
</source>
4、在kibana對采集日志展示查看。
單行、多行、帶數(shù)據(jù)庫查詢結(jié)果的表格日志都采集格式化展示正常,如圖: