使用filebeat可以在收集过程中进行一些简单的处理,如丢弃日志等,给后面的kafka等减少压力
普通文本日志格式 原始日志格式:
1 {"log":"2022-03-15 14:53:48.972 [http-nio-8080-exec-10] o.s.c.c.c.ConfigServicePropertySourceLocator-[227]-[INFO]-Connect Timeout Exception on Url - http://localhost:8888. Will be trying the next url if available\n","stream":"stdout","time":"2022-03-15T06:53:48.972854745Z"} 
 
这里的原始日志是指要收集的日志文件的格式,上面的这个日志是被Kubernetes处理过的,真正程序输出的日志应该是log字段。
对应的filebeat配置文件如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 filebeat.inputs: -  type:  log     symlinks:  true      enabled:  true      json.keys_under_root:  false           json.overwrite_keys:  false            json.add_error_key:  true              tail_files:  true                      paths:      -  /var/log/containers/*_dev_*.log                     processors:      -  drop_event:          when:              or:              -  regexp:                  json.log:  '定时任务task'              -  regexp:                  json.log:  '定时任务执行成功'      fields:      log_topic:  k8s-pod-logs      type:  "kube-logs"            
 
经过filebeat处理后输出的内容: 所以回过头来看上面的配置文件,drop_event regexp 下面 针对 json.log 做正则匹配,包含指定字符就丢弃。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 {     "@timestamp" :  "2022-03-15T07:04:44.917Z" ,      "@metadata" :  {          "beat" :  "filebeat" ,          "type" :  "_doc" ,          "version" :  "7.3.2" ,          "topic" :  "k8s-pod-logs"      } ,      "log" :  {          "offset" :  6799764 ,          "file" :  {              "path" :  "/var/log/containers/idk-oms-server-7d95759554-9xqjs_dev_idk-oms-server-d43d4364fa14ac1657d4ee1d4ef2dda64acce26025bb1c7c605844dc906efbef.log"          }      } ,      "json" :  {          "log" :  "[2022-03-15 15:04:37.637] http-nio-8080-exec-8 com.ingeek.idk.oms.b-[-1]-[DEBUG]-[TID:null NAME:APPNAME ENV:dev INS:172.20.37.197] CORSFilter is work~~~\n" ,          "stream" :  "stdout" ,          "time" :  "2022-03-15T07:04:37.638273461Z"      } ,      "input" :  {          "type" :  "log"      } ,      "fields" :  {          "type" :  "kube-logs" ,          "log_topic" :  "k8s-pod-logs"      } ,      "ecs" :  {          "version" :  "1.0.1"      } ,      "host" :  {          "name" :  "localhost.localdomain" ,          "architecture" :  "x86_64" ,          "os" :  {              "platform" :  "centos" ,              "version" :  "7 (Core)" ,              "family" :  "redhat" ,              "name" :  "CentOS Linux" ,              "kernel" :  "5.4.113-1.el7.elrepo.x86_64" ,              "codename" :  "Core"          } ,          "containerized" :  false ,          "hostname" :  "localhost.localdomain"      } ,      "agent" :  {          "ephemeral_id" :  "d004fcc4-901c-4cc9-89c2-88ff0717c16d" ,          "hostname" :  "localhost.localdomain" ,          "id" :  "e7750a81-c8a1-43a6-a49c-db2be0ac4a8b" ,          "version" :  "7.3.2" ,          "type" :  "filebeat"      }  } 
 
针对json格式日志处理 原始日志:
1 { "@timestamp" :  "2022-03-15T14:15:14+08:00" , "server_addr" : "192.168.13.120" , "remote_addr" : "10.200.4.70" , "scheme" : "https" , "request_method" : "POST" , "request_uri" :  "/ingeek-analysis/apis/v3/collect" , "request_length" :  "25545" , "uri" :  "/ingeek-analysis/apis/v3/collect" , "request_time" : 0.004 , "body_bytes_sent" : 833 , "bytes_sent" : 1072 , "status" : "200" , "upstream_host" : "172.20.17.40:8084" , "domain" : "gemalto-tam-dk.vrzbq.com" , "http_referer" : "-" , "http_user_agent" : "-" , "http_app_id" : "3bdd8886a6261935" , "x_forwarded" : "-" , "up_r_time" : "0.003" , "up_status" : "200" , "os_plant" : "android" , "os_version" : "11" , "app_version" : "4.0.4" , "app_build" : "97" , "guid" : "b1d05ee5-b45a-40ef-92de-43008ee5eccb" , "resolution_ratio" : "1080*2193" , "ip" : "fe80::8070:92ff:fe99:902a%dummy0" , "imsi" : "b1d05ee5-b45a-40ef-92de-43008ee5eccb" , "listen_port" : "443" } 
 
某一行包含xx就丢弃 比如我想丢弃所有request_uri为/actuator/info或http_user_agent为Go-http-client/2.0的日志
filebeat.yml配置文件如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 -  type:  log     symlinks:  true      enabled:  true      json.keys_under_root:  true      json.overwrite_keys:  true      json.add_error_key:  true      tail_files:  true      paths:      -  /var/log/containers/ingress-nginx-controller*.log      processors:      -  decode_json_fields:                     fields:  ['log' ]                      target:  ""          overwrite_keys:  false          process_array:  false          max_depth:  1      -  drop_event:          when:              or:              -  regexp:                  http_user_agent:  'Go-http-client/2.0'              -  regexp:                  request_uri:  '/actuator/info'      -  drop_fields:                            fields:  ["log" ,"host" ]              ignore_missing:  false               fields:      log_topic:  "ingress-k8s"      type:  "ingress"  
 
因为日志本身就是个json,经过filebeat后又会包装一个json,所以我们需要对日志做二级解析。