multiline_grok
crashes when multiline_flush_interval
is used. #107
Description
Describe the bug
I think there is a bug when using multiline_grok
.
If Fluentd receives a multi line log entry with special characters and it flushes before it gets a new log entry Fluentd crashes.
This error only appears when multiline_flush_interval
is used.
Fluentd is in a crash loop state after the error and it's not able to recover. A restart is needed.
To Reproduce
Just run the following command with the setup provided in the Configuration section:
/opt/td-agent/bin/fluentd --config /home/td-agent/fluentd.conf
Your Environment
- Fluentd version: fluentd 1.16.2
- TD Agent version: td-agent 4.5.1 fluentd 1.16.2 (d5685ada81ac89a35a79965f1e94bbe5952a5d3a
- Operating system: AlmaLinux 9.2 (Turquoise Kodkod)
- Kernel version: 5.14.0-284.30.1.el9_2.x86_64
Configuration
# /home/td-agent/fluentd.conf
<source>
@type tail
tag encoding
path /tmp/fluentd-encoding/utf8.log
read_from_head true
multiline_flush_interval 5s
<parse>
@type multiline_grok
custom_pattern_path /home/td-agent/patterns.conf
grok_failure_key grokfailure
multiline_start_regexp /\d{4}-\d{1,2}-\d{1,2}/
<grok>
pattern %{CUSTOM_LOG_FAILS}
</grok>
<grok>
pattern %{CUSTOM_LOG_WORKS}
</grok>
</parse>
</source>
<match encoding>
@type stdout
</match>
# /tmp/fluentd-encoding/utf8.log
2023-11-22 18:18:09.823+0100 Testing Berlin
2023-11-22 18:18:09.823+0100 Testing Nürnberg
2023-11-22 18:18:09.823+0100 Testing Hannover
2023-11-22 18:18:09.823+0100 Testing München
javax.management.RuntimeErrorException: null
at Main.main(Main.java:16) ~[bin/:na]
# /home/td-agent/patterns.conf
CUSTOM_LOG_WORKS %{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:message}
CUSTOM_LOG_FAILS %{HTTPDATE:timestamp} %{NUMBER:response}
Error Log
[td-agent@5200b897ecea ~]$ /opt/td-agent/bin/fluentd --config /home/td-agent/fluentd.conf
2023-12-12 17:40:48 +0100 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2023-12-12 17:40:48 +0100 [info]: parsing config file is succeeded path="/home/td-agent/fluentd.conf"
2023-12-12 17:40:48 +0100 [info]: gem 'fluentd' version '1.16.2'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-calyptia-monitoring' version '0.1.3'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-elasticsearch' version '5.3.0'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-flowcounter-simple' version '0.1.0'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-grok-parser' version '2.6.2'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-kafka' version '0.19.0'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-metrics-cmetrics' version '0.1.2'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-opensearch' version '1.1.4'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-opensearch' version '1.1.3'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-prometheus' version '2.1.0'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-prometheus_pushgateway' version '0.1.1'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-record-modifier' version '2.1.1'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-s3' version '1.7.2'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-sd-dns' version '0.1.0'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-td' version '1.2.0'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-utmpx' version '0.5.0'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-webhdfs' version '1.5.0'
2023-12-12 17:40:48 +0100 [info]: Expanded the pattern %{CUSTOM_LOG_FAILS} into (?:(?<timestamp>(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))/(?:\b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y|i)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b)/(?:(?>\d\d){1,2}):(?:(?!<[0-9])(?:(?:2[0123]|[01]?[0-9])):(?:(?:[0-5][0-9]))(?::(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))(?![0-9])) (?:(?:[+-]?(?:[0-9]+)))) (?<response>(?:(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))))
2023-12-12 17:40:48 +0100 [info]: Expanded the pattern %{CUSTOM_LOG_WORKS} into (?:(?<timestamp>(?:(?>\d\d){1,2})-(?:(?:0?[1-9]|1[0-2]))-(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))[T ](?:(?:2[0123]|[01]?[0-9])):?(?:(?:[0-5][0-9]))(?::?(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))?(?:(?:Z|[+-](?:(?:2[0123]|[01]?[0-9]))(?::?(?:(?:[0-5][0-9])))))?) (?<message>.*))
2023-12-12 17:40:48 +0100 [warn]: 'pos_file PATH' parameter is not set to a 'tail' source.
2023-12-12 17:40:48 +0100 [warn]: this parameter is highly recommended to save the position to resume tailing.
2023-12-12 17:40:48 +0100 [info]: using configuration file: <ROOT>
<source>
@type tail
path "/tmp/fluentd-encoding/utf8.log"
read_from_head true
tag "encoding"
multiline_flush_interval 5s
<parse>
@type "multiline_grok"
multiline_start_regexp "/\\d{4}-\\d{1,2}-\\d{1,2}/"
grok_failure_key "grokfailure"
custom_pattern_path "/home/td-agent/patterns.conf"
unmatched_lines
<grok>
pattern "%{CUSTOM_LOG_FAILS}"
</grok>
<grok>
pattern "%{CUSTOM_LOG_WORKS}"
</grok>
</parse>
</source>
<match encoding>
@type stdout
</match>
</ROOT>
2023-12-12 17:40:48 +0100 [info]: starting fluentd-1.16.2 pid=857 ruby="3.1.4"
2023-12-12 17:40:48 +0100 [info]: spawn command to main: cmdline=["/opt/td-agent/bin/ruby", "-Eascii-8bit:ascii-8bit", "/opt/td-agent/bin/fluentd", "--config", "/home/td-agent/fluentd.conf", "--under-supervisor"]
2023-12-12 17:40:48 +0100 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2023-12-12 17:40:48 +0100 [info]: adding match pattern="encoding" type="stdout"
2023-12-12 17:40:48 +0100 [info]: adding source type="tail"
2023-12-12 17:40:48 +0100 [info]: #0 Expanded the pattern %{CUSTOM_LOG_FAILS} into (?:(?<timestamp>(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))/(?:\b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y|i)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b)/(?:(?>\d\d){1,2}):(?:(?!<[0-9])(?:(?:2[0123]|[01]?[0-9])):(?:(?:[0-5][0-9]))(?::(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))(?![0-9])) (?:(?:[+-]?(?:[0-9]+)))) (?<response>(?:(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))))
2023-12-12 17:40:48 +0100 [info]: #0 Expanded the pattern %{CUSTOM_LOG_WORKS} into (?:(?<timestamp>(?:(?>\d\d){1,2})-(?:(?:0?[1-9]|1[0-2]))-(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))[T ](?:(?:2[0123]|[01]?[0-9])):?(?:(?:[0-5][0-9]))(?::?(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))?(?:(?:Z|[+-](?:(?:2[0123]|[01]?[0-9]))(?::?(?:(?:[0-5][0-9])))))?) (?<message>.*))
2023-12-12 17:40:48 +0100 [warn]: #0 'pos_file PATH' parameter is not set to a 'tail' source.
2023-12-12 17:40:48 +0100 [warn]: #0 this parameter is highly recommended to save the position to resume tailing.
2023-12-12 17:40:48 +0100 [info]: #0 starting fluentd worker pid=862 ppid=857 worker=0
2023-12-12 17:40:48 +0100 [info]: #0 following tail of /tmp/fluentd-encoding/utf8.log
2023-12-12 17:40:48 +0100 [warn]: #0 invalid line found file="/tmp/fluentd-encoding/utf8.log" line="2023-11-22 18:18:09.823+0100 Testing N\xC3\xBCrnberg" error="incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)"
2023-12-12 17:40:48.714389300 +0100 encoding: {"timestamp":"2023-11-22 18:18:09.823+0100","message":"Testing Berlin"}
2023-12-12 17:40:48.714509014 +0100 encoding: {"timestamp":"2023-11-22 18:18:09.823+0100","message":"Testing Hannover"}
2023-12-12 17:40:48 +0100 [info]: #0 fluentd worker is now running worker=0
2023-12-12 17:40:53 +0100 [error]: #0 incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)
2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin/parser_regexp.rb:50:in `match'
2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin/parser_regexp.rb:50:in `parse'
2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluent-plugin-grok-parser-2.6.2/lib/fluent/plugin/parser_multiline_grok.rb:21:in `block in parse'
2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluent-plugin-grok-parser-2.6.2/lib/fluent/plugin/parser_multiline_grok.rb:20:in `each'
2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluent-plugin-grok-parser-2.6.2/lib/fluent/plugin/parser_multiline_grok.rb:20:in `parse'
2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin/in_tail.rb:606:in `flush_buffer'
2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin/in_tail.rb:1256:in `on_notify'
2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin/in_tail.rb:850:in `on_notify'
2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin/in_tail.rb:407:in `block in setup_watcher'
2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin/in_tail.rb:783:in `on_timer'
2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run_once'
2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run'
2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
When the error appears Fluentd is not able to recover and it gets in a crash loop.
Additional context
If you just add a new line after the multi line log entry with special characters the error doesn't appear:
# /tmp/fluentd-encoding/utf8.log
2023-11-22 18:18:09.823+0100 Testing Berlin
2023-11-22 18:18:09.823+0100 Testing Nürnberg
2023-11-22 18:18:09.823+0100 Testing Hannover
2023-11-22 18:18:09.823+0100 Testing München
javax.management.RuntimeErrorException: null
at Main.main(Main.java:16) ~[bin/:na]
2023-11-22 18:18:09.823+0100 Testing Stuttgart
Output
2023-12-12 17:45:46 +0100 [info]: #0 starting fluentd worker pid=874 ppid=869 worker=0
2023-12-12 17:45:46 +0100 [info]: #0 following tail of /tmp/fluentd-encoding/utf8.log
2023-12-12 17:45:46 +0100 [warn]: #0 invalid line found file="/tmp/fluentd-encoding/utf8.log" line="2023-11-22 18:18:09.823+0100 Testing N\xC3\xBCrnberg" error="incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)"
2023-12-12 17:45:46 +0100 [warn]: #0 invalid line found file="/tmp/fluentd-encoding/utf8.log" line="2023-11-22 18:18:09.823+0100 Testing M\xC3\xBCnchen\njavax.management.RuntimeErrorException: null\n at Main.main(Main.java:16) ~[bin/:na]" error="incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)"
2023-12-12 17:45:46.604890113 +0100 encoding: {"timestamp":"2023-11-22 18:18:09.823+0100","message":"Testing Berlin"}
2023-12-12 17:45:46.604995979 +0100 encoding: {"timestamp":"2023-11-22 18:18:09.823+0100","message":"Testing Hannover"}
2023-12-12 17:45:46 +0100 [info]: #0 fluentd worker is now running worker=0
2023-12-12 17:45:51.605177650 +0100 encoding: {"timestamp":"2023-11-22 18:18:09.823+0100","message":"Testing Stuttgart"}
There are some warning but Fluentd keeps working.
Conclussion
It seems it's a matter of how frequent Fluentd receives logs and the multiline_flush_interval
parameter frequency.
If you get a multi line log entry with special characters there are two options:
- Fluentd flushes before it gets a new log entry. Fluentd crashes.
- Fluentd gets a new log entry before it flushes. Fluentd keeps working.