Skip to content

multiline_grok crashes when multiline_flush_interval is used. #107

Open
@gadiego92

Description

Describe the bug

I think there is a bug when using multiline_grok .

If Fluentd receives a multi line log entry with special characters and it flushes before it gets a new log entry Fluentd crashes.
This error only appears when multiline_flush_interval is used.
Fluentd is in a crash loop state after the error and it's not able to recover. A restart is needed.

To Reproduce
Just run the following command with the setup provided in the Configuration section:

/opt/td-agent/bin/fluentd --config /home/td-agent/fluentd.conf

Your Environment

  • Fluentd version: fluentd 1.16.2
  • TD Agent version: td-agent 4.5.1 fluentd 1.16.2 (d5685ada81ac89a35a79965f1e94bbe5952a5d3a
  • Operating system: AlmaLinux 9.2 (Turquoise Kodkod)
  • Kernel version: 5.14.0-284.30.1.el9_2.x86_64

Configuration

# /home/td-agent/fluentd.conf

<source>

  @type tail
  tag encoding

  path /tmp/fluentd-encoding/utf8.log
  read_from_head true
  multiline_flush_interval 5s

  <parse>

    @type multiline_grok

    custom_pattern_path /home/td-agent/patterns.conf
    grok_failure_key grokfailure
    multiline_start_regexp /\d{4}-\d{1,2}-\d{1,2}/

    <grok>
      pattern %{CUSTOM_LOG_FAILS}
    </grok>

    <grok>
      pattern %{CUSTOM_LOG_WORKS}
    </grok>

  </parse>

</source>

<match encoding>

    @type stdout

</match>
# /tmp/fluentd-encoding/utf8.log

2023-11-22 18:18:09.823+0100 Testing Berlin
2023-11-22 18:18:09.823+0100 Testing Nürnberg
2023-11-22 18:18:09.823+0100 Testing Hannover
2023-11-22 18:18:09.823+0100 Testing München
javax.management.RuntimeErrorException: null
      at Main.main(Main.java:16) ~[bin/:na]
# /home/td-agent/patterns.conf

CUSTOM_LOG_WORKS %{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:message}
CUSTOM_LOG_FAILS %{HTTPDATE:timestamp} %{NUMBER:response}

Error Log

[td-agent@5200b897ecea ~]$ /opt/td-agent/bin/fluentd --config /home/td-agent/fluentd.conf
2023-12-12 17:40:48 +0100 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2023-12-12 17:40:48 +0100 [info]: parsing config file is succeeded path="/home/td-agent/fluentd.conf"
2023-12-12 17:40:48 +0100 [info]: gem 'fluentd' version '1.16.2'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-calyptia-monitoring' version '0.1.3'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-elasticsearch' version '5.3.0'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-flowcounter-simple' version '0.1.0'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-grok-parser' version '2.6.2'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-kafka' version '0.19.0'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-metrics-cmetrics' version '0.1.2'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-opensearch' version '1.1.4'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-opensearch' version '1.1.3'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-prometheus' version '2.1.0'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-prometheus_pushgateway' version '0.1.1'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-record-modifier' version '2.1.1'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-s3' version '1.7.2'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-sd-dns' version '0.1.0'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-td' version '1.2.0'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-utmpx' version '0.5.0'
2023-12-12 17:40:48 +0100 [info]: gem 'fluent-plugin-webhdfs' version '1.5.0'
2023-12-12 17:40:48 +0100 [info]: Expanded the pattern %{CUSTOM_LOG_FAILS} into (?:(?<timestamp>(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))/(?:\b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y|i)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b)/(?:(?>\d\d){1,2}):(?:(?!<[0-9])(?:(?:2[0123]|[01]?[0-9])):(?:(?:[0-5][0-9]))(?::(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))(?![0-9])) (?:(?:[+-]?(?:[0-9]+)))) (?<response>(?:(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))))
2023-12-12 17:40:48 +0100 [info]: Expanded the pattern %{CUSTOM_LOG_WORKS} into (?:(?<timestamp>(?:(?>\d\d){1,2})-(?:(?:0?[1-9]|1[0-2]))-(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))[T ](?:(?:2[0123]|[01]?[0-9])):?(?:(?:[0-5][0-9]))(?::?(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))?(?:(?:Z|[+-](?:(?:2[0123]|[01]?[0-9]))(?::?(?:(?:[0-5][0-9])))))?) (?<message>.*))
2023-12-12 17:40:48 +0100 [warn]: 'pos_file PATH' parameter is not set to a 'tail' source.
2023-12-12 17:40:48 +0100 [warn]: this parameter is highly recommended to save the position to resume tailing.
2023-12-12 17:40:48 +0100 [info]: using configuration file: <ROOT>
  <source>
    @type tail
    path "/tmp/fluentd-encoding/utf8.log"
    read_from_head true
    tag "encoding"
    multiline_flush_interval 5s
    <parse>
      @type "multiline_grok"
      multiline_start_regexp "/\\d{4}-\\d{1,2}-\\d{1,2}/"
      grok_failure_key "grokfailure"
      custom_pattern_path "/home/td-agent/patterns.conf"
      unmatched_lines
      <grok>
        pattern "%{CUSTOM_LOG_FAILS}"
      </grok>
      <grok>
        pattern "%{CUSTOM_LOG_WORKS}"
      </grok>
    </parse>
  </source>
  <match encoding>
    @type stdout
  </match>
</ROOT>
2023-12-12 17:40:48 +0100 [info]: starting fluentd-1.16.2 pid=857 ruby="3.1.4"
2023-12-12 17:40:48 +0100 [info]: spawn command to main:  cmdline=["/opt/td-agent/bin/ruby", "-Eascii-8bit:ascii-8bit", "/opt/td-agent/bin/fluentd", "--config", "/home/td-agent/fluentd.conf", "--under-supervisor"]
2023-12-12 17:40:48 +0100 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2023-12-12 17:40:48 +0100 [info]: adding match pattern="encoding" type="stdout"
2023-12-12 17:40:48 +0100 [info]: adding source type="tail"
2023-12-12 17:40:48 +0100 [info]: #0 Expanded the pattern %{CUSTOM_LOG_FAILS} into (?:(?<timestamp>(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))/(?:\b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y|i)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b)/(?:(?>\d\d){1,2}):(?:(?!<[0-9])(?:(?:2[0123]|[01]?[0-9])):(?:(?:[0-5][0-9]))(?::(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))(?![0-9])) (?:(?:[+-]?(?:[0-9]+)))) (?<response>(?:(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))))
2023-12-12 17:40:48 +0100 [info]: #0 Expanded the pattern %{CUSTOM_LOG_WORKS} into (?:(?<timestamp>(?:(?>\d\d){1,2})-(?:(?:0?[1-9]|1[0-2]))-(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))[T ](?:(?:2[0123]|[01]?[0-9])):?(?:(?:[0-5][0-9]))(?::?(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))?(?:(?:Z|[+-](?:(?:2[0123]|[01]?[0-9]))(?::?(?:(?:[0-5][0-9])))))?) (?<message>.*))
2023-12-12 17:40:48 +0100 [warn]: #0 'pos_file PATH' parameter is not set to a 'tail' source.
2023-12-12 17:40:48 +0100 [warn]: #0 this parameter is highly recommended to save the position to resume tailing.
2023-12-12 17:40:48 +0100 [info]: #0 starting fluentd worker pid=862 ppid=857 worker=0
2023-12-12 17:40:48 +0100 [info]: #0 following tail of /tmp/fluentd-encoding/utf8.log
2023-12-12 17:40:48 +0100 [warn]: #0 invalid line found file="/tmp/fluentd-encoding/utf8.log" line="2023-11-22 18:18:09.823+0100 Testing N\xC3\xBCrnberg" error="incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)"
2023-12-12 17:40:48.714389300 +0100 encoding: {"timestamp":"2023-11-22 18:18:09.823+0100","message":"Testing Berlin"}
2023-12-12 17:40:48.714509014 +0100 encoding: {"timestamp":"2023-11-22 18:18:09.823+0100","message":"Testing Hannover"}
2023-12-12 17:40:48 +0100 [info]: #0 fluentd worker is now running worker=0
2023-12-12 17:40:53 +0100 [error]: #0 incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)
  2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin/parser_regexp.rb:50:in `match'
  2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin/parser_regexp.rb:50:in `parse'
  2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluent-plugin-grok-parser-2.6.2/lib/fluent/plugin/parser_multiline_grok.rb:21:in `block in parse'
  2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluent-plugin-grok-parser-2.6.2/lib/fluent/plugin/parser_multiline_grok.rb:20:in `each'
  2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluent-plugin-grok-parser-2.6.2/lib/fluent/plugin/parser_multiline_grok.rb:20:in `parse'
  2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin/in_tail.rb:606:in `flush_buffer'
  2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin/in_tail.rb:1256:in `on_notify'
  2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin/in_tail.rb:850:in `on_notify'
  2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin/in_tail.rb:407:in `block in setup_watcher'
  2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin/in_tail.rb:783:in `on_timer'
  2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run_once'
  2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in `run'
  2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2023-12-12 17:40:53 +0100 [error]: #0 /opt/td-agent/lib/ruby/gems/3.1.0/gems/fluentd-1.16.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

When the error appears Fluentd is not able to recover and it gets in a crash loop.

Additional context
If you just add a new line after the multi line log entry with special characters the error doesn't appear:

# /tmp/fluentd-encoding/utf8.log

2023-11-22 18:18:09.823+0100 Testing Berlin
2023-11-22 18:18:09.823+0100 Testing Nürnberg
2023-11-22 18:18:09.823+0100 Testing Hannover
2023-11-22 18:18:09.823+0100 Testing München
javax.management.RuntimeErrorException: null
      at Main.main(Main.java:16) ~[bin/:na]
2023-11-22 18:18:09.823+0100 Testing Stuttgart 

Output

2023-12-12 17:45:46 +0100 [info]: #0 starting fluentd worker pid=874 ppid=869 worker=0
2023-12-12 17:45:46 +0100 [info]: #0 following tail of /tmp/fluentd-encoding/utf8.log
2023-12-12 17:45:46 +0100 [warn]: #0 invalid line found file="/tmp/fluentd-encoding/utf8.log" line="2023-11-22 18:18:09.823+0100 Testing N\xC3\xBCrnberg" error="incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)"
2023-12-12 17:45:46 +0100 [warn]: #0 invalid line found file="/tmp/fluentd-encoding/utf8.log" line="2023-11-22 18:18:09.823+0100 Testing M\xC3\xBCnchen\njavax.management.RuntimeErrorException: null\n      at Main.main(Main.java:16) ~[bin/:na]" error="incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)"
2023-12-12 17:45:46.604890113 +0100 encoding: {"timestamp":"2023-11-22 18:18:09.823+0100","message":"Testing Berlin"}
2023-12-12 17:45:46.604995979 +0100 encoding: {"timestamp":"2023-11-22 18:18:09.823+0100","message":"Testing Hannover"}
2023-12-12 17:45:46 +0100 [info]: #0 fluentd worker is now running worker=0
2023-12-12 17:45:51.605177650 +0100 encoding: {"timestamp":"2023-11-22 18:18:09.823+0100","message":"Testing Stuttgart"}

There are some warning but Fluentd keeps working.

Conclussion
It seems it's a matter of how frequent Fluentd receives logs and the multiline_flush_interval parameter frequency.

If you get a multi line log entry with special characters there are two options:

  1. Fluentd flushes before it gets a new log entry. Fluentd crashes.
  2. Fluentd gets a new log entry before it flushes. Fluentd keeps working.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions