ansible.builtin.service_facts state is not accurate when the source is systemd #84606
Description
Summary
As indicated in the title, the ansible.builtin.service_facts
module does not accurately report the state of services when the source is systemd
.
On hosts using systemd
, the state exited
or dead
indicates that the service is inactive.
However, this does not confirm that the service is properly in the stopped
state.
According to the code in the SystemctlScanService
class here :
class SystemctlScanService(BaseService):
BAD_STATES = frozenset(['not-found', 'masked', 'failed'])
def systemd_enabled(self):
return is_systemd_managed(self.module)
def _list_from_units(self, systemctl_path, services):
# list units as systemd sees them
rc, stdout, stderr = self.module.run_command("%s list-units --no-pager --type service --all --plain" % systemctl_path, use_unsafe_shell=True)
if rc != 0:
self.module.warn("Could not list units from systemd: %s" % stderr)
else:
for line in [svc_line for svc_line in stdout.split('\n') if '.service' in svc_line]:
state_val = "stopped"
status_val = "unknown"
fields = line.split()
# systemd sometimes gives misleading status
# check all fields for bad states
for bad in self.BAD_STATES:
# except description
if bad in fields[:-1]:
status_val = bad
break
else:
# active/inactive
status_val = fields[2]
service_name = fields[0]
if fields[3] == "running":
state_val = "running"
services[service_name] = {"name": service_name, "state": state_val, "status": status_val, "source": "systemd"}
the service state on hosts is determined by the SUB
parameter from the following command.
$ systemctl list-units --no-pager --type service --all --plain
Unfortunately, the SUB
parameter, which is referenced as fields[3]
, is later overridden by the state_val
variable, which is set to stopped
by default at the beginning.
As a result, this leads to the module always returning only two possible outcomes: stopped
or running
, rather than accurately reporting the true state of the services.
Issue Type
Bug Report
Component Name
lib/ansible/modules/service_facts.py
Ansible Version
$ ansible --version
ansible [core 2.15.12]
config file = /root/.ansible/ansible.cfg
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /root/.local/lib/python3.9/site-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = /root/.local/bin/ansible
python version = 3.9.19 (main, Jul 18 2024, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] (/usr/bin/python3)
jinja version = 3.1.4
libyaml = True
Configuration
# if using a version older than ansible-core 2.12 you should omit the '-t all'
$ ansible-config dump --only-changed -t all
CONFIG_FILE() = /root/.ansible/ansible.cfg
DEFAULT_BECOME(/root/.ansible/ansible.cfg) = False
DEFAULT_HOST_LIST(/root/.ansible/ansible.cfg) = ['/etc/ansible/hosts']
DEFAULT_REMOTE_USER(/root/.ansible/ansible.cfg) = root
OS / Environment
- CentOS Linux 7
- CentOS Linux 8
- CentOS Stream 9
- Red Hat Enterprise Linux 7
- Red Hat Enterprise Linux 7
- Red Hat Enterprise Linux 9
Steps to Reproduce
This playbook captures the state of services both before and after kill the rsyslog service.
After kill, the service enters a dead
state, but the service_facts
module reports it as stopped
.
# systemd_test.yml
- name: Test systemd service_facts
hosts: all
gather_facts: no
# Global Vars
vars:
servname: "rsyslog"
tasks:
- name: state of services now
command: systemctl list-units --no-pager --type service --all --plain
register: service_list_pre
- name: set fact
set_fact:
rsyslog_state_pre: "{{ service_list_pre.stdout_lines | select('search', servname) | list | first | regex_replace('\\s+', ' ') | split(' ') }}"
- name: print state from command
debug:
msg: "rsyslog is {{ rsyslog_state_pre[4]}}"
- name: kill rsyslog
command: killall rsyslogd
- name: new state of services
command: systemctl list-units --no-pager --type service --all --plain
register: service_list_post
- name: set fact
set_fact:
rsyslog_state_now: "{{ service_list_post.stdout_lines | select('search', servname) | list | first | regex_replace('\\s+', ' ') | split(' ') }}"
- name: print new state from command
debug:
msg: "rsyslog is {{ rsyslog_state_now[4]}}"
- name: populate service facts
service_facts:
- name: print state from service_facts
debug:
msg: "rsyslog is {{ ansible_facts['services'].values() | selectattr('name', 'equalto', servname + '.service') | map(attribute='state') }}"
Unfortunately, I cannot provide an output with the -vvv
option as it exceeds 65536 characters.
Expected Results
This module should report the true state of the services without overwriting it.
Accurately determining the state of a service is essential when using Ansible, particularly in enterprise environments.
Here is a suggested modification to the SystemctlScanService
class to ensure that the value of fields[3]
is directly passed to the state_val
variable.
class SystemctlScanService(BaseService):
BAD_STATES = frozenset(['not-found', 'masked', 'failed'])
def systemd_enabled(self):
return is_systemd_managed(self.module)
def _list_from_units(self, systemctl_path, services):
# list units as systemd sees them
rc, stdout, stderr = self.module.run_command("%s list-units --no-pager --type service --all --plain" % systemctl_path, use_unsafe_shell=True)
if rc != 0:
self.module.warn("Could not list units from systemd: %s" % stderr)
else:
for line in [svc_line for svc_line in stdout.split('\n') if '.service' in svc_line]:
state_val = "unknown" # Default to unknown
status_val = "unknown"
fields = line.split()
# systemd sometimes gives misleading status
# check all fields for bad states
for bad in self.BAD_STATES:
# except description
if bad in fields[:-1]:
status_val = bad
break
else:
# active/inactive
status_val = fields[2]
service_name = fields[0]
state_val = fields[3] # Set state_val to the real value of fields[3]
services[service_name] = {"name": service_name, "state": state_val, "status": status_val, "source": "systemd"}
Actual Results
PLAY [Test systemd service_facts] **********************************************************************************************************************************************************************************
TASK [state of rsyslog now] ****************************************************************************************************************************************************************************************
changed: [localhost]
TASK [set fact] ****************************************************************************************************************************************************************************************************
ok: [localhost]
TASK [print state from command] ************************************************************************************************************************************************************************************
ok: [localhost] => {
"msg": "rsyslog is System"
}
TASK [kill rsyslog] ************************************************************************************************************************************************************************************************
changed: [localhost]
TASK [new state of rsyslog] ****************************************************************************************************************************************************************************************
changed: [localhost]
TASK [set fact] ****************************************************************************************************************************************************************************************************
ok: [localhost]
TASK [print new state from command] ********************************************************************************************************************************************************************************
ok: [localhost] => {
"msg": "rsyslog is System"
}
TASK [populate service facts] **************************************************************************************************************************************************************************************
ok: [localhost]
TASK [print state from service_facts] ******************************************************************************************************************************************************************************
ok: [localhost] => {
"msg": "rsyslog is ['stopped']"
}
PLAY RECAP *********************************************************************************************************************************************************************************************************
localhost : ok=9 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Code of Conduct
- I agree to follow the Ansible Code of Conduct