Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Fix: Change fast track timestamp default from None to datetime.min #2577

Merged
merged 13 commits into from
Apr 30, 2022

Conversation

kevinclark19a
Copy link
Contributor

Description

In case of a live migration from a host supporting VmSettings to one that doesn't, the current code caches the timestamp of the last fast track goal state in memory so we can determine whether the next fabric goal state should be executed.

However, if the live migration happens before we ever receive a fast track goal state from HostGAPlugin (i.e. HostGAPlugin supports VmSettings but only ever sent Fabric goal states through), we never update the initial value of None for _fast_track_timestamp. This can cause a NoneType exception whenever we try to compare that cached timestamp to a fabric one.

This PR instead initializes the timestamp to datetime.min, indicating that any fabric goal state is newer than the (nonexisted) cached last fast track goal state.


PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines

@codecov
Copy link

codecov bot commented Apr 29, 2022

Codecov Report

Merging #2577 (3425321) into release-2.8.0.0 (dee5fef) will increase coverage by 0.05%.
The diff coverage is 100.00%.

@@                 Coverage Diff                 @@
##           release-2.8.0.0    #2577      +/-   ##
===================================================
+ Coverage            72.12%   72.17%   +0.05%     
===================================================
  Files                  102      102              
  Lines                15379    15379              
  Branches              2441     2441              
===================================================
+ Hits                 11092    11100       +8     
+ Misses                3795     3791       -4     
+ Partials               492      488       -4     
Impacted Files Coverage Δ
azurelinuxagent/common/protocol/hostplugin.py 87.66% <100.00%> (ø)
azurelinuxagent/common/protocol/goal_state.py 95.81% <0.00%> (+0.55%) ⬆️
azurelinuxagent/ga/collect_telemetry_events.py 91.43% <0.00%> (+1.71%) ⬆️
azurelinuxagent/common/utils/timeutil.py 69.23% <0.00%> (+7.69%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dee5fef...3425321. Read the comment docs.

@@ -2919,16 +2919,53 @@ def test_it_should_mark_outdated_goal_states_on_service_restart_when_host_ga_plu
def test_it_should_clear_the_timestamp_for_the_most_recent_fast_track_goal_state(self):
data_file = self._prepare_fast_track_goal_state()

if HostPluginProtocol.get_fast_track_timestamp() is None:
if HostPluginProtocol.get_fast_track_timestamp() is timeutil.create_timestamp(datetime.min):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

== instead of is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, will push a fix


def test_it_should_default_fast_track_timestamp_to_datetime_min(self):
data = DATA_FILE_VM_SETTINGS.copy()
data["vm_settings"] = "hostgaplugin/vm_settings-fabric-no_extension_manifests.json"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the extension manifests relevant to the test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I'm using an extensions config with no certificates to get around a bug in the test code (the permissions issue on the certificate file we talked about). The default vm_settings.json fails in _check_certificates when doing the fetch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is ok for goal_state_no_certs.xml, but why vm_settings-fabric-no_extension_manifests.json? you could have left the extension manifest in the test data and just remove the certificate properties.

or rename the file not to draw attention to the manifests, since they are irrelevant. vm_settings-fabric.json? (though test data without extension manifests looks funny)


with mock_update_handler(protocol, 3, on_new_iteration=mock_live_migration) as update_handler:
with patch("azurelinuxagent.ga.update.logger.error") as patched_error:
def match_unexpected_errors():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy n' paste error? it's checking error messages, not timestamps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually want to leave this here to catch any failures, but I agree I need to add a check on the actual timestamp.

@@ -95,7 +95,7 @@ def __init__(self, endpoint):
if not os.path.exists(self._get_fast_track_state_file()):
self._supports_vm_settings = False
self._supports_vm_settings_next_check = datetime.datetime.now()
self._fast_track_timestamp = None
self._fast_track_timestamp = timeutil.create_timestamp(datetime.datetime.min)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this cause opposite behavior where we want to skip the goal state if current goal state is outdated and fast_track_timestamp has min value. I don't know we ever get to that case, just throwing to think of any.

        if self._extensions_goal_state.created_on_timestamp < vm_settings_support_stopped_error.timestamp:
            self._extensions_goal_state.is_outdated = True
            msg = "Fetched a Fabric goal state older than the most recent FastTrack goal state; will skip it.\nFabric:    {0}\nFastTrack: {1}".format(
                  self._extensions_goal_state.created_on_timestamp, vm_settings_support_stopped_error.timestamp)
            self.logger.info(msg)

@kevinclark19a kevinclark19a merged commit 35fed83 into Azure:release-2.8.0.0 Apr 30, 2022
kevinclark19a added a commit to kevinclark19a/WALinuxAgent that referenced this pull request May 10, 2022
kevinclark19a added a commit that referenced this pull request May 16, 2022
nagworld9 added a commit that referenced this pull request Mar 7, 2023
* release prepare-2.4.0.0 (#2280)

* Fix bug with dependent extensions with no settings (#2285)

* update test-requirements to pin pylint. (#2288) (#2299)

Co-authored-by: Kevin Clark <kevin.clark@microsoft.com>

* Do not create placeholder status file for AKS extensions (#2298)

* Exception for Linux Patch Extension for creating placeholder status file (#2307)

* update release version (#2308)

* Dont create default status file for Single-Config extensions (#2318)

* version update (#2319)

* Update Version (#2348)

* Fix bug with dependent extensions with no settings (#2285) (#2349)

Co-authored-by: Laveesh Rohra <larohra@microsoft.com>

* Use handler status if extension status is None when computing the Ext… (#2358)

* Use handler status if extension status is None when computing the ExtensionsSummary

* Add NotReady to convergence statuses

Co-authored-by: narrieta <narrieta>

* Release preparation 2.5.0.1 (#2360)

* Define ExtensionsSummary.__eq__ (#2371)

* define ExtensionsSummary.__eq__

* Fix test name

* Fix test name

* Fix test name

Co-authored-by: narrieta <narrieta>

* Release preparation 2.5.0.2 (#2372)

* update cgroups monitoring expiry date (#2392)

* Updated Agent Version to 2.6.0.0 (#2393)

* Do not parse status blob (#2394)

Co-authored-by: narrieta <narrieta>

* Exclude dcr from setup (#2396)

* Improve error message for vmSettings errors (#2397)

* Improve error message for vmSettings errors

* Add etag and correlation id

* pylint warning

Co-authored-by: narrieta <narrieta>

* Updated agent version to 2.6.0.1 (#2400)

* Verify that extensions goal state from vmsettings has been initialized (#2404)

Co-authored-by: narrieta <narrieta>

* Updated agent version to 2.6.0.2 (#2405)

* Set Agent version to 2.7.0.0 (#2457)

* Set Agent version to 2.7.0.0

* Remove duplicate import

Co-authored-by: narrieta <narrieta>

* Simplify the logic to update the extensions goal state (#2466)

* Simplify the logic to update the extensions goal state

* Added telemetry for NotSupported

* Added comments

* Do not support old hostgaplugin

Co-authored-by: narrieta <narrieta>

* Retry update_goal_state on GoalStateMismatchError (#2470)

* Retry update_goal_state on GoalStateMismatchError

* Add sleep before retry

* Enable test

* Enable test

* Update test

* Add unit test

* Add data file

* pylint warning

* Add comment; fix typos

* fix typo

Co-authored-by: narrieta <narrieta>

* Set agent version to 2.7.0.1 (#2473)

* Redact settings from mismatch message (#2477)

Co-authored-by: narrieta <narrieta>

* Set Agent Version to 2.7.0.2 (#2478)

Co-authored-by: narrieta <narrieta>

* Improvements in telemetry for vmSettings (#2482)

Co-authored-by: narrieta <narrieta>

* Set Agent version to 2.7.0.3 (#2483)

* Do not raise on missing status blob; reduce amount of logging for vms… (#2492)

* Do not raise on missing status blob; reduce amount of logging for vmsettings

* remove extra file; fix typo

Co-authored-by: narrieta <narrieta>

* Set agent version to 2.7.0.4 (#2494)

Co-authored-by: narrieta <narrieta>

* Save vmSettings on parse errors; improve messages in parse errors (#2503)

* Save vmSettings on parse errors; improve messages in parse errors

* pylint warnings

* pylint warnings

Co-authored-by: narrieta <narrieta>

* Set agent version to 2.7.0.5 (#2504)

Co-authored-by: narrieta <narrieta>

* Revert "Set agent version to 2.7.0.5 (#2504)" (#2505)

This reverts commit ae5a222.

Co-authored-by: narrieta <narrieta>

* ignore firewall packets reset error, check enable firewall config flag and extend cgroup extension monitoring expiry time (#2502)

* Set agent version to 2.7.0.5 (#2506)

Co-authored-by: narrieta <narrieta>

* Handle OOM errors by stopping the periodic log collector (#2510)

* Set agent version to 2.7.0.6 (#2511)

* Add keep_alive property to collect_logs (#2533)

* Set agent version to 2.7.1.0. (#2534)

* Set agent version to 2.8.0.0 (#2545)

Co-authored-by: narrieta <narrieta>

* Update HGAP minimum required version for FastTrack (#2549)

Co-authored-by: narrieta <narrieta>

* Update agent version to 2.8.0.1 (#2550)

Co-authored-by: narrieta <narrieta>

* Improvements in waagent.log (#2558)

* Improvements in waagent.log

* fix function call

* update comment

* typo

Co-authored-by: narrieta <narrieta>

* Change format of history items (#2560)

* Change format of history directory

* Update message; fix typo

* py2 compat

* py2 compat

Co-authored-by: narrieta <narrieta>

* Update agent version to 2.8.0.2 (#2561)

Co-authored-by: narrieta <narrieta>

* Refresh goal state when certificates are missing (#2562)

* Refresh goal state when certificates are missing

* Improve error reporting

* Fix assert message

Co-authored-by: narrieta <narrieta>

* Update agent version to 2.8.0.3 (#2563)

Co-authored-by: narrieta <narrieta>

* Do not mark goal state as processed when goal state fails to update (#2569)

Co-authored-by: narrieta <narrieta>

* Update agent version to 2.8.0.4 (#2570)

Co-authored-by: narrieta <narrieta>

* Bug fix for fetching a goal state with empty certificates property (#2575)

* Move error counter reset down to end of block. (#2576)

* Bug Fix: Change fast track timestamp default from None to datetime.min (#2577)

* Update agent version to 2.8.0.5. (#2580)

* Create placeholder GoalState.*.xml file (#2594)

Co-authored-by: narrieta <narrieta>

* Update version to 2.8.0.6 (#2595)

Co-authored-by: narrieta <narrieta>

* fix network interface restart in RHEL9 (#2592) (#2612)

(cherry picked from commit b8ca432)

* set agent version to 2.7.2.0 (#2613)

* Parse missing agent manifests as empty

* Set agent version to 2.8.0.7

* Retry HGAP's extensionsArtifact requests on BAD_REQUEST status (#2621)

* Retry HGAP's extensionsArtifact requests on BAD_REQUEST status

* python 2.6 compat

Co-authored-by: narrieta <narrieta>

* Retry HGAP's extensionsArtifact requests on BAD_REQUEST status (#2621) (#2622)

* Retry HGAP's extensionsArtifact requests on BAD_REQUEST status

* python 2.6 compat

Co-authored-by: narrieta <narrieta>
(cherry picked from commit dbc82d3)

* fix if command in rhel v8.6+ (#2624)

* Set agent version to 2.7.3.0 (#2625)

Co-authored-by: narrieta <narrieta>

* Set Agent version to 2.8.0.8 (#2627)

Co-authored-by: narrieta <narrieta>

* fix network interface restart in RHEL9 (#2592) (#2629)

(cherry picked from commit b8ca432)

Co-authored-by: Nageswara Nandigam <84482346+nagworld9@users.noreply.github.com>

* fix if command in rhel v8.6+ (#2624) (#2630)

(cherry picked from commit e540728)

Co-authored-by: Nageswara Nandigam <84482346+nagworld9@users.noreply.github.com>

* Set Agent version to 2.8.0.9 (#2631)

Co-authored-by: narrieta <narrieta>

* Cleanup history directory when creating new subdirectories (#2633)

* Cleanup history directory when creating new subdirectories

* Review feedback

Co-authored-by: narrieta <narrieta>

* Set agent version to 2.8.0.10

* Save sharedconfig to disk (#2649)

* Save sharedconfig to disk

* Update tests

* pylint warnings

Co-authored-by: narrieta <narrieta>

* Set Agent version to 2.8.0.11 (#2650)

Co-authored-by: narrieta <narrieta>

* test fixes

* Microsoft mandatory file (#2737)

Co-authored-by: microsoft-github-policy-service[bot] <77245923+microsoft-github-policy-service[bot]@users.noreply.github.com>

---------

Co-authored-by: Laveesh Rohra <larohra@microsoft.com>
Co-authored-by: Kevin Clark <kevin.clark@microsoft.com>
Co-authored-by: Norberto Arrieta <narrieta@users.noreply.github.com>
Co-authored-by: Dhivya Ganesan <dhivyaganesan@users.noreply.github.com>
Co-authored-by: narrieta <narrieta>
Co-authored-by: Kevin Clark <keclar@microsoft.com>
Co-authored-by: Norberto Arrieta <narrieta@microsoft.com>
Co-authored-by: microsoft-github-policy-service[bot] <77245923+microsoft-github-policy-service[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants