Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create or use user's Session in WorkflowsService to reuse connections #1183

Merged
merged 3 commits into from
Sep 10, 2024

Conversation

AchilleAsh
Copy link
Contributor

Pull Request Checklist

Description of PR

The WorkflowService uses direct calls to requests library, which under the hood creates a Session, makes the request, then closes the session.

This means that the TCP connection is opened and closed for each call instead of being reused - which causes unnecessary overhead if the service is making multiple calls.

This PR adds the possibility to use a Session object in the Service, by default not enabled to keep the current behavior as-is.

From a performance point of view, if Argo is running locally the advantage will be virtually non-existent ; however, we can simulate the outcome with a remote Argo server using toxiproxy:

toxiproxy-cli create --listen localhost:2747 --upstream localhost:2746 argo 
toxiproxy-cli toxic add -t latency -a latency=20 -a jitter=5 argo  

Using ipython and the magic command %timeit:

from hera.workflows import WorkflowsService

# default current behavior: no session
ws = WorkflowsService("https://localhost:2747", verify_ssl=False)

%timeit -n10 -r10 ws.list_workflows("argo")
133 ms ± 2.17 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)

ws = WorkflowsService("https://localhost:2747", verify_ssl=False, use_session=True)
%timeit -n10 -r10 ws.list_workflows("argo")
77.4 ms ± 2.01 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)

In toxiproxy logs (and the kubectl port-forward logs) , we can see that the tcp open is done now only once.

Thanks!

Copy link

codecov bot commented Aug 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.9%. Comparing base (d553546) to head (adad63e).
Report is 196 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##            main   #1183     +/-   ##
=======================================
+ Coverage   81.7%   81.9%   +0.2%     
=======================================
  Files         54      61      +7     
  Lines       4208    4750    +542     
  Branches     889    1015    +126     
=======================================
+ Hits        3439    3892    +453     
- Misses       574     625     +51     
- Partials     195     233     +38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

self.namespace = namespace or global_config.namespace

def request(self, method, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd mark this private; I don't think we want this added to the public API. (Same for WorkflowsService.)

Suggested change
def request(self, method, **kwargs):
def _request(self, method, **kwargs):

@@ -47,6 +47,7 @@ def __init__(
token: Optional[str] = None,
client_certs: Optional[Tuple[str, str]] = None,
namespace: Optional[str] = None,
use_session: Optional[bool] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not be more flexible to pass in a session directly? (Same for WorkflowsService.)

@elliotgunton
Copy link
Collaborator

elliotgunton commented Sep 2, 2024

by default not enabled to keep the current behavior as-is

Just out of curiosity - is there any reason not to enable it by default, i.e. is it not a strict enhancement to go from request.get to self.session.get? What could break? Looks like a straight upgrade from docs and stack overflow as request.get creates a one-time-use Session anyway.

Thinking about client usage of WorkflowsService with a Session, I think it might be better to implement context management for WorkflowsService for basic Session usage:

with WorkflowsService(...) as ws:
    ws.create(...)
    ws.list(...)

This would implicitly use a single session. If we want to also allow users to pass in the session for flexibility as per @alicederyn's comment, then they can follow a similar pattern.

with request.Session(...) as session:
    ws = WorkflowsService(session=session, ...)
    ws.create(...)
    ws.list(...)

It seems reusing a "closed" session just reopens it, and you can close multiple times, so using a "dead" session doesn't seem to be a problem:

>>> import requests
>>> session = requests.Session()
>>> session.get("https://example.com/")
<Response [200]>
>>> session.close()
>>> session.get("https://example.com/")
<Response [200]>
>>> session.close()
>>> session.close()

@alicederyn
Copy link
Collaborator

alicederyn commented Sep 2, 2024

I was unable to find a clear statement that sessions were thread-safe in requests. I did find a closed issue asking for it to be documented one way or another but I couldn't tell what the close status was intended to be. I'm also not sure if the service classes are intended to be thread-safe themselves?

@AchilleAsh
Copy link
Contributor Author

Just out of curiosity - is there any reason not to enable it by default, i.e. is it not a strict enhancement to go from request.get to self.session.get?

Oh, I fully agree :) and I think its fits anyway the "spirit" of having a WorkflowsService: it makes sense for it to hold a session to the Argo Server for the duration of its existence. I just initially thought of introducing this change as opt-in, but if we all agree then it can be the default.

Note that the Workflow class creates a default WorkflowsService if not provided, so this would also implicitly initialize and keep one session alive for the duration of the Workflow object (i.e. keep connections opened if calls were made) - and this would also not call it as context manager. So maybe the default service for workflows shouldn't use a session?

Thinking about client usage of WorkflowsService with a Session, I think it might be better to implement context management for WorkflowsService for basic Session usage:

I can add a .close() method to the Service which explicitly calls the .close() of the Session, and then yes it's just convenient to expose the context manager interface too.

This would implicitly use a single session. If we want to also allow users to pass in the session for flexibility as per @alicederyn's comment, then they can follow a similar pattern.

Perhaps it would be better to let the WorkflowService open and close the user provided Session, rather than relying on the user to open/close it. Also because we need to do it anyway if the user doesn't pass a custom Session and we use the regular one.

session = MyCustomSession()
with WorkflowsService(session=session) as ws:
    # will open/close the session
    ws.list(...)
    with Workflow(..., workflows_service=ws) as wf1:
        ...
    with Workflow(..., workflows_service=ws) as wf2:
        ...
    with Workflow(...) as wf3:
        # wf3.workflows_service != ws
        ...

It is probably better to still have to explicitly pass the service to any Workflow created under the context manager block ( Explicit is better than implicit.)

I was unable to find a clear statement that sessions were thread-safe in requests.

It's a long-standing question indeed, I've always sided with the "better safe than sorry" approach and used one session per thread. But since the urllib3 pool is thread safe in the newer versions (2022+), it is probably fine

I'm also not sure if the service classes are intended to be thread-safe themselves?

I feel like it should be thread safe: If you have tens of Workflows to submit at once, it makes sense to use one service and reuse connections. Else you just end up with the current behavior of opening one session thus one connection per call.

Copy link
Collaborator

@elliotgunton elliotgunton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the checks failing with Code is not up-to-date. Please run 'make codegen' - I'll run this with the proposed fix in __exit__, then LGTM! 🚀


def __exit__(self, *_):
\"\"\"Close the service.\"\"\"
self.session.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we add any more teardown logic, I think we should use close

Suggested change
self.session.close()
self.close()


def __exit__(self, *_):
"""Close the service."""
self.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you had it before but didn't regenerate the code?

AchilleAsh and others added 3 commits September 10, 2024 10:19
Signed-off-by: Timothé Perez <achille.ash@gmail.com>
Signed-off-by: Timothé Perez <achille.ash@gmail.com>
Signed-off-by: Elliot Gunton <elliotgunton@gmail.com>
@elliotgunton elliotgunton added type:enhancement A general enhancement semver:minor A change requiring a minor version bump labels Sep 10, 2024
Copy link
Collaborator

@elliotgunton elliotgunton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! 🚀

@elliotgunton elliotgunton enabled auto-merge (squash) September 10, 2024 09:22
@elliotgunton elliotgunton merged commit e09ce27 into argoproj-labs:main Sep 10, 2024
22 of 24 checks passed
@elliotgunton elliotgunton changed the title Add optional Session to Service to be able to reuse connections Create or use user's Session in WorkflowsService to reuse connections Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver:minor A change requiring a minor version bump type:enhancement A general enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants