Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DTrace probes for all message push and pop operations #2295

Merged
merged 3 commits into from
Dec 8, 2017

Conversation

slfritchie
Copy link
Contributor

Existing DTrace probes for message sending & receiving / push&pop are
placed in an ad hoc fashion. As a result, many actor messages are
not visible to DTrace. This patch places new DTrace probes within the
lowest-level code for adding & removing messages from queues.

The same queue functions are used for communication between threads.
New probes are also added to observe inter-thread communication.

I've introduced C preprocessor macros to wrap the message push & pop
operations. The purposes of the macros are:

  • Encode the queue category into the function call: actor or thread.
  • Avoid cluttering the runtime's code by hiding the casting of
    various args to uintptr_t.

Also added are two new scripts in the examples/dtrace directory:

  • mbox-size-all-actor-messages.d: count probe firings for actor message
    pushes & pops.
  • mbox-size-all-thread-messages.d: count probe firings for thread message
    pushes & pops.

Aside from adding new constants (usually via CPP macros), this patch
adds a field to struct scheduler_t to permit reporting of which
scheduler thread fires a DTrace probe.

Stuff I expect to get questions and/or AYFKM from reviewers ^_^ includes:

  • Adding new probes vs. rearranging existing ones. (I don't want to change the meaning of existing probes unless there's extremely wide consensus that it's a good idea.)
  • (Ab)use of CPP macros, e.g., ACTOR_MESSAGEQ_PUSH() and ACTOR_MESSAGEQ_POP() to hide some argument casts & argument additions. (Perhaps could use real functions and hope that inlining would do The Right Thing?)
  • I haven't done any performance profiling or measurement yet. Recommendations are welcome.
  • I've chosen some arbitrary numeric constant values, e.g., Q_TYPE_THREAD and the sched_msg_t enum.
  • Whitespace/indentation and other C style stuff.

I've been using slow-actor.pony for testing on OS X Sierra/10.12.6. Example output of the two new DTrace scripts can be found at https://gist.github.com/slfritchie/69ad5f5d00e5d02b05af9b3f117824fe and at https://gist.github.com/slfritchie/d4d30159864396d302f6f05d7fec0c89

Copy link
Member

@Praetonus Praetonus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit wary regarding the additional parameters for message queues functions. It could result in a performance loss even in non-DTrace builds. The performance impact could be nullified by ifdefing the additional parameters based on the DTrace status.

I've also left some style comments.

}

bool ponyint_actor_messageq_push(scheduler_t* sched,
pony_actor_t* from_actor, pony_actor_t* to_actor,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The style convention for multiline function parameters is two spaces at the start of the line.


while(m != last)
{
DTRACE4(ACTOR_MSG_PUSH, index, m->id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation should be two spaces here.

}

bool ponyint_thread_messageq_push(uintptr_t from_thr, uintptr_t to_thr,
messageq_t* q, pony_msg_t* first, pony_msg_t* last)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two spaces here.

}

bool ponyint_actor_messageq_push_single(scheduler_t* sched,
pony_actor_t* from_actor, pony_actor_t* to_actor,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two spaces here.

while(m != last)
{
DTRACE4(ACTOR_MSG_PUSH, index, m->id,
(uintptr_t)from_actor, (uintptr_t)to_actor);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two spaces here.

@@ -61,8 +62,11 @@ static void handle_queue(asio_backend_t* b)
{
asio_msg_t* msg;

while((msg = (asio_msg_t*)ponyint_messageq_pop(&b->q)) != NULL)
while((msg = (asio_msg_t*)ponyint_thread_messageq_pop(
SPECIAL_THREADID_KQUEUE, &b->q)) != NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two spaces here.

@@ -356,9 +360,11 @@ PONY_API void pony_asio_event_unsubscribe(asio_event_t* ev)

asio_msg_t* msg = (asio_msg_t*)pony_alloc_msg(
POOL_INDEX(sizeof(asio_msg_t)), 0);
msg->msg.id = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above for the ID assignment.

@@ -681,7 +681,8 @@ static void final(pony_ctx_t* ctx, pony_actor_t* self)

do
{
while((msg = ponyint_messageq_pop(&self->q)) != NULL)
while((msg = ponyint_actor_messageq_pop(
ctx->scheduler, self, &self->q)) != NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two spaces here.

@@ -41,6 +46,7 @@ struct scheduler_t
{
// These are rarely changed.
pony_thread_id_t tid;
int index;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be best as an uint32_t, the same type used in the scheduler initialisation loop. Alteratively, this could use the thread ID instead if it makes sense in DTrace. I'm not an expert here so I don't know if would really appropriate.

if(DTRACE_ENABLED(ACTOR_MSG_PUSH))
{
pony_msg_t* m = first;
int index = (sched == NULL) ? -8 : sched->index;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better with a define instead of a magic constant.

@slfritchie
Copy link
Contributor Author

Thanks @Praetonus for the review.

WRT all the whitespace changes, thanks. It hasn't been clear to me how the whitespace rule applied to function arguments. I guessed with two spaces to the right of the function's name. (Except when my editor's default 4 spaces slipped in without me noticing.) I'll fix them up.

WRT performance measurements, I'll start by pestering @SeanTAllen for recommendations.

WRT magic constants that you found, yup, I'll create #define's for them.

@slfritchie slfritchie force-pushed the move-message-probes2 branch 2 times, most recently from ea96786 to 516189e Compare October 31, 2017 15:31
jemc pushed a commit that referenced this pull request Nov 17, 2017
…ime.

A microbenchmark for measuring message passing rates in the Pony runtime.

This microbenchmark executes a sequence of intervals.  During an interval,
1 second long by default, the SyncLeader actor sends an initial
set of ping messages to a static set of Pinger actors.  When a Pinger
actor receives a ping() message, the Pinger will randomly choose
another Pinger to forward the ping() message.  This technique limits
the total number of messages "in flight" in the runtime to avoid
causing unnecessary memory consumption & overhead by the Pony runtime.

This small program has several intended uses:

* Demonstrate use of three types of actors in a Pony program: a timer,
  a SyncLeader, and many Pinger actors.

* As a stress test for Pony runtime development, for example, finding
  deadlocks caused by experiments in the "Generalized runtime
  backpressure" work in pull request
  #2264

* As a stress test for measuring message send & receive overhead for
  experiments in the "Add DTrace probes for all message push and pop
  operations" work in pull request
  #2295
@SeanTAllen
Copy link
Member

@slfritchie this needs to be rebased after I merged the runtime backpressure changes. you might have a few hairy bits not sure. Could you join us for sync next week to discuss this PR in more depth?

@slfritchie
Copy link
Contributor Author

@SeanTAllen Yes, I probably can join the sync on Wednesday. I've gone through the exercise of merging this branch with your backpressure branch a couple of times before, but I haven't tried lately.

@slfritchie
Copy link
Contributor Author

I thought that I'd commented earlier that @JONBRWN had run the Wallaroo "market spread" app and seen no change in throughput or the latencies at each processing stage within the app. (I'm not sure if that comment is no longer shown by GitHub or if I'm imagining having ever written it.) Good news. Note, however, that his test was run before the final squash & rebase.

Also, I've run tests on my MacBook Pro (4 core + HyperThreading) on macOS Sierra 10.12.6. The results are at https://gist.github.com/slfritchie/f9c38c2b938174c88c3c63659a3128eb. The tests all use examples/message-ubench/message-ubench --pingers 320 --initial-pings 5 --ponynoblock --ponythreads=7. Summary:

  1. The commit at the base of this PR sees no performance difference without or with use=dtrace
    • Inter-actor message rate is about 25.7M per second.
  2. The commit at the base of this PR and this PR, both without dtrace, sees no performance difference.
    • Inter-actor message rate is about 25.7M per second.
  3. This PR does introduce a performance change of about -4.8% when compiled with use=dtrace
    • Inter-actor message rate drops to about 24.6M per second.

}

pony_msg_t* ponyint_actor_messageq_pop(scheduler_t* sched,
pony_actor_t* actor, messageq_t* q)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

bool ponyint_messageq_push_single(messageq_t* q, pony_msg_t* first,
pony_msg_t* last)
static bool messageq_push_single(messageq_t* q,
pony_msg_t* first, pony_msg_t* last)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

@slfritchie
Copy link
Contributor Author

Here's a final sanity check. I used the same microbenchmark mentioned in #2295 (comment) to check if the latest rebase on master had a performance impact relative to the base commit on `master. All the machines below were using Ubuntu 16.04 x86_64 except the MacBook.

  • "Real hardware" AMD Phenom(tm) II X6 1045T Processor, 6 core (no HyperThreading), 2.7GHz. This PR lowers throughput by message-ubench by about 4%.
  • EC2's c4.8xlarge (36 vCPUs). No significant difference in throughput.
  • EC2's t2.xlarge (4 vCPUs). No significant difference in throughput.
  • EC2's t2.2xlarge (8 vCPUs). No significant difference in throughput.
  • "Real hardware" MacBook Pro, 2.2 GHz Intel Core i7 (4 core + 4 HyperThread cores). This PR increases throughput by about 1.5%.

I'm really puzzled by the Mac's performance; the 1.5% improvement was quite repeatable. I'm also puzzled by the AMD's results. I tried that experiment first and was disappointed, but it's also repeatable. Does anyone get fired for buying^Wmicrobenchmarking only IBM^WEC2?

Existing DTrace probes for message sending & receiving / push&pop are
placed in an ad hoc fashion.  As a result, many actor messages are
not visible to DTrace.  This patch places new DTrace probes within the
lowest-level code for adding & removing messages from queues.

The same queue functions are used for communication between threads.
New probes are also added to observe inter-thread communication.

I've introduced C preprocessor macros to wrap the message push & pop
operations.  The purposes of the macros are:

* Encode the queue category into the function call: actor or thread.
* Avoid cluttering the runtime's code by hiding the casting of
  various args to uintptr_t.

Also added are two new scripts in the `examples/dtrace` directory:

* mbox-size-all-actor-messages.d: count probe firings for actor message
  pushes & pops.
* mbox-size-all-thread-messages.d: count probe firings for thread message
  pushes & pops.

Aside from adding new constants (usually via CPP macros), this patch
adds a field to `struct scheduler_t` to permit reporting of which
scheduler thread fires a DTrace probe.

Also:

Try @dipinhora's suggestion to avoid unnecessary branches

Via IRC in the #ponylang channel, @dipinhora suggested that
I try to avoid unnecessary branches (and their likely performance
penalty) and get rid of the macros that I'd introduced with this
PR.

I've removed the macros and instead created thread- and actor-
version of the messageq_push, messageq_push_single, and
messageq_pop functions.  The push & push single versions have
a `static bool` common function that allows them to share code.

I've had to make some changes to a couple header files to avoid
chicken-and-the-egg problems when using data types not yet defined

Measurements by Jonathan Brown using the Wallaroo "Market Spread"
app shows no change in throughput rates or latencies at each
stage of the app.
Switch ponyint_actor_messageq_* and ponyint_thread_messageq_*
function argument order to match the argument order of
messageq.c's static functions messageq_push() and
messageq_push_single().  I also changed the pop function
argument order for symmetry: the "optional" tracing-related
arguments are at the end of the argument list.
@slfritchie slfritchie force-pushed the move-message-probes2 branch from 405eba2 to 1bd82f4 Compare December 5, 2017 22:15
@jemc
Copy link
Member

jemc commented Dec 6, 2017

We discussed on the sync call, and I'm good with the way this was done with the ifdefs. It's not pretty, but it's fairly elegant.

However, I want @Praetonus to sign off on this before merging, so he can confirm that his concerns are resolved.

@slfritchie
Copy link
Contributor Author

Thanks, everyone, for your helpful reviews.

@slfritchie slfritchie merged commit 615f28f into ponylang:master Dec 8, 2017
@slfritchie slfritchie deleted the move-message-probes2 branch December 8, 2017 16:20
@SeanTAllen
Copy link
Member

Kickass Scott. This is AWESOME!

@SeanTAllen
Copy link
Member

@slfritchie I think this deserves a CHANGELOG entry, can you add an entry to the CHANGELOG for this?

@slfritchie
Copy link
Contributor Author

@SeanTAllen Oops, sorry, I didn't realize that the CHANGELOG bot thingie was in use by this repo. I'll make a tiny PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants