Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Fix UserData usage for XPTIScope #11091

Merged
merged 7 commits into from
Sep 27, 2023

Conversation

KseniyaTikhomirova
Copy link
Contributor

@KseniyaTikhomirova KseniyaTikhomirova commented Sep 6, 2023

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>
@KseniyaTikhomirova KseniyaTikhomirova requested a review from a team as a code owner September 6, 2023 15:51
@KseniyaTikhomirova KseniyaTikhomirova marked this pull request as draft September 7, 2023 08:57
Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>
Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>
@KseniyaTikhomirova KseniyaTikhomirova marked this pull request as ready for review September 13, 2023 12:17
@KseniyaTikhomirova
Copy link
Contributor Author

hip failure is reported here #10460

@@ -67,7 +67,7 @@ event queue_impl::memset(const std::shared_ptr<detail::queue_impl> &Self,
// information
XPTIScope PrepareNotify((void *)this,
(uint16_t)xpti::trace_point_type_t::node_create,
SYCL_MEM_ALLOC_STREAM_NAME, "queue.memset()");
SYCL_STREAM_NAME, "memory_transfer_node");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure "memory_transfer_node" is clear as we lose information that it was a memset() operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memset and memcpy has different metadata for source: pointer or value. If it is needed we could add new node_create user_data type for memset and declare it at the design doc. Anyway - current "queue.memset" usage is invalid and should be fixed.

@@ -143,7 +143,7 @@ event queue_impl::memcpy(const std::shared_ptr<detail::queue_impl> &Self,
// pointer.
XPTIScope PrepareNotify((void *)this,
(uint16_t)xpti::trace_point_type_t::node_create,
SYCL_MEM_ALLOC_STREAM_NAME, "queue.memcpy()");
SYCL_STREAM_NAME, "memory_transfer_node");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment - we seem to have lost information. Are memset/memcpy being recorded elsewhere, perhaps in metadata? I am assuming this was a cosmetic change and not related to HIP failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not related to the HIP. Customer has reported the issue that after code_location enabling node_create signal for queue methods started to report different data and I found a few gaps here, this implementation is not aligned with design doc for node_create trace.

if (MTraceType == (uint16_t)xpti::trace_point_type_t::graph_create ||
MTraceType == (uint16_t)xpti::trace_point_type_t::node_create ||
MTraceType == (uint16_t)xpti::trace_point_type_t::edge_create)
MTP = new TracePoint(TData.fileName(), TData.functionName(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the contract regarding code_location has been changed, this will work for this use-case. We may need to look at how code_location is being used and may have to address it for all cases.

Context:

  • Previous code_location had all attributes set to NULL when NDEBUG was used and the previous code here handled it.
  • New definition of code_location only zeroes out file_location when NDEBUG is used, which changed the contract code_location had with all previous use

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"may have to address it for all cases."
do you mean to handle the case with empty code_location data? Please correct me if I am wrong, I didn't understand why UserData usage is better there because it will be the same for all queue.memset calls and will not bring any benefits. If code_location is not available the only hope to distinguish calls is CodePtr which is still used, doesn't it?

Copy link
Contributor

@tovinkere tovinkere Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the metadata includes the node_create was for memset or memcpy, we are good here.

Regarding the change in code_location that now has the function name at all times, we may need to look at all uses of FunctionName and address it in a consistent manner. The above example is a case were it was trying to handle the case when all attributes of code_location were set to nullptr, and there may be more instances where this cleanup can be applied.

Copy link
Contributor Author

@KseniyaTikhomirova KseniyaTikhomirova Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tovinkere got it, thanks for the clarification. I walked through the SYCL RT code today and it looks like we do not have any code_location absence WAs left.

Copy link
Contributor

@tovinkere tovinkere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@againull
Copy link
Contributor

@KseniyaTikhomirova HIP failure doesn't look related to your changes. But I am not sure about failure in linux gen12 testing:


Failed Tests (1):
SYCL :: XPTI/basic_event_collection_linux.cpp
Can it be related to your changes?

@KseniyaTikhomirova
Copy link
Contributor Author

@KseniyaTikhomirova HIP failure doesn't look related to your changes. But I am not sure about failure in linux gen12 testing:

Failed Tests (1): SYCL :: XPTI/basic_event_collection_linux.cpp Can it be related to your changes?

seems to be related, will fix today

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>
@KseniyaTikhomirova
Copy link
Contributor Author

@KseniyaTikhomirova HIP failure doesn't look related to your changes. But I am not sure about failure in linux gen12 testing:
Failed Tests (1): SYCL :: XPTI/basic_event_collection_linux.cpp Can it be related to your changes?

seems to be related, will fix today

done. previously node_create & edge_create seems to have one event and metadata was mixed.

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>
@KseniyaTikhomirova
Copy link
Contributor Author

KseniyaTikhomirova commented Sep 26, 2023

failures on win are the same as reported here #11224

@aelovikov-intel
Copy link
Contributor

aelovikov-intel commented Sep 26, 2023

failures on win are the same as reported here #11224

From Windows log:

warning: reached timeout, skipping remaining tests

Also, this PR's CI run seems to have had the exact problem we've been struggling with our Win CI.

@KseniyaTikhomirova KseniyaTikhomirova temporarily deployed to WindowsCILock September 27, 2023 12:22 — with GitHub Actions Inactive
@KseniyaTikhomirova KseniyaTikhomirova temporarily deployed to WindowsCILock September 27, 2023 12:45 — with GitHub Actions Inactive
@sergey-semenov sergey-semenov merged commit f92e432 into intel:sycl Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants