Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Linux] xenia-cpu-ppc-tests (Rebase of #803) #1339

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

bwrsandman
Copy link
Contributor

@bwrsandman bwrsandman commented Mar 9, 2019

Current Status on master

Currently on master, the ppc emulation tests are not tested on travis. If run manually, they fail due to gcc/clang using different registers than the MSVC compiler. Guest to host and host to guest calls fail.

Fix status of this PR

This PR takes the original commits of #803 which have merge conflicts and rebases them on current master. There was a large refactoring of x64_backend.cc and some work had to be done to resolve them.

On top of that, I also enabled the tests on travis.

Currently, the tests run well on Debug and with the combination of this branch, #1317 and #1322 running roms actually execute code. There is more work to be done before an actual rom is playable but I was able to get guest to host logging calls from RDR. This may be fixed by getting the cpu (not cpu-ppc) tests green in another PR.

Original Description by @uytvbn in #803

Hi,

This pull request fixes #670 (and also partially #356). I've tried to change as little as possible in Windows code.

$ ./build/bin/Linux/Debug/xenia-cpu-ppc-tests --log_file stdout
...
i> 0000293A Total tests: 1457
i> 0000293A Passed: 1457
i> 0000293A Failed: 0

Comments:

  • src/xenia/base/exception_handler_posix.cc - straightforward implementation based almost entirely on Windows version,

  • src/xenia/base/math.h - off-by-one error, ffs contrary to _BitScanForward returns one-based index,

  • src/xenia/base/memory_posix.cc - porting this exactly is probably not possible as there are significant differences in virtual memory handling between Windows and Linux, tried to follow the behavior of Windows implementation as closely as possible,

  • src/xenia/base/string_buffer.cc - fixed non-standard behavior that happens to work with MSVC,

  • src/xenia/cpu/backend/x64/x64_backend.cc - conform to Linux calling convention,

  • src/xenia/cpu/backend/x64/x64_sequences.cc - the only change here that affects (code-wise) MSVC, it forces Linux to pass XMM values by pointer which is expected by codegen (default calling convention on Linux uses XMM registers), I've checked and it doesn't affect code generated by Visual Studio 2017, could probably hide it behind a typedef if you want to feel extra safe - though you might end up with different types for input and output,

  • src/xenia/cpu/hir/value.cc - fixes Value::MulHi for 64 bit values, returned low not high qword,

  • xenia-build - changes binutils path to use the one you get when building in third-party/binutils.

Thanks.

premake5.lua Outdated Show resolved Hide resolved
@bwrsandman bwrsandman force-pushed the linux_cpu branch 11 times, most recently from b477dda to 12b783f Compare March 10, 2019 20:39
@bwrsandman
Copy link
Contributor Author

Build is green on travis and appveyor

@bwrsandman bwrsandman marked this pull request as ready for review March 12, 2019 12:37
@bwrsandman
Copy link
Contributor Author

Branch is ready for review as it is green. Here were my issues that were resolved.

Issues

While Debug passes the tests and works for the most part, running in release has some serious issues which I have trouble fixing. This can be seen in travis tests.
The issue of Release is related to optimized builds. Even debug builds with an added -O1 flag have this issue.

This issue is also present in the original PR (#803).

My best guess at this is that optimized builds might either compile away some thunk call code or that the compiler uses different registers than non-optimized builds.

After doing some searching, I've tracked the obj file which is sensitive to -O1, it is constant_propagation_pass.o. If I compile xenia with -O1 and delete that obj file and recompile with -O0 then the tests run successfully. If I do this with any other file, the issue persists.
Here's the objdump of constant_propagation_pass.o with both -O0 and O1: https://gist.github.com/bwrsandman/d218a2d8590e5b74310cfd6dac8f234e

After more drilling down, it seems to come from some header included in constant_propagation_pass.cc. Clang has a pragma which disables optimization and by searching up the included headers, I found the highest single header for which disabling optimization fixes the problem. This header is src/xenia/cpu/hir/value.h. Something within this header is causing issues.

diff --git a/src/xenia/cpu/hir/value.h b/src/xenia/cpu/hir/value.h
index dcc95ca8..70a159fe 100644
--- a/src/xenia/cpu/hir/value.h
+++ b/src/xenia/cpu/hir/value.h
@@ -109,6 +109,7 @@ class Value {
   Use* AddUse(Arena* arena, Instr* instr);
   void RemoveUse(Use* use);
 
+#pragma clang optimize off
   void set_zero(TypeName new_type) {
     type = new_type;
     flags |= VALUE_IS_CONSTANT;
@@ -175,6 +176,7 @@ class Value {
     flags = other->flags;
     constant.v128 = other->constant.v128;
   }
+#pragma clang optimize on
 
   inline bool IsConstant() const { return !!(flags & VALUE_IS_CONSTANT); }
   bool IsConstantTrue() const {

The issue in within the set_zero() function of in src/xenia/cpu/hir/value.h. The following fixes the issue:

diff --git a/src/xenia/cpu/hir/value.h b/src/xenia/cpu/hir/value.h
index dcc95ca8..ae59a68e 100644
--- a/src/xenia/cpu/hir/value.h
+++ b/src/xenia/cpu/hir/value.h
@@ -109,7 +109,7 @@ class Value {
   Use* AddUse(Arena* arena, Instr* instr);
   void RemoveUse(Use* use);
 
-  void set_zero(TypeName new_type) {
+  void set_zero(TypeName new_type) __attribute__ ((optnone)){
     type = new_type;
     flags |= VALUE_IS_CONSTANT;
     constant.v128.low = constant.v128.high = 0;

The difference in opcode is here https://gist.github.com/bwrsandman/390712c2f29a7645f03e67d9daf8ce81

There are also similar issues happening on travis which I cannot replicate on my Arch setup with clang7. I am able to replicate these crashes on docker ubuntu 16.04 with clang6. Two solutions are available:

  • Do the same work as above to find the offending functions
  • Update travis to use clang7

The issues are similiar:

  • Processor::backend() inlining was causing issues
  • Value::Not() optimization was causing issues.

The issues are not clear and I am affraid that these are just hacks and not fixing the real underlying problems.

I am including this branch for review and for some helpful tips from the community since I haven't had the time to investigate this fully and didn't want to leave my partial work floating in the ether if someone wanted to pick up this particular task.

@DrChat
Copy link
Member

DrChat commented May 17, 2019

So, in that specific function the difference between optimizing and not is the usage of AVX instructions.
What were the issues, exactly? Crashes? Clobbering?

mov(qword[rsp + offsetof(StackLayout::Thunk, r[2])], r12);
mov(qword[rsp + offsetof(StackLayout::Thunk, r[3])], r13);
mov(qword[rsp + offsetof(StackLayout::Thunk, r[4])], r14);
mov(qword[rsp + offsetof(StackLayout::Thunk, r[5])], r15);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N.B: SysV does not have nonvolatile XMM registers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the suggested way of resolving this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I think this was a note for myself. I might suggest you just drop a comment here and state what I said, just so nobody goes "wut, why aren't xmm registers being saved"

@bwrsandman
Copy link
Contributor Author

So, in that specific function the difference between optimizing and not is the usage of AVX instructions.
What were the issues, exactly? Crashes? Clobbering?

Clang 6 would vectorize the arguments inside a different register in Release than in Debug. This led to inconsistent behavior depending on the optimization level.

@bwrsandman
Copy link
Contributor Author

Rebased onto master and added the comment as per @DrChat request

@bwrsandman
Copy link
Contributor Author

bwrsandman commented Aug 13, 2019

@Margen67 cpu would be a better tag than test since the tests are testing the ppc cpu emulation :P

uytvbn and others added 13 commits May 17, 2021 10:54
va_list are not guarenteed to maintain their values after being used.
With clang on Linux, args is undefined after fetching length and will print
"(null)".
Copy args into another va_list before getting length to prevent this.
Add tests.
Disable optimization on set zero to prevent clang from vectorizing the
assigment to zero which would use different registers than expected.
With -O0: RAX. With -O1: RDI.
Disable optimization on set zero to prevent clang from vectorizing the
assigment to zero which would use different registers than expected.
Disable inlining on backend() which causes ppc issues on clang 7 in release
builds.
JoelLinn added a commit to JoelLinn/xenia that referenced this pull request May 21, 2021
Shim exports are called from GuestToHostThunk which dictates the calling
convention. The default system calling convention is different depending
on OS (Windows vs. everything else) and architecture. PR xenia-project#1339 addresses
this for x64 Linux.
There is no reason for explicit `__cdecl`. Also, it is not available in
GCC. We could use `__attribute__((ms_abi))` or
`__attribute__((sysv_abi))` but that just adds complexity.
JoelLinn added a commit to JoelLinn/xenia that referenced this pull request May 22, 2021
Shim exports are called from GuestToHostThunk which dictates the calling
convention. The default system calling convention is different depending
on OS (Windows vs. everything else) and architecture. PR xenia-project#1339 addresses
this for x64 Linux.
There is no reason for explicit `__cdecl`. Also, it is not available in
GCC. We could use `__attribute__((ms_abi))` or
`__attribute__((sysv_abi))` but that just adds complexity.
JoelLinn added a commit to JoelLinn/xenia that referenced this pull request May 22, 2021
Shim exports are called from GuestToHostThunk which dictates the calling
convention. The default system calling convention is different depending
on OS (Windows vs. everything else) and architecture. PR xenia-project#1339 addresses
this for x64 Linux.
There is no reason for explicit `__cdecl`. Also, it is not available in
GCC. We could use `__attribute__((ms_abi))` or
`__attribute__((sysv_abi))` but that just adds complexity.
JoelLinn added a commit to JoelLinn/xenia that referenced this pull request May 24, 2021
Shim exports are called from GuestToHostThunk which dictates the calling
convention. The default system calling convention is different depending
on OS (Windows vs. everything else) and architecture. PR xenia-project#1339 addresses
this for x64 Linux.
There is no reason for explicit `__cdecl`. Also, it is not available in
GCC. We could use `__attribute__((ms_abi))` or
`__attribute__((sysv_abi))` but that just adds complexity.
JoelLinn added a commit to JoelLinn/xenia that referenced this pull request May 25, 2021
Shim exports are called from GuestToHostThunk which dictates the calling
convention. The default system calling convention is different depending
on OS (Windows vs. everything else) and architecture. PR xenia-project#1339 addresses
this for x64 Linux.
There is no reason for explicit `__cdecl`. Also, it is not available in
GCC. We could use `__attribute__((ms_abi))` or
`__attribute__((sysv_abi))` but that just adds complexity.
JoelLinn added a commit to JoelLinn/xenia that referenced this pull request May 28, 2021
Shim exports are called from GuestToHostThunk which dictates the calling
convention. The default system calling convention is different depending
on OS (Windows vs. everything else) and architecture. PR xenia-project#1339 addresses
this for x64 Linux.
There is no reason for explicit `__cdecl`. Also, it is not available in
GCC. We could use `__attribute__((ms_abi))` or
`__attribute__((sysv_abi))` but that just adds complexity.
JoelLinn added a commit to JoelLinn/xenia that referenced this pull request Jun 3, 2021
Shim exports are called from GuestToHostThunk which dictates the calling
convention. The default system calling convention is different depending
on OS (Windows vs. everything else) and architecture. PR xenia-project#1339 addresses
this for x64 Linux.
There is no reason for explicit `__cdecl`. Also, it is not available in
GCC. We could use `__attribute__((ms_abi))` or
`__attribute__((sysv_abi))` but that just adds complexity.
gibbed pushed a commit that referenced this pull request Jun 3, 2021
Shim exports are called from GuestToHostThunk which dictates the calling
convention. The default system calling convention is different depending
on OS (Windows vs. everything else) and architecture. PR #1339 addresses
this for x64 Linux.
There is no reason for explicit `__cdecl`. Also, it is not available in
GCC. We could use `__attribute__((ms_abi))` or
`__attribute__((sysv_abi))` but that just adds complexity.
@@ -671,7 +671,7 @@ EMITTER_OPCODE_TABLE(OPCODE_VECTOR_SUB, VECTOR_SUB);
// OPCODE_VECTOR_SHL
// ============================================================================
template <typename T, std::enable_if_t<std::is_integral<T>::value, int> = 0>
static __m128i EmulateVectorShl(void*, __m128i src1, __m128i src2) {
static __m128i EmulateVectorShl(void*, __m128i& src1, __m128i& src2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const also, by the way.

@Triang3l
Copy link
Member

Triang3l commented Jul 4, 2022

The exception handler part has been merged into the main branch — thanks @uytvbn! 💚 I've also added RIP updating after an exception handler as that's needed for MMIO (it executes the mov manually and advances the instruction pointer). One additional change that I've made though is questionable — the original commit had bit 1 << 1 of the error code for a SIGSEGV interpreted as "read access violation" when it's set. However, asm/trap_pf.h (not doing an #include of it though as it's not available on Android apparently) says that it's the opposite — X86_PF_WRITE is set to 1 for a write page fault. Which behavior is correct in this situation?

RodoMa92 added a commit to RodoMa92/xenia-canary that referenced this pull request Jan 17, 2025
…#2228 back to canary builds. This fixes various emulation crashes caused from different calling conventions on System-V ABI platforms compared to the Windows standard.
RodoMa92 added a commit to RodoMa92/xenia-canary that referenced this pull request Jan 18, 2025
…#2228 back to canary builds.

This fixes various emulation crashes caused from different calling conventions on System-V ABI platforms compared to the Windows standard.
RodoMa92 added a commit to RodoMa92/xenia-canary that referenced this pull request Jan 19, 2025
… for Linux

Upstream changes made from xenia-project#1339 and xenia-project#2228 back to canary builds. This fixes various emulation crashes caused from different calling conventions on System-V ABI platforms compared to Windows standard.
Gliniak pushed a commit to xenia-canary/xenia-canary that referenced this pull request Jan 19, 2025
… for Linux

Upstream changes made from xenia-project#1339 and xenia-project#2228 back to canary builds. This fixes various emulation crashes caused from different calling conventions on System-V ABI platforms compared to Windows standard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

xenia-cpu-ppc-tests Linux Support
9 participants