[Linux] xenia-cpu-ppc-tests (Rebase of #803) #1339

bwrsandman · 2019-03-09T15:16:32Z

Current Status on master

Currently on master, the ppc emulation tests are not tested on travis. If run manually, they fail due to gcc/clang using different registers than the MSVC compiler. Guest to host and host to guest calls fail.

Fix status of this PR

This PR takes the original commits of #803 which have merge conflicts and rebases them on current master. There was a large refactoring of x64_backend.cc and some work had to be done to resolve them.

On top of that, I also enabled the tests on travis.

Currently, the tests run well on Debug and with the combination of this branch, #1317 and #1322 running roms actually execute code. There is more work to be done before an actual rom is playable but I was able to get guest to host logging calls from RDR. This may be fixed by getting the cpu (not cpu-ppc) tests green in another PR.

Original Description by @uytvbn in #803

Hi,

This pull request fixes #670 (and also partially #356). I've tried to change as little as possible in Windows code.
$ ./build/bin/Linux/Debug/xenia-cpu-ppc-tests --log_file stdout
...
i> 0000293A Total tests: 1457
i> 0000293A Passed: 1457
i> 0000293A Failed: 0
Comments:

src/xenia/base/exception_handler_posix.cc - straightforward implementation based almost entirely on Windows version,

src/xenia/base/math.h - off-by-one error, ffs contrary to _BitScanForward returns one-based index,

src/xenia/base/memory_posix.cc - porting this exactly is probably not possible as there are significant differences in virtual memory handling between Windows and Linux, tried to follow the behavior of Windows implementation as closely as possible,

src/xenia/base/string_buffer.cc - fixed non-standard behavior that happens to work with MSVC,

src/xenia/cpu/backend/x64/x64_backend.cc - conform to Linux calling convention,

src/xenia/cpu/backend/x64/x64_sequences.cc - the only change here that affects (code-wise) MSVC, it forces Linux to pass XMM values by pointer which is expected by codegen (default calling convention on Linux uses XMM registers), I've checked and it doesn't affect code generated by Visual Studio 2017, could probably hide it behind a typedef if you want to feel extra safe - though you might end up with different types for input and output,

src/xenia/cpu/hir/value.cc - fixes Value::MulHi for 64 bit values, returned low not high qword,

xenia-build - changes binutils path to use the one you get when building in third-party/binutils.

Thanks.

premake5.lua

bwrsandman · 2019-03-10T21:35:31Z

Build is green on travis and appveyor

bwrsandman · 2019-03-12T12:39:16Z

Branch is ready for review as it is green. Here were my issues that were resolved.

Issues

While Debug passes the tests and works for the most part, running in release has some serious issues which I have trouble fixing. This can be seen in travis tests.
The issue of Release is related to optimized builds. Even debug builds with an added -O1 flag have this issue.

This issue is also present in the original PR (#803).

My best guess at this is that optimized builds might either compile away some thunk call code or that the compiler uses different registers than non-optimized builds.

After doing some searching, I've tracked the obj file which is sensitive to -O1, it is constant_propagation_pass.o. If I compile xenia with -O1 and delete that obj file and recompile with -O0 then the tests run successfully. If I do this with any other file, the issue persists.
Here's the objdump of constant_propagation_pass.o with both -O0 and O1: https://gist.github.com/bwrsandman/d218a2d8590e5b74310cfd6dac8f234e

After more drilling down, it seems to come from some header included in constant_propagation_pass.cc. Clang has a pragma which disables optimization and by searching up the included headers, I found the highest single header for which disabling optimization fixes the problem. This header is src/xenia/cpu/hir/value.h. Something within this header is causing issues.
diff --git a/src/xenia/cpu/hir/value.h b/src/xenia/cpu/hir/value.h
index dcc95ca8..70a159fe 100644
--- a/src/xenia/cpu/hir/value.h
+++ b/src/xenia/cpu/hir/value.h
@@ -109,6 +109,7 @@ class Value {
   Use* AddUse(Arena* arena, Instr* instr);
   void RemoveUse(Use* use);
 
+#pragma clang optimize off
   void set_zero(TypeName new_type) {
     type = new_type;
     flags |= VALUE_IS_CONSTANT;
@@ -175,6 +176,7 @@ class Value {
     flags = other->flags;
     constant.v128 = other->constant.v128;
   }
+#pragma clang optimize on
 
   inline bool IsConstant() const { return !!(flags & VALUE_IS_CONSTANT); }
   bool IsConstantTrue() const {
The issue in within the set_zero() function of in src/xenia/cpu/hir/value.h. The following fixes the issue:
diff --git a/src/xenia/cpu/hir/value.h b/src/xenia/cpu/hir/value.h
index dcc95ca8..ae59a68e 100644
--- a/src/xenia/cpu/hir/value.h
+++ b/src/xenia/cpu/hir/value.h
@@ -109,7 +109,7 @@ class Value {
   Use* AddUse(Arena* arena, Instr* instr);
   void RemoveUse(Use* use);
 
-  void set_zero(TypeName new_type) {
+  void set_zero(TypeName new_type) __attribute__ ((optnone)){
     type = new_type;
     flags |= VALUE_IS_CONSTANT;
     constant.v128.low = constant.v128.high = 0;
The difference in opcode is here https://gist.github.com/bwrsandman/390712c2f29a7645f03e67d9daf8ce81

There are also similar issues happening on travis which I cannot replicate on my Arch setup with clang7. I am able to replicate these crashes on docker ubuntu 16.04 with clang6. Two solutions are available:

Do the same work as above to find the offending functions

Update travis to use clang7

The issues are similiar:

Processor::backend() inlining was causing issues

Value::Not() optimization was causing issues.

The issues are not clear and I am affraid that these are just hacks and not fixing the real underlying problems.

I am including this branch for review and for some helpful tips from the community since I haven't had the time to investigate this fully and didn't want to leave my partial work floating in the ether if someone wanted to pick up this particular task.

DrChat · 2019-05-17T01:48:52Z

So, in that specific function the difference between optimizing and not is the usage of AVX instructions.
What were the issues, exactly? Crashes? Clobbering?

DrChat · 2019-05-17T02:04:02Z

src/xenia/cpu/backend/x64/x64_backend.cc

+  mov(qword[rsp + offsetof(StackLayout::Thunk, r[2])], r12);
+  mov(qword[rsp + offsetof(StackLayout::Thunk, r[3])], r13);
+  mov(qword[rsp + offsetof(StackLayout::Thunk, r[4])], r14);
+  mov(qword[rsp + offsetof(StackLayout::Thunk, r[5])], r15);


N.B: SysV does not have nonvolatile XMM registers.

What is the suggested way of resolving this?

Sorry - I think this was a note for myself. I might suggest you just drop a comment here and state what I said, just so nobody goes "wut, why aren't xmm registers being saved"

bwrsandman · 2019-05-20T12:17:20Z

So, in that specific function the difference between optimizing and not is the usage of AVX instructions.
What were the issues, exactly? Crashes? Clobbering?

Clang 6 would vectorize the arguments inside a different register in Release than in Debug. This led to inconsistent behavior depending on the optimization level.

bwrsandman · 2019-07-10T18:56:03Z

Rebased onto master and added the comment as per @DrChat request

bwrsandman · 2019-08-13T20:16:29Z

@Margen67 cpu would be a better tag than test since the tests are testing the ppc cpu emulation :P

va_list are not guarenteed to maintain their values after being used. With clang on Linux, args is undefined after fetching length and will print "(null)". Copy args into another va_list before getting length to prevent this. Add tests.

Disable optimization on set zero to prevent clang from vectorizing the assigment to zero which would use different registers than expected. With -O0: RAX. With -O1: RDI.

Disable optimization on set zero to prevent clang from vectorizing the assigment to zero which would use different registers than expected.

Disable inlining on backend() which causes ppc issues on clang 7 in release builds.

Shim exports are called from GuestToHostThunk which dictates the calling convention. The default system calling convention is different depending on OS (Windows vs. everything else) and architecture. PR xenia-project#1339 addresses this for x64 Linux. There is no reason for explicit `__cdecl`. Also, it is not available in GCC. We could use `__attribute__((ms_abi))` or `__attribute__((sysv_abi))` but that just adds complexity.

Shim exports are called from GuestToHostThunk which dictates the calling convention. The default system calling convention is different depending on OS (Windows vs. everything else) and architecture. PR #1339 addresses this for x64 Linux. There is no reason for explicit `__cdecl`. Also, it is not available in GCC. We could use `__attribute__((ms_abi))` or `__attribute__((sysv_abi))` but that just adds complexity.

Triang3l · 2022-07-01T22:18:15Z

src/xenia/cpu/backend/x64/x64_seq_vector.cc

@@ -671,7 +671,7 @@ EMITTER_OPCODE_TABLE(OPCODE_VECTOR_SUB, VECTOR_SUB);
 // OPCODE_VECTOR_SHL
 // ============================================================================
 template <typename T, std::enable_if_t<std::is_integral<T>::value, int> = 0>
-static __m128i EmulateVectorShl(void*, __m128i src1, __m128i src2) {
+static __m128i EmulateVectorShl(void*, __m128i& src1, __m128i& src2) {


const also, by the way.

Triang3l · 2022-07-04T22:41:02Z

The exception handler part has been merged into the main branch — thanks @uytvbn! 💚 I've also added RIP updating after an exception handler as that's needed for MMIO (it executes the mov manually and advances the instruction pointer). One additional change that I've made though is questionable — the original commit had bit 1 << 1 of the error code for a SIGSEGV interpreted as "read access violation" when it's set. However, asm/trap_pf.h (not doing an #include of it though as it's not available on Android apparently) says that it's the opposite — X86_PF_WRITE is set to 1 for a write page fault. Which behavior is correct in this situation?

…#2228 back to canary builds. This fixes various emulation crashes caused from different calling conventions on System-V ABI platforms compared to the Windows standard.

… for Linux Upstream changes made from xenia-project#1339 and xenia-project#2228 back to canary builds. This fixes various emulation crashes caused from different calling conventions on System-V ABI platforms compared to Windows standard.

DrChat reviewed Mar 9, 2019

View reviewed changes

premake5.lua Outdated Show resolved Hide resolved

bwrsandman force-pushed the linux_cpu branch 11 times, most recently from b477dda to 12b783f Compare March 10, 2019 20:39

bwrsandman force-pushed the linux_cpu branch from 12b783f to 7bff619 Compare March 12, 2019 12:36

bwrsandman marked this pull request as ready for review March 12, 2019 12:37

bwrsandman force-pushed the linux_cpu branch from 7bff619 to 1898494 Compare March 12, 2019 12:41

bwrsandman mentioned this pull request Mar 12, 2019

Linux misc fixes #1322

Closed

bwrsandman force-pushed the linux_cpu branch from 1898494 to 5acb568 Compare May 7, 2019 12:36

gibbed added the platform-linux label May 12, 2019

DrChat reviewed May 17, 2019

View reviewed changes

bwrsandman force-pushed the linux_cpu branch from 5acb568 to c66b0b4 Compare July 10, 2019 18:55

bwrsandman force-pushed the linux_cpu branch from c66b0b4 to 21801d9 Compare July 13, 2019 18:30

Margen67 added the tests label Jul 19, 2019

bwrsandman force-pushed the linux_cpu branch from 40aa027 to 1a9adec Compare July 25, 2019 23:10

bwrsandman force-pushed the linux_cpu branch from 1a9adec to 277d7ec Compare August 13, 2019 19:58

uytvbn and others added 13 commits May 17, 2021 10:54

[Linux] Fix binutils path for gentests

6252e28

[Linux] Implement exception handler

9a16a84

[string] Remove reuse of va_list in AppendVarargs

03c0c8d

va_list are not guarenteed to maintain their values after being used. With clang on Linux, args is undefined after fetching length and will print "(null)". Copy args into another va_list before getting length to prevent this. Add tests.

[Linux] Implement thunk generation

999ec74

[Linux] Force passing XMM values by pointer

bb9820a

[Linux] Fix Value::MulHi

c512e63

[Linux] Fix Value::set_zero() on release

f40e580

Disable optimization on set zero to prevent clang from vectorizing the assigment to zero which would use different registers than expected. With -O0: RAX. With -O1: RDI.

[Linux] Fix Value::Not on release

a72cc09

Disable optimization on set zero to prevent clang from vectorizing the assigment to zero which would use different registers than expected.

[Linux] Fix Processor::backend() on release

96a9199

Disable inlining on backend() which causes ppc issues on clang 7 in release builds.

[travis] Enable ppc gentest and tests

5b1b34e

[cpu] Add linux registers to CallExtern

04e2c69

[Linux] Add reminder note about assumptions

440c34a

[cpu] Test cpu-test on travis

18f121f

bwrsandman force-pushed the linux_cpu branch from ec03532 to 18f121f Compare May 17, 2021 14:55

amessier mentioned this pull request May 13, 2022

Support for SysV ABI in x64 emitter (and other fixes) #2018

Open

Triang3l requested changes Jul 1, 2022

View reviewed changes

RodoMa92 mentioned this pull request Jan 17, 2025

[cpu] Fix System-V ABI guest to host and host to guest thunk emitters for Linux xenia-canary/xenia-canary#506

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Linux] xenia-cpu-ppc-tests (Rebase of #803) #1339

[Linux] xenia-cpu-ppc-tests (Rebase of #803) #1339

bwrsandman commented Mar 9, 2019 •

edited

Loading

bwrsandman commented Mar 10, 2019

bwrsandman commented Mar 12, 2019

Issues

DrChat commented May 17, 2019

DrChat May 17, 2019

bwrsandman May 20, 2019

DrChat Jun 30, 2019

bwrsandman commented May 20, 2019

bwrsandman commented Jul 10, 2019

bwrsandman commented Aug 13, 2019 •

edited

Loading

Triang3l Jul 1, 2022

Triang3l commented Jul 4, 2022 •

edited

Loading

[Linux] xenia-cpu-ppc-tests (Rebase of #803) #1339

Are you sure you want to change the base?

[Linux] xenia-cpu-ppc-tests (Rebase of #803) #1339

Conversation

bwrsandman commented Mar 9, 2019 • edited Loading

Current Status on master

Fix status of this PR

Original Description by @uytvbn in #803

bwrsandman commented Mar 10, 2019

bwrsandman commented Mar 12, 2019

Issues

DrChat commented May 17, 2019

DrChat May 17, 2019

Choose a reason for hiding this comment

bwrsandman May 20, 2019

Choose a reason for hiding this comment

DrChat Jun 30, 2019

Choose a reason for hiding this comment

bwrsandman commented May 20, 2019

bwrsandman commented Jul 10, 2019

bwrsandman commented Aug 13, 2019 • edited Loading

Triang3l Jul 1, 2022

Choose a reason for hiding this comment

Triang3l commented Jul 4, 2022 • edited Loading

bwrsandman commented Mar 9, 2019 •

edited

Loading

bwrsandman commented Aug 13, 2019 •

edited

Loading

Triang3l commented Jul 4, 2022 •

edited

Loading