-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arm64 macos version , seems to integrate with built in opencl driver but not hardware #3047
Comments
Metal is its own API and development ecosystem made by Apple for their custom silicon, similar to OpenGL/CL, Vulkan, etc - as you said yourself, it doesn't work because it's not supported and probably won't be any time soon. |
It worked sometime, but maybe not RandomX. |
I always build with the bundled deps everywhere to avoid untested territory, so that may be why I never saw problems. |
Thanks for the additional information. I’m by-the-book using for development / compilation tools and various video drivers on Linux . It’s the Mac part where I went “well this isn’t expected “ and wasn’t sure how to proceed. I’m effectively cpu mining with an m1 compiled ethminer, that does not make the cpu or laptop get hot. At all. whereas xmrig makes it get hot. I appreciate you even bothered to offer the binary. I just could not (and can’t) figure out why ethminer can do 2mh/s and stay cool while it burns up doing 2000h/s on xmrig, it is using the system ram to calculate a job, as though it were basically video ram , and does so very quickly. From there it’s comparable to a 12 year old i5 I have. Would anyone know the general direction I might want to look in, for rolling or solving this on my own? |
Well different algorithms are... different. And 2MH/s is kinda slow for ethash. And RandomX (where I assume you are seeing the 2000H/s) is a lot more difficult and hits the memory really hard (as part of being anti-GPU and anti-ASIC). It's pretty terrible on everything other than a Ryzen since they have the best integrated memory controller. Except the Ryzens with iGPU then the memory controller is wasting time handling the GPU... similar to the M1. DDR5 is also worse than DDR4 for purposes of low latency. Also there is nothing special in the M1 support it's just generic ARM64 compilation, with polyfills that convert the x86_64 intrinsics (AVX2, AES-NI) into ARM64 versions (NEON, crypto-extensions, etc), which can't be an optimal way of making an M1 kernel. But, Apple wants people to use Obj-C or Swift to get into any of their proprietary CPU (or GPU for that matter) features so it's probably almost as good as it can be for C++ code. Unless of course someone that knows how to write a mining kernel and also knows ARM64 assembly tricks shows up, maybe they could find some optimizations and shortcuts. Then again they could work on it and make it perfect and the memory controller could still be the bottleneck after all that effort. Or the memory you can't change or tweak at all. |
thats actually a pretty fantastic answer , thank you for summarizing it. call me an old dog, trying to learn new tricks on a new architecture that for now is just a black box to me. (ive had to cope with 8088 -> x86 -> ppc -> modern intel, to this for other reasons than "muh coins"). even if this is a non-issue for xmrig, thank you for this answer |
that ALSO answers why the ryzen boxes generate the most tangible coinage for me at the same time!!! |
Same problem with me. Is there any resolution? |
Apple could better support OpenCL (they won't), or you could run antiques like HighSierra with an nvidia card which was the last CUDA that worked (bypass OpenCL and Metal completely). And still rx/0 will be slow because it's designed explicitly to be bad on GPUs (support exists only to prove that it's slow). So rx/0 not working on Apple is not really a loss so nobody is going to expend much effort to fix it. Other algorithms tend to work somewhat, at least until Apple drops OpenCL in an upcoming OS (another reason to not put much effort into it, Apple has said it will be killed off). They have not updated it in a long time, it is abandoned and unsupported but still included. Or, someone could write mining kernels in Metal which I assume is not possible or some would exist by now, or maybe since it is not cross-platform at all and Apple hardware is generally more expensive nobody would build a mining farm with Apple products. Also since x86-64 macs can just dualboot Linux and then work fine, the target audience for Metal mining would be AppleSilicon chips which is an even smaller amount of people that would be helped by all the effort (hobby miners that happen to already have an M1 unit for other uses, etc). Not sure any mining devs have Apple stuff so dev and testing would be difficult. I had remote access to an M1 unit graciously via another user here, which was how what does work works. But I'm also not a mining kernel expert as far as the guts of how they work, I only fiddled with the OpenCL support so that it would stop trying to use AMD extensions when unavailable (many places in the code would see card name containing "AMD" and then assume it was 2.0+extensions, but that's a bad assumption on Apple), or OpenCL 2.0 features (Apple OpenCL only supports 1.2, and pretty tight to spec while other "1.2" stacks allow some slop or had partial 2.0 features) and then some algorithms worked okay. RandomX was written and tested when most other OpenCL implementations were well beyond 1.2 and might have internal logic that depends on it, or at least the things it uses confuse CL2Metal, therefore the realtime-compilation crashes. Again since I do not understand what the kernel is doing I can not identify where things are not strictly 1.2 base-spec compliant, and never got any RandomX family algo to work. (LP)DDR5 memory is also slower overall latency than DDR4 (even at higher clock rates) so to some degree that holds the hashrate back pretty bad unless the algorithm is not as memory latency sensitive (older cryptonight?). The only good thing about the AppleSilicon design is the low-power which gives a pretty good hash-per-watt although then also a low overall hashrate per unit, and the hash-per-rig is low (you spend what you'd save by the low-power to increase parallelism, still better off building Ryzens, which are also not terrible at hash-per-watt compared to Intel although worse than AppleSilicon -- one $600 Ryzen rig does the hashrate of about five $600 M1 units, at not much more watts). |
Am wondering if progress was ever made incorporating Metal support into XMRig. Am XMRig now on an M3 MacBook Pro, and absent Metal support am wondering how much potential 'performance upside' is being left on the table. (On XMRig version 6.22.2 I note that both OpenCL and CUDA show as 'disabled' in the ARMv 8 64-bit, OS X version. Can't see any mention of Metal in the support documentation either) |
I sort of looked for similar functions for a while and then gave up, I also don't have any actual Apple hardware to fiddle with anyway. |
Describe the bug
arm64 macos version works fine as a cpu miner, but is partially recognized by opencl, why wouldnt you optimize xmrig to function in that manner? (it gets extremely hot with xmrig, but compiled ethminer can run 24/7 here and not afffect temperature at all)
To Reproduce
enable opencl
Expected behavior
I expected it to not work, I suppose, due to current support being for nvidia and amd rather than Metal
Is there any reason it can't be used?
Required data
OSX ARM64 M1 , Big Sur 11.6.5
L2:16.0 MB L3:0.0 MB 8C/8T NUMA:1
[2022-05-10 09:19:23.887] net use pool gulf.moneroocean.stream:10001 149.28.115.221
[2022-05-10 09:19:23.888] net new job from gulf.moneroocean.stream:10001 diff 1000 algo rx/0 height 2620504 (12 tx)
[2022-05-10 09:19:23.888] cpu use argon2 implementation default
[2022-05-10 09:19:23.888] randomx init dataset algo rx/0 (8 threads) seed e3da49220880d859...
[2022-05-10 09:19:23.888] randomx allocated 2336 MB (2080+256) huge pages 0% 0/1168 +JIT (1 ms)
[2022-05-10 09:19:28.799] randomx dataset ready (4910 ms)
[2022-05-10 09:19:28.799] cpu use profile rx (8 threads) scratchpad 2048 KB
[2022-05-10 09:19:28.799] cpu READY threads 8/8 (8) huge pages 0% 0/8 memory 16384 KB (1 ms)
[2022-05-10 09:19:28.800] opencl use profile rx (1 thread) scratchpad 2048 KB
| # | GPU | BUS ID | INTENSITY | WSIZE | MEMORY | NAME
| 0 | 0 | n/a | 128 | 8 | 256 | Apple M1
[2022-05-10 09:19:28.801] opencl GPU #0 compiling...
UNSUPPORTED (log once): buildComputeProgram: cl2Metal failed
[2022-05-10 09:19:29.045] opencl error CL_BUILD_PROGRAM_FAILURE when calling clBuildProgram
BUILD LOG:
Compilation failed:
program_source:1788:1: warning: comparison of integers of different signs: 'uint32_t' (aka 'unsigned int') and 'int'
update_max(latency,(last_memory_op_slot+WORKERS_PER_HASH)/WORKERS_PER_HASH);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:1399:56: note: expanded from macro 'update_max'
#define update_max(value, next_value) do { if ((value) < (next_value)) (value) = (next_value); } while (0)
~~~~~ ^ ~~~~~~~~~~
program_source:1811:1: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'unsigned int'
update_max(first_allowed_slot,latency*WORKERS_PER_HASH);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:1399:56: note: expanded from macro 'update_max'
#define update_max(value, next_value) do { if ((value) < (next_value)) (value) = (next_value); } while (0)
~~~~~ ^ ~~~~~~~~~~
program_source:1815:1: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'unsigned int'
update_max(first_allowed_slot,get_byte(is_fp?registerReadCycleFP:registerReadCycle,dst)*WORKERS_PER_HASH);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:1399:56: note: expanded from macro 'update_max'
#define update_max(value, next_value) do { if ((value) < (next_value)) (value) = (next_value); } while (0)
[2022-05-10 09:19:29.046] opencl thread #0 self-test failed
[2022-05-10 09:19:29.046] opencl disabled (failed to start threads)
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: