Skip to content

Commit

Permalink
hsa docs
Browse files Browse the repository at this point in the history
  • Loading branch information
smurfd committed Apr 7, 2024
1 parent 369fd46 commit e6810bd
Show file tree
Hide file tree
Showing 7 changed files with 31 additions and 2 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ aka Navi31 aka gfx1100 aka Plum Bonito aka amd744c
- DCN = Display Core Next (dcn320) (amdgpu/dcn_3_2_0_dmcub.bin)
- VCN = Video Core Next (encoder/decoder) (vcn400) (vcn_4_0_0.bin)
- [SDMA](/docs/SDMA.md) = System DMA (lsdma600) (sdma_6_0_0.bin) (F32)
- [HSA](/docs/HSA.md) = Heterogeneous System Architecture

More info on each piece:
https://mjmwired.net/kernel/Documentation/gpu/amdgpu/driver-core.rst
Expand All @@ -32,7 +33,7 @@ RS64 = RISCV/RV64I + a few custom instructions! Load (at least MEC) with offset

## Architechture Diagram

![](/docs/arch1.jpg)
![](/docs/img/arch1.jpg)

- 1x 5nm GCD (graphics compute die)
- 6x 6nm MCD (memory cache die)
Expand Down
2 changes: 1 addition & 1 deletion docs/CU.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

It's where compute happens

![](/docs/big_compute-unit-pair.jpg)
![](/docs/img/big_compute-unit-pair.jpg)

7900XTX has 96 compute units (48 work-group processors)

Expand Down
28 changes: 28 additions & 0 deletions docs/HSA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# HSA stands for Heterogeneous System Architecture

[HSA](https://en.wikipedia.org/wiki/Heterogeneous_System_Architecture) is a cross-vendor set of specifications that allow for the integration of central processing units and graphics processors on the same bus, with shared memory and tasks.

Idea of HSA is to reduce communication latency between CPUs, GPUs and to make it easier to offload calculations to the GPU
![](/docs/img/gpu_with_hsa.png)

HSA defines a unified virtual address space for compute.

Usually GPU and CPU have their own memory, HSA requires them to share page tables, to exchange data by sharing pointers. Needs to be supported by HSA specific [memory management units](https://web.archive.org/web/20140328140823/http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/hsa10.pdf)

HSA should support both GPUs and CPUs and high-level languages.

The CPU's [MMU](https://en.wikipedia.org/wiki/Memory_management_unit) and the GPU's [IOMMU](https://en.wikipedia.org/wiki/IOMMU) must both comply with HSA hardware specifications.
![](/docs/img/mmu_iommu.png)

Some of the HSA-specific features implemented in the hardware need to be supported by the operating system kernel and specific device drivers.

`amdkfd` supports heterogeneous queuing (HQ), which aims to simplify the distribution of computational jobs among multiple CPUs and GPUs from the programmer's perspective. Support for heterogeneous memory management (HMM), suited only for graphics hardware featuring version 2 of the AMD's IOMMU,


## Graphics Core Next (GCN)

HSA kernel driver resides in the directory `/drivers/gpu/hsa`, while the DRM graphics device drivers reside in `/drivers/gpu/drm`

Hardware schedulers are used to perform scheduling and offload the assignment of compute queues to the ACEs from the driver to hardware, by buffering these queues until there is at least one empty queue in at least one ACE. This causes the HWS to immediately assign buffered queues to the ACEs until all queues are full or there are no more queues to safely assign

Part of the scheduling work performed includes prioritized queues which allow critical tasks to run at a higher priority than other tasks without requiring the lower priority tasks to be preempted to run the high priority task, therefore allowing the tasks to run concurrently with the high priority tasks scheduled to hog the GPU as much as possible while letting other tasks use the resources that the high priority tasks are not using. These are essentially Asynchronous Compute Engines that lack dispatch controllers. They were first introduced in the fourth generation [GCN](https://en.wikipedia.org/wiki/Graphics_Core_Next) microarchitectur
File renamed without changes
File renamed without changes
Binary file added docs/img/gpu_with_hsa.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/mmu_iommu.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit e6810bd

Please sign in to comment.