Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrefixSum Re-enable tests. #420

Open
wants to merge 671 commits into
base: master
Choose a base branch
from

Conversation

tewaro
Copy link

@tewaro tewaro commented Apr 15, 2024

No description provided.

Initialization of weights on CPU did not initialize weights on GPU: this
commit adds a call to copy over newly initialized CPU weights over to
the GPU. Moves init of GPU memory before the CPU weights are initialized
as well to prevent a nullptr copy.

Also adds a debug function to print vectors on the GPU.
Reenable assertions for the GPU GCN layer forward pass which now works.
The next step is to get the backward pass test working which involves
adding a function for user code to copy things over to CUDA without
needing to include the CUDA header + adding the appropriate GPU
functions in the backend.
Adds a function that allocates GPU memory and copies over a particular
passed in vector to the GPU. At the moment there is no way to free this
memory and it will leak. This function is added mostly for unit test
purposes and should not be used otherwise.
Add ifdefs to separate CPU/GPU code in the backward step and also change
the structures being used to PointerWithSize (gpus don't like CPU
vectors).

Added a few TODOs too for better code organization later.
Adds functions for calculating the weight and layer gradients of the GCN
layer. Untested: the tests will be added in a commit down the line.
Add functions to copy the backward output and weight gradients of a
layer from GPU to CPU. Also moved some function definitions to the
header since the definitions were quite small.
Since the aggregation in the GPU doesn't actually overwrite but adds to,
the entire output matrix needs to be zero'd out before anything is done
on it else you will have garbage values on it.
The forward and backward pass of a GCN layer without dropout/activation
works fine now.

All that is left for fully functioning GPU code is the output layer
(softmax). Dropout and activation are nice to have but are not critical
to "function" (though obviously they will be added).
Add ifdefs to calls in softmax layer in preparation for GPU calls.
For some reason  they were Labels which are not needed since the masks
are essentially bitsets: they have been changed to chars to save more
space.
Returning dead objects + removing unused arguments in Softmax layer
files to allow GPU build to compile
Adds code to copy over the masks for the train, val, test sets to the
GPU. Removes norm factor variable + adds the free calls for masks
to the destructor as well.
Adds the file for the GPU object for the Softmax layer and adds the call
to the Forward phase of the GPU code. The call itself is not yet
defined.
Adds a softmax function on GPUs that can be called from GPU kernels.
Added a few things from old codebase's CUDA utils to new one in
preparation for using the newly added things to compute the softmax
layer. Also added the original source of the old code: Caffe.
This commit adds the softmax/cross entropy function to GNNMath.cu and
uses it to define the GPU Softmax forward phase function. An additional
argument was added to the forward phase gpu call to deal with the
different phases: the phase argument details which mask to use in the
softmax. There are a few things left to do that will be done later,
namely zero'ing out the output matrix.

Note that I have NOT defined cross entropy for the forward phase: it is
only used to calculate loss, and I'm not using loss nor referring to it
anywhere in my code or analysis at the moment..
Fixed some bugs exposed by the unit test for softmax forward, namely
that the feature length size was incorrect and that the vector was not
being 0'd out before softmax occured. The unit test in question has been
ported over from the cpu softmax unit test as well.

The next step is to finish up the backward pass for the softmax layer
and reactivate the unit test calls to the backward phase. I also need to
consider actually checking backward phase output to make sure it is
sane.
Moved code to select the right mask pointer passed on the current
layer phase to a function as it will be used in backward phase as
well.
Ground truth is represented with GNNLabel, but I was using a GNNFloat.
This caused the labels being read to be garbaged when used on the GPU.
This commit changes it them to the correct type.

It also includes the signature definition of the backward phase: the
implementation will be included in the next commit. (Split the commits
up for modularity's sake)
Adds the backward phase for the softmax layer for the GPU. The
implementation is taken from the non-refactored old code: it copies a
prediction to shared memory (presumably to improve locality) then does
cross entropy to softmax derivatives. It remains to be seen if the
shared memory copy is actually more efficient; some testing will be done
down the line.

Also adds print to both cpu and gpu softmax tests in order to verify
that both are doing the same compute (which they are in this commit).
This commit adds the declarations for the global accuracy
getter for GPU GNNs as well as the orchestration of the call to the GPU
version. The rest of the implementation will come in a later commit: for now
this isn't priority as I can still compute accuracy on the CPU.

Adds a new GNNGPU object to hold all GPU related things for the GNN
class.
Adds a GPU Adam optimizer class that holds the allocations for the
moments used in the adam optimizer on the GPU. Adds a gpu version of the
adam test as well to make sure build is sane in its current state.
The CPU optimizer class is also now split into the CPU/GPU paths
depending on which build is being used.

Next step is to do the adam optimizer on the GPU proper.
The gradient descent call in the optimizers now uses PointerWithSize
rather than std::vectors. This is for compatibility with GPU pointers.
Calls to the function have been changed throughout the code accordingly.
Implements Adam optimization on the GPU and makes sure it's sane via the
gpu unit test. Also fixes an inconsistency with the CPU adam optimizer
where a sqrt wasn't being applied to epsilon like it is in the original
non-refactored code.
Adds a gpu version of the epoch test and fixes the pointers returned
from a GNN layer (it was always returning CPU pointers even in the GPU
build). Adds error checking to cuSparse call too.

gpu-epoch-test runs a GNN end to end (still missing some features that
CPU has), but it has to copy predictions over from GPU (slow, should do
this from GPU end) + there seem to be accuracy issues on reddit. Will be
resolved in a later commit.
Norm factors are required during aggregation in order for the current
computation on GPU to match CPU computation (earlier I was under the
impression that norm factors were integrated into the data that was
already copied, but this is incorrect). This commit adds the norm factor
copy from CPU to GPU.
Aggregation in the GPU for GCN now uses norm factors to normalize the
aggregations of neighbors. This change allows it to exactly match
computation done on a CPU if dropout is turned off.

The next step is to add dropout support to the GPU.
Efficient dropout support requires RNG on the GPU: this commit adds a
function to init the CuRAND RNG so that the GPU can generate the random
numbers required to choose things to drop for dropout.
Initializes a dropout mask for every GPU layer. Can be optimized if
dropout is disabled (i.e. do not allocate) for both CPU/GPUs. This will
be handled later once a base implementation of everything is settled.

It is a float because the float will be checked during dropout to see if
it crosses some threshold for dropout.
patrickkenney9801 and others added 28 commits March 18, 2024 11:55
chore: Remove unused submodules
* add info for compaction policy
* align atomics to cache line
* fix: WMD graph vertex schema and add phmap to part of importer
* chore: Remove instrumentation legacy code
* changes to switch to LC_LS_CSR graph

* fixing debug err

* fixing pre-commit issues

* changing api calls for wf4

* Added data.001.csv using lfs

* fixing getEdgeData api

* fix for getEdgeData()

* ci fix

* data.001.csv

* changing test dataset

* quickfix

* fixing precommit

* fixing graph deallocate()

* fixing test

* Update workflows to be realistic

* CPU set

* Try this again

* Try this again

* Slight refactor

---------

Co-authored-by: AdityaAtulTewari <adityaatewari@gmail.com>
* dynamic edges support

* adding correct test

* fixing precommit

* moving static file to lfs
@tewaro tewaro force-pushed the AdityaAtulTewari/prefixsum-reenable-test branch from 553e98d to 79a5e03 Compare April 15, 2024 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants