-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYSTEMDS-3544] Boolean Matrix Support #1849
base: main
Are you sure you want to change the base?
Conversation
Thanks @louislepage, for starting the project. |
First results from25 runs of 10k x 10k sized double matrices show an average speed-up:
Note, that the BOOLEAN DenseBlock uses Bitset, but simple boolean arrays as in the TRUE_BOOLEAN case, seem to perform better, at least regarding runtime. Further Steps
|
// if ValueFunction is comparison, return matrix is true boolean and boolean arithmetics are activated, use boolean arithmetics | ||
if (op.fn instanceof ValueComparisonFunction && ret.denseBlock != null && (ret.getDenseBlock() instanceof DenseBlockTrueBool || ret.getDenseBlock() instanceof DenseBlockBool) ){ | ||
//TODO: Optimize casting to boolean denseblock types :/ | ||
if(ret.getDenseBlock() instanceof DenseBlockTrueBool){ | ||
for(int r=rl; r<ru; r++) | ||
for(int c=0; c<clen; c++) { | ||
double v1 = m1.quickGetValue(r, c); | ||
double v2 = m2.quickGetValue(r, c); | ||
boolean vb = ((ValueComparisonFunction) op.fn).compare( v1, v2 ); | ||
|
||
//react what is happening in appendValuePlain() | ||
ret.allocateDenseBlock(false); | ||
((DenseBlockTrueBool) ret.getDenseBlock()).set(r,c,vb); | ||
lnnz += vb ? 1 : 0; | ||
} | ||
} else { | ||
for(int r=rl; r<ru; r++) | ||
for(int c=0; c<clen; c++) { | ||
double v1 = m1.quickGetValue(r, c); | ||
double v2 = m2.quickGetValue(r, c); | ||
boolean vb = ((ValueComparisonFunction) op.fn).compare( v1, v2 ); | ||
|
||
//react what is happening in appendValuePlain() | ||
ret.allocateDenseBlock(false); | ||
((DenseBlockBool) ret.getDenseBlock()).set(r,c,vb); | ||
lnnz += vb ? 1 : 0; | ||
} | ||
} | ||
|
||
} else { | ||
for(int r=rl; r<ru; r++) | ||
for(int c=0; c<clen; c++) { | ||
double v1 = m1.quickGetValue(r, c); | ||
double v2 = m2.quickGetValue(r, c); | ||
double v = op.fn.execute( v1, v2 ); | ||
ret.appendValuePlain(r, c, v); | ||
lnnz += (v!=0) ? 1 : 0; | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please take the new code out to a new method and call that method from unsafeBinary?
FP32, FP64, INT32, INT64, BOOLEAN, STRING, UNKNOWN, | ||
FP32, FP64, INT32, INT64, BOOLEAN, TRUE_BOOLEAN, STRING, UNKNOWN, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to reuse the existing BOOLEAN value type instead of adding a new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can reuse the old BOOLEAN type, yes. The reason for having two boolean types right now is to compare the bitset boolean denseblocks and the "true" boolean array denseblocks.
I would suggest renaming the "old" BOOLEAN, which currently is the bitset implementation, to BITSET and using BOOLEAN for the boolean array. That naming would make things more clear.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. You can rename those to BITSET and BOOLEAN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to revert this renaming. This Valuetype is used for so much more than just denseblocks, which I did not get the first time...
I will reuse the existing Valuetype.BOOLEAN and use another mechanism to create BooleanArrays when desired.
In the end, it makes a lot more sense when both DenseBlockBool implementations use the same ValueType, since both store booleans in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That is a better approach and makes integrating the boolean type into the system easier.
Thanks for the commits @louislepage. I have a few questions regarding the experiment setup. |
Thanks for the feedback @phaniarnab ! Since the boolean denseblocks are in an early stage and not yet usable via DML, I wrote a simple java program that creates the matrices and runs the operators on them. It runs the operators on two randomly generated double matrices of a predefined size. In my example, I took the average of 25 computations with random matrices. The time was taken using javas
As described here, the tool can be found in this repo. It should be public. The main reason for this approach is that I did not yet want to add full support for boolean matrices since I can not tell if that might break things in other places. Let me know if I should add this support for boolean matrices by using translations to doubles in the get() method to maintain compatibility. But first I should test which kind of boolean implementation performs better, boolean array or bitset. |
Measuring time in Java is fine. However, I recommend using |
New file, new missing license, sorry... |
Experiment:To analyze and compare the performance of DenseBlockBool as Bitset and as BooleanArray to the current FP64 implementation, a java program was created that uses the classes from systemds (systemds-boolean-matrix-evaluation). For each given matrix size, it does five computations with pseudo-randomly generated matrices for double to boolean operators and for boolean to boolean operators. Evaluation of Double to Bool (GreaterThan)As input, two dense FP64 Matrices are randomly generated based on a given seed. As the output, a MatrixBlock with a DenseBlock of the desired type (FP64, Bitset, or BooleanArray) is created. The plot shows the runtimes on the left and memory usage on the right. In blue, the results of the bitset implementation are plotted. In green is the current Double/FP64 implementation, and in orange is the Boolean Array implementation. The different markers represent the results of square (square marker), Nx100 (circle), and 100xN (cross) matrices. As expected, both runtime and memory usage increase linearly with an increased number of entries in the matrix. The results show a visible performance increase for computation regarding runtime when using BooleanArray. This is most likely due to faster, direct access to the smaller arrays and fewer translations between boolean results and their double representation to store them in the result matrix since the operators can return boolean values, which can then be directly stored in the result matrices. Regarding memory, both boolean implementations took less space. As expected, the Bitset takes even less space since each boolean in the boolean arrays still takes up one byte, but the bitset stores the booleans as individual bits that are collected into There is no difference in runtime or memory when using non-square matrices as long as the total number of entires is the same. Evaluation of Boolen To Boolean (Xor)As input, two dense Matrices of the given type (FP64, Bitset, or BooleanArray) with boolean values are randomly generated based on a given seed. As the output, a MatrixBlock with a DenseBlock of the desired type is created. The plot shows the runtimes on the left and memory usage on the right. In blue, the results of the bitset implementation are plotted. In green is the current Double/FP64 implementation, and in orange is the Boolean Array implementation. The different markers represent the results of square (square marker), Nx100 (circle), and 100xN (cross) matrices. As expected, an improved runtime for the boolean arrays is also visible in this case and both runtime and memory usage increase linearly with an increased number of entries in the matrix. We can also see an improvement in memory usage when using a boolean implementation. Memory usage for the Boolean to Boolean case, when using actual Booleans as input, has decreased significantly since all matrices in memory are small, not only the output matrix. It is also visible that the bitset again outperforms the boolean array regarding memory usage. There is no difference in runtime or memory when using non-square matrices as long as the total number of entries is the same. Conclusion:Since both boolean implementations improve memory usage, but only the boolean array outperforms the FP64 implementation in runtime, I suggest using the boolean array as the DenseBlock for any operation that results in only boolean data. Let me know if there are any further open questions. |
Have you tried pop count on the bit set for the sum operation? This should speed up the bit set significantly for sum. |
I did look into how some operators would probably have a speedup when using bitset because they can be performed on the entire bitset and not value by value. However, the GraterThan and other Comparison Functions were already slower for bitset , than when run on FP64, and these have to run value by value. This would mean that one would need a dynamic typing mechanism to decide during runtime which kind of boolean denseblock would be most beneficial for the upcoming operations. I doubt that translating from boolean array to bitset and vice-versa for each operation would be worth it. @phaniarnab and I discussed this briefly in a meeting but came to the result, that in the end, there should be only one kind of boolean array implementation for now. Therefore I proposed the boolean array since it seems to have some speedup over all operators while using the same value-by-value operations as the FP64 implementation did. It would provide a speedup while being backward-compatible with the current operator implementation, so to say. But I agree that there most certainly are situations where bitset could be significantly faster. I think that having both boolean implementations and deciding at runtime what should be used would be very interesting, but for now, I do not understand systemds well enough to confidently say if this would be feasible and how complex this would be. |
… support for boolean operations when using ValueComparisonFunctions
…ory size estimates for true bool
…lass for boolean dense blocks
- license - exception for double to bool for opearator of type and, or, xor - add constructor for operator to enforce desired sparse safety
revert renaming of valuetypes
a0813c3
to
badf2fb
Compare
Some changes from main were throwing errors for the overflow tests and, following the contributing guidelines, I had to rebase on the main again and ran into conflicts. I resolved them but had to force-push this branch to accommodate these changes. |
Can you restart the pipeline once? I cannot replicate the errors locally and do not understand how the changes in this branch could have changed anything regarding privacy handling or spark execution. If you do see an issue with tre test results that might be code dependent, I would be very grateful for any tips. |
Yes. They may not be related to your changes. |
Thanks, now it was successful! Anyways, I still see the same
and
I do see the same errors in other successful jobs: https://github.com/apache/systemds/actions/runs/5478347246/jobs/9978929681?pr=1850 Now I am interested in how these tests are designed... 😅 |
Thanks @louislepage. The plots conclude this task. Thanks for your contribution. 👍🏽 |
Great, thanks for your mentoring! The jupyter notebook and result CSVs with which I created the plots can be found under louislepage/systemds-boolean-matrix-evaluation/python-eval in the repo where you can also find the java code that I used to gather the data. |
This feature needs more thought and testing before it is ready for merging. I will leave it for the next release. @j143 |
No description provided.