[SYSTEMDS-3544] Boolean Matrix Support #1849

louislepage · 2023-06-22T15:29:38Z

No description provided.

phaniarnab · 2023-06-22T21:16:43Z

Thanks @louislepage, for starting the project.
Please add the Apache license to the new file, DenseBlockTrueBool.java.
Looking forward to seeing how you use this new denseblock for the binary matrix operations.

louislepage · 2023-06-25T19:29:01Z

added rudimentary support for true boolean operations by checking the result MatrixBLock and using boolean arithmetic accordingly.
java program that uses the systems jar to run operators and compute metrics can be found here: https://github.com/louislepage/systemds-boolean-matrix-evaluation

First results from25 runs of 10k x 10k sized double matrices show an average speed-up:

Matrix Dimensions: 10000 x 10000
Operator: GreaterThan
Runtime statistics in ms:
FP64:
LongSummaryStatistics{count=25, sum=60166, min=1712, average=2406,640000, max=3633}
BOOLEAN:
LongSummaryStatistics{count=25, sum=54211, min=1742, average=2168,440000, max=2521}
TRUE_BOOLEAN with boolean arithmetics:
LongSummaryStatistics{count=25, sum=33950, min=1009, average=1358,000000, max=1931}

Note, that the BOOLEAN DenseBlock uses Bitset, but simple boolean arrays as in the TRUE_BOOLEAN case, seem to perform better, at least regarding runtime.

Further Steps

more tests with different comparison operators
test memory efficiency
improve implementation of differentiation between boolean arithmetic and numeric/double arithmetics

phaniarnab · 2023-06-26T15:06:04Z

src/main/java/org/apache/sysds/runtime/matrix/data/LibMatrixBincell.java

+				// if ValueFunction is comparison, return matrix is true boolean and boolean arithmetics are activated, use boolean arithmetics
+				if (op.fn instanceof ValueComparisonFunction &&	ret.denseBlock != null && (ret.getDenseBlock() instanceof DenseBlockTrueBool || ret.getDenseBlock() instanceof DenseBlockBool) ){
+					//TODO: Optimize casting to boolean denseblock types :/
+					if(ret.getDenseBlock() instanceof DenseBlockTrueBool){
+						for(int r=rl; r<ru; r++)
+							for(int c=0; c<clen; c++) {
+								double v1 = m1.quickGetValue(r, c);
+								double v2 = m2.quickGetValue(r, c);
+								boolean vb = ((ValueComparisonFunction) op.fn).compare( v1, v2 );
+
+								//react what is happening in appendValuePlain()
+								ret.allocateDenseBlock(false);
+								((DenseBlockTrueBool) ret.getDenseBlock()).set(r,c,vb);
+								lnnz += vb ? 1 : 0;
+							}
+					} else {
+						for(int r=rl; r<ru; r++)
+							for(int c=0; c<clen; c++) {
+								double v1 = m1.quickGetValue(r, c);
+								double v2 = m2.quickGetValue(r, c);
+								boolean vb = ((ValueComparisonFunction) op.fn).compare( v1, v2 );
+
+								//react what is happening in appendValuePlain()
+								ret.allocateDenseBlock(false);
+								((DenseBlockBool) ret.getDenseBlock()).set(r,c,vb);
+								lnnz += vb ? 1 : 0;
+							}
 					}
+
+				} else {
+					for(int r=rl; r<ru; r++)
+						for(int c=0; c<clen; c++) {
+							double v1 = m1.quickGetValue(r, c);
+							double v2 = m2.quickGetValue(r, c);
+							double v = op.fn.execute( v1, v2 );
+							ret.appendValuePlain(r, c, v);
+							lnnz += (v!=0) ? 1 : 0;
+						}
+				}
+


Can you please take the new code out to a new method and call that method from unsafeBinary?

phaniarnab · 2023-06-26T15:07:45Z

src/main/java/org/apache/sysds/common/Types.java

-		FP32, FP64, INT32, INT64, BOOLEAN, STRING, UNKNOWN,
+		FP32, FP64, INT32, INT64, BOOLEAN, TRUE_BOOLEAN, STRING, UNKNOWN,


Is it possible to reuse the existing BOOLEAN value type instead of adding a new one?

I can reuse the old BOOLEAN type, yes. The reason for having two boolean types right now is to compare the bitset boolean denseblocks and the "true" boolean array denseblocks.

I would suggest renaming the "old" BOOLEAN, which currently is the bitset implementation, to BITSET and using BOOLEAN for the boolean array. That naming would make things more clear.
What do you think?

I agree. You can rename those to BITSET and BOOLEAN.

I have to revert this renaming. This Valuetype is used for so much more than just denseblocks, which I did not get the first time...

I will reuse the existing Valuetype.BOOLEAN and use another mechanism to create BooleanArrays when desired.
In the end, it makes a lot more sense when both DenseBlockBool implementations use the same ValueType, since both store booleans in the end.

Yes. That is a better approach and makes integrating the boolean type into the system easier.

phaniarnab · 2023-06-26T15:16:47Z

Thanks for the commits @louislepage. I have a few questions regarding the experiment setup.
How are you running and benchmarking the experiments? Are you running a dml script with time()?
How did you generate the input matrices? Are they dense or sparse?

louislepage · 2023-06-27T06:51:56Z

Thanks for the feedback @phaniarnab !

Since the boolean denseblocks are in an early stage and not yet usable via DML, I wrote a simple java program that creates the matrices and runs the operators on them. It runs the operators on two randomly generated double matrices of a predefined size. In my example, I took the average of 25 computations with random matrices. The time was taken using javas Instant.now() and then calculated as milli seconds, which were stored in a LongSummaryStatistics object to compute averages etc.:

        Instant start = Instant.now();
        mb1.binaryOperations(op, mb2, ret);
        Instant finish = Instant.now();
        long timeElapsed = Duration.between(start, finish).toMillis();
        timeStatistics.accept(timeElapsed);

As described here, the tool can be found in this repo. It should be public.

The main reason for this approach is that I did not yet want to add full support for boolean matrices since I can not tell if that might break things in other places.
However, I think it should be fine to use boolean matrices whenever running unsafeBinary with operators that are Comparision functions. If the data of dense blocks are only accessed using the get() functions, we can always return the double versions of the boolean values there to maintain compatibility with the FP64 implementation. To access the values as boolean in the future, I added a getBoolean() function to the boolean denseblocks.

Let me know if I should add this support for boolean matrices by using translations to doubles in the get() method to maintain compatibility.

But first I should test which kind of boolean implementation performs better, boolean array or bitset.

phaniarnab · 2023-06-27T07:12:41Z

Measuring time in Java is fine. However, I recommend using System.nanoTime() instead.
Use boolean/bitset inputs of different sizes.

louislepage · 2023-06-30T09:43:36Z

New file, new missing license, sorry...
Will fix asap!

louislepage · 2023-07-01T10:57:46Z

Experiment:

To analyze and compare the performance of DenseBlockBool as Bitset and as BooleanArray to the current FP64 implementation, a java program was created that uses the classes from systemds (systemds-boolean-matrix-evaluation). For each given matrix size, it does five computations with pseudo-randomly generated matrices for double to boolean operators and for boolean to boolean operators.
Both the total runtime of each operation and the memory used are stored. Finally, the average over the five runs is taken.
The Matrix sizes are chosen to represent up to 16mio total entries, but with different dims. We used square matrices of up to 4000x4000 and non-square ones of up to 100x160000. In the non-square case, one dimension was always fixed to a length of 100.

Evaluation of Double to Bool (GreaterThan)

As input, two dense FP64 Matrices are randomly generated based on a given seed. As the output, a MatrixBlock with a DenseBlock of the desired type (FP64, Bitset, or BooleanArray) is created.
The timer is started, the operator called with the two input and the output matrices, the timer is stopped, and the result matrix is returned for validation. Memory is computed as the difference between the memory used before the in- and output matrices are created and after the result is computed (but before any Matrix blocks were dereferenced).

The plot shows the runtimes on the left and memory usage on the right. In blue, the results of the bitset implementation are plotted. In green is the current Double/FP64 implementation, and in orange is the Boolean Array implementation. The different markers represent the results of square (square marker), Nx100 (circle), and 100xN (cross) matrices.

As expected, both runtime and memory usage increase linearly with an increased number of entries in the matrix.

The results show a visible performance increase for computation regarding runtime when using BooleanArray. This is most likely due to faster, direct access to the smaller arrays and fewer translations between boolean results and their double representation to store them in the result matrix since the operators can return boolean values, which can then be directly stored in the result matrices.
Interestingly the boolean array outperforms the Bitset here. The Bitset was even slower than the FP64 implementation. Without further investigation, this is probably due to fewer instructions necessary to access the boolean array compared to more when storing a value in a bitset. Also, it is worth noting that the bitset stores the values internally as long and therefore has to access an 8bytes large long whenever accessing a value. This might influence caching.

Regarding memory, both boolean implementations took less space. As expected, the Bitset takes even less space since each boolean in the boolean arrays still takes up one byte, but the bitset stores the booleans as individual bits that are collected into long.

There is no difference in runtime or memory when using non-square matrices as long as the total number of entires is the same.

Evaluation of Boolen To Boolean (Xor)

As input, two dense Matrices of the given type (FP64, Bitset, or BooleanArray) with boolean values are randomly generated based on a given seed. As the output, a MatrixBlock with a DenseBlock of the desired type is created.
The timer is started, and the operator is called with the two input and the output matrices in a loop for 20 iterations. The timer is stopped, and the result matrix is returned for validation. Memory is computed as the difference between the memory used before the in- and output matrices are created and after the loop ended (but before any Matrix blocks were dereferenced).

The plot shows the runtimes on the left and memory usage on the right. In blue, the results of the bitset implementation are plotted. In green is the current Double/FP64 implementation, and in orange is the Boolean Array implementation. The different markers represent the results of square (square marker), Nx100 (circle), and 100xN (cross) matrices.

As expected, an improved runtime for the boolean arrays is also visible in this case and both runtime and memory usage increase linearly with an increased number of entries in the matrix.

We can also see an improvement in memory usage when using a boolean implementation.
The memory used is less than when running the Double to Boolean operator, even when using FP64, but this is probably due to reusing the result matrix as input to the next computation. Therefore after the first iteration, there are only two matrices in memory.

Memory usage for the Boolean to Boolean case, when using actual Booleans as input, has decreased significantly since all matrices in memory are small, not only the output matrix. It is also visible that the bitset again outperforms the boolean array regarding memory usage.

There is no difference in runtime or memory when using non-square matrices as long as the total number of entries is the same.

Conclusion:

Since both boolean implementations improve memory usage, but only the boolean array outperforms the FP64 implementation in runtime, I suggest using the boolean array as the DenseBlock for any operation that results in only boolean data.

Let me know if there are any further open questions.

Baunsgaard · 2023-07-02T19:26:18Z

Have you tried pop count on the bit set for the sum operation? This should speed up the bit set significantly for sum.

louislepage · 2023-07-04T10:29:57Z

Have you tried pop count on the bit set for the sum operation? This should speed up the bit set significantly for sum.

I did look into how some operators would probably have a speedup when using bitset because they can be performed on the entire bitset and not value by value.
OR, AND, and XOR are also candidates for this (e.g. BitSet.or(BitSet bitset).
But, it would mean a reimplementation of all of these operators. And a mechanism to decide when to use value by value and when Bitset on Bitset.

However, the GraterThan and other Comparison Functions were already slower for bitset , than when run on FP64, and these have to run value by value. This would mean that one would need a dynamic typing mechanism to decide during runtime which kind of boolean denseblock would be most beneficial for the upcoming operations. I doubt that translating from boolean array to bitset and vice-versa for each operation would be worth it.

@phaniarnab and I discussed this briefly in a meeting but came to the result, that in the end, there should be only one kind of boolean array implementation for now.

Therefore I proposed the boolean array since it seems to have some speedup over all operators while using the same value-by-value operations as the FP64 implementation did. It would provide a speedup while being backward-compatible with the current operator implementation, so to say.

But I agree that there most certainly are situations where bitset could be significantly faster.

I think that having both boolean implementations and deciding at runtime what should be used would be very interesting, but for now, I do not understand systemds well enough to confidently say if this would be feasible and how complex this would be.

…f bitset

… support for boolean operations when using ValueComparisonFunctions

…ory size estimates for true bool

…ockBool files

…lass for boolean dense blocks

- license - exception for double to bool for opearator of type and, or, xor - add constructor for operator to enforce desired sparse safety

revert renaming of valuetypes

louislepage · 2023-07-06T12:47:53Z

Some changes from main were throwing errors for the overflow tests and, following the contributing guidelines, I had to rebase on the main again and ran into conflicts. I resolved them but had to force-push this branch to accommodate these changes.

louislepage · 2023-07-07T10:03:55Z

Can you restart the pipeline once?
I think a couple of errors are due to issues with spark and caching and might not be directly code related.

I cannot replicate the errors locally and do not understand how the changes in this branch could have changed anything regarding privacy handling or spark execution.

If you do see an issue with tre test results that might be code dependent, I would be very grateful for any tips.
Thanks!

phaniarnab · 2023-07-07T10:05:39Z

Can you restart the pipeline once? I think a couple of errors are due to issues with spark and caching and might not be directly code related.

I cannot replicate the errors locally and do not understand how the changes in this branch could have changed anything regarding privacy handling or spark execution.

If you do see an issue with tre test results that might be code dependent, I would be very grateful for any tips. Thanks!

Yes. They may not be related to your changes.

louislepage · 2023-07-07T11:16:55Z

Thanks, now it was successful!
And with this, I do not have anything else to add to the project.
Let me know if you need further information on the code and experiments.

Anyways, I still see the same

org.apache.sysds.runtime.controlprogram.federated.FederatedWorkerHandlerException: Failed to getVariable

and

Exception : class io.netty.channel.AbstractChannel$AnnotatedConnectException
Message   : Connection refused: localhost/127.0.0.1:35013
LEVEL : 3
Exception : class java.net.ConnectException
Message   : Connection refused

I do see the same errors in other successful jobs: https://github.com/apache/systemds/actions/runs/5478347246/jobs/9978929681?pr=1850

Now I am interested in how these tests are designed... 😅

phaniarnab · 2023-07-08T09:13:45Z

Thanks @louislepage. The plots conclude this task. Thanks for your contribution. 👍🏽
If there are any scripts that are needed to reproduce the plots, please share those with us.

louislepage · 2023-07-08T09:49:16Z

Great, thanks for your mentoring!

The jupyter notebook and result CSVs with which I created the plots can be found under louislepage/systemds-boolean-matrix-evaluation/python-eval in the repo where you can also find the java code that I used to gather the data.

phaniarnab · 2024-02-08T21:35:05Z

This feature needs more thought and testing before it is ready for merging. I will leave it for the next release. @j143

phaniarnab reviewed Jun 26, 2023

View reviewed changes

louislepage added 10 commits July 5, 2023 20:09

[SYSTEMDS-3544] added DenseBlockTrueBool for boolean matrix instead o…

bdd3941

…f bitset

[SYSTEMDS-3544] added tests and license for DenseBlockTrueBool; added…

fe38cde

… support for boolean operations when using ValueComparisonFunctions

[SYSTEMDS-3544] fixed trigger for true bool computation and added mem…

9397a7a

…ory size estimates for true bool

[SYSTEMDS-3544] use boolean arithmetics if result is boolean

254d756

[SYSTEMDS-3544] add boolean set method to Bitset DenseBlockBool

2795177

[SYSTEMDS-3544] renaming of BOOLEAN to BITSET and renaming of DenseBl…

f2f80af

…ockBool files

[SYSTEMDS-3544] fix missing refactor in fromExternalString

285c34b

[SYSTEMDS-3544] move boolean computation to method and add abstract c…

d81df36

…lass for boolean dense blocks

[SYSTEMDS-3544]

010bdfb

- license - exception for double to bool for opearator of type and, or, xor - add constructor for operator to enforce desired sparse safety

[SYSTEMDS-3544]

badf2fb

revert renaming of valuetypes

louislepage force-pushed the SYSTEMDS-3544 branch from a0813c3 to badf2fb Compare July 6, 2023 12:42

j143 requested a review from phaniarnab December 4, 2023 02:57

j143 added this to the systemds-3.2.0 milestone Dec 4, 2023

j143 modified the milestones: systemds-3.2.0, next-release Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYSTEMDS-3544] Boolean Matrix Support #1849

[SYSTEMDS-3544] Boolean Matrix Support #1849

louislepage commented Jun 22, 2023

phaniarnab commented Jun 22, 2023

louislepage commented Jun 25, 2023

phaniarnab Jun 26, 2023

phaniarnab Jun 26, 2023

louislepage Jun 27, 2023

phaniarnab Jun 27, 2023

louislepage Jul 4, 2023

phaniarnab Jul 4, 2023

phaniarnab commented Jun 26, 2023

louislepage commented Jun 27, 2023

phaniarnab commented Jun 27, 2023

louislepage commented Jun 30, 2023

louislepage commented Jul 1, 2023

Baunsgaard commented Jul 2, 2023

louislepage commented Jul 4, 2023 •

edited

Loading

louislepage commented Jul 6, 2023

louislepage commented Jul 7, 2023

phaniarnab commented Jul 7, 2023

louislepage commented Jul 7, 2023

phaniarnab commented Jul 8, 2023

louislepage commented Jul 8, 2023

phaniarnab commented Feb 8, 2024

		FP32, FP64, INT32, INT64, BOOLEAN, STRING, UNKNOWN,
		FP32, FP64, INT32, INT64, BOOLEAN, TRUE_BOOLEAN, STRING, UNKNOWN,

[SYSTEMDS-3544] Boolean Matrix Support #1849

Are you sure you want to change the base?

[SYSTEMDS-3544] Boolean Matrix Support #1849

Conversation

louislepage commented Jun 22, 2023

phaniarnab commented Jun 22, 2023

louislepage commented Jun 25, 2023

Further Steps

phaniarnab Jun 26, 2023

Choose a reason for hiding this comment

phaniarnab Jun 26, 2023

Choose a reason for hiding this comment

louislepage Jun 27, 2023

Choose a reason for hiding this comment

phaniarnab Jun 27, 2023

Choose a reason for hiding this comment

louislepage Jul 4, 2023

Choose a reason for hiding this comment

phaniarnab Jul 4, 2023

Choose a reason for hiding this comment

phaniarnab commented Jun 26, 2023

louislepage commented Jun 27, 2023

phaniarnab commented Jun 27, 2023

louislepage commented Jun 30, 2023

louislepage commented Jul 1, 2023

Experiment:

Evaluation of Double to Bool (GreaterThan)

Evaluation of Boolen To Boolean (Xor)

Conclusion:

Baunsgaard commented Jul 2, 2023

louislepage commented Jul 4, 2023 • edited Loading

louislepage commented Jul 6, 2023

louislepage commented Jul 7, 2023

phaniarnab commented Jul 7, 2023

louislepage commented Jul 7, 2023

phaniarnab commented Jul 8, 2023

louislepage commented Jul 8, 2023

phaniarnab commented Feb 8, 2024

louislepage commented Jul 4, 2023 •

edited

Loading