Simplification of vector handling: importance of helper functions? #159
Description
I've been thinking about ways the C backend could be simplified. I think a lot of complexity comes from trying to handle so many details of the translation in a single pass. By splitting things into multiple passes (operating on an IR, probably LLVM IR), maybe it could be easier to work with.
Something that could be moved to a pass is the handling of vector operations. The LLVM Scalarizer
pass can lower most vector operations to simple scalar operations for us, meaning we can remove the handling for vector addition, multiplication etc, leaving just things like generating structs for them, converting GEPs, and a few other things like that.
It's a pretty simple change to use the scalariser:
--- a/lib/Target/CBackend/CTargetMachine.cpp
+++ b/lib/Target/CBackend/CTargetMachine.cpp
@@ -19,6 +19,8 @@
#include "llvm/Transforms/Utils.h"
#endif
+#include "llvm/Transforms/Scalar/Scalarizer.h"
+
namespace llvm {
bool CTargetMachine::addPassesToEmitFile(PassManagerBase &PM,
@@ -53,6 +55,8 @@ bool CTargetMachine::addPassesToEmitFile(PassManagerBase &PM,
// Lower atomic operations to libcalls
PM.add(createAtomicExpandPass());
+ PM.add(createScalarizerPass());
+
PM.add(new llvm_cbe::CWriter(Out));
return false;
}
The main difference in the generated C code is essentially that what would otherwise be the body of a helper function like llvm_fmul_f32x4
instead gets inlined at the call-site.
I'm wondering whether there's a disadvantage to this approach. For a simple matrix multiplication test I wrote, clang seemed to produce similarly good code for the the helper function and non-helper-function versions (i.e. it successfully re-vectorises both). But it might be the case that in a more complex program, switching to scalarisation like this would produce worse code.
Any thoughts?