OPTIONAL MATCH issues + Continued Development and Support \ Community for Morpheus (Spark 3 support, bugfixes )Β #947
Description
hi,
i have encountered what i believe is a bug in the optional match implementation , and hoping to get some guidance as to how i could help remediate it (and submit a pull request for all to have this fix) .
the issue is this ->
i have done multiple tests that all lead to the same result -> using OPTIONAL MATCH to match against a non-existent relationship expansion , and afterwards having another OPTIONAL MATCH from the same variable to something that does exist , would never return the latter , and will only return a NULL .
re- example:
test graph ->
val test = morpheus.cypher(
"""
CONSTRUCT
CREATE (p1:Person {name: "Alice"})
CREATE (p2:Person {name: "Bob"})
CREATE (p3:Person {name: "Eve"})
CREATE (p4:Person {name: "Paul"})
CREATE (p1)-[:KNOWS]->(p3)
CREATE (p1)-[:KNOWS2]->(p2)
CREATE (p1)-[:KNOWS3]->(p3)
CREATE (p1)-[:KNOWS4]->(p4)
CREATE (p1)-[:KNOWS5]->(p2)
return GRAPH
""".stripMargin
).graph
--- query -->
val testres = test.cypher(
"""
match (p1:Person )
optional match (p1)-[:KNOWS]->(p2)
optional match (p1)-[:KNOWS15]->(p3)
optional match (p1)-[:KNOWS4]->(p4)
return p1.name, p2.name,p3.name,p4.name
""".stripMargin)
--- yields the following result ->
+-------+-------+-------+-------+
|p1_name|p2_name|p3_name|p4_name|
+-------+-------+-------+-------+
| Bob| null| null| null|
| Eve| null| null| null|
| Paul| null| null| null|
| Alice| Eve| null| null|
+-------+-------+-------+-------+
*** But , if we would change the query order to (having the non existent expansion last):
val testres = test2.cypher(
"""
match (p1:Person )
optional match (p1)-[:KNOWS]->(p2)
optional match (p1)-[:KNOWS4]->(p4)
optional match (p1)-[:KNOWS15]->(p3)
return p1.name, p2.name,p3.name,p4.name
""".stripMargin)
-- we would then get the correct result :
+-------+-------+-------+-------+
|p1_name|p2_name|p3_name|p4_name|
+-------+-------+-------+-------+
| Alice| Eve| null| Paul|
| Bob| null| null| null|
| Eve| null| null| null|
| Paul| null| null| null|
-- which is not expected - since alice is also connected via KNOWS4 to paul.
same query and graph setup in neo4j yields (in any ordering of the query) :
p1.name | p2.name | p3.name | p4.name |
---|---|---|---|
"Alice" | "Eve" | null | "Paul" |
"Bob" | null | null | null |
"Eve" | null | null | null |
"Paul" | null | null | null |
i found in the OptionalMatchTests.scala in morpheus the following test which doesn't cover the above, and nothing else that does (there is no test in morpheus \ tck that cover doesnt exist - exist :
val g = initGraph(
"""
|CREATE (:DoesExist {property: 42})
|CREATE (:DoesExist {property: 43})
|CREATE (:DoesExist {property: 44})
""".stripMargin)
val res = g.cypher(
"""
|OPTIONAL MATCH (f:DoesExist)
|OPTIONAL MATCH (n:DoesNotExist)
|RETURN collect(DISTINCT n.property) AS a, collect(DISTINCT f.property) AS b
""".stripMargin)
can any of the dev's please share they're thoughts as to how hard would it be to try and get to the expected behaviour? and where should i start looking in the solution to try and fix ?
is it a matter of a complex modification to the relation+logical planner to get the required behaviour ? if there is some small tweak that comes to mind it would greatly help me ( i am trying to use morpheus to automate a combination of left outer joins and inner joins needed in a very large datasets
thanks very much !