-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance and behavior of ListMap and ListSet #5103
Conversation
A Gist containing some naive benchmarks (together with the source code for them) is available here: https://gist.github.com/ruippeixotog/20f961055a94345ad24c2fd58079d26c |
@@ -0,0 +1,16 @@ | |||
package scala.collection.immutable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add copyright header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most, if not all, JUnit tests lack a copyright header. I already wrote several tests for this repository before and the copyright never was a requirement... Are you sure it's mandatory? @Ichoran, can you tell what is the "official" policy in these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirming this was an assumption I had made. Please wait for the official policy.
7ad3ac7
to
e5ddb47
Compare
this | ||
} | ||
|
||
def clear() = { elems.clear() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also seen.clear()
?
looks good otherwise. the build failure looks weird - let's just try again (amend your changes and |
e5ddb47
to
3cd0167
Compare
@lrytz thanks for the review, I just fixed both of the details you mentioned. The build failure didn't look like it was due to the modifications of this PR; I can try rebasing this into the current master if it doesn't work. |
3cd0167
to
bb05a4d
Compare
I just rebased the branch on top of 2.12.x, but the error still occurs and Jenkins does not seem to be giving any helpful error messages. However, looking at the pull request list it seems that I'm the only one having this problem... Can anyone help me figuring out what's wrong with this PR? |
|
||
@tailrec private def removeInternal(k: A, cur: ListSet[A], acc: List[ListSet[A]]): ListSet[A] = | ||
if (cur.isEmpty) | ||
acc.last |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines are pretty short. It's easier to follow the logic if the result is on the same line as the conditional. (You already went this way with the hasNext
thing above, which is more debatable.)
I have one major concern about performance. I wasn't able to spot why the test is failing, though. |
For the compilation failure, if I re-add Honestly, when i started looking at this PR, I was very surprised to learn that " |
I was also surprised to see that As you saw, iterating in the "correct" order requires the collection to be converted to a list and reversed, which has a performance hit. However, as @Ichoran mentioned, these collections are really only useful with a small number of elements, so I think it would be OK to do that. |
d12cdd4
to
a103b36
Compare
The benchmarking you did isn't really reliable enough to do anything with. You should at least use Thyme; JMH would be better. From this, it looks like you need to special-case 0-4. Having a 2x performance hit in the most commonly used cases doesn't justify the gain in the cases which the data structure isn't suited for anyway. But I'd do some more robust benchmarking first. |
Thanks for the framework suggestions! I benchmarked again If I did not anything wrong in the benchmarks, it seems the custom builder is significantly slower for every size equal to or below 6... Considering again that the collection should only be used for very small sizes due to its overall performance, I don't think it's worth it to keep the custom builder and write special cases for small sizes. If you take a look the test results, it seems that the custom |
My recommendation would be to remove |
11def59
to
ca7ba15
Compare
Yes, I agree it's better that way. I have just pushed the updated commit. |
OK, I agree the latest change looks good. A few small things to do:
in general, maybe do another pass over the two classes to bring the implementations closer together (rename |
Sure, I'll move t7445.scala code to a junit test and I'll improve the iterator as you suggested (adding two new tests for that). t3822 is a test that becomes ridiculously slow when the custom builder is taken off (though the Jenkins build seems to have made it past that test)... Do we still want to keep it? SI-3822 is a very old issue and it involves an use case that we concluded here that was inadequate for that collection. Regardless of that, all Regarding the Scaladoc and the general revamp I was also thinking of doing that, as it's way easier to fix bugs and add improvement to those collections if their code is as similar as it could be (after all, its internal organization and algorithms associated are exactly the same). However, I believe it would be better if I did that either in a separate pull request or in a separate commit; otherwise, the bugs and improvements present in this PR would become much more difficult to track in the commit history later, as they would be concealed by a lot of method renames and code moves. What do you think? Is there a problem if this PR has multiple commits? |
Right, I agree we can get rid of it.
No, this is common practice in the scala repo. I agree that having separate commits for refactorings, scaladoc and bugfixes makes a lot of sense.
Can you make sure to include that in the Scaladoc cleanup? All operations are |
ca7ba15
to
1c52d87
Compare
I have just added some new commits then. I organized the changes in:
|
f2041df
to
fca5845
Compare
* $factoryInfo | ||
* | ||
* Note that each element insertion takes O(n) time, which means that creating a list set with | ||
* n elements will take O(n^2^) time. This makes the builder suitable only for a small number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is actually the superscript notation in Scaladoc: https://wiki.scala-lang.org/display/SW/Syntax#Syntax-Wiki-likeSyntaxinDocumentationSource. I checked the generated documentation and it appears correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, in that case you could use the same up here: https://github.com/scala/scala/pull/5103/files/78f1fdd0646e6da5de3f86eaec265d9b5d71b1f4..36a46b1cc2734504c95881c20b2495891c7b9bd7#diff-9f3860217088c43b2414a59d6ce4b2acR20
other than these minor comments, looks good! |
Makes the immutable `ListMap` and `ListSet` collections more alike one another, both in their semantics and in their performance. In terms of semantics, makes the `ListSet` iterator return the elements in their insertion order, as `ListMap` already does. While, as mentioned in SI-8985, `ListMap` and `ListSet` doesn't seem to make any guarantees in terms of iteration order, I believe users expect `ListSet` and `ListMap` to behave in the same way, particularly when they are implemented in the exact same way. In terms of performance, `ListSet` has a custom builder that avoids creation in O(N^2) time. However, this significantly reduces its performance in the creation of small sets, as its requires the instantiation and usage of an auxilliary HashSet. As `ListMap` and `ListSet` are only suitable for small sizes do to their performance characteristics, the builder is removed, the default `SetBuilder` being used instead.
ListSet and ListMap are two collections which share the exact same internal structure. This commit makes the two approaches as similar as possible by renaming and reordering internal methods, improving their Scaladoc and their code style. The Scaladoc of the classes and companion objects is also improved in order to alert users of the time complexity of the collections' operations.
fca5845
to
4552de4
Compare
I have just pushed the modifications you suggested :) |
LGTM! thanks for keeping up with the pedantic reviewer :) |
/rebuild |
No problem at all, they were good suggestions :) |
protected def value: B = throw new NoSuchElementException("value of empty map") | ||
protected def next: ListMap[A, B] = throw new NoSuchElementException("next of empty map") | ||
|
||
override def stringPrefix = "ListMap" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw this changed ListMap's toString between 2.11 and 2.12
Makes the immutable
ListMap
andListSet
collections more alike one another, both in their semantics and in their performance.In terms of semantics, makes the
ListMap
iterator return the elements in reverse order, asListSet
already does (improving its performance as a side-effect). While, as mentioned in SI-8985,ListMap
andListSet
doesn't seem to make any guarantees in terms of iteration order, I believe users expectListSet
andListMap
to behave in the same way, particularly when they are implemented in the exact same way. Thenext
methodListSet
's iterator now throws an exception when called after the last element, a behavior consistent with all the other collections.In terms of performance,
ListMap
is given a custom builder that avoids creation in O(N^2) time using a strategy similar to the one already applied inListSet
.ListSet
's element removal method was not tail-recursive as theListMap
one, so a tail-recursive implementation was added.