Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance and behavior of ListMap and ListSet #5103

Merged
merged 3 commits into from
May 17, 2016

Conversation

ruippeixotog
Copy link
Contributor

Makes the immutable ListMap and ListSet collections more alike one another, both in their semantics and in their performance.

In terms of semantics, makes the ListMap iterator return the elements in reverse order, as ListSet already does (improving its performance as a side-effect). While, as mentioned in SI-8985, ListMap and ListSet doesn't seem to make any guarantees in terms of iteration order, I believe users expect ListSet and ListMap to behave in the same way, particularly when they are implemented in the exact same way. The next method ListSet's iterator now throws an exception when called after the last element, a behavior consistent with all the other collections.

In terms of performance, ListMap is given a custom builder that avoids creation in O(N^2) time using a strategy similar to the one already applied in ListSet. ListSet's element removal method was not tail-recursive as the ListMap one, so a tail-recursive implementation was added.

@scala-jenkins scala-jenkins added this to the 2.12.0-M5 milestone Apr 17, 2016
@ruippeixotog
Copy link
Contributor Author

A Gist containing some naive benchmarks (together with the source code for them) is available here: https://gist.github.com/ruippeixotog/20f961055a94345ad24c2fd58079d26c

@@ -0,0 +1,16 @@
package scala.collection.immutable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add copyright header.

Copy link
Contributor Author

@ruippeixotog ruippeixotog Apr 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most, if not all, JUnit tests lack a copyright header. I already wrote several tests for this repository before and the copyright never was a requirement... Are you sure it's mandatory? @Ichoran, can you tell what is the "official" policy in these cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming this was an assumption I had made. Please wait for the official policy.

@ruippeixotog ruippeixotog force-pushed the improve-list-map-set-perf branch from 7ad3ac7 to e5ddb47 Compare April 19, 2016 20:13
this
}

def clear() = { elems.clear() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also seen.clear()?

@lrytz
Copy link
Member

lrytz commented Apr 29, 2016

looks good otherwise. the build failure looks weird - let's just try again (amend your changes and git push -f).

@ruippeixotog ruippeixotog force-pushed the improve-list-map-set-perf branch from e5ddb47 to 3cd0167 Compare April 29, 2016 18:36
@ruippeixotog
Copy link
Contributor Author

@lrytz thanks for the review, I just fixed both of the details you mentioned. The build failure didn't look like it was due to the modifications of this PR; I can try rebasing this into the current master if it doesn't work.

@ruippeixotog ruippeixotog force-pushed the improve-list-map-set-perf branch from 3cd0167 to bb05a4d Compare April 30, 2016 11:27
@ruippeixotog
Copy link
Contributor Author

I just rebased the branch on top of 2.12.x, but the error still occurs and Jenkins does not seem to be giving any helpful error messages. However, looking at the pull request list it seems that I'm the only one having this problem... Can anyone help me figuring out what's wrong with this PR?


@tailrec private def removeInternal(k: A, cur: ListSet[A], acc: List[ListSet[A]]): ListSet[A] =
if (cur.isEmpty)
acc.last
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are pretty short. It's easier to follow the logic if the result is on the same line as the conditional. (You already went this way with the hasNext thing above, which is more debatable.)

@Ichoran
Copy link
Contributor

Ichoran commented Apr 30, 2016

I have one major concern about performance. I wasn't able to spot why the test is failing, though.

@lrytz
Copy link
Member

lrytz commented May 2, 2016

For the compilation failure, if I re-add .toList.reverseIterator to ListMap.iterator, compilation succeeds. This means that some part of the compiler (i guess the part checking forward references) depends on the current iteration order.

Honestly, when i started looking at this PR, I was very surprised to learn that "ListMap and ListSet doesn't seem to make any guarantees in terms of iteration (SI-8985)". I think this situation creates a lot of confusion, e.g., http://stackoverflow.com/questions/3481925/insertion-ordered-listset. Just like LinkedHashSet / LinkedHashMap for mutable collections, I think that ListSet / ListMap should guarantee that "all traversal methods of this class visit elements in the order they were inserted".

@ruippeixotog
Copy link
Contributor Author

ruippeixotog commented May 7, 2016

I was also surprised to see that ListMap and ListSet did not provide such guarantees - I always thought of them as an immutable alternative to LinkedHashMap and LinkedHashSet. If even the Scala compiler relies on their ordering, then I would rather make both ListMap and ListSet be insertion-ordered and document that properly in Scaladoc.

As you saw, iterating in the "correct" order requires the collection to be converted to a list and reversed, which has a performance hit. However, as @Ichoran mentioned, these collections are really only useful with a small number of elements, so I think it would be OK to do that.

@ruippeixotog ruippeixotog force-pushed the improve-list-map-set-perf branch 3 times, most recently from d12cdd4 to a103b36 Compare May 7, 2016 22:58
@Ichoran
Copy link
Contributor

Ichoran commented May 8, 2016

The benchmarking you did isn't really reliable enough to do anything with. You should at least use Thyme; JMH would be better.

From this, it looks like you need to special-case 0-4. Having a 2x performance hit in the most commonly used cases doesn't justify the gain in the cases which the data structure isn't suited for anyway.

But I'd do some more robust benchmarking first.

@ruippeixotog
Copy link
Contributor Author

Thanks for the framework suggestions! I benchmarked again ListMap using JMH. I also benchmarked ListSet. The full results are available here: ListMap, ListSet.

If I did not anything wrong in the benchmarks, it seems the custom builder is significantly slower for every size equal to or below 6... Considering again that the collection should only be used for very small sizes due to its overall performance, I don't think it's worth it to keep the custom builder and write special cases for small sizes.

If you take a look the test results, it seems that the custom ListSet builder is also significantly slower than the default one (a mutable.SetBuilder). It is expected, as the two collections are pretty much equal. Should I remove ListSetBuilder then?

@Ichoran
Copy link
Contributor

Ichoran commented May 9, 2016

My recommendation would be to remove ListSetBuilder if it's not an improvement.

@ruippeixotog ruippeixotog force-pushed the improve-list-map-set-perf branch 2 times, most recently from 11def59 to ca7ba15 Compare May 9, 2016 13:31
@ruippeixotog
Copy link
Contributor Author

Yes, I agree it's better that way. I have just pushed the updated commit.

@lrytz
Copy link
Member

lrytz commented May 11, 2016

OK, I agree the latest change looks good. A few small things to do:

  • can you specify the iteration order in the scaladoc, similar to LinkedHashMap/Set?
  • move the existing test case t3822.scala to the new junit test, so that they are all in the same place
  • also t7445 could be moved to junit. actually, ListSet probably has the same bug, it was only fixed for ListMap -- can you check? the fix for ListMap was in SI-7445 ListMap.tail is returning wrong result #3388.
  • method iterator first creates a List instance, then reverseIterator creates the reversed list instance. the following would only create one single list:
  def iterator: Iterator[(A,B)] = {
    def reverseList = {
      var self = this
      var res: List[(A, B)] = Nil
      while (!self.isEmpty) {
        res = (self.key, self.value) :: res
        self = self.next
      }
      res
    }
    reverseList.iterator
  }

in general, maybe do another pass over the two classes to bring the implementations closer together (rename ListMap.remove0 to removeInternal, replace ListSet.unchecked_outer by next, etc)

@ruippeixotog
Copy link
Contributor Author

Sure, I'll move t7445.scala code to a junit test and I'll improve the iterator as you suggested (adding two new tests for that). ListSet is indeed affected by SI-7445; I'll also fix that.

t3822 is a test that becomes ridiculously slow when the custom builder is taken off (though the Jenkins build seems to have made it past that test)... Do we still want to keep it? SI-3822 is a very old issue and it involves an use case that we concluded here that was inadequate for that collection. Regardless of that, all ListMap/ListSet methods are now tail recursive, so the bug is fixed anyway.

Regarding the Scaladoc and the general revamp I was also thinking of doing that, as it's way easier to fix bugs and add improvement to those collections if their code is as similar as it could be (after all, its internal organization and algorithms associated are exactly the same). However, I believe it would be better if I did that either in a separate pull request or in a separate commit; otherwise, the bugs and improvements present in this PR would become much more difficult to track in the commit history later, as they would be concealed by a lot of method renames and code moves. What do you think? Is there a problem if this PR has multiple commits?

@lrytz
Copy link
Member

lrytz commented May 13, 2016

t3822 is a test that becomes ridiculously slow

Right, I agree we can get rid of it.

Is there a problem if this PR has multiple commits?

No, this is common practice in the scala repo. I agree that having separate commits for refactorings, scaladoc and bugfixes makes a lot of sense.

involves an use case that we concluded here that was inadequate for that collection

Can you make sure to include that in the Scaladoc cleanup? All operations are O(n) (right?), including the creation of a ListSet/ListMap using companion apply.

@ruippeixotog ruippeixotog force-pushed the improve-list-map-set-perf branch from ca7ba15 to 1c52d87 Compare May 14, 2016 22:51
@ruippeixotog
Copy link
Contributor Author

I have just added some new commits then. I organized the changes in:

  • The original commit, improving the performance of the collections and changing ListSet to provide insertion-order iteration;
  • Adding tests for both collections regarding some core operations. I found out that I had some bugs in the changes I did in the first commit and this makes sure no regressions happen now nor in the future;
  • Making the two collections similar by renaming and reordering methods, improving their Scaladoc and their code style. Regarding this I removed the Scaladoc of most methods, as the text inherited by the superclasses is in general well written. On the other hand, I improved the class and object-level Scaladoc by making explicit the traversal order guarantee and the time complexity of the operations;
  • Adding a SerialVersionUID annotation that was missing in ListSet and including ListSet in the serialization stability test in order to ensure that future changes to this collection do not break the serialization format (or at least alert that a break occurred).

@ruippeixotog ruippeixotog force-pushed the improve-list-map-set-perf branch 2 times, most recently from f2041df to fca5845 Compare May 15, 2016 10:42
* $factoryInfo
*
* Note that each element insertion takes O(n) time, which means that creating a list set with
* n elements will take O(n^2^) time. This makes the builder suitable only for a small number of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra ^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is actually the superscript notation in Scaladoc: https://wiki.scala-lang.org/display/SW/Syntax#Syntax-Wiki-likeSyntaxinDocumentationSource. I checked the generated documentation and it appears correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrytz
Copy link
Member

lrytz commented May 17, 2016

other than these minor comments, looks good!

Makes the immutable `ListMap` and `ListSet` collections more alike one another, both in their semantics and in their performance.

In terms of semantics, makes the `ListSet` iterator return the elements in their insertion order, as `ListMap` already does. While, as mentioned in SI-8985, `ListMap` and `ListSet` doesn't seem to make any guarantees in terms of iteration order, I believe users expect `ListSet` and `ListMap` to behave in the same way, particularly when they are implemented in the exact same way.

In terms of performance, `ListSet` has a custom builder that avoids creation in O(N^2) time. However, this significantly reduces its performance in the creation of small sets, as its requires the instantiation and usage of an auxilliary HashSet. As `ListMap` and `ListSet` are only suitable for small sizes do to their performance characteristics, the builder is removed, the default `SetBuilder` being used instead.
ListSet and ListMap are two collections which share the exact same internal structure. This commit makes the two approaches as similar as possible by renaming and reordering internal methods, improving their Scaladoc and their code style. The Scaladoc of the classes and companion objects is also improved in order to alert users of the time complexity of the collections' operations.
@ruippeixotog ruippeixotog force-pushed the improve-list-map-set-perf branch from fca5845 to 4552de4 Compare May 17, 2016 10:15
@ruippeixotog
Copy link
Contributor Author

I have just pushed the modifications you suggested :)

@lrytz
Copy link
Member

lrytz commented May 17, 2016

LGTM! thanks for keeping up with the pedantic reviewer :)

@ruippeixotog
Copy link
Contributor Author

/rebuild

@ruippeixotog
Copy link
Contributor Author

ruippeixotog commented May 17, 2016

No problem at all, they were good suggestions :)

@lrytz lrytz merged commit bf35467 into scala:2.12.x May 17, 2016
@lrytz lrytz added the release-notes worth highlighting in next release notes label May 17, 2016
@adriaanm adriaanm added 2.12 and removed 2.12 labels Oct 29, 2016
protected def value: B = throw new NoSuchElementException("value of empty map")
protected def next: ListMap[A, B] = throw new NoSuchElementException("next of empty map")

override def stringPrefix = "ListMap"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw this changed ListMap's toString between 2.11 and 2.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes worth highlighting in next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants