Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ShardBucketCollection #184

Merged
merged 2 commits into from
Aug 6, 2018
Merged

Conversation

jklamer
Copy link
Contributor

@jklamer jklamer commented Aug 1, 2018

Description:

This is a new abstract collection implementation that allows for easy sorting of located items, like AtlasEntities or CheckFlags, into different buckets as defined by shards. This collection acts as wrapper for the underlying bucket collections. The bucket collection type is determined by the implementer. There are a number of control functions that need to be implemented to allow for simple and flexible use.

Rectangle and Counter changes were made to enable the testing and creation of this class.

Details:
The collection lazily constructs collections per bucket. It uses synchronized blocks to ensure only one collection is initialized per shard and no information is lost. This is to allow for multithreaded insertion/sorting, but it does not guarantee the wrapped collection is thread safe.
The bucket collections are stored in an array and the initializing shards are stored in an immutable R-tree with the index of its corresponding collection in the array. In the figure below I show the SlippyTile shards I used to initialize a number of test collections, each with the corresponding collection array index.

shardbucketcollectiontest 002

All collection functions interface with the collection index and/or collection array.

The collection does not support safe iterating while inserting.

Potential Impact:

This will change no current implementations but it will enable a number of different located sorting tasks by enabling quick and simple implementation.

Unit Test Approach:

I created the most basic test collection possible and tested all the main collection functionalities on that as well as some of the unique functionalities. I then created more test collections that take advantage of the flexibility of the control functions and tested that they worked as intended.

There are multiple tests checking for the same thing in different ways to gain code coverage.

Test Results:

As expected.


In doubt: Contributing Guidelines

if (Objects.nonNull(collection))
{
final long addCount = collection.stream().filter(this::add).count();
return addCount > 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very slick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

{
/**
* Collection that will handle splitting multipolygons into buckets by overrideing the add
* function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: overriding

Copy link
Contributor

@MikeGost MikeGost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! Minor nits in the test classes.

caught = true;
}

Assert.assertTrue("Resolve shard default method is an unsupported operation", caught);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out @Test(expected = UnsupportedOperationException.class) annotation to allow for easily verifying expected exceptions. Might simplify this code a little.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this. I use this patten in ProtoIntegerStringDictionaryAdapterTest. Check that for an example in action.

final PolyLine[] allArray = polylineBuckets2.toArray(array1);
Assert.assertArrayEquals(array2, allArray);

// get code coverage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does get code coverage mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to use all the methods to get good coverage score for sonar.


/**
* Construct the collection by allocation space for a collection for each of the shards and
* assigning the index back.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: allocating

* @param shard
* shard associated with the collection
* @return true if the collection has been added to successful and changed as a result, false
* otherwise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: successfully

public boolean contains(final Object object)
{
final Optional<LocatedType> typedItem = this.castToLocatedType(object);
if (typedItem.isPresent())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this pattern a lot actually. I will be stealing it ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

credit due to @matthieun

@lucaspcram
Copy link
Contributor

lucaspcram commented Aug 2, 2018

Looks good @jklamer . Left a few minor documentation nits.

So help me to understand this a bit better. Is this sort of like a way to partition a collection of Locateds based on some sharding?

Based on the logic in the add method, it seems to me like you give it a Located and it figures out (using an RTree) which collection (which is tied to a shard, hence the RTree and the ShardToCollectionIndex class) that Located belongs in.

And then from there, it looks like you can ask for all collections that overlap some arbitrary bounds (based on the shard boundary with which that collection is associated), which it can compute efficiently due to having a spatial index.

Am I getting it?

What types of problems does this facilitate solving? (I am sure there are many, since you clearly put some serious effort into creating this.)

@jklamer
Copy link
Contributor Author

jklamer commented Aug 2, 2018

@lucaspcram Yes exactly you are getting it.
Problems this will help facilitate solving: sorting and helping limiting the scope of other locality related tasks.

Copy link
Collaborator

@matthieun matthieun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about the Sharding v.s. RTree. If you think this is unnecessary, I will approve!

* Type of the Collection of LocatedType associated with shards
* @author jklamer
*/
public abstract class ShardBucketCollection<LocatedType extends Located & Serializable, CollectionType extends Collection<LocatedType> & Serializable>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most complex declaration I have seen ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks much scarier than it is

this(maximumBounds, SlippyTile.allTiles(zoomLevel, maximumBounds));
}

public ShardBucketCollection(final Rectangle maximumBounds, final Sharding sharding)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharding implementations are already behaving like a spatial index. I believe here you could gain some performance and simplification by using the sharding itself to query the intersecting bounds instead of a RTree, as the scope here is much smaller that what RTrees can handle...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if I query the sharing directly I would also need a map of of shard -> index, so that would require two look-ups instead of just the one now, even if the second look up is constant time it is more steps. I would also need to store both sharding and the map of shard ->index, instead of just the RTree.

In terms of what RTree's can handle, this allows for initializations for slippy tiles up to the SlippyTile max zoom, which is 30. At zoom 19 there are 274.9 billion tiles. So even if the max bounds isn't the whole world the Rtree needs the ability to hold a lot even if my tests didn't work it that hard.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, if it works for you now, we can merge it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄

this(maximumBounds, SlippyTile.allTiles(zoomLevel, maximumBounds));
}

public ShardBucketCollection(final Rectangle maximumBounds, final Sharding sharding)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, if it works for you now, we can merge it!

@matthieun matthieun merged commit 7f1e7a6 into osmlab:dev Aug 6, 2018
@matthieun matthieun added this to the 5.1.6 milestone Aug 6, 2018
@jklamer jklamer deleted the shardBucketCollection branch May 29, 2019 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants