Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atlas Delta Generator Sharded #199

Merged
merged 22 commits into from
Aug 17, 2018
Merged

Atlas Delta Generator Sharded #199

merged 22 commits into from
Aug 17, 2018

Conversation

hallahan
Copy link
Contributor

Description:

This is an extension on PR #195 .

Now the tool allows you to process diffs of directories of atlas shards, instead of single-file atlases. You can now delta before atlases that are sharded with non-sharded afters. You can also, more importantly, delta shard-by-shard before and after sharded atlases. Lastly, single atlas compares also still work. This delineation will be automatic based on if your before / after CLI args are files or directories.

Potential Impact:

This is non-impacting PR in that I am only improving the AtlasDeltaGenerator CLI, so even AtlasDelta should remain the same.

Unit Test Approach:

It's a command line tool, so my testing has been manual.

Test Results:

I have manually tested several countries' worth of data, comparing various atlases, both sharded and non-sharded.

@hallahan hallahan changed the title Delta sharded Atlas Delta Generator Sharded Aug 16, 2018
* @author matthieun
* @author hallahan
*/
public class AtlasDeltaGenerator extends Command
{
private static final int THREADS = 8; // Tweak this if desired.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this configurable through a switch, and then just make this the default value. Why did you choose 8 by the way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, that's what my laptop has, and the amount of memory used is not too bad. This is so when you're on a beefy server with many cores, you don't run out of memory. I can make this as a switch, good idea.

*/
private Atlas load(final Path path)
{
return Files.isDirectory(path) ? loadAtlasDirectory(path) : loadSingleAtlas(path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "this." before local function calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, will do.

@lucaspcram
Copy link
Contributor

Huzzah for command line tools! I'll take a look at this tomorrow!

Copy link
Contributor

@MikeGost MikeGost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful addition @hallahan - thanks! Couple of minor nits along the way.

}

// Execute in a pool of threads so we limit how many atlases get loaded in parallel.
final ForkJoinPool customThreadPool = new ForkJoinPool(threads);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the Pool class from Atlas?

}
catch (final ExecutionException exec)
{
logger.error("There was an execution exception on the shard diff workers.", exec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

customThreadPool is never shutdown, recommend doing it in a finally block. Possibly a try-with-resources block.

{
logger.error("The shard diff workers were interrupted.", interrupt);
}
catch (final ExecutionException exec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: exec --> execution

final Atlas afterAtlas = load(after);
compare(beforeAtlas, afterAtlas, outputDir);
return new File(directory.toFile()).listFilesRecursively().stream()
.filter(file -> "atlas".equals(FilenameUtils.getExtension(file.getName())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend reusing the AtlasResourceLoader IS_ATLAS predicate here, this will also allow for gzipped files to be compared. Reference.

If that doesn't work, let's rely on FileSuffix instead of a raw string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, will try it out!


private Atlas loadAtlasDirectory(final Path path)
{
return new AtlasResourceLoader().load(fetchAtlasFilesInDirectory(path));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It almost feels like this functionality should be part of AtlasResourceLoader : given a path, load the file or files there. Seems very reusable and common logic. If others agree, it may be worthwhile to add it there.

Copy link
Contributor

@lucaspcram lucaspcram Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fairly certain AtlasResourceLoader already does this. Do:

new AtlasResourceLoader().load(new File(path.toAbsolutePath().toString()))

where File is an org.openstreetmap File, not a java.io File. This will handle all the recursive listing for you automatically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, remember writing this haha! Looks like we can get rid of some duplicated code here.

lucaspcram
lucaspcram previously approved these changes Aug 17, 2018
@hallahan
Copy link
Contributor Author

Thank you for your suggestions @MikeGost . Ready for review or merge.

Copy link
Contributor

@MikeGost MikeGost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff, thanks @hallahan!

@MikeGost MikeGost merged commit f525c07 into osmlab:dev Aug 17, 2018
@matthieun matthieun added this to the 5.1.9 milestone Aug 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants