-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atlas Delta Generator Sharded #199
Conversation
…that include before and after entities. Also, the generator CLI is not working.
…arded # Conflicts: # src/main/java/org/openstreetmap/atlas/geography/atlas/delta/AtlasDeltaGenerator.java
* @author matthieun | ||
* @author hallahan | ||
*/ | ||
public class AtlasDeltaGenerator extends Command | ||
{ | ||
private static final int THREADS = 8; // Tweak this if desired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this configurable through a switch, and then just make this the default value. Why did you choose 8 by the way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, that's what my laptop has, and the amount of memory used is not too bad. This is so when you're on a beefy server with many cores, you don't run out of memory. I can make this as a switch, good idea.
*/ | ||
private Atlas load(final Path path) | ||
{ | ||
return Files.isDirectory(path) ? loadAtlasDirectory(path) : loadSingleAtlas(path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "this." before local function calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, will do.
Huzzah for command line tools! I'll take a look at this tomorrow! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useful addition @hallahan - thanks! Couple of minor nits along the way.
} | ||
|
||
// Execute in a pool of threads so we limit how many atlases get loaded in parallel. | ||
final ForkJoinPool customThreadPool = new ForkJoinPool(threads); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the Pool
class from Atlas?
} | ||
catch (final ExecutionException exec) | ||
{ | ||
logger.error("There was an execution exception on the shard diff workers.", exec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
customThreadPool
is never shutdown, recommend doing it in a finally block. Possibly a try-with-resources block.
{ | ||
logger.error("The shard diff workers were interrupted.", interrupt); | ||
} | ||
catch (final ExecutionException exec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: exec --> execution
final Atlas afterAtlas = load(after); | ||
compare(beforeAtlas, afterAtlas, outputDir); | ||
return new File(directory.toFile()).listFilesRecursively().stream() | ||
.filter(file -> "atlas".equals(FilenameUtils.getExtension(file.getName()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend reusing the AtlasResourceLoader
IS_ATLAS
predicate here, this will also allow for gzipped files to be compared. Reference.
If that doesn't work, let's rely on FileSuffix
instead of a raw string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, will try it out!
|
||
private Atlas loadAtlasDirectory(final Path path) | ||
{ | ||
return new AtlasResourceLoader().load(fetchAtlasFilesInDirectory(path)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It almost feels like this functionality should be part of AtlasResourceLoader
: given a path, load the file or files there. Seems very reusable and common logic. If others agree, it may be worthwhile to add it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fairly certain AtlasResourceLoader
already does this. Do:
new AtlasResourceLoader().load(new File(path.toAbsolutePath().toString()))
where File
is an org.openstreetmap
File
, not a java.io
File
. This will handle all the recursive listing for you automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, remember writing this haha! Looks like we can get rid of some duplicated code here.
Thank you for your suggestions @MikeGost . Ready for review or merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff, thanks @hallahan!
Description:
This is an extension on PR #195 .
Now the tool allows you to process diffs of directories of atlas shards, instead of single-file atlases. You can now delta before atlases that are sharded with non-sharded afters. You can also, more importantly, delta shard-by-shard before and after sharded atlases. Lastly, single atlas compares also still work. This delineation will be automatic based on if your before / after CLI args are files or directories.
Potential Impact:
This is non-impacting PR in that I am only improving the AtlasDeltaGenerator CLI, so even AtlasDelta should remain the same.
Unit Test Approach:
It's a command line tool, so my testing has been manual.
Test Results:
I have manually tested several countries' worth of data, comparing various atlases, both sharded and non-sharded.