-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialize the PackedAtlas fields using protocol buffer format #107
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice start! A few questions.
* the raw byte representation of the ProtoSerializable in protocol buffer format | ||
* @return The object represented by the byte stream. | ||
*/ | ||
ProtoSerializable deserialize(byte[] byteStream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about calling this byteArray
instead?
} | ||
|
||
final LongArray longArray = new LongArray(protoLongArray.getElementsCount()); | ||
for (int i = 0; i < protoLongArray.getElementsCount(); i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could a for (Xxx xxx : yyy)
work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the generated proto code does not implement Iterable. I could iterate over the LongArray, but then I would still need an index variable to grab the correct value from the ProtoLongArray.
if (!(serializable instanceof LongArray)) | ||
{ | ||
throw new CoreException( | ||
"Invalid ProtoSerializable type was provided to " + this.getClass().getName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you can use new CoreException ("blah blah {} and {}", one, two);
and it will become blah blah one and two
like when using a logger
.
@Override | ||
public ProtoAdapter getProtoAdapter() | ||
{ | ||
return this.protoLongArrayAdapter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about returning a new one every time? That would let you not have any transient variable. Less risky if you try to de-serialize a Java based Atlas and then re-save it... You would get a NPE here, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean code, nice tests! Thanks @lucaspcram!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job @lucaspcram. Few suggestions and questions
{ | ||
return false; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Extra Line
{ | ||
return false; | ||
} | ||
// We use reference equality for the dictionary, since PackedTagStores do not actually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate more on when do you think the references should be equal? Maybe an example in the comment might make it better to understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. A lot of the equals/hashCode implementations are still in flux.
final int initialPrime = 31; | ||
final int hashSeed = 37; | ||
|
||
int hash = initialPrime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of assigning initialPrime
to hash
how about
final int initialPrime = 31;
final int hashSeed = 37;
int hash = hashSeed * initialPrime + this.keys.hashCode();
hash = hashSeed * hash + this.values.hashCode();
return hash;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a lot more sense haha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another optimization :P
final int initialPrime = 31;
final int hashSeed = 37;
final int hash = hashSeed * initialPrime + this.keys.hashCode();
return hashSeed * hash + this.values.hashCode();
builder.setRelationNumber(atlasMetaData.getSize().getRelationNumber()); | ||
builder.setOriginal(atlasMetaData.isOriginal()); | ||
|
||
final Optional<String> codeVersionOptional = atlasMetaData.getCodeVersion(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of extracting in a local variable you can also use a lambda
atlasMetaData.getCodeVersion().ifPresent(value -> {
builder.setCodeVersion(value);
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good tip. I am new to using Optional. Thanks!
builder.addAllTags(ProtoAtlasMetaDataAdapter.PROTOTAG_LIST_CONVERTER.backwardConvert(tags)); | ||
|
||
final ProtoAtlasMetaData protoMetaData = builder.build(); | ||
return protoMetaData.toByteArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about
return builder.build().toByteArray();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that intermediate variable there just to make it very clear to future readers of the code what we are returning (since the builder is declared pretty far away from this return statement).
I'm actually going to build on your suggestion (no pun intended) by refactoring the name of 'builder' to be a little more descriptive, something like 'protoMetaDataBuilder'.
Best of both worlds that way!
} | ||
|
||
final ProtoIntegerStringDictionary protoDictionary = dictionaryBuilder.build(); | ||
return protoDictionary.toByteArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
return dictionaryBuilder.build().toByteArray();
This way you don't need to instantiate a local variable before returning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great tests and very clean code! Left some minor nits along the way.
public int hashCode() | ||
{ | ||
final int sizeHash = this.getSize().hashCode(); | ||
return Objects.hash(Integer.valueOf(sizeHash), Boolean.valueOf(this.original), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking - data version field was left out intentionally here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it should be there haha. Since I check against it in equals(), I should be using it in the hash computation. Good catch.
} | ||
|
||
/** | ||
* Builds an AtlasSize. By default it uses a small value for the size estimates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Consider adding a link to DEFAULT_VALUE
constant in the javadoc by using @link(#[your-variable])
notation. This way if it ever changes, it will get updated automatically.
@@ -54,6 +54,7 @@ | |||
* Atlas that packs data in large arrays | |||
* | |||
* @author matthieun | |||
* @author lcram |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Not sure this warrants an author update :P
{ | ||
// We automatically get the correct adapter for whatever type 'field' happens to be | ||
final ProtoAdapter adapter = field.getProtoAdapter(); | ||
// The adapter handles all the actual serialization using the protobuf classes. Easy! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice and clean!
@@ -48,7 +69,7 @@ protected PackedTagStore(final long maximumSize, final int memoryBlockSize, | |||
*/ | |||
public void add(final long index, final String key, final String value) | |||
{ | |||
if (index > this.index) | |||
if (index > size()) | |||
{ | |||
throw new CoreException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: not part of your changes, but consider updating the CoreException
to follow insertion format:
throw new CoreException("Cannot add. Invalid index {} is bigger than the size {}", index, size());
public class ProtoPackedTagStoreAdapter implements ProtoAdapter | ||
{ | ||
/* | ||
* PackedTagStore does not provide an interface for setting its arrays directly. This class's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice commenting!
keyArrayField.setAccessible(true); | ||
keyArrayField.set(store, keyArray); | ||
} | ||
catch (NoSuchFieldException | SecurityException | IllegalArgumentException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the exceptions get handled the same way - there is no difference what kind of Exception it is, we can't recover from it. Might be a good spot to just catch Exception
.
valueArrayField.setAccessible(true); | ||
valueArrayField.set(store, valueArray); | ||
} | ||
catch (NoSuchFieldException | SecurityException | IllegalArgumentException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here and in all below occurrences.
/** | ||
* {@link LargeArray} for type {@link Long} | ||
* | ||
* @author matthieun | ||
* @author lcram |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Not sure some of these changes warrant an author update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought a bit about this. Where should the line be? My internal thought was if I added at least one new method, or significantly changed a method, I added an author tag - if anything to document from whom the code in the class is coming from.
Looking at LongArray, I think this is probably a case where I was too overzealous with the tag. But I think for the classes where I started adding new interfaces, including equals() and hashCode() implementations, that tag is a good hint for future users of the class.
Another solution may be to simply add an author tag to the methods I add, instead of the class itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tricky area - everyone has different opinions on what's the right convention. This is only my opinion, so take that for what it's worth :) - I try to answer both of these questions with a yes before adding an author tag:
- Did I make a change that can have downstream/user impact and/or add significant functionality on top of the existing codebase?
- Do I feel comfortable explaining this class and its functionality and/or debugging/being responsible for any bugs that are a result of my changes?
In your case case, most of the author tags are spot on! I'd keep all of them as is, including the ones with equals/hashcode implementations. Not a fan of the fine-grained per-method author tags as that just brings more chaos to the code. I just nit-picked two classes that stood out to me as a bit overzealous (PackedAtlas and LongArray). Again, feel free to ignore :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I'll revert the per-method tags in a future commit. All this makes sense. Yeah I do agree that PackedAtlas and LongArray are overzealous haha.
@Override | ||
public boolean equals(final Object other) | ||
{ | ||
if (other instanceof LargeMap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be instanceof LongToLongMultiMap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... copy paste strikes again.
protected static final String FIELD_BOUNDS = "bounds"; | ||
protected static final String FIELD_LOGGER = "logger"; | ||
protected static final String FIELD_SERIAL_VERSION_UID = "serialVersionUID"; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preemptive nit: extra line
:)
catch (InstantiationException | IllegalAccessException | IllegalArgumentException | ||
| InvocationTargetException exception) | ||
{ | ||
throw new CoreException("", exception); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consolidate these exceptions and add a message.
There is a weird issue where |
private static final long serialVersionUID = -7582554057580336684L; | ||
private static final Logger logger = LoggerFactory.getLogger(PackedAtlas.class); | ||
|
||
// Keep track of the field names for reflection code in the Serializer. | ||
protected static final String FIELD_PREFIX = "FIELD_"; | ||
protected static final String FIELD_BOUNDS = "bounds"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would those be needed if they are not serialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already had them defined in EXCLUDED_FIELDS
in PackedAtlasSerializer
, but the values were hardcoded. I just added named constants so that there would be no hardcoded values in PackedAtlasSerializer
that depended on the names of variables in PackedAtlas
.
Here's a side-by-side:
Old PackedAtlasSerializer
private static final StringList EXCLUDED_FIELDS = new StringList("bounds", "serialVersionUID",
"logger", "$SWITCH_TABLE$", PackedAtlas.FIELD_SERIALIZER, "FIELD_");
New PackedAtlasSerializer
private static final StringList EXCLUDED_FIELDS = new StringList(PackedAtlas.FIELD_BOUNDS,
PackedAtlas.FIELD_SERIAL_VERSION_UID, PackedAtlas.FIELD_LOGGER, "$SWITCH_TABLE$",
PackedAtlas.FIELD_SERIALIZER, PackedAtlas.FIELD_PREFIX);
@@ -106,11 +126,16 @@ | |||
// Serializer. | |||
private transient PackedAtlasSerializer serializer; | |||
|
|||
// Serialization formats for saving/loading this PackedAtlas | |||
private transient AtlasSerializationFormat saveSerializationFormat = AtlasSerializationFormat.JAVA; | |||
private transient AtlasSerializationFormat loadSerializationFormat = AtlasSerializationFormat.JAVA; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't loadSerializationFormat always dependent on the file that it tries to load from? To my mind only the saveSerializationFormat
would be needed, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need a flag to be set at load
call time so that lazily deserialized Atlases know which format to select. However, you make a good point in a comment below. The loadSerializationFormat
does not need to be a field of PackedAtlas
. It can just be a flag in the PackedAtlasSerializer
.
|
||
final ProtoAdapter adapter = protoHandle.getProtoAdapter(); | ||
final ProtoSerializable deserializedMember = adapter | ||
.deserialize(resource.readBytesAndClose()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would return a byte[]
correct? In case the field is big, would it be possible to stream the resource, one byte buffer at a time using an InputStream instead? That would save memory. Only if it is easy though, as the tradeoff might not be worth it..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be misunderstanding your question. But looking at the code in Resource
, is this not already happening?
default byte[] readBytesAndClose()
{
final List<Byte> byteContents = new ArrayList<>();
int kyte;
try (InputStream input = new BufferedInputStream(read()))
{
while ((kyte = input.read()) >= 0)
{
byteContents.add((byte) (kyte & BYTE_MASK));
}
}
catch (final IOException e)
{
throw new CoreException("Unable to read the bytes from {}.", this, e);
}
final byte[] contents = new byte[byteContents.size()];
for (int index = 0; index < byteContents.size(); index++)
{
contents[index] = byteContents.get(index);
}
return contents;
}
It looks to me like it is already using a BufferedInputStream
to read the data from the underlying resource.
Were you instead asking to have the ProtoAdapter.deserialize()
method take an InputStream
instead of a byte[]
as input? That would be possible. I believe protobuf provides a parseFrom
overload that takes an InputStream
instead of a byte[]
.
for (final AtlasSerializationFormat candidateFormat : possibleFormats) | ||
{ | ||
logger.info("Trying load format {}", candidateFormat); | ||
atlas.setLoadSerializationFormat(candidateFormat); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep that load format local to the PackedAtlasSerializer
here as it is very dependent on the resource being read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! Good idea.
…gy, including demo on LongArray type.
…n proto size limits.
…ign with spotless.
This is awesome! What's left? When will this get merged? |
@hallahan Still working out a couple kinks surrounding some changes in |
…default format back to Java.
Merge Public Changes
EDIT: I think we are ready for merge.
Moving forward with implementation using builtin protocol buffer types and the consequent autoboxing this incurs. Note that these current proto implementations do not support the "very large" functionality of the PackedAtlas field types. This is probably OK for our use-case, as we are unlikely to have arrays with size greater than Integer.MAX_VALUE (~2 billion).
I added some new interfaces (basically a GoF adapter) in order to minimize changes to the PackedAtlasSerializer. Using this method, we can add new fields to PackedAtlas at-will without changing the serializer code - provided those fields obey the interface contracts I am laying out in this PR.
Using the interface is easy and intuitive! Here's how.
This PR will grow as more types are converted. I will update this list below as I go along.
Types that need adapters
AtlasMetaDataIntegerDictionaryLongArrayLongToLongMapLongArrayOfArraysPackedTagStoreLongToLongMultiMapPolyLineArrayPolygonArrayByteArrayOfArraysIntegerArrayOfArrays