Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize the PackedAtlas fields using protocol buffer format #107

Merged
merged 91 commits into from
May 24, 2018
Merged

Serialize the PackedAtlas fields using protocol buffer format #107

merged 91 commits into from
May 24, 2018

Conversation

lucaspcram
Copy link
Contributor

@lucaspcram lucaspcram commented Apr 5, 2018

EDIT: I think we are ready for merge.

Moving forward with implementation using builtin protocol buffer types and the consequent autoboxing this incurs. Note that these current proto implementations do not support the "very large" functionality of the PackedAtlas field types. This is probably OK for our use-case, as we are unlikely to have arrays with size greater than Integer.MAX_VALUE (~2 billion).

I added some new interfaces (basically a GoF adapter) in order to minimize changes to the PackedAtlasSerializer. Using this method, we can add new fields to PackedAtlas at-will without changing the serializer code - provided those fields obey the interface contracts I am laying out in this PR.

Using the interface is easy and intuitive! Here's how.

message ProtoSomePackedType {
    // define fields here
}
public class SomePackedType implements ProtoSerializable
{
    @Override
    public ProtoAdapter getProtoAdapter()
    {
        return new SomePackedTypeAdapter();
    }
    // rest of class goes here
    .
    .
    .
}

public class SomePackedTypeAdapter implements ProtoAdapter
{
    @Override
    public ProtoSerializable deserialize(byte[] byteArray)
    {
        // type specific code that uses the generated proto classes
    }

    @Override
    public byte[] serialize(ProtoSerializable serializable)
    {
        // type specific code that uses the generated proto classes
    }
}

This PR will grow as more types are converted. I will update this list below as I go along.

Types that need adapters

AtlasMetaData
IntegerDictionary
LongArray
LongToLongMap
LongArrayOfArrays
PackedTagStore
LongToLongMultiMap
PolyLineArray
PolygonArray
ByteArrayOfArrays
IntegerArrayOfArrays

@matthieun matthieun changed the title Serialize the PackedAtlas fields using protocol buffer format Do Not Merge: Serialize the PackedAtlas fields using protocol buffer format Apr 5, 2018
Copy link
Collaborator

@matthieun matthieun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice start! A few questions.

* the raw byte representation of the ProtoSerializable in protocol buffer format
* @return The object represented by the byte stream.
*/
ProtoSerializable deserialize(byte[] byteStream);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about calling this byteArray instead?

}

final LongArray longArray = new LongArray(protoLongArray.getElementsCount());
for (int i = 0; i < protoLongArray.getElementsCount(); i++)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could a for (Xxx xxx : yyy) work here?

Copy link
Contributor Author

@lucaspcram lucaspcram Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the generated proto code does not implement Iterable. I could iterate over the LongArray, but then I would still need an index variable to grab the correct value from the ProtoLongArray.

if (!(serializable instanceof LongArray))
{
throw new CoreException(
"Invalid ProtoSerializable type was provided to " + this.getClass().getName()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can use new CoreException ("blah blah {} and {}", one, two); and it will become blah blah one and two like when using a logger.

@Override
public ProtoAdapter getProtoAdapter()
{
return this.protoLongArrayAdapter;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about returning a new one every time? That would let you not have any transient variable. Less risky if you try to de-serialize a Java based Atlas and then re-save it... You would get a NPE here, no?

Copy link
Collaborator

@matthieun matthieun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean code, nice tests! Thanks @lucaspcram!

Copy link
Contributor

@snb3300 snb3300 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job @lucaspcram. Few suggestions and questions

{
return false;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Extra Line

{
return false;
}
// We use reference equality for the dictionary, since PackedTagStores do not actually
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate more on when do you think the references should be equal? Maybe an example in the comment might make it better to understand

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. A lot of the equals/hashCode implementations are still in flux.

final int initialPrime = 31;
final int hashSeed = 37;

int hash = initialPrime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of assigning initialPrime to hash how about

final int initialPrime = 31;
final int hashSeed = 37;

int hash = hashSeed * initialPrime + this.keys.hashCode();
hash = hashSeed * hash + this.values.hashCode();

return hash;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a lot more sense haha.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another optimization :P

final int initialPrime = 31;
final int hashSeed = 37;

final int hash = hashSeed * initialPrime + this.keys.hashCode();
return hashSeed * hash + this.values.hashCode();

builder.setRelationNumber(atlasMetaData.getSize().getRelationNumber());
builder.setOriginal(atlasMetaData.isOriginal());

final Optional<String> codeVersionOptional = atlasMetaData.getCodeVersion();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of extracting in a local variable you can also use a lambda

atlasMetaData.getCodeVersion().ifPresent(value -> {
  builder.setCodeVersion(value);
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good tip. I am new to using Optional. Thanks!

builder.addAllTags(ProtoAtlasMetaDataAdapter.PROTOTAG_LIST_CONVERTER.backwardConvert(tags));

final ProtoAtlasMetaData protoMetaData = builder.build();
return protoMetaData.toByteArray();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

return builder.build().toByteArray();

Copy link
Contributor Author

@lucaspcram lucaspcram Apr 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that intermediate variable there just to make it very clear to future readers of the code what we are returning (since the builder is declared pretty far away from this return statement).

I'm actually going to build on your suggestion (no pun intended) by refactoring the name of 'builder' to be a little more descriptive, something like 'protoMetaDataBuilder'.

Best of both worlds that way!

}

final ProtoIntegerStringDictionary protoDictionary = dictionaryBuilder.build();
return protoDictionary.toByteArray();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

return dictionaryBuilder.build().toByteArray();

This way you don't need to instantiate a local variable before returning.

Copy link
Contributor

@MikeGost MikeGost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great tests and very clean code! Left some minor nits along the way.

public int hashCode()
{
final int sizeHash = this.getSize().hashCode();
return Objects.hash(Integer.valueOf(sizeHash), Boolean.valueOf(this.original),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking - data version field was left out intentionally here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it should be there haha. Since I check against it in equals(), I should be using it in the hash computation. Good catch.

}

/**
* Builds an AtlasSize. By default it uses a small value for the size estimates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Consider adding a link to DEFAULT_VALUE constant in the javadoc by using @link(#[your-variable]) notation. This way if it ever changes, it will get updated automatically.

@@ -54,6 +54,7 @@
* Atlas that packs data in large arrays
*
* @author matthieun
* @author lcram
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Not sure this warrants an author update :P

{
// We automatically get the correct adapter for whatever type 'field' happens to be
final ProtoAdapter adapter = field.getProtoAdapter();
// The adapter handles all the actual serialization using the protobuf classes. Easy!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and clean!

@@ -48,7 +69,7 @@ protected PackedTagStore(final long maximumSize, final int memoryBlockSize,
*/
public void add(final long index, final String key, final String value)
{
if (index > this.index)
if (index > size())
{
throw new CoreException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: not part of your changes, but consider updating the CoreException to follow insertion format:
throw new CoreException("Cannot add. Invalid index {} is bigger than the size {}", index, size());

public class ProtoPackedTagStoreAdapter implements ProtoAdapter
{
/*
* PackedTagStore does not provide an interface for setting its arrays directly. This class's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice commenting!

keyArrayField.setAccessible(true);
keyArrayField.set(store, keyArray);
}
catch (NoSuchFieldException | SecurityException | IllegalArgumentException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the exceptions get handled the same way - there is no difference what kind of Exception it is, we can't recover from it. Might be a good spot to just catch Exception.

valueArrayField.setAccessible(true);
valueArrayField.set(store, valueArray);
}
catch (NoSuchFieldException | SecurityException | IllegalArgumentException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here and in all below occurrences.

/**
* {@link LargeArray} for type {@link Long}
*
* @author matthieun
* @author lcram
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Not sure some of these changes warrant an author update.

Copy link
Contributor Author

@lucaspcram lucaspcram Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought a bit about this. Where should the line be? My internal thought was if I added at least one new method, or significantly changed a method, I added an author tag - if anything to document from whom the code in the class is coming from.

Looking at LongArray, I think this is probably a case where I was too overzealous with the tag. But I think for the classes where I started adding new interfaces, including equals() and hashCode() implementations, that tag is a good hint for future users of the class.

Another solution may be to simply add an author tag to the methods I add, instead of the class itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tricky area - everyone has different opinions on what's the right convention. This is only my opinion, so take that for what it's worth :) - I try to answer both of these questions with a yes before adding an author tag:

  1. Did I make a change that can have downstream/user impact and/or add significant functionality on top of the existing codebase?
  2. Do I feel comfortable explaining this class and its functionality and/or debugging/being responsible for any bugs that are a result of my changes?

In your case case, most of the author tags are spot on! I'd keep all of them as is, including the ones with equals/hashcode implementations. Not a fan of the fine-grained per-method author tags as that just brings more chaos to the code. I just nit-picked two classes that stood out to me as a bit overzealous (PackedAtlas and LongArray). Again, feel free to ignore :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I'll revert the per-method tags in a future commit. All this makes sense. Yeah I do agree that PackedAtlas and LongArray are overzealous haha.

@Override
public boolean equals(final Object other)
{
if (other instanceof LargeMap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be instanceof LongToLongMultiMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... copy paste strikes again.

protected static final String FIELD_BOUNDS = "bounds";
protected static final String FIELD_LOGGER = "logger";
protected static final String FIELD_SERIAL_VERSION_UID = "serialVersionUID";

Copy link
Contributor Author

@lucaspcram lucaspcram Apr 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preemptive nit: extra line
:)

catch (InstantiationException | IllegalAccessException | IllegalArgumentException
| InvocationTargetException exception)
{
throw new CoreException("", exception);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidate these exceptions and add a message.

@lucaspcram
Copy link
Contributor Author

There is a weird issue where FullProtoSuiteTest.testEmptyTags() is passing locally, but failing on Travis. Definitely need to look into why this is.

private static final long serialVersionUID = -7582554057580336684L;
private static final Logger logger = LoggerFactory.getLogger(PackedAtlas.class);

// Keep track of the field names for reflection code in the Serializer.
protected static final String FIELD_PREFIX = "FIELD_";
protected static final String FIELD_BOUNDS = "bounds";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would those be needed if they are not serialized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already had them defined in EXCLUDED_FIELDS in PackedAtlasSerializer, but the values were hardcoded. I just added named constants so that there would be no hardcoded values in PackedAtlasSerializer that depended on the names of variables in PackedAtlas.

Here's a side-by-side:

Old PackedAtlasSerializer

private static final StringList EXCLUDED_FIELDS = new StringList("bounds", "serialVersionUID",
            "logger", "$SWITCH_TABLE$", PackedAtlas.FIELD_SERIALIZER, "FIELD_");

New PackedAtlasSerializer

private static final StringList EXCLUDED_FIELDS = new StringList(PackedAtlas.FIELD_BOUNDS,
            PackedAtlas.FIELD_SERIAL_VERSION_UID, PackedAtlas.FIELD_LOGGER, "$SWITCH_TABLE$",
            PackedAtlas.FIELD_SERIALIZER, PackedAtlas.FIELD_PREFIX);

@@ -106,11 +126,16 @@
// Serializer.
private transient PackedAtlasSerializer serializer;

// Serialization formats for saving/loading this PackedAtlas
private transient AtlasSerializationFormat saveSerializationFormat = AtlasSerializationFormat.JAVA;
private transient AtlasSerializationFormat loadSerializationFormat = AtlasSerializationFormat.JAVA;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't loadSerializationFormat always dependent on the file that it tries to load from? To my mind only the saveSerializationFormat would be needed, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a flag to be set at load call time so that lazily deserialized Atlases know which format to select. However, you make a good point in a comment below. The loadSerializationFormat does not need to be a field of PackedAtlas. It can just be a flag in the PackedAtlasSerializer.


final ProtoAdapter adapter = protoHandle.getProtoAdapter();
final ProtoSerializable deserializedMember = adapter
.deserialize(resource.readBytesAndClose());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would return a byte[] correct? In case the field is big, would it be possible to stream the resource, one byte buffer at a time using an InputStream instead? That would save memory. Only if it is easy though, as the tradeoff might not be worth it..

Copy link
Contributor Author

@lucaspcram lucaspcram May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding your question. But looking at the code in Resource, is this not already happening?

default byte[] readBytesAndClose()
{
        final List<Byte> byteContents = new ArrayList<>();
        int kyte;
        try (InputStream input = new BufferedInputStream(read()))
        {
            while ((kyte = input.read()) >= 0)
            {
                byteContents.add((byte) (kyte & BYTE_MASK));
            }
        }
        catch (final IOException e)
        {
            throw new CoreException("Unable to read the bytes from {}.", this, e);
        }
        final byte[] contents = new byte[byteContents.size()];
        for (int index = 0; index < byteContents.size(); index++)
        {
            contents[index] = byteContents.get(index);
        }
        return contents;
}

It looks to me like it is already using a BufferedInputStream to read the data from the underlying resource.

Were you instead asking to have the ProtoAdapter.deserialize() method take an InputStream instead of a byte[] as input? That would be possible. I believe protobuf provides a parseFrom overload that takes an InputStream instead of a byte[].

for (final AtlasSerializationFormat candidateFormat : possibleFormats)
{
logger.info("Trying load format {}", candidateFormat);
atlas.setLoadSerializationFormat(candidateFormat);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep that load format local to the PackedAtlasSerializer here as it is very dependent on the resource being read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Good idea.

@hallahan
Copy link
Contributor

This is awesome! What's left? When will this get merged?

@lucaspcram
Copy link
Contributor Author

lucaspcram commented May 12, 2018

@hallahan Still working out a couple kinks surrounding some changes in PackedAtlas. I'm hoping to get this merged very soon, and then potentially iterate on it if necessary. The protobuf scheme is going to start out as opt-in, but will eventually be the default moving forward.

@matthieun matthieun changed the title Do Not Merge: Serialize the PackedAtlas fields using protocol buffer format Serialize the PackedAtlas fields using protocol buffer format May 24, 2018
@matthieun matthieun merged commit c837551 into osmlab:dev May 24, 2018
@lucaspcram lucaspcram deleted the packed branch May 24, 2018 23:24
gitclonefun pushed a commit to gitclonefun/atlas that referenced this pull request Feb 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants