Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support PBF as In and Output API Formats #3464

Merged
merged 17 commits into from
Jan 4, 2022
Merged

Support PBF as In and Output API Formats #3464

merged 17 commits into from
Jan 4, 2022

Conversation

kevinkreiser
Copy link
Member

@kevinkreiser kevinkreiser commented Dec 17, 2021

What is this doing?

The point of this PR is to allow users to interact with the service (and bindings/library) directly using pbf objects/bytes in addition to json strings as is currently done.

This is likely not the last PR to enable this feature and for now I would say this feature is in beta. Let me explain.

Valhalla has since (almost) its inception, a bunch of protobuf messages that we use to keep track of the state of a "request" through our various APIs. What i mean here is that when you call into our top level apis like route locate isochrones etc, we parse your json directly into a protobuffer object. we then pass that object around to different parts of the library adding to it as we fulfill the request. This leads to a natural extension of valhalla in which, instead of passing json to the service/bindings/library you can pass protobuf bytes or objects and get back protobuf bytes or objects.

Why would we want to do this?

There are a couple good reasons for this. As formats go protobuf has many benifits over json:

  • speed of serialization and deserialization
  • over the wire size
  • backward/forward compatibility
  • working with an object on both sides of the network which eventually leads to
  • gRPC

The major drawback is that protobuf is a binary format so if you have just the bytes they arent much good to you.

How are we doing it?

Output

First I'll start with the easy stuff. Returning protobuf was easy. I simply added a new request format pbf and then had a shortcut in the serializers that says if the format is pbf then serialize the bytes and return that. There is an important design choice that I made for the time being but we can revisit it in a future PR when this mode of operation becomes more mature. That specific thing is that I choose to work with a single message type for all interaction the valhalla::Api message. What this means is that if you make a route request you get back everything: options trip directions you even get back the info object which has timing information about the request. This is obviously not ideal from an efficiency standpoint if you are calling valhalla over the network so I fully expect that we will add options to trim down the response to just those fields we care about (similar to trace_attributes and the expansion endpionts).

TODO: i still need to pbf-ify error responses.. This is done. When you specify pbf output and we encounter an error we fill out error messages in the info section of the protobuf and return that to you.

Input

This was pure hell 😄

First we had to go through many iterations to get the protobuf to proto3 if we were ever going to consider gRPC. To be clear gRPC SUPPORT IS NOT YET IMPLEMENTED but if we wanted to keep it on the table i need to transform the protobuf into something more usable. Aside from "upgrading" to v3 protobuf I did a bunch of rearranging various fields in messages. There is still more to do but what was done already required tons of mechanical updates, the most painful of which was the pinpoint tests since they rely on "pickled" pbfs.

The most difficult part of all of this was to find a way to do all of the validation that we do of the json but to also do it with the protobuf when it is the mode of input. What i tried to do was to make them share the same code path so i could be more sure that they are being treated the same. I am certain I have missed some spots. Case in point:

TODO: i still need to have the costing_options defaults not overwrite already set costing options values This is done and I vastly simplified the costing options parsing. To me it looks like we could move all parsing into a single function inside dynamic cost if we added more to the BaseCostingOptionsConfig object, which i think is a good idea. what do you think @genadz ?

Other Notes

There were a couple more things I want to do:

  • update documentation: i will do this in this PR this is done
  • add act method to python bindings (separate PR): id really like for us to be able to let the python pbf generated objects be passed to the python wrapper and have that do serialize/deserialize to talk to the c++ library. to me this would be the ultimate way to interact with valhalla via python
  • send only requested PBF objects (separate PR): the Apiobject has lots of stuff the requestor may not want so before sending a response to the client, as mentioned above, it will be way more efficient that you only get the parts you want to get
  • support all APIs (separate PR): currently we only support apis whose responses are stored in pbf form before serialization. several APIs dont fall into this category so they are not yet supported.
  • refactor the options message (separate PR): the request options message was incepted before we intended to use it as a request format. Because of that it's become pretty crufty and we should take a hard look at it before taking this feature out of beta.

Copy link
Member

@nilsnolde nilsnolde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just skimmed for an hour, looks good. I'll have to play around a little though, it's hard to really follow the details in the review.. other than that, I'll gladly help with e.g. expansion, but might take me a little while, also ok to wait for follow-up PRs IMO. thanks for taking is so far!

src/worker.cc Show resolved Hide resolved
@nilsnolde
Copy link
Member

nilsnolde commented Dec 18, 2021

I'll gladly help with e.g. expansion,

actually it just needs some testing & amending in the request validation & parsing that you already did, right? activating pbf output seems to be a oneliner. yeah right, it's all done in geojson currently 🤦 so create new proto definitions for isochrone & expansion eventually I guess?

@nilsnolde
Copy link
Member

@kevinkreiser you already got a small python pbf input generator/output parser by any chance? would save me that one hour to get it right to be able to play around. if not, no worries, I can put one here soon. might be nice actually to have an example, at least for routing, in the docs too (eventually).

@kevinkreiser
Copy link
Member Author

you already got a small python pbf input generator/output parser by any chance

yep see the script here for an example: https://github.com/valhalla/valhalla/blob/master/test/pinpoints/batch_edit.py

I started updating the python bindings to support this directly but didnt want to further complicate the pr i thought id get it in the next pr

@@ -0,0 +1,100 @@
#include "gurka.h"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a new test that lets us do some basic api testing, specifically using the pbf api

Comment on lines +31 to +34
std::unordered_set<Options::Action> pbf_actions{
Options::route, Options::trace_route, Options::optimized_route,
Options::centroid, Options::trace_attributes, Options::status,
};
Copy link
Member Author

@kevinkreiser kevinkreiser Dec 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we test all the apis, but only these ones support pbf as output so we have extra tests for them

@@ -25,6 +25,49 @@
#include <rapidjson/document.h>
#include <unordered_map>

// macros aren't great but writing these out for every option is an abomination worse than this macro
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know macros are generally frowned upon but in this instance it just made so much sense. for every costing option we need to do something very formulaic. in this case we have costing options on which we need to call one of the set_ methods. the thing we need to set needs to change for every member of costing options but also the type needs to change that we need to pull out of json, we also need to check that the values are in range and if there was no json and there was no preset pbf object we need to use a default.

doing this for one costing option is already quite verbose, and in the end the macro doesnt solve verbosity of what is actually compiled but it makes the costing parsing a hell of a lot more readible and concise. in fact its so very straight forward that i might suggest that we put all costing options parsing into a single function in dynamic cost at some point. pedestrian and bike have a tiny bit of extra logic that needs to be employed for parsing depending on transport type but i think that we could overcome this by having them properly setting their base costing options.

also note there are two flavors of this macro, one for getting an option set and changing that the value is in range using the ranged_default_t and the other where you provide just the default and no validity check is done (usually boolean or string values)

pbf_costing_options->set_flow_mask(kDefaultFlowMask);
pbf_costing_options->set_top_speed(kMaxAssumedSpeed);
}
CostingOptions* co) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example, this diff is awesome, truck costing was 60 lines and is now just 15. all the information is easily visible within one page. all the costings were similarly simplified.

costing_options->set_##option_name( \
range(rapidjson::get<decltype(range.def)>(json, json_key, \
costing_options->has_##option_name##_case() \
? costing_options->option_name() \
Copy link
Member Author

@kevinkreiser kevinkreiser Dec 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will note that the macro is a little bit redundant in that it says if the json isnt there but the pbf is already set, reset it to the value thats already in there, a slight waste of cpu but its trivial imho. this makes it easy to maintain the precidence of:

json -> pbf -> default value

Comment on lines +44 to +45
repeated CodedDescription errors = 2; // errors that occured during request processing
repeated CodedDescription warnings = 3; // warnings that occured during request processing
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a spot here to return errors and warnings in pbf output mode

@@ -41,20 +41,17 @@ enum ShapeFormat {

enum Costing {
auto_ = 0;
// deprecated auto_shorter = 1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need deprecated spots anymore because costing options has been converted to a map rather than a repeated field

repeated Location exclude_locations = 15; // Avoids for any costing
repeated Location sources = 16; // Sources for /sources_to_targets
repeated Location targets = 17; // Targets for /sources_to_targets
map<int32, CostingOptions> costing_options = 13; // A map of Costing enum value to its options of costing options for any costing models
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the integer is the costing enum value

if (!ped_opts->has_transit_start_end_max_distance_case())
ped_opts->set_transit_start_end_max_distance(min_transit_walking_dis);
auto transit_start_end_max_distance = ped_opts->transit_start_end_max_distance();
auto& ped_opts = options.mutable_costing_options()->find(pedestrian)->second;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are lots of changes like this because costing options is no longer an array but a map. we can dereference the iterator that comes out of find because upstream we make sure the costing is always there in the map

@@ -348,184 +348,129 @@ void DynamicCost::set_use_living_streets(float use_living_streets) {
2.f * (1.f - use_living_streets) * (1.f - kMinLivingStreetFactor));
}

void ParseSharedCostOptions(const rapidjson::Value& value, CostingOptions* pbf_costing_options) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i moved the guts of this function into ParseBaseCostOptions and in the future i hope to move the guts of most costings' parsing into this function maybe even such that only this function exists

@@ -51,69 +51,103 @@ void actor_t::cleanup() {
pimpl->cleanup();
}

std::string actor_t::act(Api& api, const std::function<void()>* interrupt) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in pbf mode you dont need to specify the action separately, instead you just set it on the pbf options object and we can use that to tell which action you want to perform. i'll expose this method in the python bindings in a future pr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess returning the pbf bytes as string despite the passed api object being filled out is not a big deal (on any level), but I'd leave a small comment/todo maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead i added javadoc to the header file that was sorely m issing which spells out exactly everything that is going on in the fuction

std::string
actor_t::route(const std::string& request_str, const std::function<void()>* interrupt, Api* api) {
// set the interrupts
pimpl->set_interrupts(interrupt);
// if the caller doesn't want a copy we'll use this dummy
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

essentially there are two inputs to this function now, either the request string or the api object. previously the api object was just something that you could optionally get out but now it can be the primary input to this and the other functions like this. in that case we need to not overwrite (via swap) but rather mutate it accordinly instead. thats why when its not passed in by the user but the json request string is used we use the dummy instead.

@@ -51,6 +51,7 @@ const std::unordered_map<unsigned, valhalla::valhalla_exception_t> error_codes{
{100, {100, "Failed to parse json request", 400, HTTP_400, OSRM_INVALID_URL, "json_parse_failed"}},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of the annoying changes to make json parsing also care about a potentially already provided protobuf object are in this file. i mentioned them in the pr description so i wont go into detail here.

}

std::string jsonify_error(const valhalla_exception_t& exception, Api& request) {
std::string serialize_error(const valhalla_exception_t& exception, Api& request) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i renamed this beacuse now the output is sometimes protobuf bytes rather than always json

Comment on lines +1135 to +1144
// if its a pbf request we only want to clear the stuff we are going to eventually set
if (api.has_options() && request.empty()) {
api.clear_trip();
api.clear_directions();
api.clear_status();
api.clear_info();
} // otherwise blank it all just in case
else {
api.Clear();
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a little tricky but the comment describes it. i just wanted to point it out

options.set_action(action);
}

// if its a protobuf mime go with that
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we see you are telling us its protobuf then we abort looking at query params or trying to parse json at all and just assume it has a request body and that that body is protobuf bytes

@@ -571,11 +571,20 @@ findEdgeByNodes(valhalla::baldr::GraphReader& reader,
end_node_name);
}

std::string
do_action(const map& map, valhalla::Api& api, std::shared_ptr<valhalla::baldr::GraphReader> reader) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the way to test the pbf interface in gurka, basically an actor::act wrapper

json_str = actor.status(request_json, nullptr, &api);
break;
case valhalla::Options::transit_available:
json_str = actor.transit_available(request_json, nullptr, &api);
Copy link
Member Author

@kevinkreiser kevinkreiser Dec 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added all the actions so that anything is callable now. technically i should rename json_str to response_str but i didnt for some reason. if anyone feels strongly about it let me know

EXPECT_EQ(request.options().costing_options(valhalla::auto_).ignore_restrictions(), true);
EXPECT_EQ(request.options().costing_options(valhalla::auto_).use_ferry(), 0.1f);
EXPECT_EQ(request.options().costing_options(valhalla::auto_).use_tolls(), 0.77f);
const auto& co = request.options().costing_options().find(valhalla::auto_)->second;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of these kinds of changes as well. it was easier to get the costing options once and then test the individual fields

@@ -19,6 +19,28 @@
#define VALHALLA_SOURCE_DIR
#endif

// this is useful when you modify the Options proto and need to restore it
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per usual, the canned protobufs are very annoying when changing the protobuf so i write a bit of code to help me modify them when i changed to a map from a repeated field. it could be useful in the future.

@@ -135,6 +135,15 @@ template <typename V> inline const rapidjson::Value& get_child(const V& v, const
return *ptr;
}

template <typename V>
inline const rapidjson::Value& get_child(const V& v, const char* source, const rapidjson::Value& t) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use this in costing parsing, so that when the user doesnt provide costing options in their json we can have an empty dummy json object instead. this lets us simplify our logic in that we dont ahve to check if the object exists, we get back one regardless. instead of doing this we could have just added a bunch of ifs to the macros i made for costing options parsing

@kevinkreiser
Copy link
Member Author

kevinkreiser commented Jan 4, 2022

I used my standard 14k auto routes as a RAD test where I tested master before I made any proto changes against this branch and there were no diffs. I will run other RAD test sets to be sure no other types of routes have changes....

Ok I ran ALL of the test requests that we have checked in (15k) and they too had no diffs so in so far as those test our request options they are all still working as intended. w00t (whew)

Copy link
Member

@nilsnolde nilsnolde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

puuh, what a bunch of work!

apart from the 1-2 comments, I think it'd be good to add a future work item (if I don't get to it before) to showcase this with python/js or so, in the documentation ideally. like an advanced request for routing, e.g. with time set etc. esp with those 2 languages, having 0 IDE support for pbf objects, that'd lower the barrier to try it out and discover bugs.

whole thing would be a good candidate for a github project (the kabana one), after turning the todo's/future work into issues?

docs/api/index.md Outdated Show resolved Hide resolved
src/sif/autocost.cc Show resolved Hide resolved
src/worker.cc Outdated Show resolved Hide resolved
@@ -51,69 +51,103 @@ void actor_t::cleanup() {
pimpl->cleanup();
}

std::string actor_t::act(Api& api, const std::function<void()>* interrupt) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess returning the pbf bytes as string despite the passed api object being filled out is not a big deal (on any level), but I'd leave a small comment/todo maybe?

@kevinkreiser
Copy link
Member Author

@nilsnolde i think i addressed your 3 items, let me know if it looks ok!

@nilsnolde nilsnolde self-requested a review January 4, 2022 14:16
Copy link
Member

@nilsnolde nilsnolde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, lgtm

Copy link
Member

@dnesbitt61 dnesbitt61 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff. Been a long time coming! Thanks!

@kevinkreiser kevinkreiser merged commit ba379a3 into master Jan 4, 2022
@kevinkreiser kevinkreiser deleted the kk_pbf_inout branch January 5, 2022 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants