Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix stitching errors in gtfs ingestion #3943

Merged
merged 2 commits into from
Feb 2, 2023
Merged

fix stitching errors in gtfs ingestion #3943

merged 2 commits into from
Feb 2, 2023

Conversation

kevinkreiser
Copy link
Member

@kevinkreiser kevinkreiser commented Feb 2, 2023

After merging #3700 i wrote a huge pr description with the work left to do. i will convert that into an issue and we can burn down the tasks but today i bring you the first task already burned down. There were errors about not being able to find the origin/destination stops of some of the stop pairs (the connections between stations where you actually ride on public transit). this makes other things fail. thankfully the problem wasnt all that complex. i'll describe the fix in the diff, comments should as well

ok so the only sad thing about fixing the stitching error was that the error wasnt stitching at all. this means that in a separate pr we need to offset the graph a bit more so that a trip crosses tiles (forcing stitchign to have to happen). i can add this to the list of things we need to do

@@ -234,7 +234,7 @@ std::priority_queue<tileTransitInfo> select_transit_tiles(const boost::property_
void setup_stops(Transit& tile,
const gtfs::Stop& tile_stop,
GraphId& node_id,
std::unordered_map<feedObject, GraphId>& stop_graphIds,
std::unordered_map<feedObject, GraphId>& platform_node_ids,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we write the stops (nodes) into a transit pbf tile we keep track of the gtfs ids assigned to those node which are platforms, thats what this map does. we need that because later we will ingest the part of the feed that references those platforms to connected them with routes/trips at some sort of schedule.

@@ -257,18 +257,29 @@ void setup_stops(Transit& tile,
auto onestop_id = isGenerated ? tile_stop.stop_id + "_" + to_string(node_type) : tile_stop.stop_id;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to simplify our model of transit we require that every station (even if it doesnt in real life) has:

  1. at least one in/egress
  2. exactl one station
  3. at least one platform

this makes handling connecting the data somewhat easiser and it models the highest level of complexity we want to support (even though bus stops dont need this in almost all circumstances). anyway... on the line above though you'll notice that what we do when the gtfs feed doesnt come with one of these things (either in/egress or platform, as it always has a station) we fake one so that our simplified model is fulfilled. when we fake one we have to give it a unique id so we take its parent stations id and just add the type onto it (assuming real ones one use the type as a name).

// other gtfs entities will refer to it. however when it is generated we must use its parents id
// and not the generated one because the references in the feed have no idea that we are doing
// the generation of new ideas for non-existant platforms
platform_node_ids[{tile_stop.stop_id, feed_path}] = node_id;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem with giving these generated platforms a new id is that that id is not known to any other entities in the gtfs feed. any schedule information (trips) that connection stations together will connect them by the stations id since from the gtfs feeds point of view there is no platform, valhalla is making that up. so later, when we want to look up with graphid we gave to a platform that we made up by its id, ie when we are iterating over the gtfs feeds trips we need to remember those graphids by their parent station id not their new made up generated id, because of course the feeds raw data doesnt know about that, its a valhalla convention

}
node_id++;
}

/**
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from here down there are pretty much no real important changes, its all me re-naming the generic "stop_graphids" to the specific "platform_node_ids". i think this variable name makes it much more clear what information is going in there even though for the generated ones the gtfs ids will be the parent station (but it is the one to use to get that platforms graphid)

bool dest_is_generated = tile_info.tile_station_children.find({dest_stopId, currFeedPath}) !=
tile_info.tile_station_children.end();
auto dest_onestop_id =
dest_is_generated ? dest_stopId
: dest_stopId + "_" + to_string(NodeType::kMultiUseTransitPlatform);
stop_pair->set_destination_onestop_id(dest_onestop_id);
stop_pair->set_origin_graphid(kInvalidGraphId);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the first time i started debugging this problem i thought i could help myself by initializing these to invalid but what i forgot was that this is protobuf 2 so later on it easily just calls has_origin_graphid rather than checking for our sentinel value. same for destination below

.stop_lat = station_one_ll->second.second, .stop_lon = station_one_ll->second.first,
.parent_station = stopOneID, .location_type = gtfs::StopLocationType::EntranceExit,
.stop_timezone = "America/Toronto", .wheelchair_boarding = "1",
.stop_id = stopOneID + "_rotating_door_eh", .stop_name = gtfs::Text("POINT NEMO"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decided to use a custom id name rather than what valhalla would generate to increase hardness to input from real gtfs

.stop_lat = station_one_ll->second.second, .stop_lon = station_one_ll->second.first,
.parent_station = stopOneID, .location_type = gtfs::StopLocationType::StopOrPlatform,
.stop_timezone = "America/Toronto", .wheelchair_boarding = "1",
.stop_id = stopOneID + "_ledge_to_the_train_bucko", .stop_name = gtfs::Text("POINT NEMO"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, just allowing it to have a non standard id

@@ -152,6 +152,15 @@ TEST(GtfsExample, WriteGtfs) {
.stop_timezone = "America/Toronto", .wheelchair_boarding = "1",
};
feed.add_stop(thirdStop);
struct gtfs::Stop thirdStopPlatform {
.stop_id = stopThreeID + "_walk_the_plank_pal", .stop_name = gtfs::Text("THIRD STOP"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sure to test the case that we can find the graph id for platforms who have actual ids rather than being generated (like the second stations platform above)

Comment on lines +194 to +196
stopOneID + "_ledge_to_the_train_bucko",
stopTwoID,
stopThreeID + "_walk_the_plank_pal",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have to update the trips information to reference the proper stops, you'll see that we reference the two non generated platforms by name but the middle one is generated and will use the parent station id to look up its platform when connecting during pbf generation

@@ -257,7 +269,7 @@ TEST(GtfsExample, WriteGtfs) {
EXPECT_EQ(trips[0].trip_id, tripOneID);

const auto& stops = feed_reader.get_stops();
EXPECT_EQ(stops.size(), 5);
EXPECT_EQ(stops.size(), 6);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a platform to the 3rd station above

Comment on lines +343 to +344
EXPECT_TRUE(transit.stop_pairs(i).has_origin_graphid());
EXPECT_TRUE(transit.stop_pairs(i).has_destination_graphid());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to check for bad values, just check that they are actually set

Comment on lines +366 to +370
std::string stop_pair_ids[3] = {
stopOneID + "_ledge_to_the_train_bucko",
stopTwoID + "_" + to_string(NodeType::kMultiUseTransitPlatform),
stopThreeID + "_walk_the_plank_pal",
};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have to look for the correct names now that they are different

Comment on lines -360 to +376
EXPECT_EQ(*stops.find(stopID), stopID);
EXPECT_TRUE(stops.find(stopID) != stops.end());
}
for (const auto& stop_pair_id : stop_pair_ids) {
EXPECT_EQ(*stops.find(stop_pair_id), stop_pair_id);
EXPECT_TRUE(stops.find(stop_pair_id) != stops.end());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was fairly non-sensical

@@ -385,7 +398,7 @@ TEST(GtfsExample, MakeTile) {

GraphReader reader(pt.get_child("mjolnir"));
auto graphids = reader.GetTileSet();
int totalNodes = 0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad name made it hard to understand

Comment on lines -454 to +475
EXPECT_EQ(totalNodes, 9);
EXPECT_EQ(transit_nodes, 9);
EXPECT_EQ(uses[Use::kRoad], 6);
// TODO: transit connect edges should only exist between egress and street nodes
// EXPECT_EQ(uses[Use::kTransitConnection], 6);
// NOTE: there are 4 for every station (3 of those) because we connect to both ends of the closest
// edge to the station and the connections are bidirectional (as per usual)
EXPECT_EQ(uses[Use::kTransitConnection], 12);
EXPECT_EQ(uses[Use::kPlatformConnection], 6);
EXPECT_EQ(uses[Use::kEgressConnection], 6);
// TODO: it looks like convert transit skips putting any actual rail edges into level 3
// EXPECT_EQ(uses[Use::kRail], 4);
// TODO: this is the only time in the graph that we dont have opposing directed edges (should fix)
EXPECT_EQ(uses[Use::kRail], 2);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of these pass now! that means good things!

@kevinkreiser kevinkreiser merged commit 36bd2ea into master Feb 2, 2023
@kevinkreiser kevinkreiser deleted the kk_stitch_gtfs branch February 2, 2023 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants