Description
Describe the bug
Graph learning edge set samples introduce new nodes not in the positive training set. Since the positive training set is what gets embedded, these other nodes are not found in the embedding vector. This can be discovered after significant runtime.
To Reproduce
Create KG-COVID edge set samples:
python run.py edges -n data/merged/merged-kg_nodes.tsv -e data/merged/merged-kg_edges.tsv
Run the embiggen pipeline and observe:
File "runLinkPrediction_ppi.py", line 260, in
main(args)
File "runLinkPrediction_ppi.py", line 254, in main
linkpred(pos_train_graph, pos_valid_graph, pos_test_graph, neg_train_graph, neg_valid_graph, neg_test_graph)
File "runLinkPrediction_ppi.py", line 178, in linkpred
lp.prepare_edge_and_node_labels()
File "/global/scratch/marcin/N2V/N2V/embiggen/link_prediction.py", line 146, in prepare_edge_and_node_labels
node2vector_map=self.map_node_vector)
File "/global/scratch/marcin/N2V/N2V/embiggen/link_prediction.py", line 412, in create_edge_embeddings
emb1 = node2vector_map[node1]
KeyError: 'BFO:0000067'
Expected behavior
All nodes in all edge set samples need to be present in the positive training set.
Activity