Skip to content

for graph learning edge set samples, all nodes in all edges sets need to be present in the positive training set #215

Open
@realmarcin

Description

Describe the bug

Graph learning edge set samples introduce new nodes not in the positive training set. Since the positive training set is what gets embedded, these other nodes are not found in the embedding vector. This can be discovered after significant runtime.

To Reproduce

Create KG-COVID edge set samples:

python run.py edges -n data/merged/merged-kg_nodes.tsv -e data/merged/merged-kg_edges.tsv

Run the embiggen pipeline and observe:

File "runLinkPrediction_ppi.py", line 260, in
main(args)
File "runLinkPrediction_ppi.py", line 254, in main
linkpred(pos_train_graph, pos_valid_graph, pos_test_graph, neg_train_graph, neg_valid_graph, neg_test_graph)
File "runLinkPrediction_ppi.py", line 178, in linkpred
lp.prepare_edge_and_node_labels()
File "/global/scratch/marcin/N2V/N2V/embiggen/link_prediction.py", line 146, in prepare_edge_and_node_labels
node2vector_map=self.map_node_vector)
File "/global/scratch/marcin/N2V/N2V/embiggen/link_prediction.py", line 412, in create_edge_embeddings
emb1 = node2vector_map[node1]
KeyError: 'BFO:0000067'

Expected behavior

All nodes in all edge set samples need to be present in the positive training set.

Version

145c7bb

Additional context

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      for graph learning edge set samples, all nodes in all edges sets need to be present in the positive training set · Issue #215 · Knowledge-Graph-Hub/kg-covid-19