We want to try employ SNA measures to Knowledge Graph to help novice users to explore the DBpedia more efficiently without the need to master query languages and graph structures.
Proposal: Social Knowledge Graph: Employing SNA measures to Knowledge Graph
When novice users use DBpedia for querying, the information they really want is always overwhelmed by numerous query results. In this project, we want to leverage the Knowledge Graph of DBpedia to develop a graph-query tool that can help the end user to obtain relevant information w.r.t his request/input/query. We can give the users a subgraph where the concept/entity that students query for is center and it is surrounded by its most important concepts (like the top-5 or top-10, in terms of the Social Network Analysis measures).
The above image shows what our developed system does after a user makes a query.
- Users input the center entity they want to query.
- Our system converts users input into the SPARQL.
- Query requests to the public endpoint
- Calculate the importance of all acquired data
- Get the top-10 important entities and plot the graph
- Users can click on the entities on the graph to continue the query...
This section provides a more detailed explanation of the above process.
- Dealing with users inputs
- Add underscores between space
- Add backslashes before non-numeric alphabetic characters
- Cleaning the returned data of the public endpoint
- Keep only meaningful entities and relationships and remove something like entities about time or relationships called wiki
We try to leverage 2 kinds of methods to calculate the importance of entities. One is ranked by degree and another is a new method proposed by ourselves.
- For the first method, the higher degree proves that the nodes are more scalable and more beneficial to help users expand the graph.
- As our proposed method, it consists of the following components:
- Clustering
- Let the similar relationships all come together
- Normalization
- Normalization in clusters
- Attenuation
- Different levels of decay based on normalized ranking
- Ranking
- Sort all nodes and output them
- Clustering
Our proposed method performs well in removing data bias and presenting as many dimensional attributes as possible.
The following two figures show the different results obtained by the two methods above when Maxwell is the central node.
{
"Einstein family": "rdf-schema#seeAlso",
"United States": "citizenship",
"German Empire": "birth place",
"Mileva Mari": "spouse",
"Fellow of the Royal Society": "award",
"Heinrich Friedrich Weber": "doctoral advisor",
"Physics": "fields",
"University of Oxford":"institution",
"Philosophy": "fields",
"General relativity": "famous",
"Alfred Kleiner": "doctoral advisor"
}
We can see the method we propose shows more dimensions while considering scalability(degree) and diversity(relationships).
Due to the some reasons of the second methods, the codes we release uses the degree as ranking evaluation, but you can try the beta version here with Colab.
Attention: It should be noted that even though we choose the seemingly simplest method of degree expansion, we have read a lot of literature and experimented with it. Several common expansion methods can be seen in this paper.
Reasons why not apply for the new method in the final version:
- we can't download the data and compared to "degree method", it will cost more time for calculation.
- More importantly, the relationships in the DBpedia are not always words.It requires to build a word dictionary manually to make the word similarity calculations successful.
e.g influencedBy -> influence by
Users can click on the nodes in graph to do the further explorations.
input
:You should enter the name of an entity as input,e.gJames Clerk Maxwell
orAlbert Einstein
hover
: You can hover over the edges to see the specific relationships.click
: You can click an entity to expand and it will return the top-10 entites around the clicked entity.
Tools and Frameworks used for developing this system:
- Flask framework (for backend operations and handling requests)
- Plotly and D3.js (for visualizations)
- Netwokx(for visualization of methodological exploration)
- VS code(for efficient development and continous integration)
git clone https://github.com/dbpedia/social-knowledge-graph.git
pip install -r requirements.txt
python app.py
The reason why we didn't put the code on the server was that the cloud service I had purchased required a vpn connection to access the public endpoint, and I kept getting errors when trying to download the data, which turned out to be the storage problems.
The ideal way is download the data to convert the network requests to local queries, and then store the results getting by the two methods in advance which could be efficient and stable for users.If so, we can actually use a more fetching way of expanding.