Wikipedia Graph Database Insertion

2023-02-06 12:15 问答作者：

I am trying to create a database from dbpedia RDF triples. I have a table Categories which contains all the Categories in wikipedia. To store categorizations i have created a table with child and parent f开发者_运维百科ields, both foreign keys to Categories table. To load categories from NTriples iam using the following SQL Query

INSERT INTO CatToCat (`child`, `parent`)
values((SELECT id FROM Categories WHERE BINARY identifier='Bar'),
       (SELECT id FROM Categories WHERE BINARY identifier='Bar'));

But the insertion is very slow.. inserting 2.5Million relationships would take very long time.. is there better way to optimize the query, schema??

you could try a Graph Database like Neo4j, with RDF layers on top, there is for instance the Tinkerpop SAIL implementation, see https://github.com/tinkerpop/blueprints/wiki/Sail-Implementation

That should work a bit better than RDBMS, at least for Neo4j.

/peter

Consider loading SELECT id, indentifier from Categories into a hash table (or trie) on the client side, and using that to fill CatToCat. On a database the size of wikipedia, I'd expect to see a huge performance difference between constant time hash lookups and trie lookups (which are constant with respect to the number of different data items), and log n B-Tree lookups. (Of course, you need to have the memory available.)
Consider using a single PreparedStatement, with parameter binding so that MySQL doesn't have to re-parse and re-optimize the query for every insertion.

You'll have to benchmark these to figure out how much of an improvement they actually are.

I solved the problem. Was some indexing issues. Made identifier in Categories unique and binary. I guess that sped up the two selects.

继续阅读：rdf sql wikipedia

Wikipedia Graph Database Insertion

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？