Wikipedia Graph Database Insertion
I am trying to create a database from dbpedia RDF triples. I have a table Categories
which contains all the Categories in wikipedia. To store categorizations i have created a table with child
and parent
f开发者_运维百科ields, both foreign keys to Categories
table. To load categories from NTriples iam using the following SQL Query
INSERT INTO CatToCat (`child`, `parent`)
values((SELECT id FROM Categories WHERE BINARY identifier='Bar'),
(SELECT id FROM Categories WHERE BINARY identifier='Bar'));
But the insertion is very slow.. inserting 2.5Million relationships would take very long time.. is there better way to optimize the query, schema??
you could try a Graph Database like Neo4j, with RDF layers on top, there is for instance the Tinkerpop SAIL implementation, see https://github.com/tinkerpop/blueprints/wiki/Sail-Implementation
That should work a bit better than RDBMS, at least for Neo4j.
/peter
Consider loading
SELECT id, indentifier from Categories
into a hash table (or trie) on the client side, and using that to fill CatToCat. On a database the size of wikipedia, I'd expect to see a huge performance difference between constant time hash lookups and trie lookups (which are constant with respect to the number of different data items), andlog n
B-Tree lookups. (Of course, you need to have the memory available.)Consider using a single PreparedStatement, with parameter binding so that MySQL doesn't have to re-parse and re-optimize the query for every insertion.
You'll have to benchmark these to figure out how much of an improvement they actually are.
I solved the problem. Was some indexing issues. Made identifier in Categories unique and binary. I guess that sped up the two selects.
精彩评论