Triplestore for Large Datasets [closed]
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
开发者_Go百科Closed 8 years ago.
Improve this questionI want to ask about a good triplestore to use for large datasets, it should:
- Scale well (millions of triples)
- Have a Java interface
You should consider using the OpenLink Virtuoso store. It is available via an OpenSource license and scales to billions of triples. You can use it via the Sesame and Jena APIs.
See here for an overview of large scale triple stores. Virtuoso is definitely easier to set up than BigData. Beside that I have used the Sesame NativeStore, which doesn't scale too well.
4Store is also a good choice, although I haven't used it. One benefit of Virtuoso over 4Store is that you can easily mix standard relational models with RDF, since Virtuoso is under the hood a relational database.
4store: Scalable RDF storage
Quoting 4store Web ...
4store's main strengths are its performance, scalability and stability. It does not provide many features over and above RDF storage and SPARQL queries, but if your are looking for a scalable, secure, fast and efficient RDF store, then 4store should be on your shortlist.
Personally I have tested 4store with very large databases (up to 2 billion triples) with very good results. 4store is written in C, runs on Linux/Unix 64 bit platforms and the current version 1.1.1 has partially implemented SPARQL 1.1.
4store can be deployed on a cluster of commodity servers which may boost the performance of your queries and assertion throughput can get up to 100 KTriples/second. But even if you use it in a single server you will get quite a decent performance.
Here at the University of Southampton is our choice for very big datasets in research projects and also for our Webmaster team, see Data Stores for Southampton and ECS Open Data.
Here you have also a list of all the libraries that you can use to query and administrate 4store Client Libraries. Also, 4store's IRC channel has an active community of users that will help if you run into any issues.
If you are a Linux/Unix user 4store is definitely a good choice.
I would also recommend 4store, but in the spirit of full disclosure, I was the lead architect :)
If you want to take advantage of the standardisation of RDF stores then you should look to use a Java library that implements SPARQL, rather than using one that exposes a JAVA API natively.
Otherwise you could end up being stuck with whatever store you choose first, due to the effort of moving between them, which is typical SQL migration hell.
I am personally quite happy with GraphDB . Which runs quite well on medium hardware (256GB ram server) with 15 billion triples. Which is accesible both via the sesame and jena interfaces. (Although jena is beta'ish).
If you can afford it an Oracle 12c instance is not bad. And might fit in with an existing oracle infrastructure (back-ups etc...).
Virtuoso 7.1 scales very well and can deal with humongous data volumes for reasonable cost. Unfortunately its SPARQL standards compliance is spotty
@Steve - don't know how to comment so I guess I am going to answer 2 questions at once.
JDBC driver for SPARQL below:
http://code.google.com/p/jdbc4sparql/
supports SPARQL Protocol and SPARUL (over the SPARQL protocol as an update, not over the SPARUL protocol).
@myahya
4Store is highly recommended, so worth appraising as a candidate.
Virtuoso also has native JDBC drivers and supports large datasets (up to 12 billion triples)
www.openlinksw.com/wiki/main/Main/
Also, Oracle have something, but be prepared to pay big bucks:
http://www.oracle.com/technetwork/database/options/semantic-tech/index.html
In addition to 4Store, Virtuoso, and Owlim, Bigdata is also worth looking at.
精彩评论