How can I provide public access to a subset of a database?
Background
My research group and I are developing a database to store our data and we are building an software tool that simplifies access to these data. The database will holds data that has been published and that we would like to make available, alongside data that has not been published and that belongs to other researchers.
Objective
We would like for our work to be easily reproducible, and to this extent, we need to allow the public to run SELECT
statements on the data. Three possible solutions include:
- for each publication, create a subset of the database that can be freely downloaded (possibly in a virtual machine so that the dependencies of the software tool are met)
- for each publication, create a many-to-many lookup table that links data records to publications, and then provide public
SELECT
permissions to access these records. We could easily r开发者_运维问答eplicate the database for public use Parameterization modules Automation of prior generation
However, I have been told that even allowing wildcard statements compromises security, which is why I consider option 1 more plausible. Option 1 would also enable us to archive the database as it was used with a particular publication.
update: to clarify, I want the users to be able to reproduce the entire computational workflow, which requires using SELECT
statements that can join data tables with auxillary data (like covariates, experimental details) in lookup tables.
Question
What is the best way to provide public access to a subset of the database?
You can distribute subsets of data as a SQLite database, that is, create a standalone datafile that people can download to their own computers. Many scholars, economists, etc use SQLite to share datasets because it is self-contained and installation is painless (and I should add, cross-platform).
Create views with appropriate access privileges, and users that can only access these views, but no underlying tables.
精彩评论