Learning JBoss drools: what should be my model
I'm learning JBoss Drools and I'm playing with the genetics data from the hapmap project: ( http://hapmap.ncbi.nlm.nih.gov/genotypes/latest/forward/non-redundant/ ) . Each file in this directory is a table with the individuals at the top, the positions on the genome on the left , and the observed mutations for each individual/position.
Here I'd like to find some potential errors in the file (e.g. a children doesn't have any mutation from his parents) using Drools.
1) I want to load those data in Drools. This can be a large amount of data (e.g. genotypes_chr2_YRI_r27_nr.b36_fwd.txt.gz is 20Mo gzipped ) Will those data be stored in memory ? or does Drools stores it somewhere ? or should I use a persistence system ?
2) about the model:
I was thinking about putting the following classes in a StatefulKnowledgeSession:
class Individual
{
private String name;
//constructor, getters, setters etc...
}
class Position
{
private String name;
private String chromosome;
private int position;
//constructor, getters, setters etc...
}
class ObservedMutation
{
private String individualName;
private String positionName;
private String observed;
//constructor, getters, setters etc...
}
or should ObservedMutation be:
class ObservedMutation
{
private Individual individual;
private Positio开发者_运维百科n position;
private String observed;
//constructor, getters, setters etc...
}
thanks for you suggestions
Pierre
update: my firs test : http://plindenbaum.blogspot.com/2010/07/rules-engine-for-bioinformatics-playing.html
Yes, when you insert the large amount of data, Drools will store them in memory. 20 Mb is probably not a problem - just try it.
It should be straightforward to write rules for the model classes you propose - the rules in the hapmap.drl example in your first test look reasonable. The choice between your two ObservedMutation classes is as much a matter of taste as anything else, since they will result in different DRL rules syntax. I would start with the second version and see how you get on: perhaps the non-obvious thing if you have object properties (as in the second version of ObservedMutation) is that you might need to use this
to refer to a bound object, e.g. $p
in:
when
ObservedMutation($p : position)
Position(this == $p)
I think it should be the second one. I'd prefer objects over primitives like String.
精彩评论