Does this scenario equate to any well known computer science problems?
I'm working on a problem in the field of survey data integration, but it's easier to describe with an artificial analogy. Since the setup is a bit lengthy I'll pose my questions at the outset:
Does this scenario equate to any computer science problems whose solutions I could simply borrow and adapt?
If not, what approaches might you suggest? I'm not asking anyone to solve the problem; rather, just hoping to be pointed in one or more promising directions.
Imagine that ecosystems have creatures, which consist of one or more molecules, which consist of one or more atoms. To be viable creature, its molecules must collectively use exactly one of each type of atom. In the example below, note that every creature uses all four atoms just once. Also worth noting is that the ordering of molecules within creatures and of atoms within molecules is irrelevant.
Atoms in the universe: a b c d.
Ecosystem X
creature x1
molecule 1: a b
molecule 2: c
molecule 3: d
creature x2
molecule 4: a b c
molecule 5: d
creature x3
molecule 6: a b c d
Ecosystem Y
creature y4
molecule 7: a b
molecule 8: c
molecule 9: d
creature y5
molecule 10: a b
molecule 11: c d开发者_StackOverflow中文版
Given two ecosystems, my task is to produce "pairings". A pairing consists of a set of molecules (or molecule combinations) from one ecosystem that map to equivalent molecules (or molecule combinations) from the other ecosystem. Equivalence is determined by the underlying atoms. Like creatures, pairings are not viable unless each of the two sets of molecules (one from each ecosystem) uses all of the atoms exactly once. Here are some (not all) of the pairings from the example above:
# A direct mapping from the molecules of creature x1 to those of y4.
m1 = m7
m2 = m8
m3 = m9
# As above, but substitute m10 for m7.
m1 = m10
m2 = m8
m3 = m9
# We can combine molecules.
m4 = m7 + m8
m5 = m9
# Another combination.
m1 = m10
m2 + m3 = m11
In the real problem domain, there could anywhere from 2 to 100 atoms in play (with corresponding variety in molecule sizes) and a couple dozen creatures per ecosystem. Also, it's possible for two ecosystems to produce no viable pairings. In that case, my Python application will eventually need to suggest approximate pairings (a list of the molecule combinations that pair up, followed by a listing of the stragglers from each ecosystem).
This smells like some flavor of covering problem.
- Index (hash) molecules by their atom subsets, producing mappings like
{a, b} -> {m1, m7, m10}
- Select an ecosystem and, by enumerating partitions, discover and index atom subsets with their other-ecosystem expansions (such as
{a, b, c} -> {{{a, b}, {c}}}
form4 = m7 + m8
.) - Discard any atom subsets that don't have an expansion (understanding that
m1 = m7
counts as an expansion.) - From the remainder, enumerate partitions of the alphabet (set of all atoms.) From step 3, we know that any discovered partitions will be translatable into potentially many partitions in the other ecosystem, via the mapping already computed.
- Select the other ecosystem and repeat steps 2-4.
- De-dupe results (possibly by accumulating them in a hash set.)
- Expand partitions composed of subsets of atoms back into collections of molecules with the mapping built in step 1.
The one part that seems tricky off-hand is the sub-routine that accepts a collection of subsets and enumerates the constructible partitions of some target set. Depending on the exact semantics, that may in fact be NP-hard.
精彩评论