开发者

How to initialize and "modify" a cyclic persistent data structure in Scala?

I have searched and found some info on this topic but the answers are either confusing or not applicable.

I have something like this:

class Thing (val name:String, val refs:IndexedSeq[开发者_Go百科Ref])
class Ref (val name:String, val thing:Thing)

Now, I want to say, load in a file, parse it and populate this data structure from it. It being immutable and cyclic, how might one do so?

Also, let's say I do get this data structure populated, now I want to modify it, like change rootThing.refs(3).name, how might one do that?


Thanks for the ideas posted here. At this point, I'm thinking that if one really wants persistent data structures for something like this, to think outside the box and consider what questions client code will need to ask. So instead of thinking of objects and fields, think of queries, indexes and such. To start with, I'm thinking in terms of: Is there a bidirectional multimap persistent data structure?


You can initialize a cyclic data structure of this form if you're prepared to modify it to introduce a degree of laziness,

scala> class Thing (val name:String, refs0: => IndexedSeq[Ref]) { lazy val refs = refs0 } ; class Ref (val name:String, thing0: => Thing) { lazy val thing = thing0 }
defined class Thing
defined class Ref

scala> val names = Vector("foo", "bar", "baz")                                                                                                                       
names: scala.collection.immutable.Vector[java.lang.String] = Vector(foo, bar, baz)

scala> val rootThing : Thing = new Thing("root", names.map { new Ref(_, rootThing) })
rootThing: Thing = Thing@1f7dab1

scala> rootThing.refs(1).name
res0: String = bar

However, you can't make it persistent: being cyclic, any change is visible via every element of the structure, so there are no opportunities for sharing between versions.


For a single cyclic reference, you can use lazy:

lazy val t: Thing = new Thing("thing", Vector(new Ref("ref", t)))

However obviously this gets complicated with many-to-many connections.

I don't know if a general purpose purely functional cyclic graph data structure exists. With acyclic graphs this would be easy as you could topologically sort it and then initialize it step by step.

Maybe using an indirection is an option for you, say to refer to objects through an identifier instead of the actual scala object reference?

case class ThingByID(id: Int, name: String, refs: IndexedSeq[RefByID])
case class RefByID(name: String, thingID: Int)

Then you could after loading your file collect the things by their ID into an immutable map (e.g. collection.immutable.IntMap) and look them up when coming from a ref.

EDIT

Miles is right about the first case of the lazy val t. Indeed you need by-name parameters as in his answer.

class Thing(val name: String, val refs: IndexedSeq[Ref])
class Ref(val name: String, _thing: => Thing) { def thing = _thing }

val t: Thing = new Thing("thing", Vector(new Ref("ref", t)))


There's an alternative approach which requires rethinking how object associations are represented: instead of storing associations between objects inside the objects themselves (as typically done in OO code) add them afterwards in a separate layer as Maps.

Because all of the nodes in the object graph exist by the time associations are created, immutable bidrectional associations can be easily created using Maps.

scala> class Thing (val name:String)
defined class Thing

scala> class Ref (val name:String)
defined class Ref

scala> new Thing("Thing1")
res0: Thing = Thing@5c2bae98

scala> new Ref("Ref1")
res1: Ref = Ref@7656acfa       

scala> val thing2Ref = Map(res0 -> res1)
thing2Ref: scala.collection.immutable.Map[Thing,Ref] = Map(Thing@5c2bae98 -> Ref
@7656acfa)

scala> val ref2Thing = Map(res1 -> res0)
ref2Thing: scala.collection.immutable.Map[Ref,Thing] = Map(Ref@7656acfa -> Thing
@5c2bae98)

If you think about it, this approach is similar to how relational databases work. Multi-valued associations between tables are not stored in the rows themselves, but in separate indexes. Even when association indexes are not present and so a query is resolved using a table scan, it is using the primary key indexes to locate all candidate rows for the result.

An advantage of this style is that associations can be added or changed without affecting the objects themselves, and different associations can be employed for different audiences/use-cases. A disadvantage is that an association map is not directly accessible on instances where the associations begins; it has to be passed around & provided separately.


Immutable data structures can be initialised entirely by their constructor, or you can accept a need to keep copying the structure as you change its properties. So to answer the first part of the question, you load data into the immutable data structure by defining a constructor that accepts all the information in your datum, or ensure you're aware of the cyclic subgraphs:

Cyclic data structures aren't necessarily entirely cyclic, I think. If you imagine the network of pointers that a single instance/state holds, you could have a subgraph containing a parent and child that point to each other, but no other cyclic structures. In this scenario, copying instance 1 to lazily create instance 2 with a different parent node (for example) would necessitate copying the parent and child nodes, as they form a cyclic subgraph. But the references held within the child other than the parent can continue to be references to the same immutable structures as the first instance.

For example, my class House has a reference to a Door, a Window and a Roof. A Door has a colour and a toHouse reference to the House, a Window has a size and a Roof has a pitch. So I create bobsHouse with a green Door, a large Window and a flat Roof. In fact, since all of these are immutable, there is theoretically only one large Window - all houses with large Windows have the same Window. A second instance, janesHouse, is just like bobsHouse, but with the gabled Roof. So if I say janesHouse = bobsHouse.withRoof(gabled), then I should get a new instance of House, with a new (also green) Door, and a new (gabled) Roof, but with the same Window.

So if janesHouse is evaluated lazily, it need only create a new House if the Door or Roof are referenced. If janesHouse.Window is requested, it need not create a new House at all - only bobsHouse is needed.

tl;dr: You can have persistent (lazy) cyclic data structures, but only if you can find non-cyclic subgraphs in it, i.e. it's not a chain.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜