开发者

Database Functional Programming in Clojure

"It is temp开发者_如何学JAVAting, if the only tool you have is a hammer, to treat everything as if it were a nail." - Abraham Maslow

I need to write a tool to dump a large hierarchical (SQL) database to XML. The hierarchy consists of a Person table with subsidiary Address, Phone, etc. tables.

  • I have to dump thousands of rows, so I would like to do so incrementally and not keep the whole XML file in memory.

  • I would like to isolate non-pure function code to a small portion of the application.

  • I am thinking that this might be a good opportunity to explore FP and concurrency in Clojure. I can also show the benefits of immutable data and multi-core utilization to my skeptical co-workers.

I'm not sure how the overall architecture of the application should be. I am thinking that I can use an impure function to retrieve the database rows and return a lazy sequence that can then be processed by a pure function that returns an XML fragment.

For each Person row, I can create a Future and have several processed in parallel (the output order does not matter).

As each Person is processed, the task will retrieve the appropriate rows from the Address, Phone, etc. tables and generate the nested XML.

I can use a a generic function to process most of the tables, relying on database meta-data to get the column information, with special functions for the few tables that need custom processing. These functions could be listed in a map(table name -> function).

Am I going about this in the right way? I can easily fall back to doing it in OO using Java, but that would be no fun.

BTW, are there any good books on FP patterns or architecture? I have several good books on Clojure, Scala, and F#, but although each covers the language well, none look at the "big picture" of function programming design.


Ok, cool, you're using this as an opportunity to showcase Clojure. So, you want to demonstrate FP and concurrency. Roger that.

To wow your interlocutors I would make a point to demonstrate:

  • Performance of your program using a single thread.
  • How your program's performance increases as you increase the number of threads.
  • How easy it is to take your program from single to multi-threaded.

You might create a function to dump a single table to an XML file.

(defn table-to-xml [name] ...)

With that you can work out all or your code for the core task of converting your relational data to XML.

Now that you've solved the core problem see if throwing more threads at it will increase your speed.

You might modify table-to-xml to accept an additional parameter:

(defn table-to-xml [name thread-count] ...)

This implies that you have n threads working on one table. In this case every thread might processes every nth row. A problem with putting multiple threads on one table is that each thread is going to want to write to the same XML file. That bottleneck may make the strategy useless, but it's worth a shot.

If creating one XML file per table is acceptable then spawning one thread per table would likely be an easy win.

(map #(future (table-to-xml %)) (table-names))

Using just a one-to-one relationship between tables, files and threads: as a guideline, I would expect your code to not contain any refs or dosyncs and the solution should be pretty straight forward.

Once you start spawning multiple threads per table you are adding complexity and may not see much of a performance increase.

In any case you would likely have one or two queries per table for getting values and meta-data. Regarding your comment about not wanting to load all the data in memory: Each thread would only be processing one row at a time.

Hope that helps!

Given your comment here's some pseudo-ish code that might help:

(defn write-to-xml [person]
  (dosync
   (with-out-append-writer *path*
     (print-person-as-xml))))

(defn resolve-relation [person table-name one-or-many]
  (let [result (query table-name (:id person))]
    (assoc person table-name (if (= :many one-or-many)
                               result
                               (first result)))))

(defn person-to-xml [person]
  (write-to-xml
   (-> person
       (resolve-relation "phones" :many)
       (resolve-relation "addresses" :many))))

(defn get-people []
  (map convert-to-map (query-db ...)))

(defn people-to-xml []
  (map (fn [person]
         (future (person-to-xml %)))
       (get-people)))

You might consider using the Java executors library to create a thread pool.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜