Database Functional Programming in Clojure

2023-02-02 20:17 问答作者：

"It is temp开发者_如何学JAVAting, if the only tool you have is a hammer, to treat everything as if it were a nail." - Abraham Maslow

I need to write a tool to dump a large hierarchical (SQL) database to XML. The hierarchy consists of a Person table with subsidiary Address, Phone, etc. tables.

I have to dump thousands of rows, so I would like to do so incrementally and not keep the whole XML file in memory.
I would like to isolate non-pure function code to a small portion of the application.
I am thinking that this might be a good opportunity to explore FP and concurrency in Clojure. I can also show the benefits of immutable data and multi-core utilization to my skeptical co-workers.

I'm not sure how the overall architecture of the application should be. I am thinking that I can use an impure function to retrieve the database rows and return a lazy sequence that can then be processed by a pure function that returns an XML fragment.

For each Person row, I can create a Future and have several processed in parallel (the output order does not matter).

As each Person is processed, the task will retrieve the appropriate rows from the Address, Phone, etc. tables and generate the nested XML.

I can use a a generic function to process most of the tables, relying on database meta-data to get the column information, with special functions for the few tables that need custom processing. These functions could be listed in a map(table name -> function).

Am I going about this in the right way? I can easily fall back to doing it in OO using Java, but that would be no fun.

BTW, are there any good books on FP patterns or architecture? I have several good books on Clojure, Scala, and F#, but although each covers the language well, none look at the "big picture" of function programming design.

Ok, cool, you're using this as an opportunity to showcase Clojure. So, you want to demonstrate FP and concurrency. Roger that.

To wow your interlocutors I would make a point to demonstrate:

Performance of your program using a single thread.
How your program's performance increases as you increase the number of threads.
How easy it is to take your program from single to multi-threaded.

You might create a function to dump a single table to an XML file.

(defn table-to-xml [name] ...)

With that you can work out all or your code for the core task of converting your relational data to XML.

Now that you've solved the core problem see if throwing more threads at it will increase your speed.

You might modify table-to-xml to accept an additional parameter:

(defn table-to-xml [name thread-count] ...)

This implies that you have n threads working on one table. In this case every thread might processes every nth row. A problem with putting multiple threads on one table is that each thread is going to want to write to the same XML file. That bottleneck may make the strategy useless, but it's worth a shot.

If creating one XML file per table is acceptable then spawning one thread per table would likely be an easy win.

(map #(future (table-to-xml %)) (table-names))

Using just a one-to-one relationship between tables, files and threads: as a guideline, I would expect your code to not contain any refs or dosyncs and the solution should be pretty straight forward.

Once you start spawning multiple threads per table you are adding complexity and may not see much of a performance increase.

In any case you would likely have one or two queries per table for getting values and meta-data. Regarding your comment about not wanting to load all the data in memory: Each thread would only be processing one row at a time.

Hope that helps!

Given your comment here's some pseudo-ish code that might help:

(defn write-to-xml [person]
  (dosync
   (with-out-append-writer *path*
     (print-person-as-xml))))

(defn resolve-relation [person table-name one-or-many]
  (let [result (query table-name (:id person))]
    (assoc person table-name (if (= :many one-or-many)
                               result
                               (first result)))))

(defn person-to-xml [person]
  (write-to-xml
   (-> person
       (resolve-relation "phones" :many)
       (resolve-relation "addresses" :many))))

(defn get-people []
  (map convert-to-map (query-db ...)))

(defn people-to-xml []
  (map (fn [person]
         (future (person-to-xml %)))
       (get-people)))

You might consider using the Java executors library to create a thread pool.

继续阅读：clojure database functional-programming

Database Functional Programming in Clojure

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？