开发者

How can I modify a column in an Incanter dataset?

I'd like to be able to transform an individual column in an incanter data set, and save the resulting data set to a new (csv) file. What is the simplest way to do that?

Essentially, I开发者_运维问答'd like to be able to map a function over a column in the data set, and replace the original column with this result.


You can define something like:

(defn map-data [dataset column fn]
  (conj-cols (sel dataset :except-cols column)
             ($map fn column dataset)))

and use as

(def data (get-dataset :cars))
(map-data data :speed #(* % 2))

there is only one problem with changing of column names - I'll try to fix it, when I'll have free time...


Here are two similar functions, both column name and order preserving.

(defn transform-column [col-name f data] 
  (let [new-col-names (sort-by #(= % col-name) (col-names data))
        new-dataset (conj-cols
                      (sel data :except-cols col-name)
                      (f ($ col-name data)))]

    ($ (col-names data) (col-names new-dataset new-col-names) )))

(defn transform-rows [col-name f data] 
  (let [new-col-names (sort-by #(= % col-name) (col-names data))
        new-dataset (conj-cols
                      (sel data :except-cols col-name)
                      ($map f col-name data))]

And here is an example illustrating the difference:

=> (def test-data (to-dataset [{:a 1 :b 2} {:a 3 :b 4}])) 
=> (transform-column :a (fn [x] (map #(* % 2) x)) test-data)
[:a :b]
[2 2]
[6 4]

=> (transform-rows   :a #(* % 2) test-data)
[:a :b]
[2 2]
[6 4]

transform-rows is best for simple transformations, where as transform-column is for when the transformation for one row is dependent on other rows (such as when normalizing a column).

Saving and loading CSV can be done with the standard Incanter functions, so a full example looks like:

(use '(incanter core io)))

(def data (col-names (read-dataset 'data.csv') [:a :b])

(save (transform-rows :a #(* % 2) data) 'transformed-data.csv')


Again: maybe you can use the internal structure of the dataset.

user=> (defn update-column
         [dataset column f & args]
         (->> (map #(apply update-in % [column] f args) (:rows dataset))
           vec
           (assoc dataset :rows)))
#'user/update-column
user=> d
[:col-0 :col-1]
[1 2]
[3 4]
[5 6]

user=> (update-column d :col-1 str "d")
[:col-0 :col-1]
[1 "2d"]
[3 "4d"]
[5 "6d"]

Again it should be checked in how far this is public API.


NOTE: this solution requires Incanter 1.5.3 or greater

For those who can use recent versions of Incanter...

add-column & add-derived-column were added to Incanter in 1.5.3 (pull request)

From the docs:

add-column

"Adds a column, with given values, to a dataset."

(add-column column-name values)

or

(add-column column-name values data)

Or you can use:

add-derived-column

"This function adds a column to a dataset that is a function of existing columns. If no dataset is provided, $data (bound by the with-data macro) will be used. f should be a function of the from-columns, with arguments in that order."

(add-derived-column column-name from-columns f)

or

(add-derived-column column-name from-columns f data)

a more complete example

(use '(incanter core datasets))
  (def cars (get-dataset :cars))

(add-derived-column :dist-over-speed [:dist :speed] (fn [d s] (/ d s)) cars)

(with-data (get-dataset :cars)
  (view (add-derived-column :speed**-1 [:speed] #(/ 1.0 %))))
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜