How can I create a dendrogram in R using pre-clustered data created elsewhere?
I have clustering code written in Java, from which I can create a nested tree structure, e.g. the following shows a tiny piece of the tree where the two "isRetired" objects were clustered in the first iteration, and this group was clustered with "setIsRequired" in the fifth iteration. The distances between the objects in the clusters are shown in parentheses.
|+5 (dist. = 0.0438171125324851)
|+1 (dist. = 2.220446049250313E-16)
|-isRetired
|-isRetired
|-setIsRetired
I would prefer to present my results in a more traditional dendrogram style, and it looks like R has some nice capabilities, but because I know very little about R, I am unclear on how to take advantage of them.
Is it possible for me to write out a tree structure to a file from Java, and then, with a few lines of R code, produce a dendrogram? From the R program, I'd like to do something like:
- Read from a file into a data structure (an "hclust" object?)
- Convert the data structure into a dendrogram (using "as-dendrogram"?)
- Display the dendrogram using "plot"
I guess the question boils down to whether R provides an easy way of reading from a fil开发者_如何学编程e and converting that string input into an (hclust) object. If so, what should the data in the input file look like?
I think what you are looking for is phylog. You can print your tree in a file in Newick notation, parse that out and construct a phylog object which you can easily visualize. The end of the webpage gives an example of how to do this. You also might want to consider phylobase. Although you don't want the entire functionality provided by these packages, you can piggyback on the constructs they use to represent trees and their plotting capabilities.
EDIT: It looks like a similar question to yours has been asked before here providing a simpler solution. So basically the only thing you will have to code here is your Newick parser or a parser for any other representation you want to output from Java.
The ape (Analysis of Phylogenetics and Evolution) package contains dendrogram drawing functionality, and it is capable of reading trees in Newick format. Because it is an optional package, you'll need to install it. It is theoretically easy to use, e.g. the following commands produce a dendrogram:
> library("ape")
> gcPhylo <- read.tree(file = "gc.tree")
> plot(gcPhylo, show.node.label = TRUE)
My main complaint thus far is that there is little diagnostic information when there is trouble with the syntax of the file containing the tree information in Newick format. I've had success reading these same files with other tools (which in some cases, may be because the tools are forgiving of certain faults in the syntax).
You can also produce a dendrogram using the phylog package as shown below.
> library(ade4)
> newickString <- system("cat gc.tree", intern = TRUE)
> gcPhylog <- newick2phylog(newickString)
> plot(gcPhylog, clabel.nodes=1)
Both can work with trees in Newick format and both have many plotting options.
精彩评论