开发者

fix unreadable postscript tree output in r

I have a relatively complicated classification tree that I'm trying to output. The resulting postscript output looks very jumbled.

> fit = rpart(virility ~ friend_count  + recip_count + twitter_handles + has_email + 
                          has_bio + has_foursquare + has_linkedin + auto_tweet + 
                          interaction_visibility + site_own_cnt + site_rec_cnt + has_url +
                          has_linkedin_url + lb_cnt, + mob_own_cnt + mob_rec_cnt + 
                          twt_own_cnt + twt_rec_cnt, method="class", data=vir)
> fit
n= 9704 

node), split, n, loss, yval, (yprob)
      * denotes terminal node

 1) root 9704 3742 virile (0.39970092 0.60029908)  
   2) recip_count< 15.5 9610 3159 mule (0.52005469 0.47994531)  
     4) site_own_cnt< 0.5 7201 1372 mule (0.65423387 0.34576613)  
       8) friend_count< 2.5 6763  948 mule (0.69566613 0.30433387)  
        16) has_bio>=0.5 4030  601 mule (0.73743993 0.26256007) *
        17) has_bio< 0.5 2733  347 mule (0.57990315 0.42009685)  
          34) recip_count< 0.5 2496   88 mule (0.78000000 0.22000000) *
          35) recip_count>=0.5 237  167 virile (0.39201878 0.60798122) *
       9) friend_count>=2.5 438  424 mule (0.50293083 0.49706917)  
        18) lb_cnt< 2.5 427  344 mule (0.55208333 0.44791667)  
          36) has_foursquare< 0.5 401  257 mule (0.61353383 0.38646617)  
            72) twitter_handles>=0.5 382  210 mule (0.65742251 0.34257749) *
            73) twitter_handles< 0.5 19    5 virile (0.09615385 0.90384615) *
          37) has_foursquare>=0.5 26   16 virile (0.15533981 0.84466019) *
        19) lb_cnt>=2.5 11    5 virile (0.05882353 0.94117647开发者_如何转开发) *
     5) site_own_cnt>=0.5 2409  827 virile (0.31637337 0.68362663)  
      10) recip_count< 0.5 1344  274 mule (0.62102351 0.37897649)  
        20) friend_count< 0.5 955   75 mule (0.81155779 0.18844221) *
        21) friend_count>=0.5 389  126 virile (0.38769231 0.61230769)  
          42) twitter_handles< 0.5 62    3 mule (0.93181818 0.06818182) *
          43) twitter_handles>=0.5 327   85 virile (0.30249110 0.69750890) *
      11) recip_count>=0.5 1065  378 virile (0.19989424 0.80010576) *
   3) recip_count>=15.5 94  319 virile (0.11474820 0.88525180)  
     6) friend_count< 2.5 40  265 virile (0.32435741 0.67564259)  
      12) site_rec_cnt>=1.5 24  175 mule (0.59112150 0.40887850)  
        24) site_rec_cnt< 4 13   46 mule (0.80257511 0.19742489) *
        25) site_rec_cnt>=4 11   66 virile (0.33846154 0.66153846) *
      13) site_rec_cnt< 1.5 16   12 virile (0.03084833 0.96915167) *
     7) friend_count>=2.5 54   54 virile (0.02750891 0.97249109) *

> post(fit, file = "/tmp/blah.ps", title = "virility model")

This results in:

fix unreadable postscript tree output in r

The nodes of the tree are all written half on top of each other. Is there any way to make this output look reasonably readable?


The post method for rpart in fact calls first the plot method and then the text method for rpart. This means you can study the help for ?plot.rpart and ?text.rpart to find ways of improving your plot output.

?text.rpart offers some very good pointers. I suggest you try the following parameters:

  • fancy=FALSE will remove the ellipses and boxes. Your plot is clearly too busy and large to have this. Removing it will increase legibility.
  • cex=0.8 will reduce the font size to 0.8 of the normal size. Slightly smaller fonts may increase spacing between elements on the plot.

Here is an example of the difference this can make, using a model fitted to the diamonds data in ggplot2:

library(ggplot2)
library(rpart)
fit <- rpart(price~. , diamonds)

par(mfrow=c(1, 2))
plot(fit, main="Default settings")
text(fit, fancy=TRUE)

plot(fit, uniform=TRUE, main="fancy=FALSE")
text(fit, fancy=FALSE, pretty=NULL, cex=0.8)

fix unreadable postscript tree output in r

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜