fix unreadable postscript tree output in r

2023-03-01 06:26 问答作者：

I have a relatively complicated classification tree that I'm trying to output. The resulting postscript output looks very jumbled.

> fit = rpart(virility ~ friend_count  + recip_count + twitter_handles + has_email + 
                          has_bio + has_foursquare + has_linkedin + auto_tweet + 
                          interaction_visibility + site_own_cnt + site_rec_cnt + has_url +
                          has_linkedin_url + lb_cnt, + mob_own_cnt + mob_rec_cnt + 
                          twt_own_cnt + twt_rec_cnt, method="class", data=vir)
> fit
n= 9704 

node), split, n, loss, yval, (yprob)
      * denotes terminal node

 1) root 9704 3742 virile (0.39970092 0.60029908)  
   2) recip_count< 15.5 9610 3159 mule (0.52005469 0.47994531)  
     4) site_own_cnt< 0.5 7201 1372 mule (0.65423387 0.34576613)  
       8) friend_count< 2.5 6763  948 mule (0.69566613 0.30433387)  
        16) has_bio>=0.5 4030  601 mule (0.73743993 0.26256007) *
        17) has_bio< 0.5 2733  347 mule (0.57990315 0.42009685)  
          34) recip_count< 0.5 2496   88 mule (0.78000000 0.22000000) *
          35) recip_count>=0.5 237  167 virile (0.39201878 0.60798122) *
       9) friend_count>=2.5 438  424 mule (0.50293083 0.49706917)  
        18) lb_cnt< 2.5 427  344 mule (0.55208333 0.44791667)  
          36) has_foursquare< 0.5 401  257 mule (0.61353383 0.38646617)  
            72) twitter_handles>=0.5 382  210 mule (0.65742251 0.34257749) *
            73) twitter_handles< 0.5 19    5 virile (0.09615385 0.90384615) *
          37) has_foursquare>=0.5 26   16 virile (0.15533981 0.84466019) *
        19) lb_cnt>=2.5 11    5 virile (0.05882353 0.94117647开发者_如何转开发) *
     5) site_own_cnt>=0.5 2409  827 virile (0.31637337 0.68362663)  
      10) recip_count< 0.5 1344  274 mule (0.62102351 0.37897649)  
        20) friend_count< 0.5 955   75 mule (0.81155779 0.18844221) *
        21) friend_count>=0.5 389  126 virile (0.38769231 0.61230769)  
          42) twitter_handles< 0.5 62    3 mule (0.93181818 0.06818182) *
          43) twitter_handles>=0.5 327   85 virile (0.30249110 0.69750890) *
      11) recip_count>=0.5 1065  378 virile (0.19989424 0.80010576) *
   3) recip_count>=15.5 94  319 virile (0.11474820 0.88525180)  
     6) friend_count< 2.5 40  265 virile (0.32435741 0.67564259)  
      12) site_rec_cnt>=1.5 24  175 mule (0.59112150 0.40887850)  
        24) site_rec_cnt< 4 13   46 mule (0.80257511 0.19742489) *
        25) site_rec_cnt>=4 11   66 virile (0.33846154 0.66153846) *
      13) site_rec_cnt< 1.5 16   12 virile (0.03084833 0.96915167) *
     7) friend_count>=2.5 54   54 virile (0.02750891 0.97249109) *

> post(fit, file = "/tmp/blah.ps", title = "virility model")

This results in:

The nodes of the tree are all written half on top of each other. Is there any way to make this output look reasonably readable?

The post method for rpart in fact calls first the plot method and then the text method for rpart. This means you can study the help for ?plot.rpart and ?text.rpart to find ways of improving your plot output.

?text.rpart offers some very good pointers. I suggest you try the following parameters:

fancy=FALSE will remove the ellipses and boxes. Your plot is clearly too busy and large to have this. Removing it will increase legibility.
cex=0.8 will reduce the font size to 0.8 of the normal size. Slightly smaller fonts may increase spacing between elements on the plot.

Here is an example of the difference this can make, using a model fitted to the diamonds data in ggplot2:

library(ggplot2)
library(rpart)
fit <- rpart(price~. , diamonds)

par(mfrow=c(1, 2))
plot(fit, main="Default settings")
text(fit, fancy=TRUE)

plot(fit, uniform=TRUE, main="fancy=FALSE")
text(fit, fancy=FALSE, pretty=NULL, cex=0.8)

fix unreadable postscript tree output in r

继续阅读：machine-learning postscript

fix unreadable postscript tree output in r

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？