Is it good practice to update R packages often? [closed]
开发者_Python百科
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this questionI've started to use R a little while ago and am not sure how often to update the installed packages (at this time, I'm using mostly ggplot2 and rattle). One one hand it's the typical geek impulse to have the latest version :-) On the other, updates can break functionality and, as an R beginner, I don't want to waste time looking into package incompatibilities and reinstalling libraries, it's almost certain I wouldn't notice any difference with an improved package.
With other applications I have a sense developed from experience on how often to upgrade, how much to wait between the release of an upgrade and installing it and so on. But I'm in the dark with regards to R.
And to be clear: I'm not talking about R itself, but its libraries.
Thanks.
Here is my philosophy: the naïve user never updates. The sophisticated user always updates. The power user updates often, but carefully.
Mindless updating is not always beneficial. Bugs work their way in updated versions of R libraries (or R itself!), and you could break your existing code by updating without reading the change log or commit history. For example, R 2.11 broke lme4
on OS X... it pays to carefully update and run demos of packages between releases. It really sucks to update to a new library or R release and realize something broke when you have a deadline.
Yes it is.
Why exactly would you want to hang on to old bugs and lacking features?
The question is already answered, but I'll offer my 2 cents. In an organization, updating R should be treated like updating gcc
or Java
: with warnings, with a staging area, a rollback plan, etc. Others' work and results may be affected. [See update #2]
I am more impulsive about updating R packages. As long as you can reproduce the state of your system at any point in time, there's little to worry about. Ensuring that nightly backups occur should be the domain of your sysadmin.
The basic idea is that you should be able to reproduce everything. Actually testing that your earlier results are reproduced is dependent on whether you want to disprove your assumption that there are no bugs or changes that will affect later results. :)
Update 1. As has been mentioned in comments and above, updating in a production environment or any environment where stability is optimal (e.g. bugs are either known or not significant), introducing new bugs, new dependencies, different output, or any variety of other software regressions, should be done quite carefully. Moreover, where you're updating from matters a lot. Updating from R-Forge is more likely to expose you to the newest bugs than from CRAN. Even so, I have found and reported bugs that persisted through 3+ versions of a package on CRAN, as well as other regressions that have just magically appeared. I test a lot, but updating, finding new bugs, and debugging is an effort that I don't always want to (or have time to) undertake.
I am reminded of this question after finding a new bug in a new version of a package that I use a lot. Just to check, I reverted to an earlier version - no more crashes, though tracking down the cause took a couple of hours, because I assumed it was not arising in this package. I'll send a note to the maintainer before long, so others won't have to stumble on the same bug.
Update 2. In an organization, I have to say that the answer is no. In fact, in any case where there may be two or more concurrent instances of R, it is very unwise to blindly update the packages while another may be using them. There are likely to be good methods for hot-swapping packages, I just don't yet know them. Keep in mind that the two instances need only share libraries (i.e. where the packages are stored), and, AFAIK, need not run concurrently on the same machine. Thus, if libraries are placed on shared systems, e.g. over NFS, one may not know where else R is running at the same time, using those libraries. Accidentally killing another R process is not usually a good thing.
Yes, unless you have a good reason not to (see my comment to Dirk)
Although some of the following has been mentioned in previous answers, I think it might be beneficial to make a few things explicit. As a developer, I think that updating packages often (and R-devel for the matter), is a good practice. You definitely want to stick with the latest out there. If your package imports/depends/sugests... interacts with other packages, you want to ensure interoperability on day to day basis, and not face the 'bugs' just before release, when time is short. On the other hand, some environments will put special emphasis on exact reproducibility. In that case, one may want to adopt a more careful strategy with updating.
But it is worth emphasising that these two behaviours are not exclusive. It is possible to install different versions of R and maintain different libraries, to benefit from a bleeding edge development environment and a more stable one for production.
Hope this helps.
I'd be inclined to respond as often as you need to, and never when you're in a hurry!
Firstly, debate the chances that you're labouring under a bug of which you are unaware. I would moot that is quite rare. If you're suffering under a bug and there's a newer version in which the bug is fixed, plan an upgrade. If you want a new feature, plan an upgrade. If it's your first day back after Christmas and the biggest overhead is trying to remember what you were actually doing last then the overhead of messing about with some new dependency requirements (which may include system components outside of R) is probably relatively small, so consider seeing what updates are available (guess what I did today) ;-)
The golden rule is probably that there isn't a single, recommended schedule other than what makes sense for your use; daily updates will inevitably result in fewer updates each time and thus minimize the pain of the actual update, but it's not worth it if you constantly get different numerical results from one day to the next because of some change to how a function does sampling (different numerical results have plagued Coursera students using caret). Don't underestimate the value of a stable system that allows you to just get on with productive work rather than faffing.
精彩评论