开发者

R: Replacing rownames of data frame by a substring[2]

I have a question about the use of gsub. The rownames of my data, have the same partial names. See below:

> rownames(test)
[1] "U2OS.EV.2.7.9"   "U2OS.PIM.2.7.9"  "U2OS.WDR.2.7.9"  "U2OS.MYC.2.7.9"
[5] "U2OS.OBX.2.7.9"  "U2OS.EV.18.6.9"  "U2O2.PIM.18.6.9" "U2OS.WDR.18.6.9"
[9] "U2OS.MYC.18.6.9" "U2OS.OBX.18.6.9" "X1.U2OS...OBX"   "X2.U2OS...MYC"
[13] "X3.U2OS...WDR82" "X4.U2OS...PIM"   "X5.U2OS...EV"    "exp1.U2OS.EV"
[17] "exp1.U2OS.MYC"   "EXP1.U20S..PIM1" "EXP1.U2OS.WDR82" "EXP1.U20S.OBX"
[21] "EXP2.U2OS.EV"    "EXP2.U2OS.MYC"   "EXP2.U2OS.PIM1"  "EXP2.U2OS.WDR82"
[25] "EXP2.U2OS.OBX"

In my previous question, I asked if there is a way to get the same names for the same partial names. See this question: Replacing rownames of data frame by a sub-string

The answer is a very nice solution. The function gsub is used in this way:

 transfecties = gsub(".*(MYC|EV|PIM|WDR|OBX).*", "\\1", rownames(test)

Now, I have another problem, the program I run with R (Galaxy) doesn't recognize the | characters. My question 开发者_StackOverflow中文版is, is there another way to get to the same solution without using this |?

Thanks!


If you don't want to use the "|" character, you can try something like :

Rnames <-
c( "U2OS.EV.2.7.9",   "U2OS.PIM.2.7.9",  "U2OS.WDR.2.7.9",  "U2OS.MYC.2.7.9" ,
 "U2OS.OBX.2.7.9" , "U2OS.EV.18.6.9"  ,"U2O2.PIM.18.6.9" ,"U2OS.WDR.18.6.9"  )

Rlevels <- c("MYC","EV","PIM","WDR","OBX")    
tmp <- sapply(Rlevels,grepl,Rnames)
apply(tmp,1,function(i)colnames(tmp)[i])
[1] "EV"  "PIM" "WDR" "MYC" "OBX" "EV"  "PIM" "WDR"

But I would seriously consider mentioning this to the team of galaxy, as it seems to be rather awkward not to be able to use the symbol for OR...


I wouldn't recommend doing this in general in R as it is far less efficient than the solution @csgillespie provided, but an alternative is to loop over the various strings you want to match and do the replacements on each string separately, i.e. search for "MYN" and replace only in those rownames that match "MYN".

Here is an example using the x data from @csgillespie's Answer:

x <-  c("U2OS.EV.2.7.9", "U2OS.PIM.2.7.9", "U2OS.WDR.2.7.9", "U2OS.MYC.2.7.9",
       "U2OS.OBX.2.7.9", "U2OS.EV.18.6.9", "U2O2.PIM.18.6.9","U2OS.WDR.18.6.9",
       "U2OS.MYC.18.6.9","U2OS.OBX.18.6.9", "X1.U2OS...OBX","X2.U2OS...MYC")

Copy the data so we have something to compare with later (this just for the example):

x2 <- x

Then create a list of strings you want to match on:

matches <- c("MYC","EV","PIM","WDR","OBX")

Then we loop over the values in matches and do three things (numbered ##X in the code):

  1. Create the regular expression by pasting together the current match string i with the other bits of the regular expression we want to use,
  2. Using grepl() we return a logical indicator for those elements of x2 that contain the string i
  3. We then use the same style gsub() call as you were already shown, but use only the elements of x2 that matched the string, and replace only those elements.

The loop is:

for(i in matches) {
    rgexp <- paste(".*(", i, ").*", sep = "") ## 1
    ind <- grepl(rgexp, x)                    ## 2
    x2[ind] <- gsub(rgexp, "\\1", x2[ind])    ## 3
}
x2

Which gives:

> x2
 [1] "EV"  "PIM" "WDR" "MYC" "OBX" "EV"  "PIM" "WDR" "MYC" "OBX" "OBX" "MYC"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜