开发者

adding a column based on other values

I have a dataframe with millions of rows and three columns labeled Keywords, Impressions, Clicks. I'd like to add a column with values depending on the evaluation of this function:

isType <- function(Impressions, Clicks)
{ 
if (Impressions >= 1 & Clicks >= 1){return("HasClicks")} else if (Impressions >=1 & Clicks == 0){return("NoClicks")} else {return("ZeroImp")}
}

so far so good. I then try this to create the column but 1) it takes for ever and 2) it marks all the rows has "HasClicks" even the ones where it shouldn't.

# Creates a dataframe
Type <- data.frame()
# Loops until last row and store it in data.frame
for (i in c(1:dim(Mydf)[1])) {Type <- rbind(Type,isTy开发者_StackOverflowpe(Mydf$Impressions[i], Mydf$Clicks[i]))}
# Add the column to Mydf
Mydf <- transform(Mydf, Type = Type)

input data:

Keywords,Impressions,Clicks

"Hello",0,0

"World",1,0

"R",34,23

Wanted output:

Keywords,Impressions,Clicks,Type

"Hello",0,0,"ZeroImp"

"World",1,0,"NoClicks"

"R",34,23,"HasClicks"


Building on Joshua's solution, I find it cleaner to generate Type in a single shot (note however that this presumes Clicks >= 0...)

Mydf$Type = ifelse(Mydf$Impressions >= 1,
    ifelse(Mydf$Clicks >= 1, 'HasClicks', 'NoClicks'), 'ZeroImp')


First, the if/else block in your function will return the warning:

Warning message:
In if (1:2 > 2:3) TRUE else FALSE :
the condition has length > 1 and only the first element will be used

which explains why it all the rows are the same.

Second, you should allocate your data.frame and fill in the elements rather than repeatedly combining objects together. I imagine this is causing your long run-times.

EDIT: My shared code. I'd love for someone to provide a more elegant solution.

Mydf <- data.frame(
  Keywords = sample(c("Hello","World","R"),20,TRUE),
  Impressions = sample(0:3,20,TRUE),
  Clicks = sample(0:3,20,TRUE) )

Mydf$Type <- "ZeroImp"
Mydf$Type <- ifelse(Mydf$Impressions >= 1 & Mydf$Clicks >= 1,
  "HasClicks", Mydf$Type)
Mydf$Type <- ifelse(Mydf$Impressions >= 1 & Mydf$Clicks == 0,
  "NoClicks", Mydf$Type)


This is a case where arithmetic can be cleaner and most likely faster than nested ifelse statements.

Again building on Joshua's solution:

Mydf$Type <- factor(with(Mydf, (Impressions>=1)*2 + (Clicks>=1)*1),
                    levels=1:3, labels=c("ZeroImp","NoClicks","HasClicks"))
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜