开发者

how to start a for loop in R programming

I'm new to programming and I开发者_运维百科 wrote a code that finds spam words for the first email but I would like to write a for loop that would do this for all of the emails. Any help would be appreciated. Thank you.

words = grepl("viagra", spamdata[[ 1 ]]$header[ "Subject"])


I presume that you want to loop over the elements of spamdata and build up an indicator whether the string "viagra" is found in the subject lines of your emails.

Lets set up some dummy data for illustration purposes:

subjects <- c("Buy my viagra", "Buy my Sildenafil citrate",
              "UK Lottery Win!!!!!")
names(subjects) <- rep("Subject", 3)
spamdata <- list(list(Header = subjects[1]), list(Header = subjects[2]),
                 list(Header = subjects[3]))

Next we create a vector words to hold the result of each iteration of the loop. You do not want to be growing words or any other object at each iteration - that will force copying and will slow your loop down. Instead allocate storage before you begin - here using the length of the list over which we want to loop:

words <- logical(length = length(spamdata))

You can set up a loop as so

## seq_along() creates a sequence of 1:length(spamdata) 
for(i in seq_along(spamdata)) {
    words[ i ] <- grepl("viagra", spamdata[[ i ]]$Header["Subject"])
}

We can then look at words:

> words
[1]  TRUE FALSE FALSE

Which matches what we know from the made up subjects.

Notice how we used i as a place holder for 1, 2, and 3 - at each iteration of the loop, i takes on the next value in the sequence 1,2,3 so we can i) access the ith component of spamdata to get the next subject line, and ii) access the ith element of words to store the result of the grepl() call.

Note that instead of an implicit loop we could also use the sapply() or lapply() functions, which create the loop for you but might need a bit of work to write a custom function. Instead of using grepl() directly, we can write a wrapper:

foo <- function(x) {
    grepl("viagra", x$Header["Subject"])
}

In the above function we use x instead of the list name spamdata because when lapply() and sapply() loop over the spamdata list, the individual components (referenced by spamdata[[i]] in the for() loop) get passed to our function as argument x so we only need to refer to x in the grepl() call.

This is how we could use our wrapper function foo() in lapply() or sapply(), first lapply():

> lapply(spamdata, foo)
[[1]]
[1] TRUE

[[2]]
[1] FALSE

[[3]]
[1] FALSE

sapply() will simplify the returned object where possible, as follows:

> sapply(spamdata, foo)
[1]  TRUE FALSE FALSE

Other than that, they work similarly.

Note we can make our wrapper function foo() more useful by allowing it to take an argument defining the spam word you wish to search for:

foo <- function(x, string) {
    grepl(string, x$Header["Subject"])
}

We can pass extra arguments to our functions with lapply() and sapply() like this:

> sapply(spamdata, foo, string = "viagra")
[1]  TRUE FALSE FALSE
> sapply(spamdata, foo, string = "Lottery")
[1] FALSE FALSE  TRUE

Which you will find most useful (for() loop or the lapply(), sapply() versions) will depend on your programming background and which you find most familiar. Sometimes for() is easier and simpler to use, but perhaps more verbose (which isn't always a bad thing!), whilst lapply() and sapply() are quite succinct and useful where you don't need to jump through hoops to create a workable wrapper function.


In R a loopstakes this form, where variable is the name of your iteration variable, and sequence is a vector or list of values:

for (variable in sequence) expression

The expression can be a single R command - or several lines of commands wrapped in curly brackets:

for (variable in sequence) { 
    expression
    expression
    expression
}

In this case it would be for(words){ do whatever you want to do}

Also

Basic loop theory

The basic structure for loop commands is: for(i in 1:n){stuff to do}, where n is the number of times the loop will execute.

listname[[1]] refers to the first element in the list “listname.”

In a for loop, listname[[i]] refers to the variable corresponding to the ith iteration of the for loop.

The code for(i in 1:length(yesnovars)) tells the loop to execute only once for each variable in the list.

Answer taken from the following sources:
Loops in R
Programming in R

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜