Basic input file parsing in R
I'm used to perl and new to R. I know you can read whole tables using read.table()
but I wonder how can I use R to parse a single line from an input file.
Specifically, what is th开发者_运维技巧e equivalent to the following perl snippet:
open my $fh, $filename or die 'can't open file $filename';
my $line = <$fh>;
my ($first, $second, $third) = split ("\t", $line);
Similar to the above would be:
filename <- 'your/file/name/here'
fh <- file( filename, open='rt' )
line <- readLines(fh, n=1 )
tmp <- strsplit(line, "\\t")
first <- tmp[[1]][1]; second <- tmp[[1]][2]; third <- tmp[[1]][3]
The file function creates a connection to the file and opens it, the opening is optional, but if you don't open the file then when you read from it it will open then close the file again, if you open the file then it remains open and the next read continues from where the previous left on (closest match to what Perl would be doing above).
The readLines function will read the specified number of lines (1 in this case) then strsplit works basically the same as the Perl split function.
R does not have the multiple assign like Perl (it is often best to just keep the results together anyways rather than splitting into multiple global variables).
In general, you should use scan
to do this, or in more complex cases read the whole file with readLines
and parse it manually with strsplit
s, grep
s and stuff.
In your case:
scan(filename,character(0),nmax=3)->d
first<-d[1];d[2]->second;third<-d[3]
Just to show another way to do it (assuming your input is "temp/3.txt"):
> d <- read.csv("temp/3.txt", sep="\t", stringsAsFactors=F, header=F, nrows=1)
# Show the default column names:
> colnames(d)
[1] "V1" "V2" "V3"
# Assign the requested column names
> colnames(d) <- c("first", "second", "third")
# Show the current structure of d
> d
first second third
1 1 2 3
# Probably not recommended: Add the columns of d to the search path
> attach(d)
> first
[1] 1
# Clean up:
> detach(d)
I guess the most important part above in terms of addressing your question is just
nrows=1
which tells it to parse one row of input. (Underneath read.csv eventually just calls down to scan.)
精彩评论