开发者

Rattle loading String to Vector file from WEKA

I have been using WEKA to do some text classification work and I want to try out R.

The problem is I cannot load the String to Vector ARFF files created by WEKA's string parser into Rattle .

Looking at the logs I get something like:

/Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,

: scan() expected 'a real', got '2281}'开发者_StackOverflow社区/

My ARFF data file looks a bit like this:

@relation 'reviewData'

@attribute polarity {0,2}
.....
@attribute $$ numeric
@attribute we numeric
@attribute wer numeric
@attribute win numeric
@attribute work numeric

@data
{0 2,63 1,71 1,100 1,112 1,140 1,186 1,228 1}
{14 1,40 1,48 1,52 1,61 1,146 1}
{2 1,41 1,43 1,57 1,71 1,79 1,106 1,108 1,133 1,146 1,149 1,158 1,201 1}
{0 2,6 1,25 1,29 1,42 1,49 1,69 1,82 1,108 1,116 1,138 1,140 1,155 1}
..../

Any ideas how I can convert this into an R readable format?

Cheers!


When you save the result of the StringToWordVector attribute filter, it will be saved as a sparse ARFF file.

You need to check if Rattle supports reading this format. If not, you can apply the SparseToNonSparse instance filter, which will convert it to a dense matrix format (file size will be much larger).

Example: if the sparse data looks like:

sparse.arff

@relation name
@attribute word1 numeric
@attribute word2 numeric
..
@attribute word10 numeric
@data
{0 1,3 3,8 1,9 1}
{2 2,5 1,8 1,9 1}

it will be converted to:

nonsparse.arff

@relation name
@attribute word1 numeric
@attribute word2 numeric
..
@attribute word10 numeric
@data
1,0,0,3,0,0,0,0,1,1
0,0,2,0,0,1,0,0,1,1
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜