creating a confusion maxtrix out of various data files but getting ValueError : x and y must be the same size

2022-12-07 20:58 问答作者：

I'm new to python and trying to create a sentiment analysis using VADER

I pulled various artists (13) data into individual dataframes, converted the lyrics to words, found only the unique words, remove stopwords and all that then put it all into a single df

#for all the artists clean, get the single event of the word and place it in the list
df_allocate = []
for df in df_all:
    df_clean = cleaning(df)
    df_words = to_unique_words(df_clean)
    df_allocate.append(df_words)

frames = df_allocate
# create the new column with the information of words lists
df_main = pd.concat(frames, ignore_index=True)
df_main = df_main.reset_index(drop=True)

Now I'开发者_JAVA百科m trying to train a logistic regression model, predict test results and get a confusion matrix.

I think I'm getting confused about how data frames work and also how to train_test_split the data correctly.

Right now, I have:

for column_name in df_all:
    cv = CountVectorizer(max_features=100000)
    X = cv.fit_transform(df_main['Artist']).toarray()
    y = column_name.sentiment

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.50, random_state=20)

    classifier = LogisticRegression(random_state= 25)
    classifier.fit(X_train, y_train)

    y_predict = classifier.predict(X_test)

    print_confusionMatrix = confusion_matrix(y_test, y_predict)
    print(print_confusionMatrix)
    print("accuracy score : ", accuracy_score(y_test, y_predict))

When I debug the program, I see why it's complaining however, I don't know how to fix it. I looked over how to iterate through dataframe and tried doing

for df in df_all.index

but it didn't work.

The columns are Artist, Title, Album, Date, Lyric, Year, and sentiment. What I want to accomplish is to iterate through each artist (df_all has the data frames of each individual artist, and that is why I use it), and get a prediction of the sentiment analysis of their lyrics to build a confusion matrix for all the 13 artists.

Previous tries are changing x to, and y keep it as that, so it's:

X = cv.fit_transform(df_main).toarray()
y = df_main.sentiment

however, this is where I get the error that x and y must be the same size.

Please push me in the right direction. I'm quite lost.

继续阅读：confusion-matrix countvectorizer logistic-regression python sentiment-analysis

creating a confusion maxtrix out of various data files but getting ValueError : x and y must be the same size

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？