欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

达观杯baseline

程序员文章站 2022-06-12 16:02:54
...

达观杯baseline

简单baseline

import pandas as pd, numpy as np
from sklearn.feature_extraction.text import  TfidfVectorizer
from sklearn import svm

column = "word_seg"
train = pd.read_csv('train_set.csv')
test = pd.read_csv('test_set.csv')
test_id = test["id"].copy()
vec = TfidfVectorizer(ngram_range=(1,2),min_df=3, max_df=0.9,use_idf=1,smooth_idf=1, sublinear_tf=1)
train_term_doc = vec.fit_transform(train[column])
test_term_doc = vec.transform(test[column])
fid=open('baseline.csv','w')

y=train["class"]
lin_clf = svm.LinearSVC()
lin_clf.fit(train_term_doc,y)
preds = lin_clf.predict(test_term_doc)
fid.write("id,class"+"\n")
for item in enmurate(preds):
    fid0.write(str(i)+","+str(item)+"\n")
fid.close()

score: 0.77788

相关标签: NLP

上一篇: 焖面怎么做

下一篇: 星座计算盘