刘小恺(Kyle)的云笔记

吃喝玩乐、好吃懒做、醉生梦死、不劳而获

情感分析-文本情感分析

发表于 2018-05-23 | 分类于机器学习，自然语言处理 | 评论数：

对文本进行情感分析的流程

导入模块(NaiveBayesClassifier)

import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
from nltk.classify import NaiveBayesClassifier 导入nltk 中的朴素贝叶斯

定义文本处理的方法

def proc_text(text):

raw_words = nltk.word_tokenize(text) # 对文本进行分词处理
wordnet_lematizer = WordNetLemmatizer() # 对文本进行归一化处理
words = [wordnet_lematizer.lemmatize(raw_word) for raw_word in raw_words] # 对文本进行归一化处理
filtered_words = [word for word in words if word not in stopwords.words('english')] # 去处停用词
# 将用来学习的单词，构建成一个字典，键是每个词，值是True/False，True代表该词在文本中
return {word: True for word in filtered_words} 返回训练样本

对模型进行学习和测试
if __name__ == "__main__":
构造5个用来学习的句子

text1 = 'I like the movie so much!'
text2 = 'That is a good movie.'
text3 = 'This is a great one.'
text4 = 'That is a really bad movie.'
text5 = 'This is a terrible movie.'

构造训练样本：是由每一个句子的词构成的字典和句子所对应的分数（情感值），构建成的列表

train_data = [[

proc_text(text1), 1],
[proc_text(text2), 1],
[proc_text(text3), 1],
[proc_text(text4), 0],
[proc_text(text5), 0
]]

使用朴素贝叶斯模型训练（默认会将文本数据向量化处理），返回贝叶斯模型

nb_model = NaiveBayesClassifier.train(train_data)

测试模型

text6 = 'That is a bad one.' # 测试文本
text7 = 'That is a good one!' # 测试文本
nb_model.classify(proc_text(text6)) --->0 对测试文本进行测试并打分
nb_model.classify(proc_text(text7)) --->1 对测试文本进行测试并打分

刘小恺(Kyle) wechat

如有疑问可联系博主

刘小恺(Kyle)

吃喝玩乐、好吃懒做、醉生梦死、不劳而获

GitHub E-Mail

0%