cover

python文本分析与挖掘（一）-构建语料库

python文本分析与挖掘（一）-构建语料库。

数据杂坛

2312人浏览 · 2022-06-05 12:35:16

数据杂坛 · 2022-06-05 12:35:16 发布

实现功能：

python文本分析与挖掘（一）-构建语料库

实现代码：

import os
from warnings import simplefilter
simplefilter(action='ignore', category=FutureWarning)
import os.path
import codecs
import pandas

#==========词料库构建=================
def Create_corpus(file):
    filePaths = []
    fileContents=[]
    for root, dirs, files in os.walk(file):
        print(root)
        print(dirs)
        print(files)
        # os.path.join()方法拼接文件名返回所有文件的路径，并储存在变量filePaths中
        for name in files:
            filePath=os.path.join(root, name)
            filePaths.append(filePath)
            print(filePaths)
            f = codecs.open(filePath, 'r', 'utf-8')
            print(f)
            fileContent = f.read()
            print(fileContent)
            f.close()
            fileContents.append(fileContent)
    #codecs.open()方法打开每个文件，用文件的read()方法依次读取其中的文本，将所有文本内容依次储存到变量fileContenst中，然后close()方法关闭文件。
    #创建数据框corpos，添加filePaths和fileContents两个变量作为数组
    corpos = pandas.DataFrame({'filePath': filePaths,'fileContent': fileContents})
    print(corpos)

Create_corpus("F:\医学大数据课题\AI_SLE\AI_SLE_TWO\TEST_DATA")

实现效果：

喜欢记得点赞，在看，收藏，

关注V订阅号：数据杂坛，获取数据集，完整代码和效果，将持续更新！

腾讯云开发者社区

腾讯云面向开发者汇聚海量精品云计算使用和开发经验，营造开放的云计算技术生态圈。

更多推荐

cover

动物识别系统 python实现+UI

腾讯云开发者社区

Linux下安装MySQL8.0(超详细)

Linux下安装MySQL8.0(超详细)

腾讯云开发者社区

cover

光模块问题查看并保证光模块收发功率

腾讯云开发者社区

所有评论(0)

查看更多评论

数据杂坛

@sinat_41858359

已为社区贡献25条内容