python数据分析关于抑郁症研究代码
抑郁症患者数据分析摘要 该数据集包含8,400条抑郁症患者咨询记录,主要字段包括患者基本信息(姓名、性别、年龄)、咨询内容标签、日期、标题、交流次数、医生信息(姓名、医院、科室)等。数据分析显示:1)部分日期数据存在缺失;2)患者姓名列需要拆分出性别和年龄信息;3)数据预处理阶段已完成性别和年龄的拆分提取。该数据集可用于研究抑郁症患者的性别年龄分布、咨询频率、医生接诊情况等医疗数据分析。
{
“cells”: [
{
“cell_type”: “markdown”,
“metadata”: {},
“source”: [
“# 导入数据”
]
},
{
“cell_type”: “code”,
“execution_count”: 27,
“metadata”: {},
“outputs”: [
{
“data”: {
“text/html”: [
“
“\n”,
“<table border=“1” class=“dataframe”>\n”,
" \n",
" <tr style=“text-align: right;”>\n",
" \n",
" Patient_name\n",
" Label\n",
" Date\n",
" Title\n",
" Communications\n",
" Doctor\n",
" Hospital\n",
" Faculty\n",
" \n",
" \n",
" \n",
" \n",
" 0\n",
" 患者:女 43岁\n",
" 压抑\n",
" 05.28\n",
" 压抑 个人情况:去年1月份开始夫妻两地分居,孩子13岁男孩住校,平… 这种情况是否需要去…\n",
" 115\n",
" 杨胜文\n",
" 襄阳市安定医院\n",
" 心理科\n",
" \n",
" \n",
“\n”,
“
],
"text/plain": [
" Patient_name Label Date \\\n",
"0 患者:女 43岁 压抑 05.28 \n",
"\n",
" Title Communications Doctor \\\n",
"0 压抑 个人情况:去年1月份开始夫妻两地分居,孩子13岁男孩住校,平... 这种情况是否需要去... 115 杨胜文 \n",
"\n",
" Hospital Faculty \n",
"0 襄阳市安定医院 心理科 "
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
“source”: [
“import pandas as pd\n”,
“from pyecharts.charts import \n",
“from pyecharts import options as opts\n”,
“\n”,
“df=pd.read_csv(‘YiYuZheng.csv’)\n”,
“df.head(1)”
]
},
{
“cell_type”: “code”,
“execution_count”: 28,
“metadata”: {},
“outputs”: [
{
“name”: “stdout”,
“output_type”: “stream”,
“text”: [
“<class ‘pandas.core.frame.DataFrame’>\n”,
“RangeIndex: 8400 entries, 0 to 8399\n”,
“Data columns (total 8 columns):\n”,
" # Column Non-Null Count Dtype \n",
“— ------ -------------- ----- \n”,
" 0 Patient_name 8400 non-null object\n",
" 1 Label 8400 non-null object\n",
" 2 Date 8288 non-null object\n",
" 3 Title 8400 non-null object\n",
" 4 Communications 8400 non-null int64 \n",
" 5 Doctor 8400 non-null object\n",
" 6 Hospital 8400 non-null object\n",
" 7 Faculty 8400 non-null object\n",
“dtypes: int64(1), object(7)\n”,
“memory usage: 525.1+ KB\n”
]
}
],
“source”: [
"# 查看数据\n",*
“df.info()”
]
},
{
“cell_type”: “markdown”,
“metadata”: {},
“source”: [
“从数据反馈结果来看:Date列存在空缺值,并且不是日期类型。\n”,
“\n”,
“Patient_name列存在信息混合一起情况,需要拆分年龄和性别。”
]
},
{
“cell_type”: “markdown”,
“metadata”: {},
“source”: [
“# 数据预处理”
]
},
{
“cell_type”: “markdown”,
“metadata”: {},
“source”: [
“## 拆分年龄和性别”
]
},
{
“cell_type”: “code”,
“execution_count”: 29,
“metadata”: {},
“outputs”: [
{
“data”: {
“text/html”: [
“
“\n”,
“<table border=“1” class=“dataframe”>\n”,
" \n",
" <tr style=“text-align: right;”>\n",
" \n",
" Patient_name\n",
" Label\n",
" Date\n",
" Title\n",
" Communications\n",
" Doctor\n",
" Hospital\n",
" Faculty\n",
" Sex\n",
" Age\n",
" \n",
" \n",
" \n",
" \n",
" 0\n",
" 患者:女 43岁\n",
" 压抑\n",
" 05.28\n",
" 压抑 个人情况:去年1月份开始夫妻两地分居,孩子13岁男孩住校,平… 这种情况是否需要去…\n",
" 115\n",
" 杨胜文\n",
" 襄阳市安定医院\n",
" 心理科\n",
" 女\n",
" 43\n",
" \n",
" \n",
" 1\n",
" 患者:女 32岁\n",
" 生气。心梗。抑郁\n",
" 05.28\n",
" 生气。心梗。抑郁 郁郁寡欢。被他人语言刺激。卧床不起。没动力。心疼。受伤 是什么病。怎么办\n",
" 12\n",
" 郭汉法\n",
" 泰安八十八医院\n",
" 临床心理科\n",
" 女\n",
" 32\n",
" \n",
" \n",
" 2\n",
" 患者:女 15岁\n",
" 情绪低落,烦躁抑郁\n",
" 05.28\n",
" 情绪低落,烦躁抑郁 情绪低落,压抑烦躁,思考能力降低。长时间学习,睡眠时间少。睡… 还有…\n",
" 2\n",
" 郭苏皖\n",
" 南京脑科医院\n",
" 医学心理科\n",
" 女\n",
" 15\n",
" \n",
" \n",
" 3\n",
" 患者:女 16岁\n",
" 抑郁\n",
" 05.28\n",
" 抑郁 前面已简述,2024年夏季中考,本来学习非常好,非常自律,自… 已经服用9个月的艾…\n",
" 2\n",
" 刘丽\n",
" 联勤保障部队第九〇四医院(常州院区)\n",
" 精神3科(物质依赖科)\n",
" 女\n",
" 16\n",
" \n",
" \n",
" 4\n",
" 患者:女 67岁\n",
" 焦虑症 严重躯干反应、抑郁症\n",
" 05.28\n",
" 焦虑症 严重躯干反应 抑郁症 草酸加量以后,还是有比较严重的躯干反应,主要表现为背痛 脖…\n",
" 2\n",
" 刘晓华\n",
" 上海市精神卫生中心\n",
" 精神科\n",
" 女\n",
" 67\n",
" \n",
" \n",
“\n”,
“
],
"text/plain": [
" Patient_name Label Date \\\n",
"0 患者:女 43岁 压抑 05.28 \n",
"1 患者:女 32岁 生气。心梗。抑郁 05.28 \n",
"2 患者:女 15岁 情绪低落,烦躁抑郁 05.28 \n",
"3 患者:女 16岁 抑郁 05.28 \n",
"4 患者:女 67岁 焦虑症 严重躯干反应、抑郁症 05.28 \n",
"\n",
" Title Communications Doctor \\\n",
"0 压抑 个人情况:去年1月份开始夫妻两地分居,孩子13岁男孩住校,平... 这种情况是否需要去... 115 杨胜文 \n",
"1 生气。心梗。抑郁 郁郁寡欢。被他人语言刺激。卧床不起。没动力。心疼。受伤 是什么病。怎么办 12 郭汉法 \n",
"2 情绪低落,烦躁抑郁 情绪低落,压抑烦躁,思考能力降低。长时间学习,睡眠时间少。睡... 还有... 2 郭苏皖 \n",
"3 抑郁 前面已简述,2024年夏季中考,本来学习非常好,非常自律,自... 已经服用9个月的艾... 2 刘丽 \n",
"4 焦虑症 严重躯干反应 抑郁症 草酸加量以后,还是有比较严重的躯干反应,主要表现为背痛 脖... 2 刘晓华 \n",
"\n",
" Hospital Faculty Sex Age \n",
"0 襄阳市安定医院 心理科 女 43 \n",
"1 泰安八十八医院 临床心理科 女 32 \n",
"2 南京脑科医院 医学心理科 女 15 \n",
"3 联勤保障部队第九〇四医院(常州院区) 精神3科(物质依赖科) 女 16 \n",
"4 上海市精神卫生中心 精神科 女 67 "
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
“source”: [
“#获取性别,作为新列\n”,
“#患者:女 43岁,首先按照空格拆分,结果为[患者:女]\[ ]\[43岁],选取第一个,第二次按照中文冒号拆分,[患者][: ][女]\n”,
“df[‘Sex’]=df[‘Patient_name’].map(lambda x:x.split(” “)[0]).map(lambda x:x.split(”:“)[-1])\n”,
“\n”,
“#获取年龄,作为新列\n”,
“#患者:女 43岁,首先按照空格拆分,结果为[患者:女]\[ ]\[43岁],选取第三个,并且去掉“岁”\n”,
“df[‘Age’]=df[‘Patient_name’].map(lambda x:x.split(” “)[2][:-1])\n”,
“\n”,
“df.head()”
]
},
{
“cell_type”: “markdown”,
“metadata”: {},
“source”: [
“## 处理空缺值”
]
},
{
“cell_type”: “code”,
“execution_count”: 30,
“metadata”: {},
“outputs”: [
{
“data”: {
“text/plain”: [
“Patient_name 0\n”,
“Label 0\n”,
“Date 112\n”,
“Title 0\n”,
“Communications 0\n”,
“Doctor 0\n”,
“Hospital 0\n”,
“Faculty 0\n”,
“Sex 0\n”,
“Age 0\n”,
“dtype: int64”
]
},
“execution_count”: 30,
“metadata”: {},
“output_type”: “execute_result”
}
],
“source”: [
“df.isnull().sum()”
]
},
{
“cell_type”: “code”,
“execution_count”: 31,
“metadata”: {},
“outputs”: [],
“source”: [
“#因为空缺数据较少,并且不适合使用填充法,故而删除\n”,
“df.dropna(inplace=True)#在原来的数据上删除”
]
},
{
“cell_type”: “code”,
“execution_count”: 32,
“metadata”: {},
“outputs”: [
{
“name”: “stdout”,
“output_type”: “stream”,
“text”: [
“<class ‘pandas.core.frame.DataFrame’>\n”,
“Int64Index: 8288 entries, 0 to 8399\n”,
“Data columns (total 10 columns):\n”,
" # Column Non-Null Count Dtype \n",
“— ------ -------------- ----- \n”,
" 0 Patient_name 8288 non-null object\n",
" 1 Label 8288 non-null object\n",
" 2 Date 8288 non-null object\n",
" 3 Title 8288 non-null object\n",
" 4 Communications 8288 non-null int64 \n",
" 5 Doctor 8288 non-null object\n",
" 6 Hospital 8288 non-null object\n",
" 7 Faculty 8288 non-null object\n",
" 8 Sex 8288 non-null object\n",
" 9 Age 8288 non-null object\n",
“dtypes: int64(1), object(9)\n”,
“memory usage: 712.2+ KB\n”
]
}
],
“source”: [
“df.info()”
]
},
{
“cell_type”: “markdown”,
“metadata”: {},
“source”: [
“## 修改Date列”
]
},
{
“cell_type”: “code”,
“execution_count”: 33,
“metadata”: {},
“outputs”: [],
“source”: [
“#df[‘Date’]\n”,
“#转换成字符串类型\n”,
“df[‘Date’]=df[‘Date’].astype(str)\n”,
“\n”,
“#定义函数,实现date列格式统一:年-月-日\n”,
“def trans_date(tag):\n”,
" if tag.startswith(“20”):#查看是否以20开头,即查看是否存在年\n",
" tag=tag.replace(“.”,“-”)\n",
" else:\n",
" tag=“2025-”+tag.replace(“.”,“-”)#否则加上年份\n",
" return tag\n",
“\n”,
“df[‘Date’]= df[‘Date’].map(lambda x:trans_date(x))#调用函数转换格式\n”,
“\n”,
“#转换成日期类型\n”,
“df[‘Date’]=pd.to_datetime(df[‘Date’])\n”,
“\n”,
“#df.info()”
]
},
{
“cell_type”: “markdown”,
“metadata”: {},
“source”: [
“# 数据可视化分析”
]
},
{
“cell_type”: “code”,
“execution_count”: 19,
“metadata”: {},
“outputs”: [],
“source”: [
“from pyecharts.globals import ThemeType #导入主题库”
]
},
{
“cell_type”: “markdown”,
“metadata”: {},
“source”: [
“## 查看患者性别分布情况”
]
},
{
“cell_type”: “code”,
“execution_count”: 41,
“metadata”: {},
“outputs”: [
{
“data”: {
“text/html”: [
“\n”,
“\n"
],
“text/plain”: [
“<pyecharts.render.display.HTML at 0x18dbfdb6100>”
]
},
“execution_count”: 41,
“metadata”: {},
“output_type”: “execute_result”
}
],
“source”: [
“#准备数据:按照性别统计个数\n”,
“data=df[‘Sex’].value_counts()\n”,
“#data\n”,
“x=data.index.tolist()\n”,
“y=data.tolist()\n”,
“\n”,
“#绘制饼图\n”,
"pie=(\n",
" Pie(init_opts=opts.InitOpts(theme=ThemeType.LIGHT))#设置主题\n",
" .add(\"\",\n",
" [list(z) for z in zip(x,y)],#数据需要打包成[(key,value),(key,value),...]\n",
" label_opts=opts.LabelOpts(formatter=\"{b}:{d}%\")#以百分比形式显示标签\n",
" )\n",
" .set_global_opts(title_opts=opts.TitleOpts(title=\"患者性别分布情况\"))\n",
")\n",
"pie.render_notebook()"
]
},
{
“cell_type”: “markdown”,
“metadata”: {},
“source”: [
“## 患者年龄分布情况”
]
},
{
“cell_type”: “code”,
“execution_count”: 34,
“metadata”: {},
“outputs”: [],
“source”: [
“#数据准备\n”,
“#1.转换年龄为数值类型\n”,
“#df[‘Age’]=df[‘Age’].astype(int)\n”,
“#因为年龄数据不规范,存在:X岁Y月 形式的数据,再次进行数据处理\n”,
“df[‘Age’]=df[‘Age’].map(lambda x:“1” if (“天” in x or “个” in x or “月” in x) else x).astype(int)\n”,
“#df.info()”
]
},
{
“cell_type”: “code”,
“execution_count”: 40,
“metadata”: {},
“outputs”: [
{
“data”: {
“text/html”: [
“\n”,
“\n"
],
“text/plain”: [
“<pyecharts.render.display.HTML at 0x1f3c6256940>”
]
},
“execution_count”: 6,
“metadata”: {},
“output_type”: “execute_result”
}
],
“source”: [
“data=df[‘Faculty’].value_counts()[:10] #选取前十科室\n”,
“\n”,
“pie=(\n”,
" Pie()\n",
" .add(‘’,[list(z) for z in zip(data.index.tolist(),data.tolist())],#饼图数据格式[[key1,value1],[key2,value2],…]\n",
" label_opts=opts.LabelOpts(formatter=“{b}:{d}%”)#标签格式\n",
" )\n",
“)\n”,
“pie.render_notebook()”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“metadata”: {},
“outputs”: [],
“source”: []
}
],
“metadata”: {
“kernelspec”: {
“display_name”: “Python 3”,
“language”: “python”,
“name”: “python3”
},
“language_info”: {
“codemirror_mode”: {
“name”: “ipython”,
“version”: 3
},
“file_extension”: “.py”,
“mimetype”: “text/x-python”,
“name”: “python”,
“nbconvert_exporter”: “python”,
“pygments_lexer”: “ipython3”,
“version”: “3.8.5”
}
},
“nbformat”: 4,
“nbformat_minor”: 4
}
