去掉文本中多余的空格,而文本中的英文之间的正常空格需要保留,输入输出如下:

input:我今天 赚了 10 个亿,老百姓very happy。
output:我今天赚了10个亿,老百姓very happy。

代码

def clean_space(text):
  """"
  处理多余的空格
  """
  match_regex = re.compile(u'[\u4e00-\u9fa5。\.,,::《》、\(\)()]{1} +(?<![a-zA-Z])|\d+ +| +\d+|[a-z A-Z]+')
  should_replace_list = match_regex.findall(text)
  order_replace_list = sorted(should_replace_list,key=lambda i:len(i),reverse=True)
  for i in order_replace_list:
    if i == u' ':
      continue
    new_i = i.strip()
    text = text.replace(i,new_i)
  return text

python去除英文单词之间多余的空格

# 第一种方法re.sub
import re 
s = "     info has been found (+/- 100 pages, and 4.5 mb of .pdf files) now i have to wait untill our team leader has processed it and learns html.     "
# re.sub函数
re.sub(" +", " ", s)


# 第二种方法s.split()
s = "     info has been found (+/- 100 pages, and 4.5 mb of .pdf files) now i have to wait untill our team leader has processed it and learns html.     "
s = ' '.join(s.split())

Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐