【Python】：re.error bad escape i at position 4

# 问题

try:
    replaceMdPic = r'(?<=\])\('
    cwd = os.getcwd()
    re.sub(replaceMdPic, cwd, line, 1)
catch:
    print('替换错误')

1
2
3
4
5
6

报错bad escape /i at position 4.md

# 原因

错误为尝试编译一个包含不当转义字符的正则表达式时。具体来说，\z在正则表达式中并不是一个有效的转义序列，这就导致了编译错误。

# 这两个字符串等价
str1 = r'(?<=\])\(' # 允许我们输入原生的字符串，python会帮我们转义为python格式的字符串
str2 = '(?<=\\])\\('
# str1 == str2

1
2
3
4

正则表达式的re库执行前，会触发python解析字符串，python发现了无效的字符串从而抛出错误，表面上是re库抛出的，实际上是python解析字符串是抛出的错误

只是简单的字符串赋值、输出等不会触发字符串解析，同样的可能会触发python解析字符串的场景：

re 正则表达式
open() 读取文件路径
json 解析字符串
eval() 解析字符串

再来看python解析字符串的规则

re.sub( r'C:','C:\user',r'D:',1) # error:bad escape \u at position 2

re.sub( r'C:',r'C:\user',r'D:',1) # error:bad escape \u at position 2
 re.sub( r'C:',r'C:\\user',r'D:',1) # 'D:'
# 上面两个被转义为    
re.sub( r'C:','C:\\user',r'D:',1) # error:bad escape \u at position 2
 re.sub( r'C:','C:\\\\user',r'D:',1) # 'D:'

1
2
3
4
5
6
7

'C:\user'显然不行，\u被视为转义字符，'C:\\user'也不行就很诡异了，据说\u会先被解析导致的，'C:\\\\user'会被解析为两个\所以没事

# 解决

首先想到用re.escape()将输入的字符手动转义

try:
    replaceMdPic = r'(?<=\])\('
    cwd = os.getcwd() # 这不是一个合法的python字符串
    re.sub(replaceMdPic, re.escape(cwd) , line, 1) # 转义后输入
catch:
    print('替换错误')

1
2
3
4
5
6

但是这样替换的结果会带上转义后的内容

例如：

re.sub( r'C:',re.escape('C:\ser'),r'C:',1)

输出：'C:\\ser'，期望的输出为'C:\ser'

因此只好先编码用re替换后，再解码

# 先确定编码
with open(old_file, "rb") as f:
    raw_data = f.read(1000)
    result = chardet.detect(raw_data)
    encoding = result["encoding"]
    
try:
    replaceMdPic = r'(?<=\])\('
    cwd = re.escape(os.getcwd())
    re.sub(replaceMdPic, cwd, line, 1).encode().decode(encoding)
catch:
    print('替换错误')

1
2
3
4
5
6
7
8
9
10
11
12

上次更新: 2025/02/21, 14:57:10

← linux调试搭建ai知识助手→