正则,使用分组引用来匹配中文时无法匹配
>>> re.findall('(\\d)(asd)\\1'," 4asd4",re.I)
[('4', 'asd')]
>>> re.findall('(\\d)(asd)\\1',"中文 4asd4",re.I)
[]
>>> re.findall(u'(\\d)(asd)\\1',u"中文 4asd4",re.I)
[]
请问这个问题要如何解决?
[解决办法]
嗯,貌似bug。试了其它正则模块正常...
regex: an alternative regular expression module, to replace re.
>>> import regex as re
>>> re.findall(r'(\d)(asd)\1',"中文4aSd4", re.I)
[('4', 'aSd')]
>>>
[解决办法]
中文unicode下才会方便,避开re.I,英文大小写嘛就用简单笨拙的方法吧...
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import re
>>> re.findall(r'(.)([aA][sS][dD])\1', "中文 中aSd中")
[('中', 'aSd')]
>>> re.findall(r'(.)([aA][sS][dD])\1', "中文 'aSd'")
[("'", 'aSd')]
>>> re.findall(r'(.)([aA][sS][dD])\1', "中文 4aSd4")
[('4', 'aSd')]
>>>