Python gbk编码怎么办偏僻字

2012-05-22

Python gbk编码怎么处理偏僻字?使用python 把数据从oracle 数据库数据同步到mysql 中，mysql 编码为utf-8

Python gbk编码怎么处理偏僻字?
使用python 把数据从oracle 数据库数据同步到mysql 中，mysql 编码为utf-8 ，在python代码中使用decode("gbk").encode("utf-8") 来将汉子转换，但是遇到偏僻字。。。就会转换失败，
if var is None:
return 'NULL';
else:
return str(var).decode('gbk').encode('utf-8');

在网上找到可以把错误忽略，但是经测试，发现返回的数据为空：
return str(var).decode('gbk','ignore').encode('utf-8');

能否有解决方法或者其他想法?

[解决办法]
var是什么，看起来str(var)没啥道理...
[解决办法]
多换集中编码尝试下，一般GBK都覆盖了简体中文的所有汉字的，不行可以看看CP936，GB2312之类的~
[解决办法]

Python code

Python 2.7.3 (default, Apr 20 2012, 22:39:59) [GCC 4.6.3] on linux2Type "copyright", "credits" or "license()" for more information.==== No Subprocess ====>>> import locale>>> locale.getdefaultlocale()('zh_CN', 'UTF-8')>>> a = '龚齐名'>>> a'\xe9\xbe\x9a\xe9\xbd\x90\xe5\x90\x8d'>>> b = a.decode('utf-8')>>> bu'\u9f9a\u9f50\u540d'>>> c = b.encode('gbk')>>> c'\xb9\xa8\xc6\xeb\xc3\xfb'>>> a, b, c('\xe9\xbe\x9a\xe9\xbd\x90\xe5\x90\x8d', u'\u9f9a\u9f50\u540d', '\xb9\xa8\xc6\xeb\xc3\xfb')>>> print a, b, c龚齐名 龚齐名 ??ÆëÃû>>> >>> print a龚齐名>>> print b龚齐名>>> print c??ÆëÃû>>>
[解决办法]
碰到字符型的数据不要随便str（）,不然很容易出问题的，因为一旦字符串里有ascii,str()这个函数就会报错，你要先判断var的类型，如果的__class__或者type()，检测数据类型，费字符串类型的数据是不用转码的，一般只有unicode的类型要处理，var.decode('gbk').encode('utf-8');不要随便str()，不然容易报错


[解决办法]
$ python
Python 2.7.2+ (default, Oct  4 2011, 20:03:08)  
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> # encoding: utf-8
...  
>>> a = '\xfe\x9f'
>>> b = a.decode('gbk')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'gbk' codec can't decode bytes in position 0-1: illegal multibyte sequence
>>> b = a.decode('gb18030')
>>> codings = ['gbk','gb18030','gb2312','cp936']
>>> def tryDecode(c):
...     for coding in codings:
...         try:
...             return c.decode(coding)
...         except:
...             pass
...     else:
...         print 'decode Chinese error;'
...  
>>> tryDecode(a)
u'\u4dae'

热点排行

perl python

Python gbk编码 怎么办偏僻字

Python gbk编码怎么办偏僻字