Python 错误 UnicodeEncodeError: ‘charmap’ codec can’t encode characters ：character maps to undefined

Python 2年前 58

当我们使用不正确的编解码器将字符串编码为字节时，会出现 Python “UnicodeEncodeError: ‘charmap’ codec can’t encode characters in position”。要解决该错误，需要在打开文件或对字符串进行编码时指定正确的编码，例如 UTF-8。

Python 错误 UnicodeEncodeError: 'charmap' codec can't encode characters ：character maps to undefined

下面是产生错误的示例代码

my_str = 'hello ?Ḇ??٤ḞԍНǏ'

# ⛔️ UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-8: character maps to <undefined>
my_bytes = my_str.encode('cp856')

该错误是因为字符串无法使用指定的编码进行编码。

要解决错误，需要使用正确的编码对字符串进行编码，例如 UTF-8。

my_str = 'hello ?Ḇ??٤ḞԍНǏ'

my_bytes = my_str.encode('utf-8')

# ?️ b'hello \xf0\x9d\x98\x88\xe1\xb8\x86\xf0\x9d\x96\xa2\xf0\x9d\x95\xaf\xd9\xa4\xe1\xb8\x9e\xd4\x8d\xd0\x9d\xc7\x8f'
print(my_bytes)

utf-8 编码能够以 Unicode 编码超过一百万个有效字符代码点。

我们可以在官方文档的这个表格中查看所有标准编码。

如果在打开文件时遇到错误，需要在调用 open() 函数时将 encoding 关键字参数设置为 utf-8。

my_str = 'hello ?Ḇ??٤ḞԍНǏ'

with open('example.txt', 'w', encoding='utf-8') as f:
    f.write(my_str)

下面是完整的代码

my_str = 'hello ?Ḇ??٤ḞԍНǏ'

# ?️ 字符串编码为字节
my_bytes = my_str.encode('utf-8')
print(my_bytes)

# ?️ 字节解码为字符串
my_str_again = my_bytes.decode('utf-8')
print(my_str_again)  # ?️ "hello ?Ḇ??٤ḞԍНǏ"

解码字节对象时，我们必须使用与将字符串编码为字节对象相同的编码。

如果使用 utf-8 编码时错误仍然存在，需要尝试将 errors 关键字参数设置为 ignore 以忽略无法编码的字符。

my_str = 'hello ?Ḇ??٤ḞԍНǏ'

# ?️ 字符串编码为字节
my_bytes = my_str.encode('utf-8', errors='ignore')
print(my_bytes)

# ?️ 字节解码为字符串
my_str_again = my_bytes.decode('utf-8', errors='ignore')
print(my_str_again)  # ?️ "hello ?Ḇ??٤ḞԍНǏ"

请注意，忽略无法编码的字符可能会导致数据丢失。

我们还可以将 errors 关键字参数设置为 ignore 以在打开文件时忽略任何编码错误。

my_str = 'hello ?Ḇ??٤ḞԍНǏ'

with open('example.txt', 'w', encoding='utf-8', errors='ignore') as f:
    f.write(my_str)

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布，任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站。本站所有源码与软件均为原作者提供，仅供学习和研究使用。如您对本站的相关版权有任何异议，或者认为侵犯了您的合法权益，请及时通知我们处理。

Python

相关文章