encoder和decoder的区别_decode作用

encoder和decoder的区别_decode作用encoder和decoder的区别_decode作用

I’ve never been sure that I understand the difference between str/unicode decode and encode.

I know that str().decode() is for when you have a string of bytes that you know has a certain character encoding, given that encoding name it will return a unicode string.

I know that unicode().encode() converts unicode chars into a string of bytes according to a given encoding name.

But I don’t understand what str().encode() and unicode().decode() are for. Can anyone explain, and possibly also correct anything else I’ve gotten wrong above?

EDIT:

Several answers give info on what .encode does on a string, but no-one seems to know what .decode does for unicode.

解决方案

The decode method of unicode strings really doesn’t have any applications at all (unless you have some non-text data in a unicode string for some reason — see below). It is mainly there for historical reasons, i think. In Python 3 it is completely gone.

unicode().decode() will perform an implicit encoding of s using the default (ascii) codec. Verify this like so:

>>> s = u’ö’

>>> s.decode()

Traceback (most recent call last):

File “”, line 1, in

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xf6′ in position 0:

ordinal not in range(128)

>>> s.encode(‘ascii’)

Traceback (most recent call last):

File “”, line 1, in

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xf6′ in position 0:

ordinal not in range(128)

The error messages are exactly the same.

For str().encode() it’s the other way around — it attempts an implicit decoding of s with the default encoding:

>>> s = ‘ö’

>>> s.decode(‘utf-8’)

u’\xf6′

>>> s.encode()

Traceback (most recent call last):

File “”, line 1, in

UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xc3 in position 0:

ordinal not in range(128)

Used like this, str().encode() is also superfluous.

But there is another application of the latter method that is useful: there are encodings that have nothing to do with character sets, and thus can be applied to 8-bit strings in a meaningful way:

>>> s.encode(‘zip’)

‘x\x9c;\xbc\r\x00\x02>\x01z’

You are right, though: the ambiguous usage of “encoding” for both these applications is… awkard. Again, with separate byte and string types in Python 3, this is no longer an issue.

Published by

风君子

独自遨游何稽首 揭天掀地慰生平

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注