Chunked conversion¶

Never split a legacy byte stream into chunks and decode each chunk with one-shot decode independently. Some encodings use multiple bytes per character, and a chunk boundary can appear in the middle of a sequence.

Incorrect shape¶

// Avoid this for arbitrary chunks.
auto a = polycpp::iconv_lite::decode(firstChunk, "gbk");
auto b = polycpp::iconv_lite::decode(secondChunk, "gbk");

Each call creates and flushes a new decoder. If firstChunk ends with a lead byte, the decoder has no chance to combine it with secondChunk.

Correct shape with Decoder¶

auto decoder = polycpp::iconv_lite::getDecoder("gbk");
std::string text;
text += decoder.write(firstChunk);
text += decoder.write(secondChunk);
text += decoder.end();

Keep one decoder for the lifetime of the byte stream. Call end exactly once when no more bytes are expected, so incomplete trailing sequences can be handled according to iconv-lite behavior.

Correct shape with DecodeStream¶

auto decoder = polycpp::iconv_lite::decodeStream("utf8");
decoder.write(firstChunk);
decoder.write(secondChunk);
decoder.end();

DecodeStream uses the same stateful decoder internally and fits stream pipelines. The readable side emits UTF-8 buffers because polycpp streams carry bytes.

Chunk-sensitive encodings¶

Use stateful conversion for arbitrary chunks of:

UTF-8, UTF-16, and UTF-32
UTF-7 and UTF-7-IMAP
Shift_JIS, GBK, GB18030, Big5, EUC-JP, and EUC-KR
base64 and other codecs with buffered output

Single-byte encodings such as latin1 or Windows-1251 are less sensitive to split character sequences, but using one stateful converter still keeps code uniform and makes later encoding changes safer.

For the base64 label, follow iconv-lite’s direction: encoding consumes base64 text and produces bytes, while decoding consumes bytes and produces base64 text.