BOM handling¶

The byte order mark is a leading marker used by some UTF formats. In practice, applications often see it in CSV exports, XML files, UTF-16 text files, and partner feeds. The library follows iconv-lite behavior for stripping and prepending BOMs.

Decoding with the default policy¶

BOM-aware decoders strip an initial decoded U+FEFF by default:

auto text = polycpp::iconv_lite::decode(bytes, "utf8");

Only an initial BOM is affected. U+FEFF elsewhere in the decoded text is normal content.

Keeping a BOM as content¶

Disable stripping when the BOM is meaningful data:

polycpp::iconv_lite::DecodeOptions options;
options.stripBOM = false;
auto text = polycpp::iconv_lite::decode(bytes, "utf8", options);

When stripBOM is false, onBOMStripped is not called because no BOM is removed.

Observing BOM removal¶

Observe actual BOM removal with onBOMStripped:

bool removed = false;
polycpp::iconv_lite::DecodeOptions options;
options.onBOMStripped = [&] { removed = true; };
auto text = polycpp::iconv_lite::decode(bytes, "utf8", options);

The callback is useful for telemetry and compatibility checks: it tells you that the payload included an initial BOM without requiring the application to keep U+FEFF in the returned string.

Encoding with a BOM¶

For encoding, utf16 and utf32 auto encoders add a BOM by default. Other BOM-aware encodings add one only when requested:

polycpp::iconv_lite::EncodeOptions options;
options.addBOM = true;
auto bytes = polycpp::iconv_lite::encode("hello", "utf8", options);

Set addBOM=false to suppress the default BOM on utf16 or utf32.

Choosing a policy¶

Situation	Suggested policy
Reading modern UTF-8 text	Keep the default `stripBOM=true`.
Preserving exact text content for a diff or editor	Use `stripBOM=false`.
Exporting for a system that requires UTF-8 BOM	Set `EncodeOptions::addBOM=true`.
Exporting UTF-16 or UTF-32 with auto endianness	Use the default BOM unless the target explicitly forbids it.