LKBEN11009: What is the default encoding for an XML document


Symptom

You do not have an encoding in an XML document and need to know it

Cause

This is by design

Solution

If no encoding declaration is present in the XML document (and no external encoding declaration mechanism such as the HTTP header is available), the assumed encoding of an XML document depends on the presence of the Byte-Order-Mark (BOM).

The Byte-Order-Mark (or BOM), is a special marker added at the very beginning of an Unicode file encoded in UTF-8, UTF-16  or UTF-32. It is used to indicate whether the file uses the big-endian or little-endian byte order. The BOM is mandatory for UTF-16 and UTF-32, but it is optional for UTF-8.

The BOM is a Unicode special marker placed at the top of the file that indicate its encoding. The BOM is optional for UTF-8.
First bytes     Encoding assumed
EF BB BF     UTF-8
FE FF     UTF-16 (big-endian)
FF FE     UTF-16 (little-endian)
00 00 FE FF     UTF-32 (big-endian)
FF FE 00 00     UTF-32 (little-endian)
None of the above     UTF-8

Note that the encoding of an XML document is never iso-8859-1 by default.

One of the most common mistake when editing an XML document is to add some extended characters and forget to set the encoding declaration at the top of the document.

Disclaimer:

The information provided in this document is intended for your information only. Lubby makes no claims to the validity of this information. Use of this information is at own risk!

About the Author

Author: Wim Peeters - Keskon GmbH & Co. KG

Wim Peeters is electronics engineer with an additional master in IT and over 30 years of experience, including time spent in support, development, consulting, training and database administration. Wim has worked with SQL Server since version 6.5. He has developed in C/C++, Java and C# on Windows and Linux. He writes knowledge base articles to solve IT problems and publishes them on the Lubby Knowledge Platform.