LKBEN11009: What is the default encoding for an XML document
LKB | Created: 02/04/2020 | Version: 0 | Language: EN | Rating: 0 | Outdated: False | Marked for deletion: False
Author: Wim Peeters - Keskon GmbH & Co. KG
Symptom
You do not have an encoding in an XML document and need to know it
Cause
This is by design
Solution
If no encoding declaration is present in the XML document (and no external encoding declaration mechanism such as the HTTP header is available), the assumed encoding of an XML document depends on the presence of the Byte-Order-Mark (BOM).
The Byte-Order-Mark (or BOM), is a special marker added at the very beginning of an Unicode file encoded in UTF-8, UTF-16 or UTF-32. It is used to indicate whether the file uses the big-endian or little-endian byte order. The BOM is mandatory for UTF-16 and UTF-32, but it is optional for UTF-8.
The BOM is a Unicode special marker placed at the top of the file that indicate its encoding. The BOM is optional for UTF-8.
First bytes Encoding assumed
EF BB BF UTF-8
FE FF UTF-16 (big-endian)
FF FE UTF-16 (little-endian)
00 00 FE FF UTF-32 (big-endian)
FF FE 00 00 UTF-32 (little-endian)
None of the above UTF-8
Note that the encoding of an XML document is never iso-8859-1 by default.
One of the most common mistake when editing an XML document is to add some extended characters and forget to set the encoding declaration at the top of the document.
About the Author
Wim Peeters is electronics engineer with an additional master in IT and over 30 years of experience including time spent in support, development, consulting, training and database administration. Wim has worked with SQL Server since version 6.5. He has developed in C/C++, Java and C# on Windows and Linux in different European countries and different European languages. He writes knowledge base articles to solve IT problems and publishes them on the Lubby Knowledge Platform where he is one of the most important contributors and the main developer.