System.Text.Encoding.WebName is wrong for System.Text.UnicodeEncoding and System.Text.UTF32Encoding #63605
Description
Description
Not sure if this a bug or a mistake in the API documentation; but per the API documentation, "WebName
returns an IANA-registered name for the encoding"; however IANA cites RFC 2781 for the the proper names for UTF-16 encodings and Unicode Standard Annex #19 for the proper names for UTF-32 encodings. Both resources require a BOM to be used if and only if "le" and "be" are omitted in the name.
Reproduction Steps
using System;
using System.Text;
namespace Bug {
static class Program {
static void Main() {
// Displays "utf-16"; but since we don't include a BOM, it MUST display "utf-16le" per RFC 2781 Section 3.3.
Console.WriteLine(new UnicodeEncoding(false, false, true).WebName);
// Displays "utf-16BE"; but since we include a BOM, it MUST display "utf-16" per RFC 2781 Section 3.3.
Console.WriteLine(new UnicodeEncoding(true, true, true).WebName);
// Displays "utf-32"; but since we don't include a BOM, it MUST display "utf-32le" per UAX #19.
Console.WriteLine(new UTF32Encoding(false, false, true).WebName);
// Displays "utf-32BE"; but since we include a BOM, it MUST display "utf-32" per UAX #19.
Console.WriteLine(new UTF32Encoding(true, true, true).WebName);
}
}
}
Expected behavior
new System.Text.UnicodeEncoding(false, false, true).WebName
to equal utf-16le
, new System.Text.UnicodeEncoding(true, true, true).WebName
to equal utf-16
, new System.Text.UTF32Encoding(false, false, true).WebName
to equal utf-32le
, and new System.Text.UTF32Encoding(true, true, true).WebName
to equal utf-32
.
Actual behavior
System.Text.UnicodeEncoding
always returns utf-16
when little-endian regardless of the BOM bool
, System.Text.UnicodeEncoding
always returns utf-16BE
when big-endian regardless of the BOM bool
, System.Text.UTF32Encoding
always returns utf-32
when little-endian regardless of the BOM bool
, and System.Text.UTF32Encoding
always returns utf-32BE
when big-endian regardless of the BOM bool
.
Regression?
I don't believe this worked with previous .NET versions. I have verified that .NET 5 and .NET Framework 4.7.2 also have this issue.
Known Workarounds
When the System.Text.Encoding
is known ahead of time, hardcode the proper value yourself. In the event the actual System.Text.Encoding
is not known, use reflection to test if the runtime type is one of these types as well as call GetPreamble()
and test if it's empty.
Configuration
.NET versions: .NET 6, .NET 5, and .NET Framework 4.7.2
OS: Windows 10
CPU architecture: x64
I highly doubt the configuration matters.
Other information
No response