Skip to content

System.Text.Encoding.WebName is wrong for System.Text.UnicodeEncoding and System.Text.UTF32Encoding #63605

Open
@zacknewman

Description

Description

Not sure if this a bug or a mistake in the API documentation; but per the API documentation, "WebName returns an IANA-registered name for the encoding"; however IANA cites RFC 2781 for the the proper names for UTF-16 encodings and Unicode Standard Annex #19 for the proper names for UTF-32 encodings. Both resources require a BOM to be used if and only if "le" and "be" are omitted in the name.

Reproduction Steps

using System;
using System.Text;

namespace Bug {
    static class Program {
        static void Main() {
            // Displays "utf-16"; but since we don't include a BOM, it MUST display "utf-16le" per RFC 2781 Section 3.3.
            Console.WriteLine(new UnicodeEncoding(false, false, true).WebName);
            // Displays "utf-16BE"; but since we include a BOM, it MUST display "utf-16" per RFC 2781 Section 3.3.
            Console.WriteLine(new UnicodeEncoding(true, true, true).WebName);
            // Displays "utf-32"; but since we don't include a BOM, it MUST display "utf-32le" per UAX #19.
            Console.WriteLine(new UTF32Encoding(false, false, true).WebName);
            // Displays "utf-32BE"; but since we include a BOM, it MUST display "utf-32" per UAX #19.
            Console.WriteLine(new UTF32Encoding(true, true, true).WebName);
        }
    }
}

Expected behavior

new System.Text.UnicodeEncoding(false, false, true).WebName to equal utf-16le, new System.Text.UnicodeEncoding(true, true, true).WebName to equal utf-16, new System.Text.UTF32Encoding(false, false, true).WebName to equal utf-32le, and new System.Text.UTF32Encoding(true, true, true).WebName to equal utf-32.

Actual behavior

System.Text.UnicodeEncoding always returns utf-16 when little-endian regardless of the BOM bool, System.Text.UnicodeEncoding always returns utf-16BE when big-endian regardless of the BOM bool, System.Text.UTF32Encoding always returns utf-32 when little-endian regardless of the BOM bool, and System.Text.UTF32Encoding always returns utf-32BE when big-endian regardless of the BOM bool.

Regression?

I don't believe this worked with previous .NET versions. I have verified that .NET 5 and .NET Framework 4.7.2 also have this issue.

Known Workarounds

When the System.Text.Encoding is known ahead of time, hardcode the proper value yourself. In the event the actual System.Text.Encoding is not known, use reflection to test if the runtime type is one of these types as well as call GetPreamble() and test if it's empty.

Configuration

.NET versions: .NET 6, .NET 5, and .NET Framework 4.7.2
OS: Windows 10
CPU architecture: x64
I highly doubt the configuration matters.

Other information

No response

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions