Description
I was wondering how to replace undefined conversions by a substitute character when they are outside of the destination encoding, e.g. when I try to convert the euro sign (€) to SHIFT JIS encoding.
In Ruby, we can do this:
"xx€xx".encode('SHIFT_JIS', 'UTF-8', undef: :replace)
=> "xx?xx"
And the € which cannot be converted is replaced by a "?" character. This is important when doing text comparison i.e. https://unicode.org/reports/tr36/#Text_Comparison
When converting charsets, never simply omit characters that cannot be converted; at least substitute U+FFFD (when converting to Unicode) or 0x1A (when converting to bytes) to reduce security problems.
Can we do this using iconv library in Elixir/Erlang? Currently the undefined character is omitted. I guess I could do the conversion char by char and check if it returns an empty string but I was hoping if there is anything more elegant possible?
Activity
krepflap commentedon Jun 28, 2019
If any one stumbles upon this, I'm using this to handle the case above, though it does call :iconv.convert for every character.