Encode Unicode

The choice of Unicode is preserved, the only change was from UTF-16 encode (of Unicode) to UTF-8.

About tools like "regular templates" and separators, see "" (F002) or "" (F003) of the PUA as neutral separators for "free UTF-8 alphabet" in multilingual corpus. F002 and F003 are only a text separators that are not content, and can be also in split or regular-expression operarations... So it is an internal representation on software-control.

Convention: use  (F002) as open-tag and  (F003) as close-tag. Example:
<section class="main"><p>Hello</p><p>Bye!</p></section>
substituted by representation of array+encode:
["section class=\"main\"","p","p"]
+
HelloBye!

So, it is easy to do fast what is not so fast in DOMDocument representation. Fast translate to TXT (tr// /), to replace tags, to analyze neighborhoods (eg. sequence any-open-tag and "Hello") and to analyze structre by complex regular expressions, where XPath is not so good.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encode Unicode

Clone this wiki locally