Commits
- Commit:
ea5e974da9b1047689411a00ecc0a9c1fb101d73
- From:
- Omar Polo <op@omarpolo.com>
- Date:
got-notify-http: fix unicode handling
JSON strings are made of UNICODE codepoints, of which only \, " and
control characters have to be escaped, and the whole document MUST
be encoded in UTF-8. The current code generates invalid strings
for non-ASCII characters, so it has to be made UTF-8 aware.
tedu' isu8cont() can't be used since it allows surrogate pairs and
overlong sequences which will cause decoding errors on the receiving
side. Similarly, mbtowc() depends on the current locale and could
cause issues in -portable.
Instead, bundle Björn Höhrmann's "Flexible and Economical UTF-8
Decoder" and use it to parse the text. Decoding errors results in
the replacement character U+FFFD being emitted and the bytes
considered so far to be discarded; the decoder is then restarted
with the next byte.
Git commit messages don't carry the notion of the encoding, but
it's reasonable to expect UTF-8 (which is a superset of ASCII).
For other more esotic encodings, the commit id can be used to
manually extract the data.
ok stsp@