Commit Briefs

Thomas Adam

got-notify-http: fix unicode handling

JSON strings are made of UNICODE codepoints, of which only \, " and control characters have to be escaped, and the whole document MUST be encoded in UTF-8. The current code generates invalid strings for non-ASCII characters, so it has to be made UTF-8 aware. tedu' isu8cont() can't be used since it allows surrogate pairs and overlong sequences which will cause decoding errors on the receiving side. Similarly, mbtowc() depends on the current locale and could cause issues in -portable. Instead, bundle Björn Höhrmann's "Flexible and Economical UTF-8 Decoder" and use it to parse the text. Decoding errors results in the replacement character U+FFFD being emitted and the bytes considered so far to be discarded; the decoder is then restarted with the next byte. Git commit messages don't carry the notion of the encoding, but it's reasonable to expect UTF-8 (which is a superset of ASCII). For other more esotic encodings, the commit id can be used to manually extract the data. ok stsp@