android - Parsing ASCII characters with Erlang -

September 15, 2014

confused parsing needs done , @ end client/server.

when send umlaut 'Ö' ejabberd,  received ejabberd <<"195, 150">>

following send client push notifications (via gcm/apns silently). there, client builds utf-8 decoding on each numeral 1 one (this wrong).

i.e. 195 first decoded gibberish character � , on.

this reconstruction needs identification if 2 bytes entertained or 3 or more. varies language of letters (german here e.g.).

how client identify language going reconstruct (no. of bytes decode in 1 go)?

to add more,

lists:flatten(mochijson2:encode({struct,[{registration_ids,[reg_id]},{data ,[{message,message},{type,type},{enum,enum},{groupid,groupid},{groupname,groupname},{sender,sender_list},{receiver,content_list}]},{time_to_live,2419200}]})).

produced json as:

"{\"registration_ids\":[\"apa91bgljnkhqzlqfep7mto9p1vu9s92_a0uizluhnhl4xdftaz_0hpd5sisb4jnrpi2d7_c8d_mbhut_k-t2bo_i_g3jt1kiqbgqkrfwb3gp1jegatromsfg4gajsekclzffijeeyow\"],\"data\":{\"message\":[104,105],\"type\":[71,82,79,85,80],\"enum\":2001,\"groupid\":[71,73,68],\"groupname\":[71,114,111,117,112,78,97,109,101],\"sender\":[49,64,100,101,118,108,97,98,47,115,100,115],\"receiver\":[97,115,97,115]},\"time_to_live\":2419200}"

where had given "hi" message , mochijson gave me ascii values [104,105].

the groupname field given value "groupname", asciis correct after json creation i.e. 71,114,111,117,112,78,97,109,101

however when use http://www.unit-conversion.info/texttools/ascii/

it decodes Ǎo��me , not "groupname".

so, should parsing? how same should handled.

my reconstructed message gibberuish when ascii reconstructed.

thanks

the things worry here manyfold, , has both encoding desired or datastructure. in erlang, text handled in 1 of following ways:

lists of bytes ([0..255, ...])
- this if listen socket , data returned list.
- the vm assumes no encoding. they're bytes , mean little more.
- the vm can interpret these strings (say in io:format("~s~n", [list])). when happens (with ~s flag specifically), vm assumes encoding latin-1 (iso-8859-1).
lists of unicode codepoints ([0..1114111, ...]).
- you may files read unicode and list.
- you can use them in output when have formatter such io:format("~ts~n", [list]) ~ts ~s unicode.
- those lists represent codepoints see in unicode standard, without encoding (they not utf-x)
- this can work in conjunction latin-1 lists of characters because unicode codepoints , latin1 characters have same sequence numbers below 255.
binaries (<<0..255, ...>>)
- this if listen or read to/from under binary format.
- the vm can told assume many things:
  1. they sequences of bytes (0..255) without specific meaning (<<bin/binary>>)
  2. they utf-8 encoded sequences (<<bin/utf-8>>)
  3. they utf-16 encoded sequences (<<bin/utf-16>>)
  4. they utf-32 encoded sequences (<<bin/utf-32>>)
- io:format("~s~n", [bin]) still assume sequence latin-1 sequence; io:format("~ts~n", [bin]) assume utf-8 only.
a mixed list of both unicode lists , utf-encoded binaries (known iodata()), used exclusively output.

so in gist:

lists of bytes
lists of latin-1 characters
lists of unicode codepoints
binary of bytes
utf-8 binary
utf-16 binary
utf-32 binary
lists of many of these output concatenated

also note: until version 17.0, erlang source files latin-1 only. 17.0 added option have compiler read source file unicode adding header:

%% -*- coding: utf-8 -*-

the next factor json, specification, assuming utf-8 encoding has. furthermore, json libraries in erlang tend assume binary string, , lists json arrays.

this means if want output adequate, must use utf-8 encoded binaries represent json.

if have is:

a list of bytes represent utf-encoded string, list_to_binary(list) proper binary representation
a list of codepoints, use unicode:characters_to_binary(list, unicode, utf8) utf-8 encoded binary
a binary representing latin-1 string: unicode:characters_to_binary(bin, latin1, utf8)
a binary of other utf encoding: unicode:characters_to_binary(bin, utf16 | utf32, utf8)

take utf-8 binary, , send json library. if json library correct and client parses properly, should right.

Search This Blog

Macro

android - Parsing ASCII characters with Erlang -

Comments

Post a Comment

Popular posts from this blog

How to connect android app to App engine -

gcc - MinGW's ld cannot perform PE operations on non PE output file -

php - display validation error message next to the textbox in codeigniter -