encoding - Converting ISO-8859-1 to UTF-8 for MultipartFormData in Play2 + Scala when parsing email from Sendgrid -
i have hooked play2+scala application sendgrid parse api , i'm struggling in decoding , encoding content of email.
since emails in different encodings sendgrid provides json object charsets:
{"to":"utf-8","cc":"utf-8","subject":"utf-8","from":"utf-8","text":"iso-8859-1","html":"iso-8859-1"}
in test case "text"
"med vänliga hälsningar jakobs webshop"
if extract multipart request , print out:
logger.info(request.body.dataparts.get("text").get)
i get:
med v?nliga h?lsningar jakobs webshop
ok given info sendgrid let's fix string utf-8.
def parsemail = action(parse.multipartformdata) { request => { val inputbuffer = request.body.dataparts.get("text").map { v => bytebuffer.wrap(v.head.getbytes()) } val fromcharset = charset.forname("iso-8859-1") val tocharset = charset.forname("utf-8") val data = fromcharset.decode(inputbuffer.get) logger.info(""+data) val outputbuffer = tocharset.encode(data) val text = new string(outputbuffer.array()) // save stuff mongodb instance }
this results in:
med v�nliga h�lsningar jakobs webshop
so strange. should work. wonder happens in body parser parse.multipartformdata
, datapart handler:
def handledatapart: parthandler[part] = { case headers @ partinfomatcher(partname) if !fileinfomatcher.unapply(headers).isdefined => traversable.takeupto[array[byte]](default_max_text_length) .transform(iteratee.consume[array[byte]]().map(bytes => datapart(partname, new string(bytes, "utf-8")))(play.core.execution.internalcontext)) .flatmap { data => cont({ case input.el(_) => done(maxdatapartsizeexceeded(partname), input.empty) case in => done(data, in) }) }(play.core.execution.internalcontext) }
when consuming data new string created encoding utf-8:
.transform(iteratee.consume[array[byte]]().map(bytes => datapart(partname, new string(bytes, "utf-8")))(play.core.execution.internalcontext))
does mean iso-8859-1 encoded string text encoded utf-8 when parsed? if so, how should create parser decode , encode params according provided json object charsets? i'm doing wrong can't figure out!
you'll need copy implementation of parse.multipartformdata function, changing decodings utf-8
iso-8859-1
, , use in action.
the problem play decodes utf-8
default, , there no way change that, other implementing own parser.
Comments
Post a Comment