xmlrpc, libxml and numeric entities

Few months ago, I wrote a RPC client library for my framework (comodojo.org). It was made to support XML (with multicall support) and JSON (v2) rpc protocols, plus a non-standard encrypted transport mode.

Implementing json client was quite easy, although despite test cases are still not completed. Xml keeps me awake for almost 2 nights…

PHP XML-RPC Functions, and why not use them

Php xml-rpc functions are the fastest way to encode/decode values to/from xml, but (as the disclaymer) these functions are experimental and currently not documented.

For my purpose, there was also other constraints to not use them:

  1. There is no support for ex:nil or nil values
  2. CDATA cannot be injected directly in xml
  3. My implementation of binary/base64 content does not fits correctly with function behaviour

Looking on google, I could not find a simple library that can be integrated with my rpc-client code. This aspect was crucial, because I have planned to write other plugins using this library. This is why I decided to write my own.

SimpleXML, libxml and numeric entities

I started writing my lib using SimpleXML PHP extension. It was quite easy to realize and it seemed to work at first attempt. I did also some tests comparing native function output with mine: same result… nice!

While I was thinking about how to implement CDATA support, I decided to test my metaweblog implementation with new encoder (I use it as a basic wordpress client). This step led me straight into the rabbit hole.

In facts, using plain text or even base64 values, I was able to publish or edit a post correctly, but html code injected via js editor was not handled correctly and recognized by wordpress as a bunch of in-text HTML entities. Not good.

It tooks about 2 hour to realize that libxml always convert html entities in named ones and this behaviour is not completely compatible with most of xml-rpc server implementations (including wordpress).

Moreover, there is no way (as far as I know) to force libxml to change or skip entities conversion.

I had to think of a different solution.

Working encoder using XMLWriter

A good snippet I found to convert html in a xml safe way is here, but there is no way to integrate it with SimpleXML (html was re-converted at each SimpleXMLElement::asXML call).

The solution was rewriting my code using XMLWriter (that supports raw xml text) and previous snippet; the working code to encode a string correctly looks like:

where  self::numericEntities()  is a static method derived from above mentioned snippet.

In practice, injecting pre-encoded text into the xml body keeps it safe from automatic numeric-to-named entities auto translation.

It works quite well and is pretty fast (although is not as fast as native funcs!); I will also write tests for phpunit someday.

How to use library

You can find my code on GitHub or install it as a comoser dependency:

Basic how-to instructions are in the readme.md file, APIs are available here.

Lascia un commento

Il tuo indirizzo email non sarà pubblicato.