When Is XML Not XML?
By: Raymond Camden
Here is a mystery for folks. I’ve updated my parsing engine for coldfusionbloggers.org.
I’m using CFHTTP now so I can check Etag type stuff. I take the result text and save it to a file to be parsed by CFFEED.
But before I do that I check to ensure it’s valid XML. Here is where it gets weird. Charlie Griefer’s blog works with CFFEED directly, but isXML on the result returns false. But - I can xmlParse the string no problem. Simple example:
<cfset f= "http://cfblog.griefer.com/feeds/rss2-0.cfm?blogid=30">
<cfhttp url="#f#">
<cfset text = cfhttp.filecontent>
<cfif isXml(text)>
yes
<cfelse>
no
<cfset z = xmlParse(text)>
<cfdump var=”#z#”>
</cfif>
If you run this, you will see “no” output, and than an XML object. If you use CFFEED on the URL directly, that works as well. So it seems like isXML is being strict about something. I can update my code to try/catch an xmlParse obviously, but I’d rather figure out why the above is happening first.
Follow Up
Yesterday I wrote a post about an issue I found with isXML. Lots of good suggestions/ideas were posted in the comments, including one by Rick O which seems to have nailed down the issue. Basically, if you have an invalid tag within a CDATA block, ColdFusion will report the XML as being invalid. The specs say (from what I found), that anything should be allowed in CDATA. It seems like this would be a bug in ColdFusion. Here is a simple sample:
<cfsavecontent variable="test">
<foo>
<![CDATA[
<b>fdoo</i
]]>
</foo>
</cfsavecontent>
<cfoutput>#isxml(test)#</cfoutput>
The bad I tag at the end is enough to break ColdFusion’s isXML function.
Something to look out - and of course - don’t forget the issues with xmlFormat as well. xmlFormat will ignore “high” characters (like funky Microsoft quotes) resulting in XML that won’t be valid. My toXML CFC has it’s own xmlFormat function that tries to get around this.


