Coldfusion web service consumed as WSDL chokes on control characters
Clients upload files in .doc format to a server directory, and the text within them is extracted using POI as per Ray Camdens posting here The content is saved in a text/memo field in a MySQL database and is made available as a web service which is consumed as wsdl. All works as intended, until consumers of the web service access records containing certain (I presume) control characters, at which point the web service throws a 500 error.
In the database the problem rows seem to have control characters, and when the text field is displayed in Firefox there's odd characters too.
The web service simply returns a CF query of returntype = "any" and is called as<cfinvoke webservice="https://nww.someplace.nhs.uk/cfcs/providerapi.cfc?wsdl" 开发者_JAVA技巧
method="getPendingReferrals" returnvariable="getReferrals">
<cfinvokeargument name="userName" value=#username#>
<cfinvokeargument name="password" value=#password#>
<cfinvokeargument name="maxrows" value=#maxrows#>
</cfinvoke>
I presume the WSDL cannot transmit these characters, so is there a way to encode them, or do I just have to strip them out using a regex or something?
<cfcomponent>
<cffunction output="false" access="remote" returntype="any" name="getPendingReferrals">
<cfargument required="false" name="userName" type="string"/>
<cfargument required="false" name="password" type="string"/>
<cfargument required="false" name="maxrows" type="numeric" default="20"/>
<cfset var q="">
<cfinvoke component="cfcs.security" method="checkAuthenticated" returnvariable="checkAuth">
<cfinvokeargument name="username" value="#arguments.userName#">
<cfinvokeargument name="password" value="#arguments.password#">
</cfinvoke>
<cfif checkAuth.authenticates is "true">
<!--- log the login --->
<cfset filename=#datepart("yyyy", now())#&#datepart("m", now())#&#datepart("d", now())#&"loginlog.txt">
<CFSET OUTFILE = "#application.Root#"&"logs\"&"#filename#">
<cfif #FileExists(OUTFILE)# is "Yes">
<cffile action="append" file="#OUTFILE#" output="#checkAuth.userName#, #now()#, #remote_addr#, #Left(http_user_agent, 50)#">
<cfelse>
<CFFILE action="write" output="#checkAuth.userName#, #now()#, #remote_addr#, #Left(http_user_agent, 50)#" file="#OUTFILE#">
</cfif>
<cfif checkAuth.organisationID is 1>
<cfset toStr="toID=1">
<cfelseif checkAuth.organisationID is 28>
<cfset toStr="(toID=28 OR toID=29)">
</cfif>
<cfquery name="q" datasource='mySqlData' maxrows=#arguments.maxrows#>
SELECT messages.messageID, messages.toID, messages.fromID AS referrerID, (SELECT CONCAT(title, ' ',firstName, ' ', lastname) FROM users WHERE users.userID = messages.fromID) as referrerName,messages.threadID, messages.messageBody, messages.dateCreated, messages.dateSent,
messages.deleted, messages.createdByID, (SELECT CONCAT(title, ' ',firstName, ' ', lastname) FROM users WHERE users.userID = messages.createdByID) as createdByName, (SELECT organisationName FROM organisations WHERE messages.originatingOrganisationID = organisations.organisationID) as originatingOrganisationName, messages.originatingOrganisationID, messages.viewed, messages.referral, messages.actioned, messages.patientID, messages.refTypeID, messages.specialtyID, organisations.organisationName AS toOrganisationName, patients.nhsNumber AS patientNHSnumber, patients.patientTitle, patients.patientLastname, patients.patientFirstname, patients.patientDOB, patients.address1 as patientAddress1, patients.address2 AS patientAddress2, patients.address3 AS patientAddress3, patients.address4 AS patientAddress4, patients.postcode AS patientPostcode, patients.patientPhone1
FROM users INNER JOIN (organisations INNER JOIN (patients INNER JOIN messages ON patients.patientID = messages.patientID) ON organisations.organisationID = messages.toID) ON users.userID = messages.fromID
WHERE #toStr#
AND NOT actioned
AND NOT originatingOrganisationID=3
ORDER BY messageID
</cfquery>
<cfif isQuery(q)>
<cfreturn q>
<cfelse>
<cfreturn "Error : in query">
</cfif>
<cfelse>
<cfreturn "Error : failed to authenticate">
</cfif>
</cffunction>
You should probably strip out all high-ascii characters using a regular expression. One of the best I've found was written up by Ben Nadel, here. (Though it wasn't perfect, and I made some improvements to it in the comments.)
Basically, if you just want to strip the high-ascii characters out, do this:
<cfset result = reReplace(messageBody, "[^\x20-\x7E\x0D\x09]", "", "all") />
This regular expression takes a white-list approach, allowing only printable characters to remain:
\x20-\x7E
= {space} ! " # $ % & ' ( ) * + , - . / 0-9 : ; < = > ? @ A-Z [ \ ] ^ _ ` a-z { | } ~\x0D
= carriage return\x09
= horizontal tab
If you like this approach to sanitization, you could use Sean Coyne's method for updating the query with a loop:
<cfloop query="q">
<cfset querySetCell(
q,
"messageBody",
clean(q.messageBody[q.currentRow]),
q.currentRow
)/>
</cfloop>
<cffunction name="clean">
<cfargument name="in" />
<cfreturn reReplace(arguments.in, "[^\x20-\x7E\x0D\x09]", "", "all") />
</cffunction>
It is not ideal, but you could try something like:
<cfloop query="q">
<cfset querySetCell(q,"messageBody",xmlFormat(q.messageBody[q.currentRow]),q.currentRow) />
</cfloop>
If xmlFormat fails to remove all the characters (it has been known to miss a few), you may need to write a manual method to strip them out.
As Sean is mentioning you will need to escape all kind of special characters to get valid XML - have a look at http://www.petefreitag.com/item/202.cfm for example
精彩评论