Input Sanitization: Invalid XML Data, Validation
The article "INPUT SANITIZATION : INVALID XML DATA, VALIDATION" by Gaurav Thakur discusses how to handle invalid XML characters in input data, particularly when converting JSON to XML.
Key points:
-
Problem: Data valid for JSON might be invalid for XML, requiring sanitization.
-
Solution (Initial): Use a precompiled regex pattern to identify invalid XML 1.0 characters:
private static final String xml10pattern = "[^" + "\u0009\r\n" + "\u0020-\uD7FF" + "\uE000-\uFFFD" + "\ud800\udc00-\udbff\udfff" + "]"; -
Validation Method: A Java method
hasInValidXmlCharacterDatais provided to check if an input string contains invalid XML characters using the regex. It returns the invalid character found ornullif the input is valid. -
Curl Request Issue: The author found that validation would fail when sending data via
curlrequests, even though it worked from a main program or JUnit tests. -
Resolution: The issue was resolved by unescaping characters in the input payload before validation. The
StringEscapeUtils.unescapeJavamethod from Apache Commons Lang was used for this purpose. The updated validation method includes this unescaping step.
Comments (0)
Loading comments...