org.springframework.web.util.HtmlUtils,特殊字符转义
特殊字符转义
由于 Web 应用程序需要联合使用到多种语言,每种语言都包含一些特殊的字符,对于动态语言或标签式的语言而言,如果需要动态构造语言的内容时,一个我们经常会碰到的问题就是特殊字符转义的问题。下面是 Web 开发者最常面对需要转义的特殊字符类型:
?HTML 特殊字符;
?JavaScript 特殊字符;
?SQL 特殊字符;
如果不对这些特殊字符进行转义处理,则不但可能破坏文档结构,还可以引发潜在的安全问题。Spring 为 HTML 和 JavaScript 特殊字符提供了转义操作工具类,它们分别是 HtmlUtils 和 JavaScriptUtils。
HTML 特殊字符转义
HTML 中 <,>,& 等字符有特殊含义,它们是 HTML 语言的保留字,因此不能直接使用。使用这些个字符时,应使用它们的转义序列:
?&:&
?" :"
?< :<
?> :>
由于 HTML 网页本身就是一个文本型结构化文档,如果直接将这些包含了 HTML 特殊字符的内容输出到网页中,极有可能破坏整个 HTML 文档的结构。所以,一般情况下需要对动态数据进行转义处理,使用转义序列表示 HTML 特殊字符。下面的 JSP 网页将一些变量动态输出到 HTML 网页中:
清单 1. 未进行 HTML 特殊字符转义处理网页
<%@ page language="java" contentType="text/html; charset=utf-8"%><%! String userName = "</td><tr></table>"; String address = " " type="button"; %><table border="1"> <tr> <td>姓名:</td><td><%=userName%></td> ① </tr> <tr> <td>年龄:</td><td>28</td> </tr></table> <input value="<%=address%>" type="text" /> ②
<table border="1"> <tr> <td>姓名:</td><td></td><tr></table></td> ① 破坏了 <table> 的结构 </tr> <tr> <td>年龄:</td><td>28</td> </tr></table> <input value=" " type="button" type="text" /> ② 将本来是输入框组件偷梁换柱为按钮组件
package com.baobaotao.escape;import org.springframework.web.util.HtmlUtils;public class HtmpEscapeExample { public static void main(String[] args) { String specialStr = "<div id="testDiv">test1;test2</div>"; String str1 = HtmlUtils.htmlEscape(specialStr); ①转换为HTML转义字符表示 System.out.println(str1); String str2 = HtmlUtils.htmlEscapeDecimal(specialStr); ②转换为数据转义表示 System.out.println(str2); String str3 = HtmlUtils.htmlEscapeHex(specialStr); ③转换为十六进制数据转义表示 System.out.println(str3); ④下面对转义后字符串进行反向操作 System.out.println(HtmlUtils.htmlUnescape(str1)); System.out.println(HtmlUtils.htmlUnescape(str2)); System.out.println(HtmlUtils.htmlUnescape(str3)); }}
str1:<div id="testDiv">test1;test2</div>str2:<div id="testDiv">test1;test2</div>str3:<div id="testDiv">test1;test2</div><div id="testDiv">test1;test2</div><div id="testDiv">test1;test2</div><div id="testDiv">test1;test2</div>
/* * Copyright 2002-2008 the original author or authors. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */package org.springframework.web.util;/** * Utility class for HTML escaping. Escapes and unescapes * based on the W3C HTML 4.01 recommendation, handling * character entity references. * * <p>Reference: * <a href="http://www.w3.org/TR/html4/charset.html">http://www.w3.org/TR/html4/charset.html</a> * * <p>For a comprehensive set of String escaping utilities, * consider Jakarta Commons Lang and its StringEscapeUtils class. * We are not using that class here to avoid a runtime dependency * on Commons Lang just for HTML escaping. Furthermore, Spring's * HTML escaping is more flexible and 100% HTML 4.0 compliant. * * @author Juergen Hoeller * @author Martin Kersten * @since 01.03.2003 * @see org.apache.commons.lang.StringEscapeUtils */public abstract class HtmlUtils {/** * Shared instance of pre-parsed HTML character entity references. */private static final HtmlCharacterEntityReferences characterEntityReferences =new HtmlCharacterEntityReferences();/** * 将 HTML 特殊字符转义为 HTML 通用转义序列; * Turn special characters into HTML character references. * Handles complete character set defined in HTML 4.01 recommendation. * <p>Escapes all special characters to their corresponding * entity reference (e.g. <code><</code>). * <p>Reference: * <a href="http://www.w3.org/TR/html4/sgml/entities.html"> * http://www.w3.org/TR/html4/sgml/entities.html * </a> * @param input the (unescaped) input string * @return the escaped string */public static String htmlEscape(String input) {if (input == null) {return null;}StringBuilder escaped = new StringBuilder(input.length() * 2);for (int i = 0; i < input.length(); i++) {char character = input.charAt(i);String reference = characterEntityReferences.convertToReference(character);if (reference != null) {escaped.append(reference);}else {escaped.append(character);}}return escaped.toString();}/** * 将 HTML 特殊字符转义为带 # 的十进制数据转义序列; * Turn special characters into HTML character references. * Handles complete character set defined in HTML 4.01 recommendation. * <p>Escapes all special characters to their corresponding numeric * reference in decimal format (&#<i>Decimal</i>;). * <p>Reference: * <a href="http://www.w3.org/TR/html4/sgml/entities.html"> * http://www.w3.org/TR/html4/sgml/entities.html * </a> * @param input the (unescaped) input string * @return the escaped string */public static String htmlEscapeDecimal(String input) {if (input == null) {return null;}StringBuilder escaped = new StringBuilder(input.length() * 2);for (int i = 0; i < input.length(); i++) {char character = input.charAt(i);if (characterEntityReferences.isMappedToReference(character)) {escaped.append(HtmlCharacterEntityReferences.DECIMAL_REFERENCE_START);escaped.append((int) character);escaped.append(HtmlCharacterEntityReferences.REFERENCE_END);}else {escaped.append(character);}}return escaped.toString();}/** * 将 HTML 特殊字符转义为带 # 的十六进制数据转义序列; * Turn special characters into HTML character references. * Handles complete character set defined in HTML 4.01 recommendation. * <p>Escapes all special characters to their corresponding numeric * reference in hex format (&#x<i>Hex</i>;). * <p>Reference: * <a href="http://www.w3.org/TR/html4/sgml/entities.html"> * http://www.w3.org/TR/html4/sgml/entities.html * </a> * @param input the (unescaped) input string * @return the escaped string */public static String htmlEscapeHex(String input) {if (input == null) {return null;}StringBuilder escaped = new StringBuilder(input.length() * 2);for (int i = 0; i < input.length(); i++) {char character = input.charAt(i);if (characterEntityReferences.isMappedToReference(character)) {escaped.append(HtmlCharacterEntityReferences.HEX_REFERENCE_START);escaped.append(Integer.toString((int) character, 16));escaped.append(HtmlCharacterEntityReferences.REFERENCE_END);}else {escaped.append(character);}}return escaped.toString();}/** * Turn HTML character references into their plain text UNICODE equivalent. * <p>Handles complete character set defined in HTML 4.01 recommendation * and all reference types (decimal, hex, and entity). * <p>Correctly converts the following formats: * <blockquote> * &#<i>Entity</i>; - <i>(Example: &amp;) case sensitive</i> * &#<i>Decimal</i>; - <i>(Example: &#68;)</i><br> * &#x<i>Hex</i>; - <i>(Example: &#xE5;) case insensitive</i><br> * </blockquote> * Gracefully handles malformed character references by copying original * characters as is when encountered.<p> * <p>Reference: * <a href="http://www.w3.org/TR/html4/sgml/entities.html"> * http://www.w3.org/TR/html4/sgml/entities.html * </a> * @param input the (escaped) input string * @return the unescaped string */public static String htmlUnescape(String input) {if (input == null) {return null;}return new HtmlCharacterEntityDecoder(characterEntityReferences, input).decode();}}