关于dom4j的utf-8编码的问题
我组了一个XML字符串,解析过程如下,但是报错:
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import org.dom4j.Document;
import org.dom4j.Node;
import org.dom4j.DocumentException;
import org.dom4j.DocumentHelper;
import org.dom4j.Element;
import org.dom4j.io.OutputFormat;
import org.dom4j.io.SAXReader;
public class TestDom4j {
/**
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
String sXml = " <?xml version=\ "1.0\ " encoding=\ "utf-8\ "?> <ROOT> <TABLE> 哈哈哈哈 </TABLE> </ROOT> ";
SAXReader saxReader = new SAXReader();
InputStream inputStream = new ByteArrayInputStream(sXml.getBytes());
try {
Document document = saxReader.read(inputStream);
} catch (DocumentException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
报错内容是:
org.dom4j.DocumentException: Error on line 1 of document : Invalid byte 1 of 1-byte UTF-8 sequence. Nested exception: Invalid byte 1 of 1-byte UTF-8 sequence.
at org.dom4j.io.SAXReader.read(SAXReader.java:482)
at org.dom4j.io.SAXReader.read(SAXReader.java:343)
at TestDom4j.main(TestDom4j.java:25)
但是如果把XML字符串中的编码改成gb2312就没问题了,但是我的程序是多语言版本的,我需要怎么做啊?
[解决办法]
public static void main(String[] args) {
// TODO Auto-generated method stub
String sXml = " <?xml version=\ "1.0\ " encoding=\ "utf-8\ "?> <ROOT> <TABLE> 哈哈哈哈 </TABLE> </ROOT> ";
SAXReader saxReader = new SAXReader();
try {
Document document = DocumentHelper.parseText(sXml);
} catch (DocumentException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
这样就可以了