怎么解析doc文件啊,有没有什么工具,我是学Java,还是个新手,求各种高手!!
现在在公司实习,需要解析doc文件,不知道怎么办,求高手
[解决办法]
用poi解析。下面是前天刚写的获取word的文本内容
package se.analyzer.parser.impl;import java.io.File;import java.io.FileInputStream;import java.io.FileNotFoundException;import java.io.IOException;import org.apache.poi.hwpf.extractor.WordExtractor;import se.analyzer.parser.Parsable;import se.analyzer.parser.Parser;public class WordParser extends Parser implements Parsable { @Override public String getContext(File arg) { FileInputStream in = null; StringBuffer context = new StringBuffer(); try { if (arg.exists()) { in = new FileInputStream(arg); WordExtractor wordExtractor = new WordExtractor(in); context.append(wordExtractor.getText()); } else { context.append("file not exists"); } } catch (FileNotFoundException e) { context.append("file not exists"); e.printStackTrace(); } catch (IOException e) { context.append("file not exists"); e.printStackTrace(); } finally { } return context.toString(); }}
[解决办法]
jacob
开源包
java本身不支持