大神帮忙啊!!要处理的文件比较大 下面这段代码提示超内存了 我现在想读前面一小部分 请问怎么改啊
public class Html2Text {
protected static final String lineSign = System.getProperty("line.separator");
protected static final int lineSign_size = lineSign.length();
/**
* @param args
*/
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
getpage("D:/2.txt","D:/test.txt");
}
public static void getpage(String readFile,String writeFile) throws IOException{
extractText(readFile,writeFile);
}
/**
* 提取html文件中的文本内容存入txt文件中
*/
public static void extractText(String readFile,String writeFile) throws IOException {
StringBuilder sb = new StringBuilder();
FileReader fr = new FileReader(readFile);
BufferedReader br = new BufferedReader(fr);
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(writeFile), "gbk"));
String line;
while ( (line=br.readLine()) !=null ) {
sb.append(line);
}
String s = sb.toString();
// String s="";
int index = s.indexOf("<p style="TEXT-INDENT: 2em">");
s = s.substring(index+1);
String textOnly = Jsoup.parse(s).text();
Matcher m = Pattern.compile("\\[(.*?)\\]").matcher(textOnly);
textOnly = m.replaceAll("");
textOnly = textOnly.replaceAll("//?", "");
bw.write(textOnly);
bw.flush();
bw.close();
fr.close();
br.close();
}
}
超出内存? html string stringbuilder class
[解决办法]
其实我真的很惊讶,
读html能把内容读溢出了。。
可以使用RandomAccessFile定位到某个位置开始读
如果用 RadomAccessFile,可以用 seek 指定移动到文件的某个位置。
位置定位以字符为单元
要定位到某行,试试 java.io.LineNumberReader 和 java.io.LineNumberInputStream
例子: RadomAccessFile
package org.kodejava.example.io;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
public class RandomAccessFileExample {
public static void main(String[] args) {
try {
//
// Create a new instance of RandomAccessFile class. We'll do a "r"
// read and "w" write operation to the file. If you want to do a write
// operation you must also allow read opeartion to the RandomAccessFile
// instance.
//
RandomAccessFile raf = new RandomAccessFile("books.dat", "rw");
//
// Let's write some book's title to the end of the file
//
String books[] = new String[5];
books[0] = "Professional JSP";
books[1] = "The Java Application Programming Interface";
books[2] = "Java Security";
books[3] = "Java Security Handbook";
books[4] = "Hacking Exposed J2EE & Java";
for (int i = 0; i < books.length; i++) {
raf.writeUTF(books[i]);
}
//
// Write another data at the end of the file.
//
raf.seek(raf.length());
raf.writeUTF("Servlet & JSP Programming");
//
// Move the file pointer to the beginning of the file
//
raf.seek(0);
//
// While the file pointer is less than the file length, read the
// next strings of data file from the current position of the
// file pointer.
//
while (raf.getFilePointer() < raf.length()) {
System.out.println(raf.readUTF());
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}