求教java文本过滤处理
小弟初学文本处理
要处理的文件是亚马逊上的购物产品日志
对于单个产品记录 格式如下 整个日志有数十万条这样的产品记录 (整个文件1G)
我现在 想用java 读入这个文件 然后 只保每个记录的 ID 号 (如15) 和 其对应的group (如Book)
然后 再把 ID 号 (如15)和 其对应的group (Book)写入一个新的文件
不知道该怎么处理 求高手指导啊
Id: 15
ASIN: 1559362022
title: Wake Up and Smell the Coffee
group: Book
salesrank: 518927
similar: 5 1559360968 1559361247 1559360828 1559361018 0743214552
categories: 3
|Books[283155]|Subjects[1000]|Literature & Fiction[17]|Drama[2159]|United States[2160]
|Books[283155]|Subjects[1000]|Arts & Photography[1]|Performing Arts[521000]|Theater[2154]|General[2218]
|Books[283155]|Subjects[1000]|Literature & Fiction[17]|Authors, A-Z[70021]|( B )[70023]|Bogosian, Eric[70116]
reviews: total: 8 downloaded: 8 avg rating: 4
2002-5-13 cutomer: A2IGOA66Y6O8TQ rating: 5 votes: 3 helpful: 2
2002-6-17 cutomer: A2OIN4AUH84KNE rating: 5 votes: 2 helpful: 1
2003-1-2 cutomer: A2HN382JNT1CIU rating: 1 votes: 6 helpful: 1
2003-6-7 cutomer: A2FDJ79LDU4O18 rating: 4 votes: 1 helpful: 1 2003-6-27
cutomer: A39QMV9ZKRJXO5 rating: 4 votes: 1 helpful: 1 2004-2-17
cutomer: AUUVMSTQ1TXDI rating: 1 votes: 2 helpful: 0 2004-2-24
cutomer: A2C5K0QTLL9UAT rating: 5 votes: 2 helpful: 2 2004-10-13
cutomer: A5XYF0Z3UH4HB rating: 5 votes: 1 helpful: 1
[解决办法]
完整的测试代码,供参考
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.*;
import java.util.regex.*;
public class Test {
/**
* @param args
*/
public static void main(String[] args) {
File file = new File("c:\\Test.txt");
File file2 = new File("c:\\demo.txt");
if (file.isFile() && file.exists()) {
try {
InputStreamReader read = new InputStreamReader(new FileInputStream(file));
OutputStreamWriter writer = new OutputStreamWriter(new FileOutputStream(file2));
BufferedReader bufferedReader = new BufferedReader(read);
String lineTXT = null;
while ((lineTXT = bufferedReader.readLine()) != null){
String re="(Id
[解决办法]
group): [\\d\\w]*";
Pattern p = Pattern.compile(re);
Matcher m = p.matcher(lineTXT);
while (m.find()) {
String tmp = m.group();
if (!"".equals(tmp)) {
writer.write(tmp+"\r\n");
}
}
writer.flush();
}
read.close();
}
catch (Exception e) {
e.printStackTrace();
}
}
else{
System.out.println("找不到指定的文件!");
}
}
}
String re="(Id
[解决办法]
group): [\\s\\d\\w]*";