问几个关于文本处理的问题,有点挑战性。
问几个关于文本处理的问题,有点挑战性。
背景是这样的:有一个文本,每行基本上都是以DATABASE这个字符串开头的,但是也有少数行不是这样的,我想把这些少数非DATABASE开头的行并归做上一行以DATABASE开头的字符串。
举个例子,文本如下:
DATABASE 10: 10 BANK OSD
DATABASE 10 10: COMPANY OSD
DATABASE 10 30/ BANKS OSDd
(空格)
DATABASE 40 10 BANK- pOSD
0 -osd
1 -dso
DATABASE 10 10 BANK OSD
absc
DATABASE 100 10 BANKs OSD
我想把它处理为:
DATABASE 10: 10 BANK OSD
DATABASE 10 10: COMPANY OSD
DATABASE 10 30/ BANKS OSDd (删除空格)
DATABASE 40 10 BANK- pOSD 0 -osd 1 -dso
DATABASE 10 10 BANK OSD absc
DATABASE 100 10 BANKs OSD
请问这个程序如何编写,给点思路也好.
[解决办法]
要求不高的话可以试下:
open (IN, "a.txt ") or die "open error: $! ";
open (OUT, "> b.txt ") or die "open error: $! ";
my $lastline;
while ( <IN> ) {
chomp;
s/^\s*|\s*$//g;
next unless $_;
if (/^DATABASE/) {
print OUT "$lastline\n " if $lastline;
$lastline = $_;
} else {
$lastline = "$lastline $_ ";
}
}
print OUT "$lastline\n " if $lastline;
[解决办法]
怎么空格全没有了?再试一次
def trans(openFilePath , saveFilePath ):
try:
of = open(openFilePath, "r ")
sf = open(saveFilePath, "w ")
line = of.readline()
startStr = "DATABASE "
delimiter = " "
firstLine = True
for line in of.readlines():
words = line.split()
#ignore the blank line
if not words:
continue
saveStr = delimiter.join(words)
#start with "DATABASE "
if words[0] == startStr:
if firstLine:
firstLine = False
else:
saveStr = "\n " + saveStr
else:
saveStr = delimiter + saveStr
sf.write(saveStr)
of.close()
sf.close()
print "OK\n "
except:
print "ERROR\n "
pass
if __name__ == "__main__ ":
trans( "a.txt ", "b.txt ")