一千萬個為什麽

搜索

如何刪除文件中的某些換行符

我有一個包含大約70,000條記錄的文件,其結構大致如下:

01499     1000642   4520101000900000
...more numbers...
104000900169
+Fieldname1
-Content
+Fieldname2
-Content
-Content
-Content
+Fieldname3
-Content
-Content
+Fieldname4
-Content
+Fieldname5
-Content
-Content
-Content
-Content
-Content
-Content

01473     1000642   4520101000900000
...more numbers...

編輯1:每條記錄從一列數字開始,以空行結束。在此空行之前,大多數記錄都有 + Fieldname5 和一個或多個 -Content 行。

我想要做的是將所有多行條目合並為一行,同時將前導減號替換為除了與最後一個字段相關的空格(在本例中為Fieldname5)。

它應該是這樣的:

01499     1000642   4520101000900000
...more numbers...
104000900169
+Fieldname1
-Content
+Fieldname2
-Content Content Content
+Fieldname3
-Content Content
+Fieldname4
-Content
+Fieldname5
-Content
-Content
-Content
-Content
-Content
-Content

01473     1000642   4520101000900000
...more numbers...

我現在擁有的是這個(改編自這個答案):

use strict;
use warnings;

our $input = "export.txt";
our $output = "export2.txt";

open our $in, "<$input" or die "$!\n"; 
open our $out, ">$output" or die "$!\n"; 

my $this_line = "";
my $new = "";

while(<$in>) {
    my $last_line = $this_line;
    $this_line = $_;

    # if both $last_line and $this_line start with a "-" do the following:
    if ($last_line =~ /^-.+/ && $this_line =~ /^-.+/) {

        #remove \n from $last_line
        chomp $last_line;

        #remove leading "-" from $this_line
        $this_line =~ s/^-//;

        #join both lines and print them to the file
        $new = join(' ', $last_line,$this_line);
        print $out $new;
        } else {
        print $out $last_line;
            }
    }
close ($in);
close ($out);

但這有兩個問題:

  • It correctly prints out the joined line but then still prints out the second line e.g.

    +Fieldname2 -Content Content Content -Content

那麽如何讓腳本只輸出連接線?

  • 它一次只能在兩條線上工作,而一些多線條目最多有四十條線。

EDIT 2: My question is thus how to do the following:

  1. Read in a file line by line and write it to an output file
  2. When a multi-line section appears read and process it in one go, replacing \n- by , except if it belongs to a given fieldname (e.g. Fieldname5).
  3. Return to reading and writing each line again until another multi-line block appears

EDIT 3: It worked! I just added another conditional at the beginning: use strict; use warnings;

our $input = "export.txt";
our $output = "export2.txt";

open our $in, "<$input" or die "Kann '$input' nicht finden: $!\n"; 
open our $out, ">$output" or die "Kann '$output' nicht erstellen: $!\n"; 


my $insideMultiline = 0;
my $multilineBuffer = "";
my $exception = 0;                  # variable indicating whether the current multiline-block is a "special" or not

LINE:
while (<$in>) {
    if (/^\+Fieldname5/) {          # if line starts with +Fieldname5, set $exception to "1"
        $exception = 1;
    } 
    elsif (/^\s/) {                 # if line starts with a space,  set $exception to "0"
        $exception = "0";
    }
    if ($exception == 0 && /^-/) {  # if $exception is "0" AND the line starts with "-", do the following
        chomp;
        if ($insideMultiline) {
            s/^-/ /;
            $multilineBuffer .= $_;
        }
        else {
            $insideMultiline = 1;
            $multilineBuffer = $_;
        }
        next LINE;
    }
    else {
        if ($insideMultiline) {
            print $out "$multilineBuffer\n";
            $insideMultiline = 0;
            $multilineBuffer = "";
        }
        print $out $_;
        }
}

close ($in);
close ($out);

最佳答案

假設只有以“ - ”開頭的行是這些多行部分,你可以這樣做......

# Open $in and $out as in your original code...

my $insideMultiline = 0;
my $multilineBuffer = "";

LINE:
while (<$in>) {
    if (/^-/) {
        chomp;
        if ($insideMultiline) {
            s/^-/ /;
            $multilineBuffer .= $_;
        }
        else {
            $insideMultiline = 1;
            $multilineBuffer = $_;
        }
        next LINE;
    }
    else {
        if ($insideMultiline) {
            print $out "$multilineBuffer\n";
            $insideMultiline = 0;
            $multilineBuffer = "";
        }
        print $out $_;
    }
}

至於嵌入式子問題(“除了與最後一個字段有關的那些”之外),我需要更多關於文件格式的細節才能做到這一點。它看起來像一個空行將字段和內容集彼此分開,但在描述中並不是100%清楚。但是,上面的代碼應該處理您在底部列出的要求。

轉載註明原文: 如何刪除文件中的某些換行符