Python RegEx One-Liner in a Pipe – Remove Hard Line Wrapping – No Perl

Just put Python in your pipe and let Perl rest in peace.

Python can be used as a one-liner in a pipe. Replacing text with a regular expression used to be the last holdout of perl, but now you can do it with Python.
Python is more familiar to most of us, and Python 3 also shines with Unicode and UTF-8 encoding and multiline expressions. For very short or simple replacements, perl syntax might still be more convenient.

$ echo "Hello Tero!"|python3 -c 'import sys, re; s = sys.stdin.read(); s=re.sub("Hello", "Good morning", s); print(s);'
Good morning Tero!

Consider a file with hard line wrapping

$ cat foo
This is chapter
one. It has funny newlines
everywhere.
Chapter to, likewise,
is weirdly hard
line broken.
Third. third
third. third.
third.
Four, four,
four, four, yes
this is four and
not a
haiku.

We want to remove the single line feeds inside the chapter, but still keep the empty lines between chapter. So, 1) replace singe ‘\n’ with space 2) ignore double newline ‘\n\n’.

$ cat foo|python3 -c 'import sys, re; s = sys.stdin.read(); print(re.sub("([^\n])\n([^\n])", r"\1 \2", s));'
This is chapter one. It has funny newlines everywhere.
Chapter to, likewise,  is weirdly hard line broken.
Third. third third. third. third.
Four, four,  four, four, yes this is four and not a  haiku.

We slurp the whole input, that is, we read it all to memory. This makes it very easy to handle regexp targeting multiple lines. One the other hand, it would be inconvenient to slurp a multi gibibyte file.

Python program explained

import sys, re
s = sys.stdin.read() # slurp, i.e. read all standard input to memory
s = re.sub("([^\n])\n([^\n])", r"\1 \2", s)
# search for not-newline newline not-newline
# replace with first-not-new-line literal-space second-not-newline
# replacement string is a r"raw string", marked with starting r
print(s)

Perl might be history as a language, but her ideas will live forever in modern languages.
Updated.

Posted in Uncategorized | Tagged , , , , , , , , , , | Comments Off on Python RegEx One-Liner in a Pipe – Remove Hard Line Wrapping – No Perl

Comments are closed.