Vim vs AWK
Vim is a text editor and Awk a text processing language, I wouldn’t blame you
for believing that trying to compare them is bonkers, but trust me, I have
something to compare here.
Case study: Vinfo
Vinfo is a Vim plugin that allows you to read Info documentation files right
in a vim session by converting Info plain text files into Vim help-files, so you
get nice syntax highlighting and convenient tags for jumping between the file
contents. Let’s examine how it does its job.
This is how the Info text looks like, note that the titles are underlined using
*
chars:
Some context lines here
maybe a nice introduction
or something.
1. Title
********
This section refers to blah blah and is great because of blah blah...
And here is what we need. Titles underlined using =
:
Some context lines here
maybe a nice introduction
or something.
1. Title
========
This section refers to blah blah and is great because of blah blah...
It appears to be an easy task, but its actually not.
So why is this difficult using Awk? Or even other tools like grep, sed? Or a
combination of all these?
It’s because of the context! In this same file we have many other lines that
start with =
and that we don’t care about; We are actually looking for the
lines that start with =
and are under a line that starts with a number
followed by a dot and followed by a word that has a blank line separation and a
paragraph above and below that are not the beginning/end of nodes but some sort
of description text… Now that’s a difficult match!.
It is technically possible to accomplish this using Awk, but it requires some
complicated logic for handling the state of the context, and it grows out of
hand really quickly.
Let’s see how this is done using Vim script (from Vinfo code):
g/\v^$\n.+\n\=+\n^$\n/norm! jjvg_r-\<Esc>
One line. It looks scary, but it wasn’t written from scratch. The commands were
obtained by performing normal live editing on the text.
It could be said that one of the reasons for this kind of task to become too
complex using Awk, is because of its per-line processing nature.
But in Awk you can use the RS variable in order to handle multi line matches
I hear you protest. Yes, you can, but it doesn’t mean that it will become much
easier, you still need to assume a constant and formatted stream of records OR
dynamically change the RS variable to adjust it when needed. If you have a
non-homogeneous text with multi-line matches that are very context prone,
complexity will rapidly become a real pain to handle.
Vim goes further
With Vim, you can open the target text file and just start doing the editing
while Vim itself is recording your moves (:h q
). Then you can paste the
recorded commands and moves in a vim script and bang! The script for your
complex editing is written.
When writing an Awk script you need multiple trial and fail iterations to test
your regular expressions. With vim, however, you can visually test all your
regular expressions, make quick changes, improvements and then just paste them
in a vim script when they happen to work.
Vim regular expressions engine
You can use a regular, convoluted, expression programmatically in a language
like Perl for complex tasks, or you can use Vim’s regular expressions engine to
collapse a lot of logic into one single regex.
Let’s say you want to process some text in the middle of a line without touching
anything else in that same line, you can do this in Vim by using the \%V
atom
in order to work with a visual selected area.
But, obviously, Vim is going to be slower than an Awk script
Actually, it can be, but many milliseconds processing a large file is something
I would prefer over a couple of hours (or days!) of Awk scripting.
But Vim cannot do multi-threading, so when processing large files it hangs
until it’s done, and I have no option but to wait for it before I can continue
working!
That’s correct, But this will only be a problem if the text processing is meant
to be executed while working in a vim session. When you execute an Awk script in
your shell you must wait for it or open a new terminal session in order to
continue working. The same thing happens here. If you wrote a Vim script for a
specific text processing task, you should execute it and wait for it or get
another terminal while the processing takes place (or send it to the background
of course).
But I’m not a Vim user and don’t know how to use it
Perfect!, now you have a good reason for learning Vim, even if you’re not
planning to use it as your main text editor.
So should I completely forget about Awk?
Of course not!, Awk is an awesome tool that allows you to do awesome things, you
should learn it if you don’t know it already!. I still use grep, sed, tr,
etc, to perform basic tasks. Awk will always be there to save the day. You can
use it for a quick one liner or write down a proper script if needed too.
Being able to use it in combination with other tools is great, which leads us
to…
Vim cannot be used in combination with other tools using pipes
Except it can!, you just need to use the -
(dash) option, consider the following:
cat file | grep -i hello | vim -c scriptCommand -
In this contrived example, cat
pipes a file into grep, that pipes a result
into Vim, which in turn executes “scriptCommand” and perform the text
processing. So really you can use all the tools with normal piping, including
Vim.
echo text.txt | tr ... | sed ... | awk { print $2 } | vim -c Script -
Vim is a text editor and Awk a text processing language, I wouldn’t blame you for believing that trying to compare them is bonkers, but trust me, I have something to compare here.
Case study: Vinfo
Vinfo is a Vim plugin that allows you to read Info documentation files right in a vim session by converting Info plain text files into Vim help-files, so you get nice syntax highlighting and convenient tags for jumping between the file contents. Let’s examine how it does its job.
This is how the Info text looks like, note that the titles are underlined using
*
chars:
Some context lines here
maybe a nice introduction
or something.
1. Title
********
This section refers to blah blah and is great because of blah blah...
And here is what we need. Titles underlined using =
:
Some context lines here
maybe a nice introduction
or something.
1. Title
========
This section refers to blah blah and is great because of blah blah...
It appears to be an easy task, but its actually not.
So why is this difficult using Awk? Or even other tools like grep, sed? Or a combination of all these?
It’s because of the context! In this same file we have many other lines that
start with =
and that we don’t care about; We are actually looking for the
lines that start with =
and are under a line that starts with a number
followed by a dot and followed by a word that has a blank line separation and a
paragraph above and below that are not the beginning/end of nodes but some sort
of description text… Now that’s a difficult match!.
It is technically possible to accomplish this using Awk, but it requires some complicated logic for handling the state of the context, and it grows out of hand really quickly.
Let’s see how this is done using Vim script (from Vinfo code):
g/\v^$\n.+\n\=+\n^$\n/norm! jjvg_r-\<Esc>
One line. It looks scary, but it wasn’t written from scratch. The commands were obtained by performing normal live editing on the text.
It could be said that one of the reasons for this kind of task to become too complex using Awk, is because of its per-line processing nature.
But in Awk you can use the RS variable in order to handle multi line matches
I hear you protest. Yes, you can, but it doesn’t mean that it will become much easier, you still need to assume a constant and formatted stream of records OR dynamically change the RS variable to adjust it when needed. If you have a non-homogeneous text with multi-line matches that are very context prone, complexity will rapidly become a real pain to handle.
Vim goes further
With Vim, you can open the target text file and just start doing the editing
while Vim itself is recording your moves (:h q
). Then you can paste the
recorded commands and moves in a vim script and bang! The script for your
complex editing is written.
When writing an Awk script you need multiple trial and fail iterations to test your regular expressions. With vim, however, you can visually test all your regular expressions, make quick changes, improvements and then just paste them in a vim script when they happen to work.
Vim regular expressions engine
You can use a regular, convoluted, expression programmatically in a language like Perl for complex tasks, or you can use Vim’s regular expressions engine to collapse a lot of logic into one single regex.
Let’s say you want to process some text in the middle of a line without touching
anything else in that same line, you can do this in Vim by using the \%V
atom
in order to work with a visual selected area.
But, obviously, Vim is going to be slower than an Awk script
Actually, it can be, but many milliseconds processing a large file is something I would prefer over a couple of hours (or days!) of Awk scripting.
But Vim cannot do multi-threading, so when processing large files it hangs until it’s done, and I have no option but to wait for it before I can continue working!
That’s correct, But this will only be a problem if the text processing is meant to be executed while working in a vim session. When you execute an Awk script in your shell you must wait for it or open a new terminal session in order to continue working. The same thing happens here. If you wrote a Vim script for a specific text processing task, you should execute it and wait for it or get another terminal while the processing takes place (or send it to the background of course).
But I’m not a Vim user and don’t know how to use it
Perfect!, now you have a good reason for learning Vim, even if you’re not planning to use it as your main text editor.
So should I completely forget about Awk?
Of course not!, Awk is an awesome tool that allows you to do awesome things, you should learn it if you don’t know it already!. I still use grep, sed, tr, etc, to perform basic tasks. Awk will always be there to save the day. You can use it for a quick one liner or write down a proper script if needed too. Being able to use it in combination with other tools is great, which leads us to…
Vim cannot be used in combination with other tools using pipes
Except it can!, you just need to use the -
(dash) option, consider the following:
cat file | grep -i hello | vim -c scriptCommand -
In this contrived example, cat
pipes a file into grep, that pipes a result
into Vim, which in turn executes “scriptCommand” and perform the text
processing. So really you can use all the tools with normal piping, including
Vim.
echo text.txt | tr ... | sed ... | awk { print $2 } | vim -c Script -