I started this paper in 2013, and in 2015 sent it out for review to the people listed later on. After incorporating comments, I sent it to Rik Farrow, the editor of the USENIX magazine ;login: to see if he would publish it. He declined to do so, for reasonably good reasons.
The paper languished, forgotten, until early 2018 when I came across it and decided to polish it off, put it up on GitHub, and make it available from my home page in HTML.
In 2024, I took a fresh look at it, and decided to polish it a little bit more.
If you are interested in language design and evolution in general, and in Awk in particular, I hope you will enjoy reading this paper. If not, then why are you bothering looking at it now?
Arnold Robbins
Nof Ayalon, ISRAEL
June, 2024
At the March 1991 USENIX conference, Henry Spencer presented a paper
entitled AWK As A Major Systems Programming Language. In it,
he described his experiences using the original version of awk
to write two significant “systems” programs—a clone for a reasonable
subset of the nroff
formatter1,
and a simple parser generator.
He described what awk
did well, as well as what it didn’t, and
presented a list of things that awk
would need to acquire
in order to take the position of a reasonable alternative to C for
systems programming tasks on Unix systems.
In particular, awk
lies about in the middle of the spectrum
between C, which is “close to the metal,” and the shell, which is
quite high-level. A language at this level that is useful for
doing systems programming is very desirable.
This paper reviews Henry’s wish list, and describes some of the events that
have occurred in the Unix/Linux world since 1991.
It presents a case that gawk
—GNU Awk—fills most of the
major needs Henry listed way back in 1991, and then describes the
author’s opinion as to why other languages have successfully filled the
systems programming role which awk
did not. It discusses
how the current version of gawk
may
finally be able to join the ranks of other popular, powerful, scripting
languages in common use today, and ends off with some counter-arguments
and the author’s responses to them.
Thanks to Andrew Schorr, Henry Spencer, Nelson H.F. Beebe, and Brian Kernighan for reviewing an earlier draft of this paper.
In this section we review the state of the Unix world in
1991, as well as the state of awk
, and then list what Henry Spencer
saw as missing for awk
.
Undoubtedly, many readers of this paper were not using computers in 1991, so this section provides the context in which Henry’s paper was written. In March of 1991:
awk
was about 2.5 years old.
The book by Aho, Weinberger and Kernighan was published in
October of 1987, so most people knew about new awk
, but they
just couldn’t get it.
Who could? New awk
was available to educational institutions
from the Bell Labs research group, and to those who had Unix source
licenses for System V Releases 3.1, 3.2, and 4. By this time, source
licensees were an extremely rare breed, since the cost for commercial
licenses had skyrocketed, and even for educational licensees it had
increased greatly.3 If I recall correctly,
an educational license cost around US $1,000, considerably more than
the earlier Unix licenses.
awk
were available:
The problem with the first of these is that source code was not available. And the latter came with (to quote Henry) “troublesome licenses.” (Actually, Henry no longer remembers whether his statement about “troublesome licenses” referred to the GPL, or to the Bell Labs source licenses.)
mawk
(also GPL’ed) was not yet available.
Version 1.0 was accepted for posting in comp.sources.reviewed
on September 30, 1991, half a year after Henry’s paper was published.
Here is a summary of what was wrong with the awk
picture
in 1991. These are in the same order as presented Henry’s paper.
We qualify each issue in order to later discuss how it has been addressed
over time.
awk
was not widely available. Most Unix vendors
still shipped only old awk
. (Here is where he mentions that
“the independently-available implementations either cost substantial
amounts of money or come with troublesome [sic] licenses.”) His point then
was that for portability, awk
programs had to be restricted
to old awk
.
This could be considered a quality of implementation issue, although it’s really a “lack of available implementation” issue.
awk
to start matching all its
patterns over again against the existing $0
.
This is a language design issue.
awk
.
This leads to gratuitous portability problems.
This too is thus a quality of implementation issue, in that without
a specification, it’s difficult to produce uniform, high quality
implementations.
awk
-level debugger.
(Support tool or quality of implementation issue.)
awk
-level profiler.
(Support tool or quality of implementation issue.)
In private email, Henry added the following items, saying “there are a couple more things I’d add now, in hindsight.” These are direct quotes:
awk
is being invoked from a shell file, the
shell file can do substitutions or use multiple -f options, but
those are mechanisms outside the language, and not very convenient ones.
What’s really wanted is something like you get in Python etc., where one
little statement up near the top says “arrange for this program to have
the xyz library available when it runs.”
substr()
so often.” My paper did allude
to the difficulty of finding out where something matched in
old-awk
programs, but even in new awk
, what you get
is a number that you then have to feed to substr()
. The language
could really use some more convenient way of dissecting a string using
regexp matching. [Caveat: I have not looked lately at Gawk to see if
it has one.]
The first of these is somewhere between a language design and a language implementation issue. The latter is a language design issue.
Fast forward to 2024. Where do things stand?
The state of the awk
world is much better now.
In the same order:
awk
is the standard version of awk
today on GNU/Linux, BSD, and commercial Unix systems.
The one notable exception is Solaris, where /usr/bin/awk
is still the old one; on all other systems, plain awk
is some version of new awk
.
awk
to start matching all its
patterns over again against the existing $0
. Furthermore,
this is a feature that has not been called for by the awk
community, except in Henry’s paper. (We do acknowledge that this might
be a useful feature.)
gawk
, which has arrays of arrays, can do
the trick nicely. It is also efficient, since gawk
uses
reference counted strings internally:
function copy_array(dest, source, i, count) { delete dest for (i in source) { if (typeof(source[i]) == "array") count += copy_array(dest[i], source[i]) else { dest[i] = source[i] count++ } } return count }
gawk
, mawk
and
Brian Kernighan’s awk
all have "/dev/stderr" built in
for I/O redirections, so even on systems without a real
/dev/stderr special file, you can still send error
messages to standard error.
awk
. As with all formal standards, it isn’t
perfect. But it provides an excellent starting point, as well
as chapter and verse to cite when explaining the behavior
of a standards-compliant version of awk
.
Additionally, the second edition of The AWK Progamming Language is now available.
awk
is the direct lineal
descendant of Unix awk
. He calls it the “One
True Awk” (sic). It is available from
Github:
$ git clone git://github.com/onetrueawk/awk bwkawk
gawk
, is available from the Free Software Foundation.
You may use an HTTPS downloader:
https://ftp.gnu.org/gnu/gawk/gawk-5.3.0.tar.gz is the current
version. There may be a newer one.
awk
, known as mawk
.
In 2009, Thomas Dickey took on mawk
maintenance.
Basic information is available on
the project’s web page.
The download URL is
https://invisible-island.net/datafiles/release/mawk.tar.gz.
In 2017 Michael published a beta of mawk
2.0. It’s available from
the project’s GitHub page.
awk
. For a while it was
available as part of Open Solaris, but is no longer so.
Some years ago, we were able to make this version compile and run on
GNU/Linux after just a few hours work.
Although Open Solaris is now history, the Illumos project does make the MKS Awk available. You can view the files one at a time from https://github.com/joyent/illumos-joyent/blob/master/usr/src/cmd/awk_xpg4.
gawk
documentation.
The more difficult of the quality of implementation issues are addressed
by gawk
. In particular:
gawk
provides an
awk
-level debugger which is
modeled after GDB.
This is a full debugger, with breakpoints, watchpoints, single statement
stepping and expression evaluation capabilities.
(Older versions had a separate executable named dgawk
. Today
it’s built into regular gawk
.)
gawk
has provided an awk
-level statement profiler for
many years (pgawk
).
Although there is no direct correlation with CPU time used, the
statement level profiler remains a powerful tool for understanding
program behavior.
gawk
has had an ‘@include’ facility
whereby gawk
goes and finds the named awk
source
progrm. For much longer it has searched for files specified with
-f along the path named by the AWKPATH
environment variable.
The ‘@include’ mechanism also uses AWKPATH
.
gawk
provides an optional third argument to the match()
function. This argument is an array which gawk
fills in with both
the matched text for the full regexp and subexpressions, and index and length
information for use with substr()
. gawk
also provides the
gensub()
general substitution function, an enhanced version of the
split()
function, and the patsplit()
function for specifying
contents instead of separators using a regexp.
While gawk
has almost always been faster than Brian
Kernighan’s awk
, performance improvements bring
it closer to mawk
’s performance level (a byte-code based
execution engine and internal improvements in array indexing).
And gawk
clearly has the most features of any version,
many of which considerably increase the power of the language.
Despite all of the above, gawk
is not as popular as other
scripting languages. Since 1991, we can point to four major scripting
languages which have enjoyed, or currently enjoy, differing levels of
popularity: PERL, tcl/tk, Python, and Ruby. We think it is fair to say
that Python is the most popular scripting languages in the
third decade of the 21st century.
Is awk
, as we’ve described it up to this point, now ready to
compete with the other languages? Not quite yet.
In retrospect, it seems clear (at least to us!) that there are two major reasons that all of the previously mentioned languages have enjoyed significant popularity. The first is their extensibility. The second is namespace management.
One certainly cannot attribute their popularity to improved syntax.
In the opinion of many, PERL and Ruby both suffer from terrible syntax.
Tcl’s syntax is readable but nothing special. Python’s syntax is elegant,
although slightly unusual. The point here is that they all differ greatly
in syntax, and none really offers the clean pattern–action paradigm
that is awk
’s trademark, yet they are all popular languages.
If not syntax, then what? We believe that their popularity stems from the fact that all of these languages are easily extensible. This is true with both “modules” in the scripting language, and more importantly, with access to C level facilities via dynamic library loading.
Furthermore, these languages allow you to group related functions and variables into packages or modules: they let you manage the namespace.
awk
, on the other hand, has always been closed. An awk
program cannot even change its working directory, much less open
a connection to an SQL database or a socket to a server on the
Internet somewhere (although gawk
can do the latter).
If one examines the number of extensions available for PERL on CPAN, or for Python such as PyQt or the Python tk bindings, it becomes clear that extensibility is the real key to power (and from there to popularity).
Furthermore, in awk
, all global variables and functions
share a single namespace. This prevents many good software development
practices based on the principle of information hiding.
To summarize: A reasonable language definition, efficient implementations, debuggers and profilers are necessary but not sufficient for true power. The final ingredients are extensibility and namespaces.
With version 4.1, gawk
(finally) provides a defined C API
for extending the core language.
The API makes it possible to write functions in C or C++ that are
callable from an awk
program as if the function were
written in awk
. The most straightforward way to think
of these functions is as user-defined functions that happen to be
implemented in a different language.
The API provides the following facilities:
awk
string, numeric, and undefined values
into C types that can be worked with.
awk
variables, and create and update new variables. As an initial, relatively
arbitrary design decision, extensions cannot update special variables such as
NR
or NF
, with the single exception of PROCINFO
.
gawk
exits. This is conceptually
the same as the C atexit()
function.
gawk
.
In particular, there are separate facilities for input redirections
with getline
and ‘<’, output redirections with
print
or printf
and ‘>’ or ‘>>’, and two-way
pipelines with gawk
’s ‘|&’ operator.
Considerable thought went into the design of the API.
The gawk
documentation provides a
full description of the API itself,
with examples (over 50 pages worth!), as well as
some discussion of the goals and design decisions
behind the API (in an appendix).
The development was done over the course of
about a year and a half, together with the developers of xgawk
,
a fork of gawk
that added features that made using extensions
easier, and included an extension for processing XML files in a way that
fit naturally with the pattern–action paradigm. While it may not be
perfect, the gawk
developers feel that it is a good start.
FIXME: Henry Spencer suggests adding more info on the API and on the design decisions. I think this paper is long enough, and the full doc is quite big. It’d be hard to pull API doc into this paper in a reasonable fashion, although it would be possible to review some of the design decisions. Comments?
The major xgawk
additions to the C code base have been merged
into gawk
, and the extensions from that project have been
rewritten to use the new API. As a result, the xgawk
project
developers renamed their project gawkextlib
, and the project now
provides only extensions.5
It is notable that functions written in awk
can do a number
of things that extension functions cannot, such as modify any
variables, do I/O, call awk
built-in functions,
and call other user-defined functions.
While it would certainly be possible to provide APIs for all of these
features for extension functions, this seemed to be overkill. Instead,
the gawk
developers took the view that extension functions
should provide access to external facilities, and provide communication
to the awk
level via function parameters and/or global variables,
including associative arrays, which are the only real data structure.
Consider a simple example. The standard du
program
can recursively walk one or more arbitrary file hierarchies, call
stat()
to retrieve file information, and then sum up the blocks
used. In the process, du
must track hard links, so that no
file is accounted for or reported more than once.
The ‘filefuncs’ extension shipped with gawk
provides a
stat()
function that takes a pathname and fills in an associative
array with the information retrieved from stat()
. The array
elements have names like "size"
, "mtime"
and so on, with
corresponding appropriate values. (Compare this to PERL’s stat()
function that returns a linearly-indexed array!)
The fts()
function in the ‘filefuncs’ extension builds on
stat()
to create a multidimensional array of arrays that describes
the requested file hierarchies, with each element being an array filled
in by stat()
. Directories are arrays containing elements for each
directory entry, with an element named "."
for the array itself.
Given that fts()
does the heavy lifting, du
can be
written quite nicely, and quite portably6, in awk
. See Awk Code For du
, for the
code, which weighs in at under 250 lines. Much of this is comments and
argument parsing.
The extension facility is relatively new, and undoubtedly has introduced new
“dark corners” into gawk
. These remain to be uncovered
and any new bugs need to be shaken out and removed.
Some issues are known and may not be resolvable. For example, 64-bit
integer values such as the timestamps in stat()
data on modern
systems don’t fit into awk
’s 64-bit double-precision
numbers which only have 53 bits of significand. This is also a
problem for the bit-manipulation functions.
With respect to namespaces, in 2017 I (finally) figured out how
namespaces in awk
ought to work to provide the needed
functionality while retaining backwards compatibility.
The was released with gawk
5.0.
One or two of the sample extensions shipped with gawk
and in gawkextlib
have been modified to take advantage
of namespaces.
Brian Kernighan raised several counterpoints in response to an earlier draft of the paper. They are worth addressing (or at least trying to):
I’m not 100% convinced by your basic premise, that the lack of an extension mechanism is the main / a big reason why Awk isn’t used for the kinds of system programming tasks that Perl, Python, etc., are. It’s absolutely a factor—without such a mechanism, there’s just no way to do a lot of important computations. But how does that trade off against just having built-in mechanisms for the core system programming facilities (as Perl does) or a handful of core libraries like
sys
,os
,regex
, etc., for Python?
I think that Perl’s original inclusion of most of the Unix system calls
was, from a language design standpoint, ultimately a mistake. At
the time it was first done, there was no other choice: dynamic loading
of libraries didn’t exist on Unix systems in the early and mid-1980s
(nor did shared libraries, for that matter). But having all those
built-in functions bloats the language, making it harder to learn,
document, and maintain, and I definitely did not wish to go down that
path for gawk
.
With respect to Python, the question is: how are those libraries implemented? Are they built-in to the interpreter and separated from the “core” language simply by the language design? Or are they dynamically loaded modules?
If the latter, that sounds like an argument for the case of having extensions, not against it. And indeed, this merely emphasizes the point made at the end of the previous section, which is that to make an extension facility really scalable, you also need some sort of namespace / module capability.
Thus, Brian is correct: an extension facility is needed, but the
last part of the puzzle would be a module facility in the language.
I think that I have solved this, and invite the curious reader to
checkout the current versions of gawk
.
I’m also not convinced that Awk is the right language for writing things that need extensions. It was originally designed for 1-liners, and a lot of its constructs don’t scale up to bigger programs. The notation for function locals is appalling (all my fault too, which makes it worse). There’s little chance to recover from random spelling mistakes and typos; the use of mere adjacency for concatenation looks ever more like a bad idea.
This is hard to argue with. Nonetheless, gawk
’s --lint
option may be of help here, as well as the --dump-variables
option which produces a list of all variables used in the program.
Awk is fine for its original purpose, but I find myself writing Python for anything that’s going to be bigger than say 10-20 lines unless the lines are basically just longer pattern-action sequences. (That notation is a win, of course, which you point out.)
I have worked for several years in Python. For string manipulation and processing records, you still have to write all the manual stuff: open the file, read lines in a loop, split them, etc. Awk does all this stuff for me.
Additionally, I think that with discipline, it’s possible to write fairly
good-sized, understandable and maintainable awk
programs;
in my experience awk
does scale up well beyond the one-liner
range.
Not to mention that Brian published (twice now!) a whole book of awk
programs larger than one line. :-)
(See the Resources section.)
Some of my own, good-sized awk
programs are available
from GitHub:
See https://github.com/arnoldrobbins/texiwebjr.
The suite has two programs that total over 1,300 lines
of awk
. (They share some code.)
See https://github.com/arnoldrobbins/prepinfo.
This script processes Texinfo files, updating menus
as needed. This version is rewritten in TexiWeb Jr.; it’s
about 350 lines of awk
.
See https://github.com/arnoldrobbins/sortmail.
This script sorts a Unix mbox format mailbox by thread.
I use it daily. It’s also written in TexiWeb Jr. and
is about 330 lines of awk
.
Brian continues:
The
du
example is awfully big, though it does show off some of the language features. Could you get the same mileage with something quite a bit shorter?
My definition of “small” and “big” has changed over time. 250 lines
may be big for a script, but the du.awk
program is much smaller
than a full implementation in C: GNU du
is over 1,100 lines
of C, plus all the libraries it relies upon in the GNU Coreutils.
With respect to shorter examples, nothing springs to mind immediately.
However, gawk
comes with several useful extensions that
are worth exploring, much more than we’ve covered here.
For example, the readdir
extension in the gawk
distribution causes gawk
to read directories and return one
record per directory entry in an easy-to-parse format:
$ gawk -lreaddir '{ print }' . -| 2109292/mail.mbx/f -| 2109295/awk-sys-prog.texi/f -| 2100007/./d -| 2100056/texinfo.tex/f -| 2100055/cleanit/f -| 2109282/awk-sys-prog.pdf/f -| 2100009/du.awk/f -| 2100010/.git/d -| 2098025/../d -| 2109294/ChangeLog/f
How cool is that?!? :-)
Also, the gawkextlib
project provides some very interesting
extensions. Of particular interest are the XML and JSON extensions,
but there are a number of others, and it’s worth checking out.
In 2018 I wrote here:
In short, it’s too early to really tell. This is the beginning of an experiment. I hope it will be a fun journey for me, the other
gawk
maintainers, and the larger community ofawk
users.
In 2024, I have to say that extensions haven’t particularly
caught on. This saddens me, but it seems to be typical of awk
users that they use what’s in the language and aren’t interested
in extending it, or they don’t know that they can. Sigh.
It has taken much longer than any awk
fan would like, but finally,
GNU Awk fills in almost all the gaps listed by Henry Spencer for
awk
to be really useful as a systems programming language.
In addition, experience from other popular languages has shown that extensibility and namespaces are the keys to true power, usability, and popularity.
With the release of gawk
4.1, we feel that gawk
(and thus the Awk language) are now almost on par with the basic capabilities
of other popular languages. With gawk
5.0, we hope(d) to truly
reach par.
Is it too late in the game?
In 2024, sadly, it does seem to be. But at least I had fun
adding the new features to gawk
.
I hope that this paper will have piqued your
curiosity, and that you will take the time to give gawk
a fresh look.
gawk
documentation:
https://www.gnu.org/software/gawk/manual/.
gawkextlib
project:
https://sourceforge.net/projects/gawkextlib/.
du
¶Here is the du
program, written in Awk.
Besides demonstrating the power of the stat()
and fts()
extensions and gawk
’s multidimensional arrays,
it also shows the switch
statement and the built-in
bit manipulation functions and()
, or()
, and compl()
.
The output is not identical to GNU du
’s, since filenames are
not sorted. However, gawk
’s built-in sorting facilities
should make sorting the output straightforward; we leave that as the
traditional “exercise for the reader.”
#! /usr/local/bin/gawk -f # du.awk --- write POSIX du utility in awk. # See https://pubs.opengroup.org/onlinepubs/9699919799/utilities/du.html # # Most of the heavy lifting is done by the fts() function in the "filefuncs" # extension. # # We think this conforms to POSIX, except for the default block size, which # is set to 1024. Following GNU standards, set POSIXLY_CORRECT in the # environment to force 512-byte blocks. # # Arnold Robbins # arnold@skeeve.com @include "getopt" @load "filefuncs" BEGIN { FALSE = 0 TRUE = 1 BLOCK_SIZE = 1024 # Sane default for the past 30 years if ("POSIXLY_CORRECT" in ENVIRON) BLOCK_SIZE = 512 # POSIX default compute_scale() fts_flags = FTS_PHYSICAL sum_only = FALSE all_files = FALSE while ((c = getopt(ARGC, ARGV, "aHkLsx")) != -1) { switch (c) { case "a": # report size of all files all_files = TRUE; break case "H": # follow symbolic links named on the command line fts_flags = or(fts_flags, FTS_COMFOLLOW) break case "k": BLOCK_SIZE = 1024 # 1K block size break case "L": # follow all symbolic links # fts_flags &= ~FTS_PHYSICAL fts_flags = and(fts_flags, compl(FTS_PHYSICAL)) # fts_flags |= FTS_LOGICAL fts_flags = or(fts_flags, FTS_LOGICAL) break case "s": # do sums only sum_only = TRUE break case "x": # don't cross filesystems fts_flags = or(fts_flags, FTS_XDEV) break case "?": default: usage() break } } # if both -a and -s if (all_files && sum_only) usage() for (i = 0; i < Optind; i++) delete ARGV[i] if (Optind >= ARGC) { delete ARGV # clear all, just to be safe ARGV[1] = "." # default to current directory } fts(ARGV, fts_flags, filedata) # all the magic happens here # now walk the trees if (sum_only) sum_walk(filedata) else if (all_files) all_walk(filedata) else top_walk(filedata) } # usage --- print a message and die function usage() { print "usage: du [-a|-s] [-kx] [-H|-L] [file] ..." > "/dev/stderr" exit 1 } # compute_scale --- compute the scale factor for block size calculations function compute_scale( stat_info, blocksize) { stat(".", stat_info) if (! ("devbsize" in stat_info)) { printf("du.awk: you must be using filefuncs extension from gawk 4.1.1 or later\n") > "/dev/stderr" exit 1 } # Use "devbsize", which is the units for the count of blocks # in "blocks". blocksize = stat_info["devbsize"] if (blocksize > BLOCK_SIZE) SCALE = blocksize / BLOCK_SIZE else # I can't really imagine this would be true SCALE = BLOCK_SIZE / blocksize } # islinked --- return true if a file has been seen already function islinked(stat_info, device, inode, ret) { device = stat_info["dev"] inode = stat_info["ino"] ret = ((device, inode) in Files_seen) return ret } # file_blocks --- return number of blocks if a file has not been seen yet function file_blocks(stat_info, device, inode) { if (islinked(stat_info)) return 0 device = stat_info["dev"] inode = stat_info["ino"] Files_seen[device, inode]++ return block_count(stat_info) # delegate actual counting } # block_count --- return number of blocks from a stat() result array function block_count(stat_info, result) { if ("blocks" in stat_info) result = int(stat_info["blocks"] / SCALE) else # otherwise round up from size result = int((stat_info["size"] + (BLOCK_SIZE - 1)) / BLOCK_SIZE) return result } # sum_dir --- data on a single directory function sum_dir(directory, do_print, i, sum, count) { for (i in directory) { if ("." in directory[i]) { # directory count = sum_dir(directory[i], do_print) count += file_blocks(directory[i]["."]) if (do_print) printf("%d\t%s\n", count, directory[i]["."]["path"]) } else { # regular file count = file_blocks(directory[i]["stat"]) } sum += count } return sum } # simple_walk --- summarize directories --- print info per parameter function simple_walk(filedata, do_print, i, sum, path) { for (i in filedata) { if ("." in filedata[i]) { # directory sum = sum_dir(filedata[i], do_print) path = filedata[i]["."]["path"] } else { # regular file sum = file_blocks(filedata[i]["stat"]) path = filedata[i]["path"] } printf("%d\t%s\n", sum, path) } } # sum_walk --- summarize directories --- print info only for the top set of directories function sum_walk(filedata) { simple_walk(filedata, FALSE) } # top_walk --- data on the main arguments only function top_walk(filedata) { simple_walk(filedata, TRUE) } # all_walk --- data on every file function all_walk(filedata, i, sum, count) { for (i in filedata) { if ("." in filedata[i]) { # directory count = all_walk(filedata[i]) sum += count printf("%s\t%s\n", count, filedata[i]["."]["path"]) } else { # regular file if (! islinked(filedata[i]["stat"])) { count = file_blocks(filedata[i]["stat"]) sum += count if (i != ".") printf("%d\t%s\n", count, filedata[i]["path"]) } } } return sum }
The Amazingly Workable
Formatter, awf
, is available from
ftp://ftp.freefriends.org/arnold/Awkstuff/awf.tgz.
See the Wikipedia article, and some notes at the late Dennis Ritchie’s website. There are undoubtedly other sources of information as well.
Especially for budget-strapped educational institutions, source licences were increasingly an expensive luxury, since SVR4 rarely ran on hardware that they had.
I’ve been told that one of the reasons Larry Wall created
PERL is that he either didn’t know about new awk
, or he couldn’t
get it.
For more information, see the
gawkextlib
project page.
The awk
version of du
works on Unix, GNU/Linux, Mac OS X, and MS
Windows. On Windows only Cygwin is currently supported. We hope to one
day support MinGW also.