# AWK tutorial

See https://github.com/tdhopper/awk-lessons for inspiration

## Lesson 01: Basics of Awk

If you haven't read the Awk man page, you should start there. It's helpful! Some highlights:

```
awk − pattern-directed scanning and processing language

awk [ −F fs ] [ −v var=value ] [ ’prog’ | −f progfile ] [ file ... ]
```

Awk scans each input file for lines that match any of a set of patterns specified literally in prog or in one or more files specified as −f progfile.

With each pattern there can be an associated action that will be performed when a line of a file matches the pattern.

Each line is matched against the pattern portion of every pattern-action statement; the associated action is performed for each matched pattern

A pattern-action statement has the form pattern {action}.

A missing { action } means print the line; a missing pattern always matches.

I created an simple example file to demonstrate basic Awk:

In [1]:
cat data/letters.txt

a
bb
ccc
dddd
ggg
hh
i

### A Basic Pattern

If we match lines longer than two characters and use the implicit print action, we get:

In [2]:
awk 'length $0 > 2' data/letters.txt

bb
ccc
dddd
ggg
hh


$0 is a built-in variable that contains the line.

### A Basic Function

If we leave out a pattern, we will match every line. A trivial action would be to print each line:

In [3]:
awk '{ print }' data/letters.txt

echo
echo Using the length function as our action, we can get the length of each line:
echo 

awk '{ print length }' data/letters.txt

a
bb
ccc
dddd
ggg
hh
i

Using the length function as our action, we can get the length of each line:

1
2
3
4
3
2
1


In [4]:
# We can combine things
awk '{ print length $0}' data/letters.txt

echo 
echo The above prints length of line and line - the value of \$0
echo

awk '{ print length,$0}' data/letters.txt
echo 
echo Using "," as separator puts whitespace 
echo


1a
2bb
3ccc
4dddd
3ggg
2hh
1i

The above prints length of line and line - the value of $0

1 a
2 bb
3 ccc
4 dddd
3 ggg
2 hh
1 i

Using , as separator puts whitespace



In [5]:
# Awk has special controls for executing some code before the file input begins and after it is complete.

awk 'BEGIN { print "HI" } { print $0 } END { print "BYE!" }' data/letters.txt

HI
a
bb
ccc
dddd
ggg
hh
i
BYE!


In [6]:
awk "BEGIN { print \"Don't Panic! \" }"

Don't Panic! 


### Combining Patterns and Functions
Of course, patterns and functions can be combined so that the function is only applied when the pattern is matched.

From the man page:
```
A pattern-action statement has the form

pattern { action }
```

We can print the length of all lines longer than 2 characters.

In [7]:
awk 'length($0) > 2 { print length($0) }' data/letters.txt

3
4
3


In [8]:
# Actually, we don't have to limit Awk to just one pattern! 
# We can have arbitrarily many patterns separated by a semicolon or a new line:

awk 'length($0) > 2 { print "Long: " length($0) }; length($0) < 2 { print "Short: " length($0) }' data/letters.txt

Short: 1
Long: 3
Long: 4
Long: 3
Short: 1


### Multiple Fields

Awk is designed for easy handling of data with multiple fields per row. 
The field delimiter can be specified with the -F option.

Here's a simple space-delimited file:

In [9]:
cat data/field_data.txt

Roses are red,
Violets are blue,
Sugar is sweet,
And so are you.


In [10]:
# If we specify the field seperator, we can print the second field from each row:

awk -F " " '{ print $2 }' data/field_data.txt
echo

# which is also a default
awk '{ print $2 }' data/field_data.txt


are
are
is
so

are
are
is
so


In [11]:
# We don't get an error if a line doesn't have the referenced field; it just shows up as blank
awk -F " " '{ print $4 }' data/field_data.txt




you.


In [12]:
# The seperator expression is interpreted as a regular expression.

awk -F "((so )?are|is) " '{print "Field 1: " $1 "\nField 2: " $2}' data/field_data.txt


Field 1: Roses 
Field 2: red,
Field 1: Violets 
Field 2: blue,
Field 1: Sugar 
Field 2: sweet,
Field 1: And 
Field 2: you.


### Regular Expressions

Patterns can be regular expressions, not just built-in functions. From the man page:

Regular expressions are as defined in re_format(7) - 
Isolated regular expressions in a pattern apply to the entire line.

In [13]:
ls -la dict

total 1172
drwxr-xr-x 4 jovyan users 128 Oct 20 17:01 .
drwxr-xr-x 17 jovyan users 544 Oct 20 18:21 ..
-rw-r--r-- 1 jovyan users 256374 Oct 20 16:59 8927565-d9783627c731268fb2935a731a618aa8e95cf465.zip
-rw-r--r-- 1 jovyan users 938847 Feb 11 2014 words


In [22]:
awk '/^[a-z][aeiou]{4}/' dict/words

gooier
gooiest
queue
queue's
queued
queues
queuing


## Using simple filters
From GNU manual - https://www.gnu.org/software/gawk/manual/html_node/Very-Simple.html#Very-Simple

In [26]:
# print all lines contaning 'li'
awk '/li/ { print $0 }' data/mail_data.txt

Amelia 555-5553 amelia.zodiacusque@gmail.com F
Broderick 555-0542 broderick.aliquotiens@yahoo.com R
Julie 555-6699 julie.perscrutabor@skeeve.com F
Samuel 555-3430 samuel.lanceolis@shu.edu A


In [27]:
# Print length of longest word in dictionary

awk '{ if (length($0) > max) max = length($0) }
 END { print max }' dict/words

23


In [30]:
# print total number of bytes used by files


echo The raw data
ls -l data 
echo ===========
ls -l data | awk '{ x += $5 }
 END { print "total bytes: " x }'
 
ls -l data | awk '{ x += $5 }
 END { print "total K-bytes:", x / 1024 }' 

The raw data
total 16
-rw-r--r-- 1 jovyan users 65 Oct 20 16:45 field_data.txt
-rw-r--r-- 1 jovyan users 320 Oct 20 22:04 inventory_shipped.txt
-rw-r--r-- 1 jovyan users 22 Oct 20 16:25 letters.txt
-rw-r--r-- 1 jovyan users 659 Oct 20 21:59 mail_data.txt
total bytes: 1066
total K-bytes: 1.04102


In [31]:
# Print a sorted list of the login names of all users:
awk -F: '{ print $1 }' /etc/passwd | sort

_apt
backup
bin
daemon
games
gnats
irc
jovyan
list
lp
mail
man
news
nobody
proxy
root
sync
sys
uucp
www-data


In [32]:
# Count the number of lines
awk 'END { print NR }' /etc/passwd

20


In [34]:
# print even numbered lines from file

awk 'NR % 2 == 0' data/mail_data.txt
echo
echo 'Full file'
cat data/mail_data.txt

Anthony 555-3412 anthony.asserturo@hotmail.com A
Bill 555-1675 bill.drowning@hotmail.com A
Camilla 555-2912 camilla.infusarum@skynet.be R
Julie 555-6699 julie.perscrutabor@skeeve.com F
Samuel 555-3430 samuel.lanceolis@shu.edu A

Full file
Amelia 555-5553 amelia.zodiacusque@gmail.com F
Anthony 555-3412 anthony.asserturo@hotmail.com A
Becky 555-7685 becky.algebrarum@gmail.com A
Bill 555-1675 bill.drowning@hotmail.com A
Broderick 555-0542 broderick.aliquotiens@yahoo.com R
Camilla 555-2912 camilla.infusarum@skynet.be R
Fabius 555-1234 fabius.undevicesimus@ucb.edu F
Julie 555-6699 julie.perscrutabor@skeeve.com F
Martin 555-6480 martin.codicibus@hotmail.com A
Samuel 555-3430 samuel.lanceolis@shu.edu A
Jean-Paul 555-2127 jeanpaul.campanorum@nyu.edu R

### Using more than one rule / file

In [35]:
# The awk utility reads the input files one line at a time. 
# For each line, awk tries the patterns of each rule. 
# If several patterns match, then several actions execute in the order in which they appear in the awk program. 
# If no patterns match, then no actions run.


awk '/12/ { print $0 }
/21/ { print $0 }' data/mail_data.txt data/inventory_shipped.txt

# Note how the line beginning with ‘Jean-Paul’ in mail-list was printed twice, once for each rule.

Anthony 555-3412 anthony.asserturo@hotmail.com A
Camilla 555-2912 camilla.infusarum@skynet.be R
Fabius 555-1234 fabius.undevicesimus@ucb.edu F
Jean-Paul 555-2127 jeanpaul.campanorum@nyu.edu R
Jean-Paul 555-2127 jeanpaul.campanorum@nyu.edu R
Jan 21 36 64 620
Apr 21 70 74 514


In [38]:
# Sum of size of all files modified in November
ls -l /usr/bin | awk '$6 == "Nov" { sum += $5 }
 END { print sum }'

566520


### Environment variables
See https://www.gnu.org/software/gawk/manual/html_node/Environment-Variables.html#Environment-Variables

In [41]:
echo AWKPATH=$AWKPATH
echo AWKLIBPATH=$AWKLIBPATH

AWKPATH=
AWKLIBPATH=


### Using include files


In [42]:
cat test1.awk


BEGIN {
 print "This is script test1."
}

In [43]:
awk '@include "test1.awk" 
BEGIN {
 print "This is script test2."
}'

This is script test1.
This is script test2.


## Expressions

In [47]:
# constants - numerical - octal, decimal, hexa
awk 'BEGIN { printf "%d, %d, %d\n", 011, 11, 0x11 }'
echo ---

# number 8 is not valid in octal - will stop conversion
awk 'BEGIN { print "021 is", 021 ; print "018 is", 018 }'



9, 11, 17
---
021 is 17
018 is 1


In [63]:
# RegExp constants
awk '{ if ($0 ~ /^foote/ || $0 ~ /camels/)
 print "found", $0 }' dict/words
 
# Is same as 
echo ----
awk '{ if (/^foote/ || /camels/)
 print "found", $0 }' dict/words
 


found camels
found footed
----
found camels
found footed


### Passing variables into program

The -v option for Awk allows us to pass variables it the program. 
For example, we could use it to hard code constants.

In [15]:
awk -v pi=3.1415 'BEGIN { print pi }'

# The $USER will work in terminal, not in Jupyter or in Docker
awk -v curdir=$PWD 'BEGIN { print curdir }'

3.1415
/home/jovyan/work


In [72]:
# When is variable set: with -v, at the very beginning
awk -v n=2 '{ print $n }' data/inventory_shipped.txt data/mail_data.txt

echo ----
# But here - in order
awk '{ print $n }' n=1 data/inventory_shipped.txt n=3 data/mail_data.txt


13
15
15
31
16
31
24
15
13
29
20
17

21
26
24
21
555-5553
555-3412
555-7685
555-1675
555-0542
555-2912
555-1234
555-6699
555-6480
555-3430
555-2127
----
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec

Jan
Feb
Mar
Apr
amelia.zodiacusque@gmail.com
anthony.asserturo@hotmail.com
becky.algebrarum@gmail.com
bill.drowning@hotmail.com
broderick.aliquotiens@yahoo.com
camilla.infusarum@skynet.be
fabius.undevicesimus@ucb.edu
julie.perscrutabor@skeeve.com
martin.codicibus@hotmail.com
samuel.lanceolis@shu.edu
jeanpaul.campanorum@nyu.edu


In [73]:
# How awk Converts Between Strings and Numbers

awk 'BEGIN {two = 2; three = 3
print (two three) + 4}'

# The numeric values of the variables two and three are converted to strings and concatenated together. 
# The resulting string is converted back to the number 23, to which 4 is then added.

27


In [75]:
# Locale enforced

awk 'BEGIN { printf "%g\n", 3.1415927 }'
LC_ALL=en_DK.utf-8 awk 'BEGIN { printf "%g\n", 3.1415927 }'

3.14159
3.14159


## Operators

The following list provides the arithmetic operators in awk, in order from the highest precedence to the lowest:
```
x ^ y
x ** y
Exponentiation; x raised to the y power. ‘2 ^ 3’ has the value eight; the character sequence ‘**’ is equivalent to ‘^’. (c.e.)

- x
Negation.

+ x
Unary plus; the expression is converted to a number.

x * y
Multiplication.

x / y
Division; because all numbers in awk are floating-point numbers, the result is not rounded to an integer—‘3 / 4’ has the value 0.75. (It is a common mistake, especially for C programmers, to forget that all numbers in awk are floating point, and that division of integer-looking constants produces a real number, not an integer.)

x % y
Remainder; further discussion is provided in the text, just after this list.

x + y
Addition.

x - y
Subtraction.
```

In [83]:
# Concatenation

# concatenation is performed by writing expressions next to one another, with no operator. 
awk '/^B/ { print "Field number one: " $1 }' data/mail_data.txt
# Without the space in the string constant after the ‘:’, the line runs together. 
echo ----
awk '/^B/ { print "Field number one:" $1 }' data/mail_data.txt


Field number one: Becky
Field number one: Bill
Field number one: Broderick
----
Field number one:Becky
Field number one:Bill
Field number one:Broderick


In [84]:
# The precedence of concatenation, when mixed with other operators, is often counter-intuitive. Consider this example:
awk 'BEGIN { print -12 " " -24 }'
# But where did the space disappear to?
awk 'BEGIN { print -12 " " (-24) }'
# This forces awk to treat the ‘-’ on the ‘-24’ as unary. Otherwise, it’s parsed as follows
## -12 (" " - 24)
## ⇒ -12 (0 - 24)
## ⇒ -12 (-24)
## ⇒ -12-24

-12-24
-12 -24


### Assignment

This is OK

x = y = z = 5

So is this

## Truth values

Any nonzero numeric value or any nonempty string value is true.

In [89]:
awk 'BEGIN {
 if (3.1415927)
 print "A strange truth value"
 if ("Four Score And Seven Years Ago")
 print "A strange truth value"
 if (j = 57)
 print "A strange truth value"
 
}'

# There is a surprising consequence of the “nonzero or non-null” rule: 
# the string constant "0" is actually true, because it is non-null. 

awk 'BEGIN {
 if (0)
 print "Numerical zero"
 if ("0")
 print "String zero"
 if (0 / 123)
 print "expression"
 
}'

A strange truth value
A strange truth value
A strange truth value
String zero


In [96]:
# Speaking of Types
awk 'BEGIN { print (a == "" && a == 0 ? "a is untyped" : "a has a type"); print typeof(a); }'
#awk 'BEGIN { a = 42 ; print typeof(a); b = a ; print typeof(b); }'

# Typeof => from 4.2 

a is untyped
awk: cmd. line:1: fatal: function `typeof' not defined


: 2

In [98]:
# Since ‘hello’ is alphabetic data, awk can only do a string comparison. 
# Internally, it converts 42 into "42" and compares the two string values "hello" and "42". Here’s the result:

echo hello | awk '{ printf("%s %s < 42\n", $1,
 ($1 < 42 ? "is" : "is not")) }'

# However, what happens when data from a user looks like a number? On the one hand, in reality, 
# the input data consists of characters, not binary numeric values. 
# But, on the other hand, the data looks numeric, and awk really ought to treat it as such. And indeed, it does:
echo 37 | awk '{ printf("%s %s < 42\n", $1,
 ($1 < 42 ? "is" : "is not")) }' 

hello is not < 42
37 is < 42


In [99]:
echo ' +3.14' | awk '{ print($0 == " +3.14") }' # True
echo ' +3.14' | awk '{ print($0 == "+3.14") }' # False
echo ' +3.14' | awk '{ print($0 == "3.14") }' # False
echo ' +3.14' | awk '{ print($0 == 3.14) }' # True

echo ' +3.14' | awk '{ print($1 == " +3.14") }' # False
echo ' +3.14' | awk '{ print($1 == "+3.14") }' # True
echo ' +3.14' | awk '{ print($1 == "3.14") }' # False
echo ' +3.14' | awk '{ print($1 == 3.14) }' # True


1
0
0
1
0
1
0
1


In [109]:
# echo hello 37 | awk '{ for(k in PROCINFO["identifiers"]) print(k, PROCINFO["identifiers"][k]) }'
echo hello 37 | awk -v a="hello" -v b=37 '{ print(a, PROCINFO["identifiers"]["a"]) }'

hello scalar


In [87]:
# There are situations where using ‘+=’ (or any assignment operator) 
# is not the same as simply repeating the lefthand operand in the righthand expression. For example:

# Thanks to Pat Rankin for this example
awk 'BEGIN {
 foo[rand()] += 5
 for (x in foo)
 print x, foo[x]

 bar[rand()] = bar[rand()] + 5
 for (x in bar)
 print x, bar[x]
}'

0.237788 5
0.845814 5
0.291066 


In [79]:
cat data/inventory_shipped.txt
awk '{ sum = $2 + $3 + $4 ; avg = sum / 3; print $1, avg }' data/inventory_shipped.txt

Jan 13 25 15 115
Feb 15 32 24 226
Mar 15 24 34 228
Apr 31 52 63 420
May 16 34 29 208
Jun 31 42 75 492
Jul 24 34 67 436
Aug 15 34 47 316
Sep 13 55 37 277
Oct 29 54 68 525
Nov 20 87 82 577
Dec 17 35 61 401

Jan 21 36 64 620
Feb 26 58 80 652
Mar 24 75 70 495
Apr 21 70 74 514Jan 17.6667
Feb 23.6667
Mar 24.3333
Apr 48.6667
May 26.3333
Jun 49.3333
Jul 41.6667
Aug 32
Sep 35
Oct 50.3333
Nov 63
Dec 37.6667
 0
Jan 40.3333
Feb 54.6667
Mar 56.3333
Apr 55


In [117]:
awk 'BEGIN { if (! ("HOME" in ENVIRON)) print "no home!"; else print("HOME", ENVIRON['HOME']);}'

HOME 


## Patterns

Patterns in awk control the execution of rules—a rule is executed when its pattern matches the current input record. The following is a summary of the types of awk patterns:
```
/regular expression/
A regular expression. It matches when the text of the input record fits the regular expression. (See Regexp.)

expression
A single expression. It matches when its value is nonzero (if a number) or non-null (if a string). (See Expression Patterns.)

begpat, endpat
A pair of patterns separated by a comma, specifying a range of records. The range includes both the initial record that matches begpat and the final record that matches endpat. (See Ranges.)

BEGIN
END
Special patterns for you to supply startup or cleanup actions for your awk program. (See BEGIN/END.)

BEGINFILE
ENDFILE
Special patterns for you to supply startup or cleanup actions to be done on a per-file basis. (See BEGINFILE/ENDFILE.)

empty
The empty pattern matches every input record. (See Empty.)
```


In [119]:
# Using shell variables
# Note the shell friendly quoting
PATTERN=zoom
awk "/$PATTERN/"'{nmatches++;} END {print nmatches}' dict/words

5


## Actions
An awk program or script consists of a series of rules and function definitions interspersed. (Functions are described later. See User-defined.) A rule contains a pattern and an action, either of which (but not both) may be omitted. The purpose of the action is to tell awk what to do once a match for the pattern is found. Thus, in outline, an awk program generally looks like this:
```
[pattern] { action }
 pattern [{ action }]
…
function name(args) { … }
…
```

An action consists of one or more awk statements, enclosed in braces (‘{…}’). Each statement specifies one thing to do. The statements are separated by newlines or semicolons. The braces around an action must be used even if the action contains only one statement, or if it contains no statements at all. However, if you omit the action entirely, omit the braces as well. An omitted action is equivalent to ‘{ print $0 }’:

```
/foo/ { } match foo, do nothing — empty action
/foo/ match foo, print the record — omitted action
```

### If-else Statements
If-else statements in Awk are of the form:

if (condition) then-body [else else-body]

For example:

In [16]:
printf "1\n2\n3\n4" | awk \
 '{ \
 if ($1 % 2 == 0) print $1, "is even"; \
 else print $1, "is odd" \
 }'

1 is odd
2 is even
3 is odd
4 is even


### Looping
Awk includes several looping statements: while, do while, and for.

They take the expected C-ish syntax.

In [17]:
awk \
 'BEGIN { \
 i = 0; \
 while (i < 5) { print i; i+=1; } \
 }'

0
1
2
3
4


In [120]:
awk '
{
 i = 1
 while (i <= 3) {
 print $i
 i++
 }
}' data/inventory_shipped.txt

Jan
13
25
Feb
15
32
Mar
15
24
Apr
31
52
May
16
34
Jun
31
42
Jul
24
34
Aug
15
34
Sep
13
55
Oct
29
54
Nov
20
87
Dec
17
35



Jan
21
36
Feb
26
58
Mar
24
75
Apr
21
70


In [18]:
awk \
 'BEGIN { \
 i = 0; \
 do { print i; i+=1; } while(i < 5) \
 }'

0
1
2
3
4


In [19]:
awk \
 'BEGIN { \
 i = 0; \
 for(i = 0; i<5; i++) print i \
 }'

0
1
2
3
4


In [20]:
awk --version

GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)
Copyright (C) 1989, 1991-2016 Free Software Foundation.

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.


In [124]:
# next - skip the line
cat data/field_data.txt
echo ----
awk 'NF != 4 {
 printf("%s:%d: skipped: NF != 4\n", FILENAME, FNR) 
 next
}' data/field_data.txt




Roses are red,
Violets are blue,
Sugar is sweet,
And so are you.
----
data/field_data.txt:1: skipped: NF != 4
data/field_data.txt:2: skipped: NF != 4
data/field_data.txt:3: skipped: NF != 4


# Variables

## Built-in variables

https://www.gnu.org/software/gawk/manual/html_node/User_002dmodified.html#User_002dmodified

### FIELDWIDTHS 

A space-separated list of columns that tells gawk how to split input with fixed columnar boundaries. Starting in version 4.2, each field width may optionally be preceded by a colon-separated value specifying the number of characters to skip before the field starts. Assigning a value to FIELDWIDTHS overrides the use of FS and FPAT for field splitting. See Constant Size for more information.

### FPAT 

A regular expression (as a string) that tells gawk to create the fields based on text that matches the regular expression. Assigning a value to FPAT overrides the use of FS and FIELDWIDTHS for field splitting. See Splitting By Content for more information.

### FS

The input field separator (see Field Separators). The value is a single-character string or a multicharacter regular expression that matches the separations between fields in an input record. If the value is the null string (""), then each character in the record becomes a separate field. (This behavior is a gawk extension. POSIX awk does not specify the behavior when FS is the null string. Nonetheless, some other versions of awk also treat "" specially.)

The default value is " ", a string consisting of a single space. As a special exception, this value means that any sequence of spaces, TABs, and/or newlines is a single separator. It also causes spaces, TABs, and newlines at the beginning and end of a record to be ignored.

You can set the value of FS on the command line using the -F option:

```
awk -F, 'program' input-files
```

If gawk is using FIELDWIDTHS or FPAT for field splitting, assigning a value to FS causes gawk to return to the normal, FS-based field splitting. An easy way to do this is to simply say ‘FS = FS’, perhaps with an explanatory comment.

### IGNORECASE 

If IGNORECASE is nonzero or non-null, then all string comparisons and all regular expression matching are case-independent. This applies to regexp matching with ‘~’ and ‘!~’, the gensub(), gsub(), index(), match(), patsplit(), split(), and sub() functions, record termination with RS, and field splitting with FS and FPAT. However, the value of IGNORECASE does not affect array subscripting and it does not affect field splitting when using a single-character field separator. See Case-sensitivity.

### OFMT
A string that controls conversion of numbers to strings (see Conversion) for printing with the print statement. It works by being passed as the first argument to the sprintf() function (see String Functions). Its default value is "%.6g". Earlier versions of awk used OFMT to specify the format for converting numbers to strings in general expressions; this is now done by CONVFMT.

### OFS
The output field separator (see Output Separators). It is output between the fields printed by a print statement. Its default value is " ", a string consisting of a single space.

### ORS
The output record separator. It is output at the end of every print statement. Its default value is "\n", the newline character. (See Output Separators.)

### RS
The input record separator. Its default value is a string containing a single newline character, which means that an input record consists of a single line of text. It can also be the null string, in which case records are separated by runs of blank lines. If it is a regexp, records are separated by matches of the regexp in the input text. (See Records.)

The ability for RS to be a regular expression is a gawk extension. In most other awk implementations, or if gawk is in compatibility mode (see Options), just the first character of RS’s value is used.


## Built-in variables set by AWK

### ARGC, ARGV

The command-line arguments available to awk programs are stored in an array called ARGV. ARGC is the number of command-line arguments present. See Other Arguments. Unlike most awk arrays, ARGV is indexed from 0 to ARGC - 1. In the following example:



In [127]:
awk 'BEGIN {
 for (i = 0; i < ARGC; i++)
 print ARGV[i] }' data/field_data.txt data/inventory_shipped.txt

awk
data/field_data.txt
data/inventory_shipped.txt


### ARGIND
The index in ARGV of the current file being processed. Every time gawk opens a new data file for processing, it sets ARGIND to the index in ARGV of the file name. When gawk is processing the input files, ‘FILENAME == ARGV[ARGIND]’ is always true.

This variable is useful in file processing; it allows you to tell how far along you are in the list of data files as well as to distinguish between successive instances of the same file name on the command line.

While you can change the value of ARGIND within your awk program, gawk automatically sets it to a new value when it opens the next file.

### ENVIRON
An associative array containing the values of the environment. The array indices are the environment variable names; the elements are the values of the particular environment variables. For example, ENVIRON["HOME"] might be /home/arnold.

For POSIX awk, changing this array does not affect the environment passed on to any programs that awk may spawn via redirection or the system() function.

However, beginning with version 4.2, if not in POSIX compatibility mode, gawk does update its own environment when ENVIRON is changed, thus changing the environment seen by programs that it creates. You should therefore be especially careful if you modify ENVIRON["PATH"], which is the search path for finding executable programs.

This can also affect the running gawk program, since some of the built-in functions may pay attention to certain environment variables. The most notable instance of this is mktime() (see Time Functions), which pays attention the value of the TZ environment variable on many systems.

Some operating systems may not have environment variables. On such systems, the ENVIRON array is empty (except for ENVIRON["AWKPATH"] and ENVIRON["AWKLIBPATH"]; see AWKPATH Variable and see AWKLIBPATH Variable).

### ERRNO
If a system error occurs during a redirection for getline, during a read for getline, or during a close() operation, then ERRNO contains a string describing the error.

In addition, gawk clears ERRNO before opening each command-line input file. This enables checking if the file is readable inside a BEGINFILE pattern (see BEGINFILE/ENDFILE).

Otherwise, ERRNO works similarly to the C variable errno. Except for the case just mentioned, gawk never clears it (sets it to zero or ""). Thus, you should only expect its value to be meaningful when an I/O operation returns a failure value, such as getline returning -1. You are, of course, free to clear it yourself before doing an I/O operation.

If the value of ERRNO corresponds to a system error in the C errno variable, then PROCINFO["errno"] will be set to the value of errno. For non-system errors, PROCINFO["errno"] will be zero.

### FILENAME
The name of the current input file. When no data files are listed on the command line, awk reads from the standard input and FILENAME is set to "-". FILENAME changes each time a new file is read (see Reading Files). Inside a BEGIN rule, the value of FILENAME is "", because there are no input files being processed yet.39 (d.c.) Note, though, that using getline (see Getline) inside a BEGIN rule can give FILENAME a value.

### FNR
The current record number in the current file. awk increments FNR each time it reads a new record (see Records). awk resets FNR to zero each time it starts a new input file.

### NF

The number of fields in the current input record. NF is set each time a new record is read, when a new field is created, or when $0 changes (see Fields).

Unlike most of the variables described in this subsection, assigning a value to NF has the potential to affect awk’s internal workings. In particular, assignments to NF can be used to create fields in or remove fields from the current record. See Changing Fields.

### FUNCTAB
An array whose indices and corresponding values are the names of all the built-in, user-defined, and extension functions in the program.

NOTE: Attempting to use the delete statement with the FUNCTAB array causes a fatal error. Any attempt to assign to an element of FUNCTAB also causes a fatal error.

### NR
The number of input records awk has processed since the beginning of the program’s execution (see Records). awk increments NR each time it reads a new record.

### PROCINFO
The elements of this array provide access to information about the running awk program. The following elements (listed alphabetically) are guaranteed to be available:

https://www.gnu.org/software/gawk/manual/html_node/Auto_002dset.html#Auto_002dset

PROCINFO["identifiers"]
A subarray, indexed by the names of all identifiers used in the text of the awk program. An identifier is simply the name of a variable (be it scalar or array), built-in function, user-defined function, or extension function. 

PROCINFO["pgrpid"]
The process group ID of the current process.

PROCINFO["pid"]
The process ID of the current process.

PROCINFO["ppid"]
The parent process ID of the current process.

PROCINFO["strftime"]
The default time format string for strftime(). Assigning a new value to this element changes the default. See Time Functions.

PROCINFO["uid"]
The value of the getuid() system call.

PROCINFO["version"]
The version of gawk.

The following additional elements in the array are available to provide information about the MPFR and GMP libraries if your version of gawk supports arbitrary-precision arithmetic (see Arbitrary Precision Arithmetic):

PROCINFO["gmp_version"]
The version of the GNU MP library.

PROCINFO["mpfr_version"]
The version of the GNU MPFR library.

PROCINFO["prec_max"]
The maximum precision supported by MPFR.

PROCINFO["prec_min"]
The minimum precision required by MPFR.

The following additional elements in the array are available to provide information about the version of the extension API, if your version of gawk supports dynamic loading of extension functions (see Dynamic Extensions):

PROCINFO["api_major"]
The major version of the extension API.

PROCINFO["api_minor"]
The minor version of the extension API.

### RLENGTH
The length of the substring matched by the match() function (see String Functions). RLENGTH is set by invoking the match() function. Its value is the length of the matched string, or -1 if no match is found.

### RSTART
The start index in characters of the substring that is matched by the match() function (see String Functions). RSTART is set by invoking the match() function. Its value is the position of the string where the matched substring starts, or zero if no match was found.

### RT
The input text that matched the text denoted by RS, the record separator. It is set every time a record is read.

### SYMTAB
An array whose indices are the names of all defined global variables and arrays in the program. SYMTAB makes gawk’s symbol table visible to the awk programmer. It is built as gawk parses the program and is complete before the program starts to run.

In [131]:
awk 'BEGIN { foo = 5; SYMTAB["foo"] = 4; print foo }' 

4


In [134]:
# You may use an index for SYMTAB that is not a predefined identifier:

awk 'BEGIN { SYMTAB["xxx"] = 5
print SYMTAB["xxx"] }'


5


In [135]:
awk '
# Indirect multiply of any variable by amount, return result

function multiply(variable, amount)
{
 return SYMTAB[variable] *= amount
}

BEGIN {
 answer = 10.5
 multiply("answer", 4)
 print "The answer is", answer
}
'

The answer is 42


In [139]:
# changing NR

echo 'a
b
c
d' | awk 'NR == 2 { NR = 17 };
 { print NR }'

1
17
18
19
