File Test Operators
Before we start a program that creates a new file, let’s make sure that the file doesn’t already exist so that we don’t accidentally overwrite a vital spreadsheet datafile or that important birthday calendar. For this, we use the -e
file test, testing a filename for existence:
my $filename = "test1.txt";
die "Oops! A file called '$filename' already exists.\n" if -e $filename;
Notice that we don’t include $! in this die message, since we’re not reporting that the system refused a request in this case. Here’s an example of checking whether a file is being kept up-to-date. In this case, we’re testing an already opened filehandle, instead of a string filename. Let’s say that our program’s configuration file should be updated every week or two. (Maybe it’s checking for computer viruses, say.) If the file hasn’t been modified in the past 28 days, then something is wrong:
warn "Config file is looking pretty old!\n"
if -M CONFIG > 28;
The third example is more complex. Here, let’s say that disk space is filling up and rather than buy more disks, we’ve decided to move any large, useless files to the backup tapes. So let’s go through our list of files to see which of them are larger than 100 KB. But even if a file is large, we shouldn’t move it to the backup tapes unless it hasn’t been accessed in the last 90 days (so we know that it’s not used too often):
my @original_files = qw / fred barney betty wilma pebbles dino bamm-bamm /;
my @big_old_files;
foreach my $filename (@original_files) {
push @big_old_files, $filename
if -s $filename > 100_000 and -A $filename > 90;
}
Here is the table of the complete File tests and their meanings.
File test | Meaning |
---|---|
-r | File or directory is readable by this (effective) user or group |
-w | File or directory is writable by this (effective) user or group |
-x | File or directory is executable by this (effective) user or group |
-o | File or directory is owned by this (effective) user |
-R | File or directory is readable by this real user or group |
-W | File or directory is writable by this real user or group |
-X | File or directory is executable by this real user or group |
-O | File or directory is owned by this real user or group |
-e | File or directory name exists |
-z | File exists and has zero size(always false for directories) |
-s | File or directory exists and has nonzero size(the value is the size in bytes) |
-f | Entry is a plain file |
-d | Entry is a directory |
-I | Entry is a symbolic link |
-S | Entry is a socket |
-p | Entry is a named pipe(a "fifo") |
-b | Entry is a block-special file(like a mountable disk) |
-c | Entry is a character-special file(like an I/O device) |
-u | File or directory is setuid |
-g | File or directory is setgid |
-k | File or directory has the sticky bit set |
-t | The filehandle is a TTY( as reported by the isatty() system function; filenames can't be tested by this test) |
-T | File looks like a "text" file |
-B | File looks like a "binary" file |
-M | Modification age (measured in days) |
-A | Access age (measured in days) |
-C | Inode-modification age (measured in days) |
The tests -r
, -w
, -x
, and -o
tell whether the given attribute is true for the effective user or group ID, which essentially refers to the person who is “in charge of” running the program. These tests look at the “permission bits” on the file to see what is permitted. If your system uses Access Control Lists (ACLs), the tests will use those as well. These tests generally tell whether the system would try to permit something, but it doesn’t mean that it really would be possible. For example, -w
may be true for a file on a CD- ROM, even though you can’t write to it, or -x
may be true on an empty file, which can’t truly be executed.
The -s
test does return true if the file is nonempty, but it’s a special kind of true. It’s the length of the file, measured in bytes, which evaluates as true for a nonzero number.
On a Unix filesystem, there are just seven types of items, represented by the seven file tests: -f
, -d
, -l
, -S
, -p
, -b
, and -c
. Any item should be one of those. But if you have a symbolic link pointing to a file, that will report true for both -f
and -l
. So, if you want to know whether something is a symbolic link, you should generally test that first.
The age tests, -M
, -A
, and -C
(yes, they’re uppercase), return the number of days since the file was last modified, accessed, or had its inode changed. (The inode contains all of the information about the file except for its contents—see the stat
system call man-page or a good book on Unix internals for details.) This age value is a full floating-point number, so you might get a value of 2.00001 if a file were modified two days and one second ago. (These “days” aren’t necessarily the same as a human would count; for example, if it’s 1:30 in the morning when you check a file modified at about an hour before midnight, the value of -M for this file would be around 0.1, even though it was modified “yesterday.”)
When checking the age of a file, you might even get a negative value like −1.2, which means that the file’s last-access timestamp is set at about 30 hours in the future! The zero point on this timescale is the moment your program started running, so that value might mean that a long-running program was looking at a file that had just been accessed. Or a timestamp could be set (accidentally or intentionally) to a time in the future.
The tests -T
and -B
take a try at telling whether a file is text or binary. But people who know a lot about filesystems know that there’s no bit (at least in Unix-like operating systems) to indicate that a file is a binary or text file—so how can Perl tell? The answer is that Perl cheats: it opens the file, looks at the first few thousand bytes, and makes an educated guess. If it sees a lot of null bytes, unusual control characters, and bytes with the high bit set, then that looks like a binary file. If there’s not much weird stuff then it looks like text. As you might guess, it sometimes guesses wrong. If a text file has a lot of Swedish or French words (which may have characters represented with the high bit set, as some ISO-8859-something variant, or perhaps even a Unicode version), it may fool Perl into declaring it binary. So it’s not perfect, but if you just need to separate your source code from compiled files, or HTML files from PNGs, these tests should do the trick.
You’d think that -T
and -B
would always disagree, since a text file isn’t a binary and vice versa, but there are two special cases where they’re in complete agreement. If the file doesn’t exist, or can’t be read, both are false, since it’s neither a text file nor a binary. Alternatively, if the file is empty, it’s an empty text file and an empty binary file at the same time, so they’re both true.
The -t
file test returns true if the given filehandle is a TTY—in short, if it’s interactive because it’s not a simple file or pipe. When -t STDIN returns true, it generally means that you can interactively ask the user questions. If it’s false, your program is probably getting input from a file or pipe, rather than a keyboard.
If you omit the filename or filehandle parameter to a file test (that is, if you have just -r
or just -s
, say), the default operand is the file named in $_. So, to test a list of filenames to see which ones are readable, you simply type:
foreach (@lots_of_filenames) {
print "$_ is readable\n" if -r; # same as -r $_
}
But if you omit the parameter, be careful that whatever follows the file test doesn’t look like it could be a parameter. For example, if you wanted to find out the size of a file in KB rather than in bytes, you might be tempted to divide the result of -s by 1000 (or 1024), like this:
# The filename is in $_
my $size_in_K = -s / 1000; # Oops!
When the Perl parser sees the slash, it doesn’t think about division; since it’s looking for the optional operand for -s
, it sees what looks like the start of a regular expression in forward slashes. One simple way to prevent this kind of confusion is to put parentheses around the file test:
my $size_in_k = (-s) / 1024; # Uses $_ by default
Testing Several Attributes of the Same File
We can use more than one file test on the same file to create a complex logical condition. Suppose we want to operate on only files that are both readable and writable. We check each attribute and combine them with and:
if( -r $file and -w $file ) {
#...
}
This is an expensive operation, though. Each time you perform a file test, Perl asks the filesystem for all of the information about the file (Perl’s actually doing a stat each time, which we talk about in the next section). Although we already got that information when we tested -r
, Perl asks for the same information again so it can test -w
. What a waste! This can be a significant performance problem if you’re testing many attributes on many files.
Perl has a special shortcut to help us not do so much work. The virtual file handle _ (just the underscore) uses the information from the last file lookup that a file test operator performed. Perl only has to look up the file information once now:
if( -r $file and -w _ ) {
#...
}
We don’t have to use the file tests next to each other to use _. Here we have them in separate if conditions:
if ( -r $file) {
print "The file is readable!\n"
}
if ( -w _ ) {
print "The file is writable!\n"
}
We have to watch out that we know what the last file lookup really was, though. If we do something else between the file tests, such as call a subroutine, the last file we looked up might be different. For instance, this example calls the lookup subroutine, which has a file test in it. When we return from that subroutine and do another file test, the _ filehandle isn’t for $file
like we expect, but for $other_file
:
if( -r $file ) {
print "The file is readable!\n";
}
lookup( $other_file );
if( -w _ ) {
print "The file is writable!\n";
}
sub lookup {
return -w $_[0];
}
Stacked File Test Operators
Previous to Perl 5.10, if we wanted to test several file attributes at the same time, we had to test them individually, even if using the _ file handle to save ourselves some work. Perl 5.10 lets us “stack” our file test operators by lining them all up before the filename:
use 5.010;
if( -w -r $file ) {
print "The file is both readable and writable!\n";
}
This stacked example is the same as the previous example with just a change in syntax, although, it looks like the file tests are reversed, Perl does the file test nearest the file- name first. Normally this isn’t going to matter.
Stacked file tests are especially handy for complex situations. Suppose we want to list all of the directories that are readable, writable, executable, and owned by our user. We just need the right set of file tests:
use 5.010;
if( -r -w -x -o -d $file ) {
print "My directory is readable, writable, and executable!\n";
}
Stacked file tests aren’t good for those that return values other than true or false that we would want to use in a comparison. We might think that this next bit of code first tests that it’s a directory and then tests that it is less than 512 bytes, but it doesn’t:
use 5.010;
if( -s -d $file < 512) { # WRONG! DON'T DO THIS
print "The directory is less than 512 bytes!\n";
}
Rewriting the stacked file tests as the previous notation shows us what is going on. The result of the combination of the file tests becomes the argument for the comparison:
if( ( -d $file and -s _ ) < 512 ) {
print "The directory is less than 512 bytes!\n";
}
When the -d returns false, Perl compares that false value to 512. That turns out to be true since false will be 0, which just happens to be less than 512. Instead of worrying about that sort of confusion, we just write it as separate file tests to be nice to the maintenance programmers who come after us:
if( -d $file and -s _ < 512 ) {
print "The directory is less than 512 bytes!\n";
}
The stat and lstat Functions
While these file tests are fine for testing various attributes regarding a particular file or file handle, they don’t tell the whole story. For example, there’s no file test that returns the number of links to a file or the owner’s user ID (uid). To get at the remaining information about a file, merely call the stat function, which returns pretty much everything that the stat Unix system call returns (hopefully more than you want to know).* The operand to stat is a file handle (including the _ virtual file handle), or an expression that evaluates to a filename. The return value is either the empty list, indicating that the stat failed (usually because the file doesn’t exist), or a 13-element list of numbers, most easily described using the following list of scalar variables:
my($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtime, $ctime, $blksize, $blocks) = stat($filename);
The names here refer to the parts of the stat structure, described in detail in the stat(2) manpage. Here is a quick summary of the important ones:
- $dev and $ino
The device number and inode number of the file. Together they make up a “license plate” for the file. Even if it has more than one name (hard link), the combination of device and inode numbers should always be unique. - $mode
The set of permission bits for the file, and some other bits. If you’ve ever used the Unix command ls -l to get a detailed (long) file listing, you’ll see that each line of output starts with something like-rwxr-xr-x
. The nine letters and hyphens of file permissions† correspond to the nine least-significant bits of $mode, which would, in this case, give the octal number 0755. -
$nlink
The number of (hard) links to the file or directory. This is the number of true names that the item has. This number is always 2 or more for directories and (usually) 1 for files. -
$uid and $gid
The numeric user ID and group ID showing the file’s ownership. -
$size
The size in bytes, as returned by the -s file test. -
$atime, $mtime, and $ctime
The three timestamps, but here they’re represented in the system’s timestamp format: a 32-bit number telling how many seconds have passed since the Epoch, an arbitrary starting point for measuring system time. On Unix systems and some others, the Epoch is the beginning of 1970 at midnight Universal Time, but the Epoch is different on some machines.
Invoking stat on the name of a symbolic link returns information on what the symbolic link points at, not information about the symbolic link itself (unless the link just hap- pens to be pointing at nothing currently accessible). If you need the (mostly useless) information about the symbolic link itself, use lstat rather than stat (which returns the same information in the same order). If the operand isn’t a symbolic link, lstat returns the same things that stat would.
Like the file tests, the operand of stat or lstat defaults to $_, meaning that the under- lying stat system call will be performed on the file named by the scalar variable $_.
The locatime Function
When you have a timestamp number (such as the ones from stat), it will typically look something like 1180630098. That’s not very useful for most humans, unless you need to compare two timestamps by subtracting. You may need to convert it to something human-readable, such as a string like Thu May 31 09:48:18 2007. Perl can do that with the localtime function in a scalar context:
my $timestamp = 1180630098;
my $date = localtime $timestamp;
In a list context, localtime returns a list of numbers, several of which may not be quite what you’d expect:
my($sec, $min, $hour, $day, $mon, $year, $wday, $yday, $isdst) = localtime $timestamp;
The $mon is a month number, ranging from 0 to 11, which is handy as an index into an array of month names. The $year is the number of years since 1900, oddly enough, so add 1900 to get the real year number. The $wday ranges from 0 (for Sunday) through 6 (for Saturday), and the $yday is the day-of-the-year (ranging from 0 for January 1, through 364 or 365 for December 31).
There are two related functions that you’ll also find useful. The gmtime function is just the same as localtime, except that it returns the time in Universal Time (what we once called Greenwich Mean Time). If you need the current timestamp number from the system clock, just use the time function. Both localtime and gmtime default to using the current time value if you don’t supply a parameter:
my $now = gmtime; # Get the current universal timestamp as a string
Bitwise Operators
Using Bitstrings
All of the bitwise operators can work with bitstrings, as well as with integers. If the operands are integers, the result will be an integer. If any operand of a bitwise operator is a string, Perl will perform the operation on thatbitstring.Thatis,"\xAA" | "\x55"will give the string"\xFF".Note that these values are single-byte strings; the result is a byte with all eight bits set. Bitstrings may be arbitrarily long.