Strings and Sorting

Finding a Substring with index

Finding a substring depends on where you have lost it. If you happen to have lost it within a bigger string, you’re in luck because the index function can help you out. Here’s how it looks:

$where = index($big, $small);

Perl locates the first occurrence of the small string within the big string, returning an integer location of the first character. The character position returned is a zero-based value—if the substring is found at the very beginning of the string, index returns 0. If it’s one character later, the return value is 1, and so on. If the substring can’t be found at all, the return value is −1 to indicate that. In this example, $where gets 6:

my $stuff = "Howdy world!";
my $where = index($stuff, "wor");

Another way you could think of the position number is the number of characters to skip over before getting to the substring. Since $where is 6, we know that we have to skip over the first six characters of $stuff before we find wor.

The index function will always report the location of the first found occurrence of the substring. But you can tell it to start searching at a later point than the start of the string by using the optional third parameter, which tells index to start at that position:

my $stuff = "Howdy world!";
my $where1 = index($stuff, "w"); # $where1 gets 2
my $where2 = index($stuff, "w", $where1 + 1);  # $where2 gets 6
my $where3 = index($stuff, "w", $where2 + 1); # $where3 gets −1 (not found)

Once in a while, you might prefer to have the last found occurrence of the substring. You can get that with the rindex function. In this example, we can find the last slash, which turns out to be at position 4 in a string:

my $last_slash = rindex("/etc/passwd", "/"); # value is 4

The rindex function also has an optional third parameter, but, in this case, it effectively
gives the maximum permitted return value:

my $fred = "Yabba dabba doo!";
my $where1 = rindex($fred, "abba"); # $where1 gets 7
my $where2 = rindex($fred, "abba", $where1 - 1); # $where2 gets 1 
my $where3 = rindex($fred, "abba", $where2 - 1); # $where3 gets −1

Manipulating a Substring with substr

The substr operator works with only a part of a larger string. It looks like this:

$part = substr($string, $initial_position, $length);

It takes three arguments: a string value, a zero-based initial position (like the return value of index), and a length for the substring. The return value is the substring:

my $mineral = substr("Fred J. Flintstone", 8, 5); # gets "Flint" 
my $rock = substr "Fred J. Flintstone", 13, 1000; # gets "stone"

As you may have noticed in the previous example, if the requested length (1000 characters, in this case) would go past the end of the string, there’s no complaint from Perl, but you simply get a shorter string than you might have. But if you want to be sure to go to the end of the string, however long or short it may be, just omit that third parameter (the length), like this:

my $pebble = substr "Fred J. Flintstone", 13; # gets "stone"

The initial position of the substring in the larger string can be negative, counting from the end of the string (that is, position −1 is the last character). In this example, position -3 is three characters from the end of the string, which is the location of the letter i:

my $out = substr("some very long string", -3, 2); # $out gets "in"

Now here’s something really cool: the selected portion of the string can be changed if the string is a variable:

my $string = "Hello, world!";
substr($string, 0, 5) = "Goodbye"; # $string is now "Goodbye, world!"

As you see, the assigned (sub)string doesn’t have to be the same length as the substring it’s replacing. The string’s length is adjusted to fit. Or if that wasn’t cool enough to impress you, you could use the binding operator (=~) to restrict an operation to work with just part of a string. This example replaces fred with barney wherever possible within just the last 20 characters of a string:

substr($string, −20) =~ s/fred/barney/g;

Besides assigning to the substr function (which looks a little weird at first glance, perhaps), you can also use substr in a slightly more traditional manner with the four-argument version, in which the fourth argument is the replacement substring:

my $previous_value = substr($string, 0, 5, "Goodbye");  # previous_value = "Hello"

Formatting Data with sprintf

The sprintf function takes the same arguments as printf (except for the optional file handle, of course), but it returns the requested string instead of printing it. This is handy if you want to store a formatted string into a variable for later use, or if you want more control over the result than printf alone would provide:

my $date_tag = sprintf "%4d/%02d/%02d %2d:%02d:%02d", $yr, $mo, $da, $h, $m, $s;
Using sprintf with “Money Numbers”

One popular use for sprintf is when you want to format a number with a certain number of places after the decimal point, such as when you want to show an amount of money as 2.50 and not 2.5—and certainly not as 2.49997! That’s easy to accomplish with the "%.2f" format:

my $money = sprintf "%.2f", 2.49997;

If you have a “money number” that may be large enough to need commas to show its size, you might find it handy to use a subroutine like this one:

sub big_money {
    my $number = sprintf "%.2f", shift @_;
    # Add one comma each time through the do-nothing loop 1 
    1 while $number =~ s/^(-?\d+)(\d\d\d)/$1,$2/;
    # Put the dollar sign in the right place
    $number =~ s/^(-?)/$1\$/;
    $number;
}

Advanced Sorting

You’ll tell Perl what order you want by making a sort-definition subroutine, or sort subroutine for short. The subroutine returns a coded value describing how the elements compare (like C’s qsort does, but it’s Perl’s own internal sort implementation). If $a should appear before $b in the final list, the sort subroutine returns −1 to say so. If $b should appear before $a, it returns 1.

If the order of $a and $b doesn’t matter, the subroutine returns 0. Why would it not matter? Perhaps you’re doing a case-insensitive sort and the two strings are fred and Fred. Or perhaps you’re doing a numeric sort and the two numbers are equal.

You could write a numeric sort subroutine like this:

sub by_number {
    # a sort subroutine, expect $a and $b
    if ($a < $b) { −1 } elsif ($a > $b) { 1 } else { 0 }
}

To use the sort subroutine, just put its name (without an ampersand) between the keyword sort and the list you’re sorting. This example puts a numerically sorted list of numbers into @result:

my @result = sort by_number @some_numbers;

Notice that we don’t have to do anything in the sort subroutine to declare $a and $b, and to set their values—and if we did, the subroutine wouldn’t work right. We just let Perl set up $a and $b for us, and so all we need to write is the comparison.

In fact, we can make it even simpler (and more efficient). Since this kind of three-way comparison is frequent, Perl has a convenient shortcut to use to write it. In this case, we use the spaceship operator (<=>).* This operator compares two numbers and returns −1, 0, or 1 as needed to sort them numerically. So we could have written that sort subroutine better, like this:

sub by_number { $a <=> $b }

Note*: We call it that because it looks like one of the TIE fighters from Star Wars.

Since the spaceship compares numbers, you may have guessed that there’s a corresponding three-way string-comparison operator: cmp. cmp by itself provides the same order as the default sort. You’d never need to write this subroutine, which yields merely the default sort order:

sub ASCIIbetically { $a cmp $b }
my @strings = sort ASCIIbetically @any_strings;

But you can use cmp to build a more complex sort order, like a case-insensitive sort:

sub case_insensitive { "\L$a" cmp "\L$b" }

When your sort subroutine is as simple as the ones we show here (and most of the time, it is), you can make the code even simpler yet, by replacing the name of the sort routine with the entire sort routine “inline,” like so:

my @numbers = sort { $a <=> $b } @some_numbers;

In fact, in modern Perl, you’ll hardly ever see a separate sort subroutine; you’ll frequently find sort routines written inline as we’ve done here.

Suppose you want to sort in descending numeric order. That’s easy enough to do with the help of reverse:

my @descending = reverse sort { $a <=> $b } @some_numbers;

but there is another way to get a reversed numeric sort:

my @descending = sort { $b <=> $a } @some_numbers;
Sorting a Hash by Value

Once you’ve been sorting lists happily for a while you’ll run into a situation where you want to sort a hash by value. For example, three of our characters went out bowling last night, and we’ve got their bowling scores in the following hash. We want to be able to print out the list in the proper order, with the game winner at the top, so we want to sort the hash by score:

my %score = ("barney" => 195, "fred" => 205, "dino" => 30);
my @winners = sort by_score keys %score;

and the subroutine looks like this:

sub by_score { $score{$b} <=> $score{$a} }
Sorting by Multiple Keys

We forgot to mention that there was a fourth player bowling last night with the other three, so the hash really looked like this:

my %score = (
    "barney" => 195, "fred" => 205,
    "dino" => 30, "bamm-bamm" => 195,
);

Now, as you can see, bamm-bamm has the same score as barney. So which one will be first in the sorted list of players? There’s no telling because the comparison operator (seeing the same score on both sides) will have to return zero when checking those two.

Maybe that doesn’t matter, but we generally prefer to have a well defined sort. If several players have the same score, we want them to be together in the list, of course. But within that group, the names should be in ASCIIbetical order. But how can we write the sort subroutine to say that? Again, this turns out to be pretty easy:

my @winners = sort by_score_and_name keys %score;

sub by_score_and_name {
    $score{$b} <=> $score{$a} # by descending numeric score
    or
    $a cmp $b # ASCIIbetically by name
}

发表评论

邮箱地址不会被公开。 必填项已用*标注