Advanced sort (sorting by columns)

I was familiar of the general sort command and its basic options like sort -r for reverse sort and sort -n for numerical sort, but I did not know about the powerful -k option that allows sort to be used for sorting by multiple columns

The online documentation is poorly written for this option. If you want to use this command to sort by multiple Let’s use the example data below:

cat t.txt

comprehensive:test:25:abc
first:line:5:abc
random:stuff:33:def
good:stuff:5:efg
good:boy:25:def
random:test:111:efg

Here we sort on column 2, and it seems to work correctly:

# sort -t: -k2  <t.txt
good:boy:25:def
first:line:5:abc
random:stuff:33:def
good:stuff:5:efg
random:test:111:efg
comprehensive:test:25:abc

the lines with the same keys are just in a random order (we is not true, continue reading)

To make a specific order we can try to add secondary ordering on column 1 like that:

# sort -t: -k2 -k1 <t.txt
good:boy:25:def
first:line:5:abc
random:stuff:33:def
good:stuff:5:efg
random:test:111:efg
comprehensive:test:25:abc

It appears that the secondary sort order did not work. Why? In fact the syntax -k2 means to select from the 2nd column to the end of line. Sorting takes the rest of the data on the line, so test:111... comes before test:25..., and the secondary sort order not used. The correct syntax to search only on the specific colum is to specify the same column as the ending column like in the example below:

# sort -t: -k2,2 -k1,1 <t.txt
good:boy:25:def
first:line:5:abc
good:stuff:5:efg
random:stuff:33:def
comprehensive:test:25:abc
random:test:111:efg

That makes the multi-column search behave in the expected way.

Below are a few more examples

Sorting in reverse order on one column, and natural order of the other

sort -t: -k2,2r -k1,1 <t.txt

comprehensive:test:25:abc
random:test:111:efg
good:stuff:5:efg
random:stuff:33:def
first:line:5:abc
good:boy:25:def

Using numerical order

# sort -t: -k3,3rn -k1,1 <t.txt
random:test:111:efg
random:stuff:33:def
comprehensive:test:25:abc
good:boy:25:def
first:line:5:abc
good:stuff:5:efg

Sorting by the 2nd and 3rd character of the column 1

# sort -t: -k1.2,1.3 <t.txt
random:stuff:33:def
random:test:111:efg
first:line:5:abc
comprehensive:test:25:abc
good:boy:25:def
good:stuff:5:efg

A useful option is --debug, which allows you to see exactly what is selected for sorting:

# sort -t: -k1.2,1.3 -k2,2 --debug <t.txt
sort: using �en_US.UTF-8� sorting rules
random:stuff:33:def
 __
       _____
___________________
random:test:111:efg
 __
       ____
___________________
first:line:5:abc
 __
      ____
________________
comprehensive:test:25:abc
 __
              ____
_________________________
good:boy:25:def
 __
     ___
_______________
good:stuff:5:efg
 __
     _____
________________

Here you see the underlines showing the used characters for each criteria. Please note the long line of underlines which suggest that the whole line is still used as the very last criteria to sort the line. You can remove that with -s option like that:

# sort -s -t: -k1.2,1.3 -k2,2 --debug <t.txt
sort: using �en_US.UTF-8� sorting rules
random:stuff:33:def
 __
       _____
random:test:111:efg
 __
       ____
first:line:5:abc
 __
      ____
comprehensive:test:25:abc
 __
              ____
good:boy:25:def
 __
     ___
good:stuff:5:efg
 __
     _____

In some case if your columns contain spaces for alignment, another useful option is -b. For example, let’s say you have the following CSV data:

# cat grades.csv
"Last name", "First name", "SSN",        "Test1", "Test2", "Test3", "Test4", "Final", "Grade"
"Alfalfa",   "Aloysius",   "123-45-6789", 40.0,    90.0,   100.0,    83.0,    49.0,   "D-"
"Alfred",    "University", "123-12-1234", 41.0,    97.0,    96.0,    97.0,    48.0,   "D+"
"Gerty",     "Gramma",     "567-89-0123", 41.0,    80.0,    60.0,    40.0,    44.0,   "C"
"Android",   "Electric",   "087-65-4321", 42.0,    23.0,    36.0,    45.0,    47.0,   "B-"
"Bumpkin",   "Fred",       "456-78-9012", 43.0,    78.0,    88.0,    77.0,    45.0,   "A-"
"Rubble",    "Betty",      "234-56-7890", 44.0,    90.0,    80.0,    90.0,    46.0,   "C-"
"Noshow",    "Cecil",      "345-67-8901", 45.0,    11.0,    -1.0,     4.0,    43.0,   "F"
"Buff",      "Bif",        "632-79-9939", 46.0,    20.0,    30.0,    40.0,    50.0,   "B+"
"Airpump",   "Andrew",     "223-45-6789", 49.0      1.0,    90.0,   100.0,    83.0,   "A"
"Backus",    "Jim",        "143-12-1234", 48.0,     1.0,    97.0,    96.0,    97.0,   "A+"
"Carnivore", "Art",        "565-89-0123", 44.0,     1.0,    80.0,    60.0,    40.0,   "D+"
"Dandy",     "Jim",        "087-75-4321", 47.0,     1.0,    23.0,    36.0,    45.0,   "C+"
"Elephant",  "Ima",        "456-71-9012", 45.0,     1.0,    78.0,    88.0,    77.0,   "B-"
"Franklin",  "Benny",      "234-56-2890", 50.0,     1.0,    90.0,    80.0,    90.0,   "B-"
"George",    "Boy",        "345-67-3901", 40.0,     1.0,    11.0,    -1.0,     4.0,   "B"
"Heffalump", "Harvey",     "632-79-9439", 30.0,     1.0,    20.0,    30.0,    40.0,   "C"

You want to sort by the two middle character of SSN:

# sort -b -s -t, -k3.6,3 --debug <grades.csv
sort: using �en_US.UTF-8� sorting rules
"Last name", "First name", "SSN",        "Test1", "Test2", "Test3", "Test4", "Final", "Grade"
                                ^ no match for key
"Alfred",    "University", "123-12-1234", 41.0,    97.0,    96.0,    97.0,    48.0,   "D+"
                                ________
"Backus",    "Jim",        "143-12-1234", 48.0,     1.0,    97.0,    96.0,    97.0,   "A+"
                                ________
"Alfalfa",   "Aloysius",   "123-45-6789", 40.0,    90.0,   100.0,    83.0,    49.0,   "D-"
                                ________
"Airpump",   "Andrew",     "223-45-6789", 49.0      1.0,    90.0,   100.0,    83.0,   "A"
                                ________
"Franklin",  "Benny",      "234-56-2890", 50.0,     1.0,    90.0,    80.0,    90.0,   "B-"
                                ________
"Rubble",    "Betty",      "234-56-7890", 44.0,    90.0,    80.0,    90.0,    46.0,   "C-"
                                ________
"Android",   "Electric",   "087-65-4321", 42.0,    23.0,    36.0,    45.0,    47.0,   "B-"
                                ________
"George",    "Boy",        "345-67-3901", 40.0,     1.0,    11.0,    -1.0,     4.0,   "B"
                                ________
"Noshow",    "Cecil",      "345-67-8901", 45.0,    11.0,    -1.0,     4.0,    43.0,   "F"
                                ________
"Elephant",  "Ima",        "456-71-9012", 45.0,     1.0,    78.0,    88.0,    77.0,   "B-"
                                ________
"Dandy",     "Jim",        "087-75-4321", 47.0,     1.0,    23.0,    36.0,    45.0,   "C+"
                                ________
"Bumpkin",   "Fred",       "456-78-9012", 43.0,    78.0,    88.0,    77.0,    45.0,   "A-"
                                ________
"Heffalump", "Harvey",     "632-79-9439", 30.0,     1.0,    20.0,    30.0,    40.0,   "C"
                                ________
"Buff",      "Bif",        "632-79-9939", 46.0,    20.0,    30.0,    40.0,    50.0,   "B+"
                                ________
"Gerty",     "Gramma",     "567-89-0123", 41.0,    80.0,    60.0,    40.0,    44.0,   "C"
                                ________
"Carnivore", "Art",        "565-89-0123", 44.0,     1.0,    80.0,    60.0,    40.0,   "D+"
                                ________
             ____________

Note that without the option -b the results are wrong because of the different number of space padding:

# sort -s -t, -k3.6,3 --debug <grades.csv
sort: using �en_US.UTF-8� sorting rules
"Last name", "First name", "SSN",        "Test1", "Test2", "Test3", "Test4", "Final", "Grade"
                               _
"Dandy",     "Jim",        "087-75-4321", 47.0,     1.0,    23.0,    36.0,    45.0,   "C+"
                        ________________
"Alfred",    "University", "123-12-1234", 41.0,    97.0,    96.0,    97.0,    48.0,   "D+"
                               _________
"Backus",    "Jim",        "143-12-1234", 48.0,     1.0,    97.0,    96.0,    97.0,   "A+"
                        ________________
"Airpump",   "Andrew",     "223-45-6789", 49.0      1.0,    90.0,   100.0,    83.0,   "A"
                           _____________
"Franklin",  "Benny",      "234-56-2890", 50.0,     1.0,    90.0,    80.0,    90.0,   "B-"
                          ______________
"Alfalfa",   "Aloysius",   "123-45-6789", 40.0,    90.0,   100.0,    83.0,    49.0,   "D-"
                             ___________
"Rubble",    "Betty",      "234-56-7890", 44.0,    90.0,    80.0,    90.0,    46.0,   "C-"
                          ______________
"George",    "Boy",        "345-67-3901", 40.0,     1.0,    11.0,    -1.0,     4.0,   "B"
                        ________________
"Noshow",    "Cecil",      "345-67-8901", 45.0,    11.0,    -1.0,     4.0,    43.0,   "F"
                          ______________
"Elephant",  "Ima",        "456-71-9012", 45.0,     1.0,    78.0,    88.0,    77.0,   "B-"
                        ________________
"Bumpkin",   "Fred",       "456-78-9012", 43.0,    78.0,    88.0,    77.0,    45.0,   "A-"
                         _______________
"Carnivore", "Art",        "565-89-0123", 44.0,     1.0,    80.0,    60.0,    40.0,   "D+"
                        ________________
"Gerty",     "Gramma",     "567-89-0123", 41.0,    80.0,    60.0,    40.0,    44.0,   "C"
                           _____________
"Heffalump", "Harvey",     "632-79-9439", 30.0,     1.0,    20.0,    30.0,    40.0,   "C"
                           _____________
"Buff",      "Bif",        "632-79-9939", 46.0,    20.0,    30.0,    40.0,    50.0,   "B+"
                        ________________
"Android",   "Electric",   "087-65-4321", 42.0,    23.0,    36.0,    45.0,    47.0,   "B-"
                             ___________

Table of Contents