HOEKSTRA.CO.UK

Complex and useful BASH one-liners. Here's how to:

  • Sort a file based on line length
  • Determine the longest line in a text file

Believe me, there are some real-world applications where this is required, or else I would not be telling you about it!

Sort a file based on line length

This takes the input file, logfile.txt, and sorts it in ascending line length.

awk '{ print length(), $0 }'  logfile.txt | sort -n | sed -e 's/^[0-9]*\s//'

The output is lines of increasing length:

2012.02.19 17:15:03 INFO: Cleaning up...
2012.02.19 17:14:38 INFO: Committing package
2012.02.19 17:14:38 INFO: Make file pkg_af-ZA.xml
2012.02.19 17:14:37 INFO: Make file site/LINGO.xml
2012.02.19 17:14:38 INFO: Make file site_af-ZA.zip
etc...

How does this work?

The first part of the command is a simple awk script,

awk '{ print length(), $0}' logfile.txt

that pre-pends the line length to each line in the example file to get this:

92 2012.02.19 17:14:38 INFO: Create file list in site/install.xml file for 'site' language pack
82 2012.02.19 17:14:38 DEBUG: MkXMLInstallFileList site - filename: site/install.xml
81 2012.02.19 17:14:38 INFO: Create site/install.xml header for 'site' language pack
84 2012.02.19 17:14:38 DEBUG: MkXMLInstallFileFooter site - filename: site/install.xml
49 2012.02.19 17:14:38 INFO: Make file pkg_af-ZA.xml
etc...

Next, numerically sort the lines on the first incidence of a number found in each line, by piping it through sort -n

awk '{ print length(), $0 }'  logfile.txt | sort -n 
40 2012.02.19 17:15:03 INFO: Cleaning up...
45 2012.02.19 17:14:38 INFO: Committing package
49 2012.02.19 17:14:38 INFO: Make file pkg_af-ZA.xml
50 2012.02.19 17:14:37 INFO: Make file site/LINGO.xml
50 2012.02.19 17:14:38 INFO: Make file site_af-ZA.zip
51 2012.02.19 17:14:37 INFO: Make file admin/LINGO.xml

The lines are now sorted on the left-hand-side  numerical column. All that remains is to remove the numbers by piping it through the stream-editor, sed, which uses a regular expresssion to remove the first characters from the beginning of a line, until the first white-space is found:

awk '{ print length(), $0 }'  logfile.txt | sort -n | sed -e 's/^[0-9]*\s//'

Determine the longest line in a text file

Similar to the above, but with a twist - this prints out the longest line in the file, my_text_file:

$  awk '{ print length(), $0 | "sort -nr" }' my_text_file | sed -e 's/^[0-9]*\s*//' | head -1

How does it work?

The first part of the command is a simple awk script,

awk '{ print length(), $0}' my_text_file

that pre-pends the line length to each line in the example file. The example file contains the credits for an open source piece of work, and looks like this with line-lengths pre-pended to each line:

52  Al Aab                   # founder of "seders" list
35  Edgar Allen              # various
35  Yiorgos Adamopoulos      # various
49  Dale Dougherty           # author of "sed & awk"
54  Carlos Duarte            # author of "do it with sed"
51  Eric Pement              # author of this document
51  Ken Pizzini              # author of GNU sed v3.02
48  S.G. Ravenhall           # great de-html script
58  Greg Ubben               # many contributions & much help

Adding the shell sort function to the awk script,

$ awk '{ print length(), $0 | "sort -nr" }' my_text_file

reverse-sorts the lines based on the first number encountered in each string (the -nr bit is important here!):

58  Greg Ubben               # many contributions & much help
54  Carlos Duarte            # author of "do it with sed"
52  Al Aab                   # founder of "seders" list
51  Ken Pizzini              # author of GNU sed v3.02
51  Eric Pement              # author of this document
49  Dale Dougherty           # author of "sed & awk"
48  S.G. Ravenhall           # great de-html script
35  Yiorgos Adamopoulos      # various
35  Edgar Allen              # various

All we need to do now is get rid of the leading numbers in each line by piping all this through a regular expression in  sed:

$  awk '{ print length(), $0 | "sort -nr" }' my_text_file | sed -e 's/^[0-9]*\s*//' 

to get something resembling the original file, but sorted from longest line to shortest line:

Greg Ubben               # many contributions & much help
Carlos Duarte            # author of "do it with sed"
Al Aab                   # founder of "seders" list
Ken Pizzini              # author of GNU sed v3.02
Eric Pement              # author of this document
Dale Dougherty           # author of "sed & awk"
S.G. Ravenhall           # great de-html script
Yiorgos Adamopoulos      # various
Edgar Allen              # various

Since we are only interested in one line - the top line of our stream, we pipe it into head, specifying that we only want 1 line from the head of the stream (the -1 bit):

$  awk '{ print length(), $0 | "sort -nr" }' my_text_file | sed -e 's/^[0-9]*\s*//' | head -1

This gives the longest line in the file and will be printed in the BASH shell:

Greg Ubben               # many contributions & much help