Complex and useful BASH one-liners. Here's how to:
- Sort a file based on line length
- Determine the longest line in a text file
Believe me, there are some real-world applications where this is required, or else I would not be telling you about it!
Sort a file based on line length
This takes the input file, logfile.txt, and sorts it in ascending line length.
awk '{ print length(), $0 }' logfile.txt | sort -n | sed -e 's/^[0-9]*\s//'
The output is lines of increasing length:
2012.02.19 17:15:03 INFO: Cleaning up...
2012.02.19 17:14:38 INFO: Committing package
2012.02.19 17:14:38 INFO: Make file pkg_af-ZA.xml
2012.02.19 17:14:37 INFO: Make file site/LINGO.xml
2012.02.19 17:14:38 INFO: Make file site_af-ZA.zip
etc...
How does this work?
The first part of the command is a simple awk script,
awk '{ print length(), $0}' logfile.txt
that pre-pends the line length to each line in the example file to get this:
92 2012.02.19 17:14:38 INFO: Create file list in site/install.xml file for 'site' language pack
82 2012.02.19 17:14:38 DEBUG: MkXMLInstallFileList site - filename: site/install.xml
81 2012.02.19 17:14:38 INFO: Create site/install.xml header for 'site' language pack
84 2012.02.19 17:14:38 DEBUG: MkXMLInstallFileFooter site - filename: site/install.xml
49 2012.02.19 17:14:38 INFO: Make file pkg_af-ZA.xml
etc...
Next, numerically sort the lines on the first incidence of a number found in each line, by piping it through sort -n:
awk '{ print length(), $0 }' logfile.txt | sort -n
40 2012.02.19 17:15:03 INFO: Cleaning up...
45 2012.02.19 17:14:38 INFO: Committing package
49 2012.02.19 17:14:38 INFO: Make file pkg_af-ZA.xml
50 2012.02.19 17:14:37 INFO: Make file site/LINGO.xml
50 2012.02.19 17:14:38 INFO: Make file site_af-ZA.zip
51 2012.02.19 17:14:37 INFO: Make file admin/LINGO.xml
The lines are now sorted on the left-hand-side numerical column. All that remains is to remove the numbers by piping it through the stream-editor, sed, which uses a regular expresssion to remove the first characters from the beginning of a line, until the first white-space is found:
awk '{ print length(), $0 }' logfile.txt | sort -n | sed -e 's/^[0-9]*\s//'
Determine the longest line in a text file
Similar to the above, but with a twist - this prints out the longest line in the file, my_text_file:
$ awk '{ print length(), $0 | "sort -nr" }' my_text_file | sed -e 's/^[0-9]*\s*//' | head -1
How does it work?
The first part of the command is a simple awk script,
awk '{ print length(), $0}' my_text_file
that pre-pends the line length to each line in the example file. The example file contains the credits for an open source piece of work, and looks like this with line-lengths pre-pended to each line:
52 Al Aab # founder of "seders" list
35 Edgar Allen # various
35 Yiorgos Adamopoulos # various
49 Dale Dougherty # author of "sed & awk"
54 Carlos Duarte # author of "do it with sed"
51 Eric Pement # author of this document
51 Ken Pizzini # author of GNU sed v3.02
48 S.G. Ravenhall # great de-html script
58 Greg Ubben # many contributions & much help
Adding the shell sort function to the awk script,
$ awk '{ print length(), $0 | "sort -nr" }' my_text_file
reverse-sorts the lines based on the first number encountered in each string (the -nr bit is important here!):
58 Greg Ubben # many contributions & much help
54 Carlos Duarte # author of "do it with sed"
52 Al Aab # founder of "seders" list
51 Ken Pizzini # author of GNU sed v3.02
51 Eric Pement # author of this document
49 Dale Dougherty # author of "sed & awk"
48 S.G. Ravenhall # great de-html script
35 Yiorgos Adamopoulos # various
35 Edgar Allen # various
All we need to do now is get rid of the leading numbers in each line by piping all this through a regular expression in sed:
$ awk '{ print length(), $0 | "sort -nr" }' my_text_file | sed -e 's/^[0-9]*\s*//'
to get something resembling the original file, but sorted from longest line to shortest line:
Greg Ubben # many contributions & much help
Carlos Duarte # author of "do it with sed"
Al Aab # founder of "seders" list
Ken Pizzini # author of GNU sed v3.02
Eric Pement # author of this document
Dale Dougherty # author of "sed & awk"
S.G. Ravenhall # great de-html script
Yiorgos Adamopoulos # various
Edgar Allen # various
Since we are only interested in one line - the top line of our stream, we pipe it into head, specifying that we only want 1 line from the head of the stream (the -1 bit):
$ awk '{ print length(), $0 | "sort -nr" }' my_text_file | sed -e 's/^[0-9]*\s*//' | head -1
This gives the longest line in the file and will be printed in the BASH shell:
Greg Ubben # many contributions & much help