Search this site


Metadata

Articles

Projects

Presentations

cut and paste - Week of Unix Tools; Day 2

Intro

This week-of-unix-tools is intended to be a high concentration of information with little fluff. I'll be covering only GNU versions of the tools, for the sake of choosing only one version for sanity sake.

What is cut?

cut is a tool that lets you 'cut' out pieces of data. You can cut by field, column (character), and byte number.

Basic cut(1) usage

cut [-d delim -f range] [-c range] [-b range]
There are 3 different ways to cut. One by bytes, one by characters, and one by fields.

A range is one or more sequences, separated by commas. A sequence can be any of the following:
  • N = Select only the Nth piece (on each line)
  • N- = Select the Nth piece through end-of-line
  • N-M = Select the Nth through Mth pieces
  • -M = Select the first through Mth pieces
field cutting - get me the 1st and 3rd fields delimited by comma
% echo "one,two,three,four" | cut -d"," -f 1,3
one,three

# output space-delimited from comma-separated input (GNU only)
% echo "one,two,three,four" | cut -d"," -f 1,3 --output-delimiter=" "
one three
 
character cutting - output everything except the first character of every line
% seq 15 19 | cut -c 2-
5
6
7
8
9
 

When to use cut

Cut provides features easily done in sed and awk. Why would you use it instead of sed or awk?

Simplicity of statement.

Example: Let's print the 1st and 7th fields from /etc/passwd:
% grep '^root' /etc/passwd | cut -d: -f1,7
root:/bin/sh
In the above invocation, it is very clear to the reader that you want the 1st and 7th field. Yes, this in awk would also be simple, but if you don't know awk syntax, then awk might be harder for you to write and/or read.

Use the tool that best allows you concisely and unobfuscatingly describe what you want to do. Cut often lets you do that. However, there are a few cut-like things that you can't do in cut because of the way it determines fields.

When not to use cut

If your input will have multiple instances of the delimiter in a row, cut won't do it the way you might think: delimiters are single characters in cut. Multiple delimiters in a row are not handled in any special way as awk handles whitespace. Consider this simple example:
% echo "one    two     three" | cut -d' ' -f 2

% echo "one    two     three" | awk '{print $2}'
two
According to cut, field 2 in the above example is an empty string because it occurs between the first and second space (delimiter). Not what we wanted. Keep this behavior in mind.

What is paste?

Think of it as a horizontal version of cat(1). It will join lines by delimiters from different file inputs.

Basic paste(1) usage

 paste [-s] [-d delimiter_list] [input1 input2 input3 ...] 
Paste reads a line from each input, in order, and prints them without newlines. After one line has been read from each input, a newline is printed. Optional '-d' will let you specify a list of delimiters that are to be used to separate each input file. The default is separation with tab characters.

'-s' is a neat little flag, that supresses newline output entirely. The effect is that all lines of all inputs are concatonated on one single line of output. Very similar to "tr '\n' '*delimiter*'" except there's no trailing delimiter. Useful!

The same input file can be specified multiple times, which gives you some neat effects.

Why is it useful?

I rarely use paste, but what it does is quite useful.
Join input lines in triplets
% seq 9 | paste - - -
1       2       3
4       5       6
7       8       9
Print line numbers
% FILE="/etc/hosts"
% seq `wc -l < $FILE` | paste - $FILE | head -3
1       # /etc/hosts
2       #
3       # This file describes a number of hostname-to-address
List of users on a system
% cut -d: -f1 /etc/passwd | paste -d, -s -
root,bin,daemon,adm,lp,sync,shutdown,halt,mail,news,uucp,operator

Conclusion

Cut and paste are somewhat niche tools, but keep them in your toolbelt because of the functionality they provide. Sometimes it's much simpler to use cut or paste instead of another tool.