Search this site


Metadata

Articles

Projects

Presentations

sed - Week of Unix Tools; Day 1

Intro

I think it's fair to say that not enough people know sed. Mostly, because it's probably scary. This week-of-unix-tools is intended to be a high concentration of information with little fluff. I'll be covering only GNU versions of the tools, for the sake of choosing only one version for sanity sake.

What is sed?

Sed is short for 'stream editor' and basically lets you do lots of things to streams of text.

Basic usage and Invocation

sed [-lrn] [-e 'sedscript'] [file1 file2 ...]
-l means line buffered (ie; flush output every line), -r means use extended regex, -n silences default output, and -e should be self explanatory. There are other flags (such as -f) but I never use them. Seek the man page for more information.

If you've ever seen the perlism s/foo/bar/, that came from sed. Sed is basically a string processing language. The language consists of a very small grammar, but is still very powerful. Here are some examples:

Simple text replacement.
% echo "Hello there foo" |  sed -e 's/foo/bar/'
Hello there bar
Grep-like behavior.
% sed -ne '/FreeBSD/p' /etc/motd
FreeBSD 6.2-PRERELEASE (FOO) #0: Sat Nov 11 00:12:52 EST 2006
Welcome to FreeBSD!
Grep '-v' like behavior
% echo "foo\nbar\nbaz\nfoobar" | sed -ne '/foo/!p'r
bar
baz

Backreferences

Backreferences are using a captured group's matched value later in your pattern. You group regexp patterns with parenthesis, but in non-extended mode (ie; without -r), you must escape your parentheses. Example:
% echo "hello world" | sed -e 's/\([a-z]*\) world/\1 sed/'
hello sed

# Now with -r (or -E on FreeBSD and OS X):
% echo "hello world" | sed -r -e 's/([a-z]*) world/\1 sed/'
hello sed
There is a special "reference" when using substitution (s///). Ampersand (&). This will expand to the entire matched pattern:
% echo "hello world" | sed -e 's/.*/I say, "&"/'
I say, "hello world"

Syntax and Functions

Sed syntax is pretty straight forward. A general expression will look like this:

address[,address]function

That's it. Expressions are separated by newlines or semicolons.

What is a address?

A address is a way to indicate a location in your data stream. An address can be any of:
  1. A line number (eg 1). The first line is '1'
  2. A regexp match expression, such as /foo/.
  3. The literal '$', which means 'last line of file'
  4. Nothing at all, which means "every line in the file"
If you specify two addresses, it means "inclusive" of the first and last address, and includes all lines in between. After the last address is hit, the first address is searched for again further down the file. More on this later.

What are functions?

Functions are always one-letter in sed. The useful ones (to me) are:
  • p (print)
  • s (substitute)
  • d (delete)
  • x (swap pattern and hold buffer)
  • h and H (copy and append to hold buffer)
  • ! (apply the next function against lines not matched)

What can I do with sed?

Print the first line of input (same as head -n 1)
 sed -ne 1p 
Print everything *except* the first line
sed -ne '1!p' # print everything not on the first line
or
sed -e '1d'   # delete the first line
              # default action is to print, so everything else is printed
Print the first non-whitespace, non-comment line in httpd.conf
sed -ne '/^[^# ]/{p;q;}' httpd.conf
or
sed -ne '/^#/! { /^ *$/! { p;q; }; }' httpd.conf
Show only 'Received:' headers in a mail
% cat mymail \
  | sed -ne '/^[A-Za-z0-9]/ { x; /^Received: /{p;}; }; /^[A-Za-z0-9]/!H' 
Received: from localhost (localhost [127.0.0.1])
        by whitefox.csh.rit.edu (Postfix) with ESMTP id 731F81145C
        for <email-snipped>; Sat, 19 May 2007 01:19:30 -0400 (EDT)
Received: from whitefox.csh.rit.edu ([127.0.0.1])
        by localhost (whitefox.csh.rit.edu [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id EURHKUeHSrao for <email-snipped>;
        Sat, 19 May 2007 01:19:16 -0400 (EDT)
... etc ...
  
Noisey code, eh? Gets the job done though. There are two checks here. The first pattern checks to see if the line starts with a letter or number, if so, it swaps to the "hold" buffer and checks if it starts with 'Received:' and prints if it does. The side effect is that the current input line is now in the hold buffer and the old header "line" is in the pattern space, which we discard. After that, we check if the line does *not* start with a letter or number, in which case we append the input (aka pattern space) to the hold space.

Basically, we build the current header (which can be multiple lines) in the hold buffer until the next header happens.
Output a file, but color matched patterns.
# The '^[' below are raw escape characters, entered at the shell 
# with CTRL+V and hitting escape.
% dmesg | sed -e 's/ath0/^[[33m&^[[0m/g'

Use sed to make a 'section grep' tool

You can use sed to "grep" paragraphs of data using similar techniques to the above mail header example. This script will let you 'grep' whole paragraphs (empty-line-delimited).
#!/bin/sh

if [ $# -eq 0 -o "${1:-}" = "-h" ] ; then
  echo "usage: $0 [-v] pattern [files]"
  return 1
fi

func='!d'
if [ "$1" = "-v" ]; then
  # support '-v' like 'grep -v' 
  func='d'
  shift
fi

pattern="$1"
shift

sed -ure '/./{H;$!d;}; '"x;/${pattern}/$func;" $1
Call this 'sgrep.sh', put it somewhere, and make it executable. Let's use it to find anything with 'Delete' and 'cycle' in FreeBSD's sed manpage :
% man sed | ./sgrep.sh 'Delete .* cycle' 

     [2addr]d
             Delete the pattern space and start the next cycle.

     [2addr]D
             Delete the initial segment of the pattern space through the first
             newline character and start the next cycle.

Bonus notes

  • The 's' function has a 'p' flag, which prints only if a substitution was made.
    # this:
    sed -ne '/foo/ { s/foo/bar/; p }'
    
    # is the same as
    sed -ne 's/foo/bar/p'
    
  • You can insert data into the hold space (or the pattern space) if you really want:
    # Print 'Hello there' before the second line
    % echo "one\ntwo\nthree" | sed -e '2 { x; s/.*/Hello there/; p; x; }'
    one
    Hello there
    two
    three
    

Ok, now what?

Given your choice of filter tools, sed is an extremely useful one that often allows you to describe what you want to do with your text in a shorter, simpler form than awk or perl can offer you. If you wish to venture down the path of unix ninja, then sed should be on your list of commands to understand.

Want to really make your eyes hurt? Check out this calculator written entirely in sed.