Search this site


Metadata

Articles

Projects

Presentations

data sources - Week of Unix Tools; Day 4

Intro

This week-of-unix-tools is intended to be a high concentration of information with little fluff. I'll be covering only GNU versions of the tools, for the sake of choosing only one version for sanity sake.

Data, where are you?

Data comes from lots of places. Loosely categorizing, they come from 3 places:
  1. Files and devices
  2. Output of other tools
  3. The network (via other tools)

cat

Cat means 'concatonate'. It is mostly useful for doing a few things:
  • Cat lots of files together; eg 'cat *.c' for processing by another tool, or generally glueing data sets (from files) together.
  • Make a shell script more readable by making the input more obvious

nc

Netcat. Basically gives you the ability to talk tcp and udp from the shell. You can send data using standard input, and receive data from standard output. Simple.
tcp client (connect to google.com port 80)
nc google.com 80
tcp server (listen on port 8080)
nc -l 8080
udp client (connect to ns1.slashdot.org port 53)
nc -u ns1.slashdot.org 53
udp server (listen on port 5353)
nc -l -u 5353
Examples:
Basic HTTP request
% echo "GET / HTTP/1.0\n" | nc google.com 80 | head -1
HTTP/1.0 200 OK

openssl

openssl is a command that any unix-like system will probably have installed. The command itself can do many many things, but for this article I'll only cover the s_client command.

'openssl s_client' is essentially 'netcat + ssl'. This tool is extremely useful for debugging text-based protocols behind SSL such as ssl'd nntp, imaps, and https.

Example:
Open an https connection to addons.mozilla.org
% echo "GET / HTTP/1.0\r\n\r\n" \
| openssl s_client -quiet -connect addons.mozilla.org:443 \
| col \
| sed -e '/^$/q'
depth=3 /C=BE/O=GlobalSign nv-sa/OU=Root CA/CN=GlobalSign Root CA
verify error:num=19:self signed certificate in certificate chain
verify return:0
HTTP/1.1 302 Found
Date: Fri, 25 May 2007 10:07:25 GMT
Server: Apache/2.0.52 (Red Hat)
Location: http://www.mozilla.com/
Content-Length: 293
Keep-Alive: timeout=300, max=1000
Connection: Keep-Alive
Content-Type: text/html; charset=iso-8859-1
* The 'col' command will strip the \r (carriage return) characters from the http response, allowing sed's /^$/ to match an empty line (end of headers).

GET/curl/wget/fetch

You can query webservers (http) with any number of tools and you'll get the raw source or data for any page you query. This is really useful.
  • GET, POST, lwp-request, et al. Comes with libwww-perl
  • curl
  • wget
  • fetch (FreeBSD)
Most of the time I need to fetch pages to stdout, I use GET, becuase it's less typing. Here's some examples of the above commands:
Fetch / from www.w3schools.com and output page to stdout
  • GET http://www.w3schools.com/
  • wget -O - -q http://www.w3schools.com/
  • fetch -o -q http://www.w3schools.com/
  • curl http://www.w3schools.com/

w3m/lynx

But what if you don't want the raw html from a webpage? You can have w3m and lynx do some basic rendering for you, also to stdout. I recommend w3m instead of lynx, but use whatever.
  • w3m -dump http://www.google.com/
  • lynx -dump http://www.google.com/
w3m's output looks like this.

ssh

ssh can be a data source too. Run a command on 1000 machines and process the output locally, for fun and profit.

Login to N systems and get uptime. Prefix output with the hostname
% echo "fury\ntempest" \
| xargs -n1 -i@ sh -c 'ssh @ "uptime" | sed -e "s/^/@/"'
fury  6:18am  up  2:25,  1 user,  load average: 0.06, 0.04, 0.04
tempest 06:18:00 up  9:01,  2 users,  load average: 0.12, 0.09, 0.09
 
Combining xargs and ssh gives you a powerful ability to execute commands on multiple machines easily, even in parallel.