st
==

simple statistics from the command line interface (CLI)

### Description

Imagine you have this sample file:

    $ cat numbers.txt
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10

How do you calculate the sum of the numbers?

#### The traditional way

If you ask around, you'll come up with suggestions like these:

    $ awk '{s+=$1} END {print s}' numbers.txt
    55

    $ perl -lne '$x += $_; END { print $x; }' numbers.txt
    55

    $ sum=0; while read num ; do sum=$(($sum + $num)); done < numbers.txt ; echo $sum
    55

    $ paste -sd+ numbers.txt | bc
    55

Now imagine that you need to calculate the arithmetic mean, median,
or standard deviation...


#### Using st

"st" is a command-line tool to calculate simple statistics from a
file or standard input.

Let's start with "sum":

    $ st --sum numbers.txt
    55

That was easy!

How about mean and standard deviation?

    $ st --mean --stddev numbers.txt
    mean  stddev
    5.5   3.02765

If you don't specify any options, you'll get this output:

    $ st numbers.txt
    N     min   max   sum   mean  stddev
    10    1     10    55    5.5   3.02765

You can switch rows and columns using the "--transpose-output" option:

    $ st --transpose-output numbers.txt
    N       10
    min     1
    max     10
    sum     55
    mean    5.5
    stddev  3.02765

The "--summary" option will provide the five-number summary:

    $ st --summary numbers.txt
    min   q1    median  q3    max
    1     3.5   5.5     7.5   10

And "--complete" will print a complete description:

    $ st --complete numbers.txt
    N   min   q1    median  q3    max   sum   mean  stddev  stderr
    10  1     3.5   5.5     7.5   10    55    5.5   3.02765 0.957427

#### How does it compare with R, Octave and other analytical tools?

"R" and Octave are integrated suites for data manipulation, calculation
and graphical display.

They provide high-level interpreted languages, capabilities for the
numerical solution of linear and nonlinear problems, and for
performing other numerical experiments, including statistical tests,
classification, clustering, etc.

"st" is a simpler solution for simpler problems, focused on descriptive
statistics for small datasets, handy when you need quick results
without leaving the shell.


### Usage

    st [options] [file]

#### Options

##### Functions

    --N|n|count
    --mean|avg|m
    --stddev|sd
    --stderr|sem|se
    --sum|s
    --var|variance

    --min
    --q1
    --median
    --q3
    --max

    --percentile=<0..1>
    --quartile=<1..4>

If no functions are selected, "st" will print the default output:

    N     min  max  sum  mean  stddev

You can also use the following predefined sets of functions:

    --summary   # five-number summary (min q1 median q3 max)
    --complete  # everything

##### Formatting

    --format|fmt|f=<value>  # default: "%g"
    --delimiter|d=<value>   # default: "\t"

    --no-header|nh          # don't display header
    --transpose-output|to   # switch rows and columns

Examples of valid formats ("--format" option):

        %d    signed integer, in decimal
        %e    floating-point number, in scientific notation
        %f    floating-point number, in fixed decimal notation
        %g    floating-point number, in %e or %f notation

##### Input validation

By default, "st" skips invalid input with a warning.

You can change this behavior with the following options:

    --strict   # throws an error, interrupting process
    --quiet|q  # no warning

### Author

Nelson Ferraz <<nferraz@gmail.com>>

### Contribute

Send comments, suggestions and bug reports to:

https://github.com/nferraz/st/issues

Or fork the code on github:

https://github.com/nferraz/st