The Linux SCSI Generic (sg) HOWTO

Douglas Gilbert

      dgilbert@interlog.com
     

Copyright © 2001, 2002 by Douglas Gilbert

2002-05-03
Revision History                                                             
Revision 1.2           2002-05-03            Revised by: dpg                 
ENOMEM, EPERM; DRIVER_SENSE->CHECK_CONDITION                                 
Revision 1.1           2002-01-26            Revised by: dpg                 
corrections, host_status, odd dxfer_len                                      
Revision 1.0           2001-12-21            Revised by: dpg                 
original, displace SCSI-PROGRAMMING-HOWTO                                    


This HOWTO describes the SCSI Generic driver (sg) found in the Linux 2.4
production series of kernels. It focuses on the the interface and
characteristics of the driver that application writers may need to know. The
driver's theory of operations is covered and some brief examples are
included.

Permission is granted to copy, distribute and/or modify this document under
the terms of the GNU Free Documentation License, Version 1.1 or any later
version published by the Free Software Foundation; with no Invariant
Sections, with no Front-Cover Texts, and with no Back-Cover Texts.

For an online copy of the license see www.fsf.org/copyleft/fdl.html.

-----------------------------------------------------------------------------
Table of Contents
1. Introduction
2. What the sg driver does
3. Identifying the version of the SG driver
4. Interface
5. Theory of operation
6. The sg_io_hdr_t structure in detail
    6.1. interface_id
    6.2. dxfer_direction
    6.3. cmd_len
    6.4. mx_sb_len
    6.5. iovec_count
    6.6. dxfer_len
    6.7. dxferp
    6.8. cmdp
    6.9. sbp
    6.10. timeout
    6.11. flags
    6.12. pack_id
    6.13. usr_ptr
    6.14. status
    6.15. masked_status
    6.16. msg_status
    6.17. sb_len_wr
    6.18. host_status
    6.19. driver_status
    6.20. resid
    6.21. duration
    6.22. info
   
   
7. System calls
    7.1. open()
    7.2. write()
    7.3. read()
    7.4. poll()
    7.5. close()
    7.6. mmap()
    7.7. fcntl(sg_fd, F_SETFL, oflags | FASYNC)
    7.8. Errors reported in errno
   
   
8. Ioctl()s
    8.1. SG_IO
    8.2. SG_GET_ACCESS_COUNT
    8.3. SG_SET_COMMAND_Q (and _GET_)
    8.4. SG_SET_DEBUG
    8.5. SG_EMULATED_HOST
    8.6. SG_SET_KEEP_ORPHAN (and _GET_)
    8.7. SG_SET_FORCE_LOW_DMA
    8.8. SG_GET_LOW_DMA
    8.9. SG_NEXT_CMD_LEN
    8.10. SG_GET_NUM_WAITING
    8.11. SG_SET_FORCE_PACK_ID
    8.12. SG_GET_PACK_ID
    8.13. SG_GET_REQUEST_TABLE
    8.14. SG_SET_RESERVED_SIZE (and _GET_ )
    8.15. SG_SCSI_RESET
    8.16. SG_GET_SCSI_ID
    8.17. SG_GET_SG_TABLESIZE
    8.18. SG_GET_TIMEOUT
    8.19. SG_SET_TIMEOUT
    8.20. SG_SET_TRANSFORM
    8.21. SG_GET_TRANSFORM
    8.22. Sg ioctls removed in version 3
    8.23. SCSI_IOCTL_GET_IDLUN
    8.24. SCSI_IOCTL_GET_PCI
    8.25. SCSI_IOCTL_PROBE_HOST
    8.26. SCSI_IOCTL_SEND_COMMAND
   
   
9. Direct and Mmap-ed IO
    9.1. Direct IO
    9.2. Mmap-ed IO
   
   
10. Driver and module initialization
11. Sg and the "proc" file system
    11.1. /proc/scsi/sg/debug
   
   
12. Asynchronous usage of sg
A. Sg3_utils package
B. sg_header, the original sg control structure
C. Programming example
D. Debugging
E. Other references

-----------------------------------------------------------------------------
Chapter 1. Introduction

This document outlines the Linux SCSI Generic (sg) driver interface as found
in the 2.4 series kernels. The driver's purpose is to allow SCSI commands to
be sent directly to SCSI devices. The responses of those commands can then be
obtained. This type of driver is sometimes termed as a "pass through". In the
case of SCSI disks, the block subsystem which is normally used to mount and
access a disk, is bypassed permitting low level operations such as formatting
to be performed. Various specialized applications for writing CD-Rs and
document scanning use the sg driver.

Many devices that use other physical buses (e.g. ATAPI cdroms, USB mass
storage devices and IEEE 1394 sbp2 devices) utilize the SCSI command set. By
using Linux pseudo SCSI device drivers which bridge between the native
protocol stack and the SCSI subsystem, the upper level SCSI device drivers,
including sg, can be used to control "non-SCSI" devices.

This is the third major version of the sg driver. A summary of the sg driver
history is as follows:

  * sg version 1 (original) from 1992 to early 1999 (lk 2.2.5) . A copy of
    the original HOWTO (in plain text) is at www.torque.net/sg/p/original/
    SCSI-Programming-HOWTO.txt
   
  * sg version 2 from lk 2.2.6 in the 2.2 series. Its documentation is
    available in abridged form [www.torque.net/sg/p/scsi-generic.txt] and a
    longer form [www.torque.net/sg/p/scsi-generic_long.txt].
   
  * sg version 3 in the linux kernel 2.4 series.
   

This document can be found at the Linux Documentation Project's site at 
www.linuxdoc.org/HOWTO/SCSI-Generic-HOWTO/ . It is available in plain text
and pdf renderings at that site. A (possibly later) version of this document
can be found at www.torque.net/sg/p/sg_v3_ho.html. That is a single html
page; drop the ".html" extension for multi-page html. There are also
postscript, pdf and rtf renderings from the original SGML (docbook) file at
the same location.

A more general description of the Linux SCSI subsystem of which sg is a part
can be found in the SCSI-2.4-HOWTO.

This document was last modified on 3rd May 2002.
-----------------------------------------------------------------------------

Chapter 2. What the sg driver does

The sg driver permits user applications to send SCSI commands to devices that
understand them. SCSI commands are 6, 10, 12 or 16 bytes long [1]. The SCSI
disk driver (sd), once device initialization is complete, only sends SCSI
READ and WRITE commands. There a several other interesting things one might
want to do, for example, perform a low level format or turn on write caching.

Associated with some SCSI commands there is data to be written to the device.
A SCSI WRITE command is one obvious example. When instructed, the sg driver
arranges for data to be transferred to the device along with the SCSI
command. It is possible that the lower level driver (often known as the "Host
Bus Adapter" [HBA] or simply "adapter" driver) is unable to send the command
to the device. An example of this occurs when the device does not respond in
which case a 'host_status' or 'driver-status' error will be conveyed back to
the user application.

All going well the SCSI command (and optionally some data) are conveyed to
the device. The device will respond with a single byte value called the
'scsi_status'. GOOD is the scsi status indicating everything has gone well.
The most common other status is CHECK CONDITION. In this latter case, the
SCSI mid level issues a REQUEST SENSE SCSI command The response of the
REQUEST SENSE is 18 bytes or more in length and is called the "sense buffer".
It will indicate why the original command may not have been executed. It is
important to realize that a CHECK CONDITION may vary in severity from
informative (e.g. command needed to be retried before succeeding) to fatal
(e.g. "medium error" which often indicates it is time to replace the disk).

So in all cases a user application should check the various status values. If
necessary the "sense buffer" will be copied back to the user application.
SCSI commands like READ convey data back to the user application (if they
succeed). The sg driver arranges for this data transfer from the device to
the user space, if necessary.

The description so far has concentrated on a disk device, but in reality the
sg driver is not needed very often for disks because there already is a
purpose built device driver for that: sd. The same is true of reading audio
and data CDs (sr [scd]) and tapes (st). However scanners that understand the
SCSI command set and CDR "burning" programs tend to use the sg driver. Other
applications include tape "robots" and music CD "ripping".

To find out more about SCSI (draft) standards and resources visit www.t10.org
. To use the sg device driver you should be familiar with the SCSI commands
supported by the device that you wish to control. Getting hold of such
information for devices like scanners can be quite challenging (if the vendor
does not provide it).

The first SCSI command sent to a SCSI device when it is initialized is an
INQUIRY. All SCSI devices should respond promptly to an INQUIRY supplying
information such as the vendor, product designation and revision. Appendix C
shows the sg driver being used to send an INQUIRY and print out some of the
information in the response.
-----------------------------------------------------------------------------

Chapter 3. Identifying the version of the SG driver

Earlier versions of the sg device driver either have no version number (e.g.
the original driver) or a version number starting with "2". The drivers that
support this new interface have a major version number of "3". The sg version
numbers are of the form "x.y.z" and the single number given by the
SG_GET_VERSION_NUM ioctl() is calculated by (x * 10000 + y * 100 + z). The sg
driver discussed here will yield a number greater than or equal to 30000 from
SG_GET_VERSION_NUM. The version number can also be seen using cat /proc/scsi/
sg/version in the new driver. This document describes sg version 3.1.24 for
the lk 2.4 series. Where some facility has been added during the lk 2.4
series (e.g. mmap-ed IO) and hence is not available in all versions of the lk
2.4 series, this is noted. [2]

Here is a list of sg versions that have appeared to date during the lk 2.4
series.

  * lk 2.4.0 : sg version 3.1.17
   
  * lk 2.4.7 : sg version 3.1.19 [see include/scsi/sg.h in that or a later
    version for the changelog]
   
  * lk 2.4.10 : sg version 3.1.20 [This version had several changes put into
    it by third parties over the next 6 release kernel versions.]
   
  * lk 2.4.17 : sg version 3.1.22
   
  * lk 2.4.19 : sg version 3.1.24 [lk 2.4.19 hasn't been released at the time
    of writing. It will most likely contains sg version 3.1.24 .]
   

-----------------------------------------------------------------------------
Chapter 4. Interface

This driver supports the following system calls, most of which are typical
for a character device driver in Linux. They are:

  * open()
   
  * close()
   
  * write()
   
  * read()
   
  * ioctl()
   
  * poll()
   
  * fcntl(sg_fd, F_SETFL, oflags | FASYNC)
   
  * mmap()
   

The interface to these calls as seem from Linux applications is well
documented in the "man" pages (in section 2).

A user application accesses the sg driver by using the open() system call on
sg device file name. Each sg device file name corresponds to one
(potentially) attached SCSI device. These are usually found in the /dev
directory. Here are some sg device file names:
$ ls -l /dev/sg[01]                                                          
crw-rw----    1 root     disk      21,   0 Aug 30 16:30 /dev/sg0             
crw-rw----    1 root     disk      21,   1 Aug 30 16:30 /dev/sg1             
The leading "c" at the front of the permissions indicates a character device.
The absence of read or write permissions for "others" is prudent security.
The major number of all sg device names is 21 while the minor number is the
same as the number following "sg" in the device file name. When the device
file system (devfs) is active on a system then the primarily sg device file
names are found at the bottom of an informative subtree:
$ cd /dev/scsi/host1/bus0/target0/lun0                                       
$ ls -l generic                                                              
crw-r-----    1 root     root      21,   1 Dec 31  1969 generic              
Under devfs (when its daemon [devfsd] is running) there would usually be a
symbolic link from /dev/sg1 to /dev/scsi/host1/bus0/target0/lun0/generic.
This is so existing applications looking for the abridged device file name
will not be surprised. One advantage of devfs is that only attached SCSI
devices appear in the /dev/scsi subtree.

A significant addition in sg v3 is an ioctl() called SG_IO which is
functionally equivalent to a write() followed by a blocking read(). In
certain contexts the write()/read() combination have advantages over SG_IO
(e.g. command queuing) and continue to be supported.

The existing (and original) sg interface based on the sg_header structure is
still available using a write()/read() sequence as before. The SG_IO ioctl
will only accept the new interface based on the sg_io_hdr_t structure.

The sg v3 driver thus has a write() call that can accept either the older
sg_header structure or the new sg_io_hdr_t structure. The write() calls
decides which interface is being used based on the second integer position of
the passed header (i.e. sg_header::reply_len or sg_io_hdr_t::
dxfer_direction). If it is a positive number then the old interface is
assumed. If it is a negative number then the new interface is assumed. The
direction constants placed in 'dxfer_direction' in the new interface have
been chosen to have negative values.

If a request is sent to a write() with the sg_io_hdr_t interface then the
corresponding read() that fetches the response must also use the sg_io_hdr_t
interface. The same rule applies to the sg_header interface.

This document concentrates on the sg_io_hdr_t interface introduced in the sg
version 3 driver. For the definition of the older sg_header interface see the
sg version 2 documentation. A brief description is given in Appendix B.
-----------------------------------------------------------------------------

Chapter 5. Theory of operation

The path of a request through the sg driver can be broken into 3 distinct
stages:

 1. The request is received from the user, resources are reserved as required
    (e.g. kernel buffer for indirect IO). If necessary, data in the user
    space is transferred into kernel buffers. Then the request is submitted
    to the SCSI mid level (and then onto the adapter) for execution. The SCSI
    mid level maintains a queue so the request may have to wait. If a SCSI
    device supports command queuing then it may be able to accommodate
    multiple outstanding requests.
   
 2. Assuming the SCSI adapter supports interrupts, then an interrupt is
    received when the request is completed. When this interrupt arrives the
    data transfer is complete. This means that if the SCSI command was a READ
    then the data is in kernel buffers (indirect IO) or in user buffers
    (direct or mmap-ed IO). The sg driver is informed of this interrupt via a
    kernel mechanism called a "bottom half" handler. Some kernel resources
    are freed up.
   
 3. The user makes a call to fetch the result of the request. If necessary,
    data in kernel buffers is transferred to the user space. If necessary,
    the sense buffer is written out to the user space. The remaining kernel
    resources associated with this request are freed up.
   

The write() call performs stage 1 while the read() call performs stage 3. If
the read() call is made before stage 2 is complete then it will either wait
or yield EAGAIN (depending on whether the file descriptor is blocking or
not). If asynchronous notification is being used then stage 2 will send a
SIGPOLL signal to the user process. The poll() system call will show this
file descriptor is now readable (unless it was sent by the SG_IO ioctl()).

The SG_IO ioctl() performs stage 1, waits for stage 2 and then performs stage
3. If the file descriptor in question is set O_NONBLOCK then SG_IO will
ignore this and still block! Also a SG_IO call will not effect the poll()
state nor cause a SIGPOLL signal to be sent. If you really want non-blocking
operation (e.g. for command queuing) then don't use SG_IO; use the write()
read() sequence instead.

For more information about normal (or indirect), direct and mmap-ed IO see 
Chapter 9 .

Currently the sg driver uses one Linux major device number (char 21) which in
the lk 2.4 series limits it to handling 256 SCSI devices. Any attempt to
attach more than this number will rejected with a message being sent to the
console and the log file. [3]
-----------------------------------------------------------------------------

Chapter 6. The sg_io_hdr_t structure in detail

The main control structure for the version 3 SCSI generic driver has a struct
tag name of "sg_io_hdr" and a typedef name of "sg_io_hdr_t". The structure is
shown in abridged form below. The "[i]" notation indicates an input value
while "[o]" indicates a value that is output. The "[i->o]" indicates a value
that is conveyed from input to output and apart from one special case, is not
used by the driver. The "[i->o]" members are meant to aid an application
matching the request sent to a write() to the corresponding response received
by a read(). For pointers the "[*i]" indicates a pointer that is used for
reading from user memory into the driver, "[*o]" is a pointer used for
writing, and "[*io]" indicates a pointer used for either reading or writing.
typedef struct sg_io_hdr                                                     
{                                                                            
    int interface_id;           /* [i] 'S' (required) */                     
    int dxfer_direction;        /* [i] */                                    
    unsigned char cmd_len;      /* [i] */                                    
    unsigned char mx_sb_len;    /* [i] */                                    
    unsigned short iovec_count; /* [i] */                                    
    unsigned int dxfer_len;     /* [i] */                                    
    void * dxferp;              /* [i], [*io] */                             
    unsigned char * cmdp;       /* [i], [*i]  */                             
    unsigned char * sbp;        /* [i], [*o]  */                             
    unsigned int timeout;       /* [i] unit: millisecs */                    
    unsigned int flags;         /* [i] */                                    
    int pack_id;                /* [i->o] */                                 
    void * usr_ptr;             /* [i->o] */                                 
    unsigned char status;       /* [o] */                                    
    unsigned char masked_status;/* [o] */                                    
    unsigned char msg_status;   /* [o] */                                    
    unsigned char sb_len_wr;    /* [o] */                                    
    unsigned short host_status; /* [o] */                                    
    unsigned short driver_status;/* [o] */                                   
    int resid;                  /* [o] */                                    
    unsigned int duration;      /* [o] */                                    
    unsigned int info;          /* [o] */                                    
} sg_io_hdr_t;  /* 64 bytes long (on i386) */                                
-----------------------------------------------------------------------------

6.1. interface_id

This must be set to 'S' (capital ess). If not, the ENOSYS error message is
placed in errno. The idea is to allow interface variants in the future that
identify themselves with a different value. [The parallel port generic driver
(pg) uses the letter 'P' to identify itself.] The type of interface_id is
int.
-----------------------------------------------------------------------------

6.2. dxfer_direction

The type of dxfer_direction is int. This is required to be one of the
following:

  * SG_DXFER_NONE /* e.g. a SCSI Test Unit Ready command */
   
  * SG_DXFER_TO_DEV /* e.g. a SCSI WRITE command */
   
  * SG_DXFER_FROM_DEV /* e.g. a SCSI READ command */
   
  * SG_DXFER_TO_FROM_DEV
   
  * SG_DXFER_UNKNOWN
   

The value SG_DXFER_NONE should be used when there is no data transfer
associated with a command (e.g. TEST UNIT READY). The value SG_DXFER_TO_DEV
should be used when data is being moved from user memory towards the device
(e.g. WRITE). The value SG_DXFER_FROM_DEV should be used when data is being
moved from the device towards user memory (e.g. READ).

The value SG_DXFER_TO_FROM_DEV is only relevant to indirect IO (otherwise it
is treated like SG_DXFER_FROM_DEV). Data is moved from the user space to the
kernel buffers. The command is then performed and most likely a READ-like
command transfers data from the device into the kernel buffers. Finally the
kernel buffers are copied back into the user space. This technique allows
application writers to initialize the buffer and perhaps deduce the number of
bytes actually read from the device (i.e. detect underrun). This is better
done by using 'resid' if it is supported.

The value SG_DXFER_UNKNOWN is for those (rare) situations where the data
direction is not known. It may be useful for backward compatibility of
existing applications when the relevant direction information is not
available in the sg interface layer. There is a (minor) performance "hit"
associated with choosing this option (e.g. on the PCI bus). Some recent
pseudo device drivers (e.g. USB mass storage) may have problems handling this
value (especially on vendor-specific SCSI commands).

N.B. 'dxfer_direction' must have one of the five indicated values and cannot
be uninitialized or zero.

If 'dxfer_len' is zero then all values are treated like SG_DXFER_NONE.
-----------------------------------------------------------------------------

6.3. cmd_len

This is the length in bytes of the SCSI command that 'cmdp' points to. As a
SCSI command is expected an EMSGSIZE error number is produced if the value is
less than 6 or greater than 16. Further, if the SCSI mid level has a further
limit then EMSGSIZE is produced in this case as well. [4] The type of cmd_len
is unsigned char.
-----------------------------------------------------------------------------

6.4. mx_sb_len

This is the maximum size that can be written back to the 'sbp' pointer when a
sense_buffer is output which is usually in an error situation. The actual
number written out is given by 'sb_len_wr'. In all cases 'sb_len_wr' <=
'mx_sb_len' . The type of mx_sb_len is unsigned char.
-----------------------------------------------------------------------------

6.5. iovec_count

This is the number of scatter gather elements in an array pointed to by
'dxferp'. If the value is zero then scatter gather (in the user space) is
_not_ being used and 'dxferp' points to the data transfer buffer. If the
value is greater than zero then each element of the array is assumed to be of
the form:
            typedef struct sg_iovec                                          
            {                                                                
                void * iov_base; /* starting address */                      
                size_t iov_len;  /* length in bytes */                       
            } sg_iovec_t;                                                    
Note that this structure has been named and defined in such a way to parallel
"struct iovec" used by the readv() and writev() system calls in Linux. See
"man 2 readv".

Note that the scatter gather capability offered by 'iovec_count' is unrelated
to the scatter gather capability (often associated with DMA) offered by most
modern SCSI adapters. Furthermore iovec_count's variety of scatter gather
(into the user space) is only available when normal (or "indirect") IO is
being used. Hence when the SG_FLAG_DIRECT_IO or SG_FLAG_MMAP_IO are set in
'flags' then 'iovec_count' should be zero.

The type of iovec_count is unsigned short.
-----------------------------------------------------------------------------

6.6. dxfer_len

This is the number of bytes to be moved in the data transfer associated with
the command. The direction of the transfer is indicated by 'dxfer_direction'.
If 'dxfer_len' is zero then no data transfer takes place. [5]

If iovec_count is non-zero then 'dxfer_len' should be equal to the sum of
iov_len lengths. If not, the minimum of the two is the transfer length. The
type of dxfer_len is unsigned int.
-----------------------------------------------------------------------------

6.7. dxferp

If 'iovec_count' is zero then this value is a pointer to user memory of at
least 'dxfer_len' bytes in length. If there is a data transfer associated
with the command then the data will be transferred to or from this user
memory. If 'iovec_count' is greater than zero then this value points to a
scatter-gather array in user memory. Each element of this array should be an
object of type sg_iovec_t. Note that data is sometimes written to user memory
(e.g. from a failed SCSI READ) even when an error has occurred.

If mmap-ed IO is selected then the value in 'dxferp' is ignored and any data
transfers will be to and from the address returned by the prior mmap() call.

The type of dxferp is void * .
-----------------------------------------------------------------------------

6.8. cmdp

This value points to the SCSI command to be executed. The command is assumed
to be 'cmd_len' bytes long. If cmdp is NULL then the system call yields an
EMSGSIZE error number. The user memory pointed to is only read (not written
to). The type of cmdp is unsigned char * .
-----------------------------------------------------------------------------

6.9. sbp

This value points to user memory of at least 'mx_sb_len' bytes length where
the SCSI sense buffer will be output. Most successful commands do not output
a sense buffer and this will be indicated by 'sb_len_wr' being zero. Note
that there are error conditions that don't result in a sense buffer be
generated. The sense buffer results from the "auto-sense" mechanism in the
SCSI mid-level driver. This mechanism detects a CHECK_CONDITION status and
issues a REQUEST SENSE command and conveys its response back as the "sense
buffer". The type of sbp is unsigned char * .
-----------------------------------------------------------------------------

6.10. timeout

This value is used to timeout the given command. The units of this value are
milliseconds. The time being measured is from when a command is sent until
when sg is informed the request has been completed. A following read() can
take as long as the user likes. Timeouts are best avoided, especially if SCSI
bus resets will adversely effect other devices on that SCSI bus. When the
timeout expires, the SCSI mid level attempts error recovery. Error recovery
completes when the first action in the following list is successful. Note
that a more extreme measure is being taken at each step.

  * the SCSI command that has timed out is aborted [6]
   
  * a SCSI device reset is attempted
   
  * a SCSI bus reset is attempted. Note this may have an adverse effect on
    other devices sharing that SCSI bus.
   
  * a SCSI host (bus adapter) reset is attempted. This is an attempt to
    re-initialize the adapter card associated with the SCSI device that has
    the timed out command.
   

If all these fail then the device may be set "offline" which means that it is
no longer accessible (except by this driver when open()-ed O_NONBLOCK) until
the machine is rebooted. Offline devices still appear in the cat /proc/scsi/
scsi listing. The last column of the cat /proc/scsi/sg/devices listing shows
the online/offline status of a device ("1" means online while "0" is
offline). The exact status returned depends on which level of error recovery
succeeded. Most likely the 'host_status' will be set to DID_ABORT or
DID_RESET.

The two error statuses containing the word "TIME(_)OUT" are typically _not_
related to a command timing out. DID_TIME_OUT in the 'host_status' usually
means an (unexpected) device selection timeout. DRIVER_TIMEOUT in the
'driver_status' byte means the SCSI adapter is unable to control the devices
on its SCSI bus (and has given up).

The type of timeout is unsigned int (and it represents milliseconds).
-----------------------------------------------------------------------------

6.11. flags

These are single or multi-bit values that can be "or-ed" together:

  * SG_FLAG_DIRECT_IO This is a request for direct IO on the data transfer.
    If it cannot be performed then the driver automatically performs indirect
    IO instead. If it is important to find out which type of IO was performed
    then check the values from the SG_INFO_DIRECT_IO_MASK in 'info' when the
    request packet is completed (i.e. after read() or ioctl(,SG_IO,) ). The
    default action is to do indirect IO.
   
  * SG_FLAG_LUN_INHIBIT The default action of the sg driver to overwrite
    internally the top 3 bits of the second SCSI command byte with the LUN
    associated with the file descriptor's device. To inhibit this action set
    this flag. For SCSI 3 (or later) devices, this internal LUN overwrite
    does not occur.
   
  * SG_FLAG_MMAP_IO When set the driver will attempt to procure the reserved
    buffer. If the reserved buffer is occupied (EBUSY) or too small (ENOMEM)
    then the operation (write() or ioctl(SG_IO)) fails. No data transfers
    occur between the dxferp pointer and the reserved buffer (dxferp is
    ignored). In order for a user application to access mmap-ed IO, it must
    have successfully executed an appropriate mmap() system call on this sg
    file descriptor. This precondition is not checked by write() or ioctl
    (SG_IO) when this flag is set. Setting this flag and SG_FLAG_DIRECT_IO
    results in a EINVAL error.
   
  * SG_FLAG_NO_DXFER When set user space data transfers to or from the kernel
    buffers do not take place. This only has effect during indirect IO. This
    flag is for testing bus speed (e.g. the "sg_rbuf" utility uses it).
   

The type of flags is unsigned int.
-----------------------------------------------------------------------------

6.12. pack_id

This value is not normally acted upon by the sg driver. It is provided so the
user can identify the request. This is useful when command queuing is being
used. The "abnormal" case is when SG_SET_FORCE_PACK_ID is set and a 'pack_id'
other than -1 is given to read(). In this case the read() will wait to fetch
a request that matches this 'pack_id'. If this mode is used be careful to set
'dxfer_direction' to a valid value (actually any of the SG_DXFER_* values
will do) on input to the read(), together with the wanted pack_id. The type
of pack_id is int.
-----------------------------------------------------------------------------

6.13. usr_ptr

This value is not acted upon by the sg driver. It is meant to allow the user
to associate some object with this request (e.g. to maintain state
information). The type of usr_ptr is void * .
-----------------------------------------------------------------------------

6.14. status

This is the SCSI status byte as defined by the SCSI standard. Note that it
can have vendor information set in bits 0, 6 and 7 (although this is
uncommon). Further note that this 'status' data does _not_ match the
definitions in <scsi/scsi.h> (e.g. CHECK_CONDITION). The following
'masked_status' does match those definitions. [7] The type of status is
unsigned char .
-----------------------------------------------------------------------------

6.15. masked_status

Logically: masked_status == ((status & 0x3e) >> 1) . So 'masked_status'
strips the vendor information bits off 'status' and then shifts it right one
position. This makes it easier to do things like "if (CHECK_CONDITION ==
masked_status) ..." using the definitions in <scsi/scsi.h>. The defined
values in this file are:

  * GOOD [0x00]
   
  * CHECK_CONDITION [0x01]
   
  * CONDITION_GOOD [0x02]
   
  * BUSY 0x04
   
  * INTERMEDIATE_GOOD 0x08
   
  * INTERMEDIATE_C_GOOD 0x0a
   
  * RESERVATION_CONFLICT 0x0c
   
  * COMMAND_TERMINATED 0x11
   
  * QUEUE_FULL 0x14
   

N.B. 1 bit offset from usual SCSI status values

Note that SCSI 3 defines some additional status codes. [8] The type of
masked_status is unsigned char .
-----------------------------------------------------------------------------

6.16. msg_status

The messaging level in SCSI is under the command level and knowledge of what
is happening at the messaging level is very rarely needed. Furthermore most
modern chip-sets used in SCSI adapters completely hide this value. Nearly all
adapters will return zero in 'msg_status' all the time. The type of
msg_status is unsigned char .
-----------------------------------------------------------------------------

6.17. sb_len_wr

This is the actual number of bytes written to the user memory pointed to by
'sbp'. 'sb_len_wr' is always <= 'mx_sb_len'. Linux 2.2 series kernels (and
earlier) truncate this value to a maximum of 16 bytes. The actual number of
bytes written will not exceed the length indicated by "Additional Sense
Length" field (byte 7) of the Request Sense response. The type of sb_len_wr
is unsigned char .
-----------------------------------------------------------------------------

6.18. host_status

These codes potentially come from the firmware on a host adapter or from one
of several hosts that an adapter driver controls. The 'host_status' field has
the following values whose #defines mimic those which are only visible within
the kernel (with the "SG_ERR_" removed from the front of each define). A copy
of these defines can be found in sg_err.h (see Appendix A):

  * SG_ERR_DID_OK [0x00] NO error
   
  * SG_ERR_DID_NO_CONNECT [0x01] Couldn't connect before timeout period
   
  * SG_ERR_DID_BUS_BUSY [0x02] BUS stayed busy through time out period
   
  * SG_ERR_DID_TIME_OUT [0x03] TIMED OUT for other reason (often this an
    unexpected device selection timeout)
   
  * SG_ERR_DID_BAD_TARGET [0x04] BAD target, device not responding?
   
  * SG_ERR_DID_ABORT [0x05] Told to abort for some other reason. From lk
    2.4.15 the SCSI subsystem supports 16 byte commands however few adapter
    drivers do. Those HBA drivers that don't support 16 byte commands will
    yield this error code if a 16 byte command is passed to a SCSI device
    they control.
   
  * SG_ERR_DID_PARITY [0x06] Parity error. Older SCSI parallel buses have a
    parity bit for error detection. This probably indicates a cable or
    termination problem.
   
  * SG_ERR_DID_ERROR [0x07] Internal error detected in the host adapter. This
    may not be fatal (and the command may have succeeded). The aic7xxx and
    sym53c8xx adapter drivers sometimes report this for data underruns or
    overruns. [9]
   
  * SG_ERR_DID_RESET [0x08] The SCSI bus (or this device) has been reset. Any
    SCSI device on a SCSI bus is capable of instigating a reset.
   
  * SG_ERR_DID_BAD_INTR [0x09] Got an interrupt we weren't expecting
   
  * SG_ERR_DID_PASSTHROUGH [0x0a] Force command past mid-layer
   
  * SG_ERR_DID_SOFT_ERROR [0x0b] The low level driver wants a retry
   

The type of host_status is unsigned short .
-----------------------------------------------------------------------------

6.19. driver_status

One driver can potentially control several host adapters. For example
Advansys provide one Linux adapter driver that controls all adapters made by
that company - if 2 of more Advansys adapters are in 1 machine, then 1 driver
controls both. When ('driver_status' & SG_ERR_DRIVER_SENSE) is true the
'sense_buffer' is also output. The 'driver_status' field has the following
values whose #defines mimic those which are only visible within the kernel
(with the "SG_ERR_" removed from the front of each define). A copy of these
defines can be found in sg_err.h (see the utilities section):

  * SG_ERR_DRIVER_OK [0x00] Typically no suggestion
   
  * SG_ERR_DRIVER_BUSY [0x01]
   
  * SG_ERR_DRIVER_SOFT [0x02]
   
  * SG_ERR_DRIVER_MEDIA [0x03]
   
  * SG_ERR_DRIVER_ERROR [0x04]
   
  * SG_ERR_DRIVER_INVALID [0x05]
   
  * SG_ERR_DRIVER_TIMEOUT [0x06] Adapter driver is unable to control the SCSI
    bus to its is setting its devices offline (and giving up)
   
  * SG_ERR_DRIVER_HARD [0x07]
   
  * SG_ERR_DRIVER_SENSE [0x08] Implies sense_buffer output
   
  * above status 'or'ed with one of the following suggestions
   
  * SG_ERR_SUGGEST_RETRY [0x10]
   
  * SG_ERR_SUGGEST_ABORT [0x20]
   
  * SG_ERR_SUGGEST_REMAP [0x30]
   
  * SG_ERR_SUGGEST_DIE [0x40]
   
  * SG_ERR_SUGGEST_SENSE [0x80]
   

The type of driver_status is unsigned short .
-----------------------------------------------------------------------------

6.20. resid

This is the residual count from the data transfer. It is 'dxfer_len' less the
number of bytes actually transferred. In practice it only reports underruns
(i.e. positive number) as data overruns should never happen. This value will
be zero if there was no underrun or the SCSI adapter doesn't support this
feature. [10] The type of resid is int .
-----------------------------------------------------------------------------

6.21. duration

This value will be the number of milliseconds from when a SCSI command was
sent until sg is informed that it is complete. For i386 machines the
granularity is 10ms while on alpha machines it is 1ms. This value is rounded
toward zero. The type of duration is unsigned int .
-----------------------------------------------------------------------------

6.22. info

This value is designed to convey useful information back to the user about
the associated request. This information does not necessarily indicate an
error. Several single bit and multi-bit fields are "or-ed" together to make
this value.

A single bit component contained in SG_INFO_OK_MASK indicates whether some
error or status field is non-zero. If either 'masked_status', 'host_status'
or 'driver_status' are non-zero then SG_INFO_CHECK is set. The associated
values are:

  * SG_INFO_OK_MASK [0x1]
   
  * SG_INFO_OK [0x0] no sense, host nor driver "noise"
   
  * SG_INFO_CHECK [0x1] something abnormal happened. In most but not all
    cases, the sense buffer will be written. If the sense buffer has not been
    written than 'sb_len_wr' will be zero. This flag indicates either
    'masked_status', 'host_status' or 'driver_status' is non-zero.
   

A multi bit component contained in SG_INFO_DIRECT_IO_MASK indicates what type
of data transfer has just taken place. If indirect IO (or no data transfer)
has taken place then SG_INFO_INDIRECT_IO is matched. Note that even if direct
IO was requested in 'flags' the driver may choose to do indirect IO instead.
If direct IO was requested and performed then SG_INFO_DIRECT_IO will be
matched. Currently SG_INFO_MIXED_IO is never set. The associated values are:

  * SG_INFO_DIRECT_IO_MASK [0x6]
   
  * SG_INFO_INDIRECT_IO [0x0] data xfer via kernel buffers (or no xfer)
   
  * SG_INFO_DIRECT_IO [0x2]
   
  * SG_INFO_MIXED_IO [0x4] part direct, part indirect IO
   

The type of info is unsigned int .
-----------------------------------------------------------------------------

Chapter 7. System calls

System calls that can be used on sg devices are discussed in this chapter.
The ioctl() system call is discussed in the following chapter [ see Chapter 8
].

Successfully opening a sg device file name (e.g. /dev/sg0) establishes a link
between a file descriptor and an attached SCSI device. The sg driver
maintains state information and resources at both the SCSI device (e.g.
exclusive lock) and the file descriptor (e.g. reserved buffer) levels.

A SCSI device can be detached while an application has a sg file descriptor
open. An example of this is a "hotplug" device such as a USB mass storage
device that has just been unplugged. Most subsequent system calls that
attempt to access the detached SCSI device will yield ENODEV. The close()
call will complete silently while the poll() call will "or" in POLLHUP to its
result. A subsequent attempt to open() that device name will yield ENODEV.
-----------------------------------------------------------------------------

7.1. open()

open(const char * filename, int flags). The filename should be a sg device
file name as discussed in the Chapter 4. Flags can be a number of the
following or-ed together:

  * O_RDONLY restricts operations to read()s and ioctl()s (i.e. can't use
    write() ).
   
  * O_RDWR permits all system calls to be executed.
   
  * O_EXCL waits for other opens on the associated SCSI device to be closed
    before proceeding. If O_NONBLOCK is set then yields EBUSY when someone
    else has the SCSI device open. The combination of O_RDONLY and O_EXCL is
    disallowed.
   
  * O_NONBLOCK Sets non-blocking mode. Calls that would otherwise block yield
    EAGAIN (e.g. read() ) or EBUSY (e.g. open() ). This flag is ignored by
    ioctl(SG_IO) .
   

Either O_RDONLY or O_RDWR must be set in flag. Either of the other 2 flags
(but not both) can be or-ed in.

Note that multiple file descriptors may be open to the same SCSI device.
[This is a way of side stepping the SG_MAX_QUEUE limit.] At the sg level
separate state information is maintained. This means that even if multiple
file descriptors are open to a single SCSI device their write() read()
sequences are essentially independent.

Open() calls may be blocked due to exclusive locks (i.e. O_EXCL). An
exclusive lock applies to a single SCSI device and only to sg's use of that
device (i.e. it has no effect on access via sd, sr or st to that device). If
the O_NONBLOCK flag is used then open() calls that would have otherwise
blocked, yield EBUSY. Applications that scan sg devices trying to determine
their identity (e.g. whether one is a scanner) should use the O_NONBLOCK flag
otherwise they run the risk of blocking.

The driver will attempt to reserve SG_DEF_RESERVED_SIZE bytes (32KBytes in
the current sg.h) on open(). The size of this reserved buffer can
subsequently be modified with the SG_SET_RESERVED_SIZE ioctl(). In both cases
these are requests subject to various dynamic constraints. The actual amount
of memory obtained can be found by the SG_GET_RESERVED_SIZE ioctl(). The
reserved buffer will be used if:

  * it is not already in use (e.g. when command queuing is in use)
   
  * a write() or ioctl(SG_IO) requests a data transfer size that is less than
    or equal to the reserved buffer size.
   

Returns a file descriptor if >= 0 , otherwise -1 implies an error.
-----------------------------------------------------------------------------

7.2. write()

write(int sg_fd, const void * buffer, size_t count). The action of write()
with a control block based on struct sg_header is discussed in the earlier
document: www.torque.net/sg/p/scsi-generic.txt (i.e the sg version 2
documentation). This section describes the action of write() when it is given
a control block based on struct sg_io_hdr.

The 'buffer' should point to an object of type sg_io_hdr_t and 'count' should
be sizeof(sg_io_hdr_t) [it can be larger but the excess is ignored]. If the
write() call succeeds then the 'count' is returned as the result.

Up to SG_MAX_QUEUE (16) write()s can be queued up before any finished
requests are completed by read(). An attempt to queue more than that will
result in an EDOM error. [11] The write() command should return more or less
immediately. [12]

The version 2 sg driver defaulted the maximum queue length to 1 (and made
available the SG_SET_COMMAND_Q ioctl() to switch it to SG_MAX_QUEUE). So for
backward compatibility a file descriptor that only receives sg_header
structures in its write() will have a default "max" queue length of 1. As
soon as a sg_io_hdr_t structure is seen by a write() then the maximum queue
length is switched to SG_MAX_QUEUE on that file descriptor.

The "const" on the 'buffer' pointer is respected by the sg driver. Data is
read in from the sg_io_hdr object that is pointed to. Significantly this is
when the 'sbp' and the 'dxferp' are recorded internally (i.e. not from the
sg_io_hdr object given to the corresponding read() ).
-----------------------------------------------------------------------------

7.3. read()

read(int sg_fd, void * buffer, size_t count). The action of read() with a
control block based on struct sg_header is discussed in the earlier document:
www.torque.net/sg/p/scsi-generic.txt (i.e. the sg version 2 documentation).
This section describes the action of read() when it is given a control block
based on struct sg_io_hdr.

The 'buffer' should point to an object of type sg_io_hdr_t and 'count' should
be sizeof(sg_io_hdr_t) [it can be larger but the excess is ignored]. If the
read() call succeeds then the 'count' is returned as the result.

By default, read() will return the oldest completed request that is queued
up. A read() will not interfere with any request associated with the SG_IO
ioctl() on this file descriptor except in a special case when a SG_IO ioctl()
is interrupted by a signal.

If the SG_SET_FORCE_PACK_ID,1 ioctl() is active then read() will attempt to
fetch the packet whose pack_id (given earlier to write()) matches the
sg_io_hdr_t::pack_id given to this read(). If not available it will either
wait or yield EAGAIN. As a special case, -1 in sg_io_hdr_t::pack_id given to
read() will match the request whose response has been waiting for the longest
time. Take care to also set 'dxfer_direction' to any valid value (e.g.
SG_DXFER_NONE) when in this mode. The 'interface_id' member should also be
set appropriately.

Apart from the SG_SET_FORCE_PACK_ID case (and then only for the 3 indicated
fields), the sg_io_hdr_t object given to read() can be uninitialized. Note
that the 'sbp' pointer value for optionally outputting a sense buffer was
recorded from the earlier, corresponding write().
-----------------------------------------------------------------------------

7.4. poll()

poll(struct pollfd *ufds, unsigned int nfds, int timeout). This call can be
used to check the state of a sg file descriptor. It will always respond
immediately. Typical usages are to periodically poll the state of a sg file
descriptor and to determine why a SIG_IO signal was received.

For file descriptors associated with sg devices:

  * POLLIN one or more responses is awaiting a read()
   
  * POLLOUT command can be sent to write() without causing an EDOM error
    (i.e. sufficient space on sg's queues)
   
  * POLLHUP SCSI device has been detached, awaiting cleanup
   
  * POLLERR internal structures are inconsistent
   

POLLOUT indicates the sg will not block a new write() or SG_IO ioctl().
However it is still possible (but unlikely) that the mid level or an adapter
may block (or yield EAGAIN).
-----------------------------------------------------------------------------

7.5. close()

close(int sg_fd). Preferably a close() should be done after all issued write
()s have had their corresponding read() calls completed. Unfortunately this
is not always possible (e.g. the user may choose to send a kill signal to a
running process). The sg driver implements "fast" close semantics and thus
will return more or less immediately (i.e. not wait on any event). This is
application friendly but requires the sg driver to arrange for an orderly
cleanup of those packets that are still "in flight".

When close() leaves outstanding SCSI commands still awaiting responses, the
sg driver maintains its internal structures for the now defunct file
descriptor. These internal structures are maintained until all outstanding
responses (some might be timeouts) are received. When the sg driver is loaded
as a module and has any open file descriptors or "defunct" file descriptors
then it cannot be unloaded. An attempt to call rmmod sg will report the
driver is busy. Defunct file descriptors that remain for some time, perhaps
awaiting a timeout, can be observed with the cat /proc/scsi/sg/debug command.
In this case "closed=1" will be set on the defunct file descriptor [see 
Section 11.1]. Defunct file descriptors do not impede attempts by
applications to open() new file descriptors on the same SCSI device.

The kernel arranges for only the last close() on a file descriptor to be seen
by a driver (and to emphasize this, the corresponding sg driver call is named
sg_release() rather than sg_close()). This is only significant when an
application uses fork() or dup().

Returns 0 if successful, otherwise -1 implies an error.
-----------------------------------------------------------------------------

7.6. mmap()

mmap(void * start, size_t length, int prot, int flags, int sg_fd, off_t
offset). This system call returns a pointer to the beginning of the reserved
buffer associated with the sg file descriptor 'sg_fd'. The 'start' argument
is a hint to the kernel and is ignored by this driver; best set it to 0. The
'length' argument should be less than or equal to the size of the reserved
buffer associated with 'sg_fd'. If it exceeds the reserved buffer size (after
'length' has been rounded up to a page size multiple) then MAP_FAILED is
returned and ENOMEM is placed in errno. The 'prot' argument should either be
PROT_READ or (PROT_READ | PROT_WRITE). The 'flags' argument should contain
MAP_SHARED. In a sense, the user application is "sharing" data with the sg
driver. The MAP_PRIVATE flag does not play well with compiler optimization
flags such as '-O2'. The 'offset' argument must be set to 0 (or NULL).

The mmap() system call can be made multiple times on the same sg_fd. The
munmap() system call is not required if close() is called on sg_fd. Mmap-ed
IO is well-behaved when a process is fork()-ed (or the equivalent finer
grained clone() system call is made). In the case of a fork(), 2 processes
will be sharing the same memory mapped area together with the sg driver for a
sg_fd and the last one to close the sg_fd (or exit) will cause the shared
memory to be freed.

It is assumed that if the default reserved buffer size of 32 KB is not
sufficient then a ioctl(SG_SET_RESERVED_SIZE) call is made prior to any calls
to mmap(). If the required size is not a multiple of the kernel's page size
(returned by getpagesize() system call) then the size passed to ioctl
(SG_SET_RESERVED_SIZE) should be rounded up to the next page size multiple.

Mmap-ed IO is requested by setting (or or-ing in) the SG_FLAG_MMAP_IO
constant into the flag member of the the sg_io_hdr structure prior to a call
to write() or ioctl(SG_IO). The logic to do mmap-ed IO _assumes_ that an
appropriate mmap() call has been made by the application. In other words it
does not check. [13]
-----------------------------------------------------------------------------

7.7. fcntl(sg_fd, F_SETFL, oflags | FASYNC)

fcntl(int sg_fd, int cmd, long arg). There are several uses for this system
call in association with a sg file descriptor. The following pseudo code
shows code that is useful for scanning the sg devices, taking care not to be
caught in a wait for an O_EXCL lock by another process, and when the
appropriate device is found, switching to normal blocked io. A working
example of this logic is in the sg_scan utility program.
open("/dev/sg0", O_RDONLY | O_NONBLOCK)                                      
/* check device, EBUSY means some other process has O_EXCL lock on it */     
/* when the device you want is found then ... */                             
flags = fcntl(sg_fd, F_GETFL)                                                
fcntl(sg_fd, F_SETFL, flags & (~ O_NONBLOCK))                                
/* since, with simple apps, it is easier to use normal blocked io */         

The sg driver supports asynchronous notification. This is a non-blocking mode
of operation in which, when the driver receives data back from a device so
that a read() can be done, it sends a SIGPOLL (aka SIGIO) signal to the
owning process. Here is a code snippet from the sg_poll test program.
sigemptyset(&sig_set)                                                        
sigaddset(&sig_set, SIGPOLL)                                                 
sigaction(SIGPOLL, &s_action, 0)                                             
fcntl(sg_fd, F_SETOWN, getpid())                                             
flags = fcntl(sg_fd, F_GETFL);                                               
fcntl(sg_fd, F_SETFL, flags | O_ASYNC)                                       
-----------------------------------------------------------------------------

7.8. Errors reported in errno

With the original interface almost any string could be accidentally given to
write() and potentially (but rarely) something nasty could happen. If some
error was detected then more than likely EIO was placed in errno.

Unfortunately this can still happen with write() since it can accept both the
original struct sg_header or the newer sg_io_hdr_t described in this note.
However since the SG_IO ioctl() will only accept the sg_io_hdr_t structure
there is less chance of a random string being interpreted as a command. Since
the sg_io_hdr_t interface does a lot more error checking, it attempts to give
out more precise errno values to help the user pinpoint the problem.
[Admittedly some of these errno values are picked in an arbitrary way from
the large set of available values.]

In most cases when a system call on a sg file descriptor fails, the call in
question will return -1. After an application detects that a system call has
failed it should read the value in the "errno" variable (prior to do any more
system calls). Applications should include the <errno.h> header.

Below is a table of errno values indicating which calls to sg will generate
them and the meaning of the error. A write() call is indicated by "w", a read
() call by "r" and an open() call by "o".

errno    which_calls    Meaning                                               
-----    -----------    ----------------------------------------------        
EACCES    <some ioctls> Root permission (more precisely CAP_SYS_ADMIN         
                        or CAP_SYS_RAWIO) required. Also may occur during     
                        an attempted write to /proc/scsi/sg files.            
EAGAIN    r             The file descriptor is non-blocking and the request   
                        has not been completed yet.                           
EAGAIN    w,SG_IO       SCSI sub-system has (temporarily) run out of          
                        command blocks.                                       
EBADF     w             File descriptor was not open()ed O_RDWR.              
EBUSY     o             Someone else has an O_EXCL lock on this device.       
EBUSY     w             With mmap-ed IO, the reserved buffer already in use.  
EBUSY     <some ioctls> Attempt to change something (e.g. reserved buffer     
                        size) when the resource was in use.                   
EDOM      w,SG_IO       Too many requests queued against this file            
                        descriptor. Limit is SG_MAX_QUEUE active requests.    
                        If sg_header interface is being used then the         
                        default queue depth is 1. Use SG_SET_COMMAND_Q        
                        ioctl() to increase it.                               
EFAULT    w,r,SG_IO     Pointer to user space invalid.                        
          <most ioctls>                                                       
EINVAL    w,r           Size given as 3rd argument not large enough for the   
                        sg_io_hdr_t structure. Both direct and mmap-ed IO     
                        selected.                                             
EIO       w             Size given as 3rd argument less than size of old      
                        header structure (sg_header). Additionally a write()  
                        with the old header will yield this error for most    
                        detected malformed requests.                          
EIO       r             A read() with the older sg_header structure yields    
                        this value for some errors that it detects.           
EINTR     o             While waiting for the O_EXCL lock to clear this call  
                        was interrupted by a signal.                          
EINTR     r,SG_IO       While waiting for the request to finish this call     
                        was interrupted by a signal.                          
EINTR     w             [Very unlikely] While waiting for an internal SCSI    
                        resource this call was interrupted by a signal.       
EMSGSIZE  w,SG_IO       SCSI command size ('cmd_len') was too small           
                        (i.e. < 6) or too large                               
ENODEV    o             Tried to open() a file with no associated device.     
                        [Perhaps sg has not been built into the kernel or     
                        is not available as a module?]                        
ENODEV    o,w,r,SG_IO   SCSI device has detached, awaiting cleanup.           
                        User should close fd. Poll() will yield POLLHUP.      
ENOENT    o             Given filename not found.                             
ENOMEM    o             [Very unlikely] Kernel was not even able to find      
                        enough memory for this file descriptor's context.     
ENOMEM    w,SG_IO       Kernel unable to find memory for internal buffers.    
                        This is usually associated with indirect IO.          
                        For mmap-ed IO 'dxfer_len' greater than reserved      
                        buffer size.                                          
                        Lower level (adapter) driver does not support enough  
                        scatter gather elements for requested data transfer.  
ENOSYS    w,SG_IO       'interface_id' of a sg_io_hdr_t object was _not_ 'S'. 
ENXIO     o             "remove-single-device" may have removed this device.  
ENXIO     o, w,r,SG_IO  Internal error (including SCSI sub-system busy doing  
                        error processing - e.g. SCSI bus reset). When a       
                        SCSI device is offline, this is the response. This    
                        can be bypassed by opening O_NONBLOCK.                
EPERM     o             Can't use O_EXCL when open()ing with O_RDONLY         
EPERM     w,SG_IO       File descriptor open()-ed O_RDONLY but O_RDWR         
          <some ioctls> access mode needed for this operation.                
-----------------------------------------------------------------------------

Chapter 8. Ioctl()s

The Linux SCSI upper level drivers, including sg, have a "trickle down" ioctl
() architecture. This means that ioctl()s whose request value (i.e. the
second argument) is not understood by the upper level driver, are passed down
to the SCSI mid-level. Those ioctl()s that are not understood by the mid
level driver are passed down to the lower level (adapter) driver. If none of
the 3 levels understands the ioctl() request value then -1 is returned and
EINVAL is placed in errno. By convention the beginning of the request value's
symbolic name indicates which level will respond to the ioctl(). For example,
request values starting with "SG_" are processed by the sg driver while those
starting with "SCSI_" are processed by the mid level.

Most of the sg ioctl()s read or write information via a pointer given as the
third argument to the ioctl() call and return 0 on success. A few of the
older ioctl()s that get a value from the driver return that value as the
result of the ioctl() call (e.g. ioctl(SG_GET_TIMEOUT) ).

All sg driver ioctl()s are listed below. They all start with "SG_". They are
followed by several interesting SCSI mid level ioctl()s which start with
"SCSI_IOCTL_". The sg ioctl()s are roughly in alphabetical order (with _SET_,
_GET_ and _FORCE_ ignored). Since ioctl(SG_IO) is a complete SCSI command
request/response sequence then it is listed first.
-----------------------------------------------------------------------------

8.1. SG_IO

SG_IO 0x2285. The idea is deceptively simple: just hand a sg_io_hdr_t object
to an ioctl() and it will return when the SCSI command is finished. It is
logically equivalent to doing a write() followed by a blocking read(). The
word "blocking" here implies the read() will wait until the SCSI command is
complete.

The same file descriptor can be used both for SG_IO synchronous calls and the
write() read() sequences at the same time. The sg driver makes sure that the
response to a SG_IO call will never accidentally be fetched by a read(). Even
though a single file descriptor can be shared in this manner, it is probably
more sensible (and results in cleaner code) if separate file descriptors to
the same SCSI device are used in this case.

It is possible that the wait for the command completion is interrupted by a
signal. In this case the SG_IO call will yield an EINTR error. This is
reasonably complex to handle and is discussed in the ioctl
(SG_SET_KEEP_ORPHAN) description below. The following SCSI commands will be
permitted by SG_IO when the sg file descriptor was opened O_RDONLY:

  * TEST UNIT READY
   
  * REQUEST SENSE
   
  * INQUIRY
   
  * READ CAPACITY
   
  * READ BUFFER
   
  * READ(6) (10) and (12)
   
  * MODE SENSE(6) and (10)
   
  * LOG SENSE
   

All commands to SCSI device type SCANNER are accepted. Other cases yield an
EPERM error. Note that the write() read() interface must have the sg file
descriptor open()-ed with O_RDWR as write permission is required by Linux to
execute a write() system call.

The ability of the SG_IO ioctl() to issue certain SCSI commands has led to
some relaxation on file descriptors open()ed "read-only" compared with the
version 2 sg driver. The open() call will now attempt to allocate a reserved
buffer for all newly opened file descriptors. The ioctl(SG_SET_RESERVED_SIZE)
will now work on "read-only" file descriptors.
-----------------------------------------------------------------------------

8.2. SG_GET_ACCESS_COUNT

SG_GET_ACCESS_COUNT 0x2289. This ioctl() yields the access count maintained
by the mid level for this SCSI device. This number is incremented by each
open() call done by the upper level SCSI drivers (i.e. sd, sr, st and sg) and
decremented by those drivers' release(). [A driver's release() corresponds to
the last close() on a file descriptor, or is supplied by the kernel when a
process is aborted.] Each SCSI device has a separate access count.
-----------------------------------------------------------------------------

8.3. SG_SET_COMMAND_Q (and _GET_)

SG_SET_COMMAND_Q 0x2271 [_GET_ 0x2270] . The default it the original sg
driver was not to allow commands to be queued on the same file descriptor
(actually it was more restrictive, commands could not be queued on a SCSI
device). The version 2 sg driver kept this action as its default (for
backward compatibility) and offered these ioctl()s to change and monitor the
command queuing state.
-----------------------------------------------------------------------------

8.4. SG_SET_DEBUG

SG_SET_DEBUG 0x227e. The third argument is assumed to point to an int. The
default value is 0. If this call is made pointing to an int greater than 0
then any SCSI request that is issued that results in the SCSI status of
CHECK_CONDITION (or COMMAND_TERMINATED) will cause a message to be sent to
the log (and perhaps the console). The message is information derived from
the sense buffer (i.e. the SCSI error message) and it is prefixed with
"sg_cmd_done_bh".

The other actions of debug mode performed in version 2 of the sg driver have
been removed as they are no longer needed. The internal state of the sg
driver can now be found by viewing the output of cat /proc/scsi/sg/debug.
-----------------------------------------------------------------------------

8.5. SG_EMULATED_HOST

SG_EMULATED_HOST 0x2203. Assumes 3rd argument points to an int and outputs a
flag indicating whether the host (adapter) is connected to a "real" SCSI bus
or is an emulated one (e.g. ide-scsi or usb storage device driver). A value
of 1 means emulated while 0 is not. [To check: is IEEE1394 a "real" SCSI
serial bus?]
-----------------------------------------------------------------------------

8.6. SG_SET_KEEP_ORPHAN (and _GET_)

SG_SET_KEEP_ORPHAN 0x2287 [_GET_ 0x2288]. These ioctl()s allow the setting
and reading of the "keep_orphan" flag. This controls what happens to the
request associated with a SG_IO ioctl() that is interrupted (i.e. errno is
EINTR). The default action is to drop the response as soon as it is received.
This corresponds to the "keep_orphan" flag being 0. When the "keep_orphan"
flag is 1 then the response is transformed in such a way that it can be
fetched by a read(). This is the only circumstance in which a request sent by
a SG_IO ioctl() can have the associated response fetched by a read().
-----------------------------------------------------------------------------

8.7. SG_SET_FORCE_LOW_DMA

SG_SET_FORCE_LOW_DMA 0x2279. Assumes 3rd argument points to an int containing
0 or 1. 0 (default) means sg decides whether to use memory above 16 Mbyte
level (on i386) based on the host adapter being used by this SCSI device.
Typically PCI SCSI adapters will indicate they can DMA to the whole 32 bit
address space. If 1 is given then the host adapter is overridden and only
memory below the 16MB level is used for DMA. A requirement for this should be
extremely rare. If the "reserved" buffer allocated on open() is not in use
then it will be de-allocated and re-allocated under the 16MB level (and the
latter operation could fail yielding ENOMEM). Only the current file
descriptor is affected.
-----------------------------------------------------------------------------

8.8. SG_GET_LOW_DMA

SG_GET_LOW_DMA 0x227a. Assumes 3rd argument points to an int and places 0 or
1 in it. 0 indicates the whole 32 bit address space is being used for DMA
transfers on this file descriptor. 1 indicates the memory below the 16MB
level (on i386) is being used (and this may be the case because the host
adapters setting has been overridden by SG_SET_FORCE_LOW_DMA,1 .
-----------------------------------------------------------------------------

8.9. SG_NEXT_CMD_LEN

SG_NEXT_CMD_LEN 0x2283. This ioctl() is not required with sg_io_hdr structure
since command length is set explicitly for every command. Assumes 3rd
argument is pointing to an int. The value of the int (if > 0) will be used as
the SCSI command length of the next SCSI command sent to a write() using the
sg_header interface. After that write() the SCSI command length logic is
reset to use automatic length detection (i.e. depending on SCSI command group
and the 'twelve_byte' field). If the current SCSI command length maximum of
16 is exceeded then the affected write() will yield an EDOM error. Giving
this ioctl() a value of 0 will set automatic length detection for the next
write(). N.B. Only the following write() on this fd is affected by this ioctl
().
-----------------------------------------------------------------------------

8.10. SG_GET_NUM_WAITING

SG_GET_NUM_WAITING 0x227d. Assumes 3rd argument points to an int and places
the number of packets waiting to be read in it. Only those requests that have
been issued by a write() and are now available to be read() are counted. In
other words any ioctl(SG_IO) operations underway on this file descriptor will
not effect this count [14].
-----------------------------------------------------------------------------

8.11. SG_SET_FORCE_PACK_ID

SG_SET_FORCE_PACK_ID 0x227b. Assumes 3rd argument is pointing to an int. 0
(default) instructs read() to return the oldest (written) packet if multiple
packets are waiting to be read. 1 instructs read() to view the sg_io_hdr::
pack_id (or sg_header::pack_id) as input and return the oldest packet
matching that pack_id or wait until it arrives. If the file descriptor is in
O_NONBLOCK state, rather than wait this ioctl() will yield EAGAIN. As a
special case the pack_id of -1 given to read() in the mode will match the
oldest packet. Only the current file descriptor is affected by this command.
-----------------------------------------------------------------------------

8.12. SG_GET_PACK_ID

SG_GET_PACK_ID 0x227c. Assumes 3rd argument points to an int and places the
pack_id of the oldest (written) packet in it. If no packet is waiting to be
read then yields -1.
-----------------------------------------------------------------------------

8.13. SG_GET_REQUEST_TABLE

SG_GET_REQUEST_TABLE 0x2286. This ioctl outputs an array of information about
the status of requests associated with the current file descriptor. Its 3rd
argument should point to memory large enough to receive SG_MAX_QUEUE objects
of the sg_req_info_t structure. This structure has the following members:
        req_state                                                            
            0 -> request not in use                                          
            1 -> request has been sent, but is not finished (i.e. it is      
                 between stages 1 and 2 in the "theory of operation")        
            2 -> request is ready to be read() (i.e. it is between stages    
                 2 and 3 in the "theory of operation")                       
        orphan                                                               
            0 -> normal request                                              
            1 -> request sent by SG_IO ioctl() which has been interrupted    
                 by a signal                                                 
        sg_io_owned                                                          
            0 -> request sent by a write()                                   
            1 -> request sent by a SG_IO ioctl()                             
        problem                                                              
            0 -> no problem (or 1 == req_state)                              
            1 -> req_state is 2 and either masked_status, host_status or     
                 driver_status is non-zero                                   
        duration                                                             
            [if 1 == req_state] time since request was sent (in millisecs)   
            [if 2 == req_state] duration of request (in millisecs). Clock    
                 is stopped when stage 2 in "theory of operation" is         
                 reached                                                     
        pack_id                                                              
        usr_ptr                                                              
            these are user provided values in the sg_io_hdr_t (or            
            struct sg_header) that sent the request                          
-----------------------------------------------------------------------------

8.14. SG_SET_RESERVED_SIZE (and _GET_ )

SG_SET_RESERVED_SIZE 0x2275 [_GET_ 0x2272]. Both ioctl()s assume the 3rd
argument is pointing to an int.

For ioctl(SG_SET_RESERVED_SIZE) the value will be used to request a new
reserved buffer of that size. The previous reserved buffer is freed (if it is
not in use; if it was in use then the ioctl() fails and EBUSY is placed in
errno). A new reserved buffer is then allocated and its actual size can be
found by calling the ioctl(SG_GET_RESERVED_SIZE). The reserved buffer is then
used for DMA purposes by subsequent write() and ioctl(SG_IO) commands if it
is not already in use and if the write() is not calling for a buffer size
larger than that reserved. The reserved buffer may well be a series of kernel
buffers if the adapter supports scatter-gather. Large buffers can be
requested (e.g. 4 MB) but not necessarily granted. Once a mmap() call has
been made on a sg file descriptor, subsequent calls to this ioctl() will fail
with EBUSY placed in errno.

In the case of ioctl(SG_GET_RESERVED_SIZE) the size in bytes of the reserved
buffer from open() or the most recent SG_SET_RESERVED_SIZE ioctl() call on
this fd. The result can be 0 if memory is very tight. In this case it may not
be wise to attempt something like burning a CD on this file descriptor.
-----------------------------------------------------------------------------

8.15. SG_SCSI_RESET

SG_SCSI_RESET 0x2284. Assumes 3rd argument points to an int. That int should
be one of the following defined in the sg.h header:

  * SG_SCSI_RESET_NOTHING (0x0): can be used to poll the device after a reset
    has been issued to see if it has returned to the normal state. If it is
    still being reset or it is offline then EBUSY will be placed in errno,
   
  * SG_SCSI_RESET_DEVICE (0x1): issues a reset to the SCSI device associated
    with the current sg file descriptor,
   
  * SG_SCSI_RESET_BUS (0x2): issues a reset to the SCSI bus that contains the
    device associated with the current sg file descriptor. This will usually
    have an adverse effect on any other SCSI device sharing this SCSI bus,
    especially if it was in the middle of an operation,
   
  * SG_SCSI_RESET_HOST (0x3): issues a reset to the host that controls the
    SCSI bus that contains the device associated with the current sg file
    descriptor. This operation can have an adverse effect on any SCSI device
    that is connected to this host.
   

The reset options are in ascending order of severity. Not all levels are
supported by all linux lower level drivers. Most lower level (adapter)
drivers support the SCSI bus reset. These boards often issue a SCSI bus reset
during their initialization.

Unfortunately this ioctl() doesn't currently do much (but may in the future
after other issues are resolved). Yields an EBUSY error if the SCSI bus or
the associated device is being reset when this ioctl() is called, otherwise
returns 0. N.B. In some recent distributions there is a patch to the SCSI mid
level code that activates this ioctl. Check your distribution.
-----------------------------------------------------------------------------

8.16. SG_GET_SCSI_ID

SG_GET_SCSI_ID 0x2276. Assumes 3rd argument is pointing to an object of type
Sg_scsi_id (see sg.h) and populates it. That structure contains ints for
host_no, channel, scsi_id, lun, scsi_type, allowable commands per lun and
queue_depth. Most of this information is available from other sources (e.g.
SCSI_IOCTL_GET_IDLUN and SCSI_IOCTL_GET_BUS_NUMBER) but tends to be awkward
to collect. Allowable commands per lun and queue_depth give an insight to the
command queuing capabilities of the adapters and the device. The latter
overrides the former (logically) and the former is only of interest if it is
equal to queue_depth which probably indicates the device does not support
queuing commands (e.g. most scanners).
typedef struct sg_scsi_id { /* used by SG_GET_SCSI_ID ioctl() */              
    int host_no;        /* as in "scsi<n>" where 'n' is one of 0, 1, 2 etc */ 
    int channel;                                                              
    int scsi_id;        /* scsi id of target device */                        
    int lun;                                                                  
    int scsi_type;      /* TYPE_... defined in scsi/scsi.h */                 
    short h_cmd_per_lun;/* host (adapter) maximum commands per lun */         
    short d_queue_depth;/* device (or adapter) maximum queue length */        
    int unused[2];      /* probably find a good use, set 0 for now */         
} sg_scsi_id_t;                                                               
-----------------------------------------------------------------------------

8.17. SG_GET_SG_TABLESIZE

SG_GET_SG_TABLESIZE 0x227F. Assumes 3rd argument points to an int and places
the maximum number of scatter gather elements supported by the host adapter
associated with the current SCSI device. 0 indicates that the adapter does
support scatter gather.
-----------------------------------------------------------------------------

8.18. SG_GET_TIMEOUT

SG_GET_TIMEOUT 0x2202. Ignores its 3rd argument and _returns_ the timeout
value (which will be >= 0 ). The unit of this timeout is "jiffies" which are
currently 10 millisecond intervals on i386 (less on an alpha). Linux supplies
a manifest constant HZ which is the number of "jiffies" in 1 second. This
ioctl() is not relevant to the sg version 3 driver because timeouts are
specified explicitly for each command in the sg_io_hdr structure.
-----------------------------------------------------------------------------

8.19. SG_SET_TIMEOUT

SG_SET_TIMEOUT 0x2201. Assumes 3rd argument points to an int containing the
new timeout value for this file descriptor. The unit is a "jiffy". Packets
that are already "in flight" will not be affected. The default value is set
on open() and is SG_DEFAULT_TIMEOUT (defined in sg.h). This default is
currently 1 minute and may not be long enough for formats. Negative values
will yield an EIO error. This ioctl() is not relevant to the sg version 3
driver because timeouts are specified explicitly for each command in the
sg_io_hdr structure. Only when the sg_header structure is used is the timeout
inherited from this value (help on a per file descriptor basis).
-----------------------------------------------------------------------------

8.20. SG_SET_TRANSFORM

SG_SET_TRANSFORM 0x2204. Only is meaningful when SG_EMULATED host has yielded
1 (i.e. the low-level is the ide-scsi device driver); otherwise an EINVAL
error occurs. The default state is to _not_ transform SCSI commands to the
corresponding ATAPI commands but pass them straight through as is. [Only
certain classes of SCSI commands need to be transformed to their ATAPI
equivalents.] The third argument is interpreted as an integer. When it is
non-zero then a flag is set inside the ide-scsi driver that transforms
subsequent commands sent to this driver. When zero is passed as the 3rd
argument to this ioctl then the flag within the ide-scsi driver is cleared
and subsequent commands are not transformed. Beware, this state will affect
all devices (and hence all related sg file descriptors) associated with this
ide-scsi "bus".
-----------------------------------------------------------------------------

8.21. SG_GET_TRANSFORM

SG_GET_TRANSFORM 0x2205. Third argument is ignored. Only is meaningful when
SG_EMULATED host has yielded 1 (i.e. the low-level is the ide-scsi device
driver); otherwise an EINVAL error occurs. Returns 0 to indicate _not_
transforming SCSI to ATAPI commands (default). Returns 1 when it is
transforming them.
-----------------------------------------------------------------------------

8.22. Sg ioctls removed in version 3

Some seldom used ioctl()s introduced in the sg 2.x series drivers have been
withdrawn. They are:

  * SG_SET_UNDERRUN_FLAG (and _GET_) [use 'resid' in this new interface]
   
  * SG_SET_MERGE_FD (and _GET) [added complexity with little benefit]
   

-----------------------------------------------------------------------------
8.23. SCSI_IOCTL_GET_IDLUN

SCSI_IOCTL_GET_IDLUN 0x5382. This ioctl takes a pointer to a "struct
scsi_idlun" object as its third argument. The "struct scsi_idlun" is not
visible to user applications. To use this, that structure needs to be
replicated in the user's program. Something like:
typedef struct my_scsi_idlun {                                               
    int four_in_one;    /* 4 separate bytes of info compacted into 1 int */  
    int host_unique_id; /* distinguishes adapter cards from same supplier */ 
} My_scsi_idlun;                                                             
"four_in_one" is made up as follows:
(scsi_device_id | (lun << 8) | (channel << 16) | (host_no << 24))            
These 4 components are assumed (or masked) to be 1 byte each. These are the
four numbers that the SCSI subsystem uses to index devices, often written as
"<host_no, channel, scsi_id, lun>". The 'host_unique_id' assigns a different
number to each controller from the same manufacturer/low-level device driver.
Most of the information provided by this command is more easily obtained from
SG_GET_SCSI_ID.

The 'host_no' element is a change in lk 2.4 kernels. [In the lk 2.2 series
and earlier, it was 'low_inode & 0xff' from the procfs entry corresponding to
the host.] This change makes the use of the SCSI_IOCTL_GET_BUS_NUMBER ioctl()
superfluous.

The advantage of this ioctl() is that it can be called on any SCSI file
descriptor.
-----------------------------------------------------------------------------

8.24. SCSI_IOCTL_GET_PCI

SCSI_IOCTL_GET_PCI 0x5387. Yields the PCI slot name (pci_dev::slot_name)
associated with the lower level (adapter) driver that controls the current
device. Up to 8 characters are output to the location pointed to by 'arg'. If
the current device is not controlled by a PCI device then errno is set to
ENXIO. [This ioctl() was introduced in lk 2.4.4]
-----------------------------------------------------------------------------

8.25. SCSI_IOCTL_PROBE_HOST

SCSI_IOCTL_PROBE_HOST 0x5385. This command should be given a pointer to a
'char' array as its 3rd argument. That array should be at least sizeof(int)
long and have the length of the array as an 'int' at the beginning of the
array! An ASCII string of no greater than that length containing
"information" (or the name) of SCSI host (i.e. adapter) associated with this
file descriptor is then placed in the given byte array. N.B. A trailing '\0'
may need to be put on the output string if it has been truncated by the input
length. Returns 1 if host is present, 0 if it is not and a negative value if
there is an error.
-----------------------------------------------------------------------------

8.26. SCSI_IOCTL_SEND_COMMAND

SCSI_IOCTL_SEND_COMMAND 0x1. This ioctl() also offers a "pass through" SCSI
command capability which is a subset of what is offered by the sg driver.

The structure that we are passed should look like:
   struct sdata {                                                            
    unsigned int inlen;     [i] Length of data written to device             
    unsigned int outlen;    [i] Length of data read from device              
    unsigned char cmd[x];   [i] SCSI command (6 <= x <= 16)                  
                            [o] Data read from device starts here            
                            [o] On error, sense buffer starts here           
    unsigned char wdata[y]; [i] Data written to device starts here           
   };                                                                        
Notes:

  * The SCSI command length is determined by examining the 1st byte of the
    given command [15] . There is no way to override this.
   
  * Data transfers are limited to PAGE_SIZE (4K on i386, 8K on alpha).
   
  * The length (x + y) must be at least OMAX_SB_LEN bytes long to accommodate
    the sense buffer when an error occurs. The sense buffer is truncated to
    OMAX_SB_LEN (16) bytes so that old code will not be surprised.
   
  * If a Unix error occurs (e.g. ENOMEM) then the user will receive a
    negative return and the Unix error code in 'errno'. If the SCSI command
    succeeds then 0 is returned. Positive numbers returned are the compacted
    SCSI error codes (4 bytes in one int) where the lowest byte is the SCSI
    status. See the drivers/scsi/scsi.h file for more information on this.
   

 
-----------------------------------------------------------------------------

Chapter 9. Direct and Mmap-ed IO

The normal action of the sg driver for a read operation (from a device) is to
request the lower level (adapter) driver to DMA [16] data into kernel buffers
that the sg driver manages. The sg driver will then copy the contents of its
buffers into the user space. [This sequence is reversed for a write operation
(towards a device)]. While this double handling of data is obviously
inefficient it does decouple some hardware issues from user applications. For
these and historical reasons the "double-buffered" IO remains the default for
the sg driver.

Both "direct" and "mmap-ed" IO are techniques that permit the data to be
DMA-ed directly from the lower level (adapter) driver into the user
application (vice versa for write operations). Both techniques result in
faster speed, smaller latencies and lower CPU utilization but come at the
expense of complexity (as always). For example the Linux kernel must not
attempt to swap out pages in a user application that a SCSI adapter is busy
DMA-ing data into.
-----------------------------------------------------------------------------

9.1. Direct IO

Direct IO uses the kiobuf mechanism [see the Linux Device Drivers book] to
manipulate memory allocated within the user space so that a lower level
(adapter) driver can DMA directly to or from that user space memory. Since
the user can give a different data buffer to each SCSI command passed through
the sg interface then the kiobuf mechanism needs to setup its structures (and
undo that setup) for each SCSI command. [17] Direct IO is available as an
option in sg 3.1.18 (before that the sg driver needed to be recompiled with
an altered define). Direct IO support is designed in such a way that if it is
requested and cannot be performed then the command will still be performed
using indirect IO. If direct IO is requested and has been performed then the
SG_INFO_DIRECT_IO bit will be set in the 'info' member of the sg_io_hdr_t
control structure after the request has been completed. Direct IO is not
supported on ISA SCSI adapters since they only can address a 24 bit address
space.

One limit on direct IO is that sg_io_hdr_t::iovec_count==0. So the user
cannot (currently) use application level scatter gather and direct IO on the
same request.

For direct IO to be worthwhile, a reasonable amount of data should be
requested for data transfer. For transfers less than 8 KByte it is probably
not worth the trouble. On the other hand "locking down" a multiple 512 KB
blocks of data for direct IO could adversely impact overall system
performance. Remember that for the duration of a direct IO request, the data
transfer buffer is mapped to a fixed memory location and locked in such a way
that it won't be swapped out. This can "cramp the style" of the kernel if it
is overdone.

Prior to sg 3.1.18 the direct IO code was commented out with the
"SG_ALLOW_DIO" define. In sg 3.1.18 (available for lk 2.4.2 and later) the
direct IO code is active but is defaulted off by a run time value. This value
can be accessed via the "proc" file system at /proc/scsi/sg/allow_dio .
Direct IO is enabled when a user with root permissions writes "1" to that
file: echo 1 > /proc/scsi/sg/allow_dio . If SG_FLAG_DIRECT_IO is set in
sg_io_hdr::flags but /proc/scsi/sg/allow_dio holds "0" then indirect IO will
be performed (and this is indicated by ((sg_io_hdr::info &
SG_INFO_DIRECT_IO_MASK) == SG_INFO_INDIRECT_IO) after the request is
completed).
-----------------------------------------------------------------------------

9.2. Mmap-ed IO

Memory-mapped IO takes a different approach from direct IO to removing the
extra data copy performed by normal ("indirect") IO. With mmap-ed IO the
application calls the mmap() system call to memory map sg's reserved buffer.
The sg driver maintains one reserved buffer per file descriptor. The default
size of the reserved buffer is 32 KB and it can be changed with the ioctl
(SG_SET_RESERVED_SIZE). The mmap() system call only needs to be called once
prior [18] to doing mmap-ed IO. For more details on the mmap() see Section
7.6. An application indicates that it wants mmap-ed on a SCSI request by
setting the SG_FLAG_MMAP_IO value in 'flags'.

Since there is only reserved buffer per sg file descriptor then only one
mmap-ed IO command can be active at one time. In order to perform command
queuing with mmap-ed IO, an application will need to open() multiple file
descriptors to the same SCSI device. With mmap-ed IO the various status
values and the sense buffer (if required) are conveyed back to an application
in the same fashion as normal ("indirect") IO.

Mmap-ed has very low per command latency since the reserved buffer mapping
only needs to be done once per file descriptor. Also the reserved buffer is
set up by the sg driver to aid the efficient construction of the internal
scatter gather list used by the lower level (adapter) driver for DMA
purposes. This tends to be more efficient than the user memory that direct IO
requires the sg driver to process into an internal scatter gather list. So on
both these counts, mmap-ed IO has the edge over direct IO.
-----------------------------------------------------------------------------

Chapter 10. Driver and module initialization

The size of the default reserved buffer can be specified when the sg driver
is loaded. If it is built into the kernel then use:
    sg_def_reserved_size=<n>                                                 
on the boot line (only supported in 2.4 kernels).

If sg is a module, it can be loaded with modprobe in either manner:
    modprobe sg                                                              
    modprobe sg def_reserved_size=<n>                                        
In the second case "<n>" is an integer (non negative). The default value is
the value of the SG_DEF_RESERVED_SIZE defined in sg.h . This is currently
32768.

If sg is a module, it can be unloaded with rmmod like this:
    rmmod sg                                                                 
However if there is a file descriptor still open with the sg driver (or there
is an outstanding request awaiting a response) then the sg module is
considered to be busy and can't be unloaded.
-----------------------------------------------------------------------------

Chapter 11. Sg and the "proc" file system

The sg driver provides information about the SCSI subsystem and the current
internal state of the sg driver in the /proc/scsi/sg directory. Some sg
driver defaults can be changed by super user writing values to these "pseudo"
files [19].

The following files which are readable by all:
allow_dio       0 indicates direct IO disable, 1 for enabled                 
debug           debug information including active request data              
def_reserved_size  default buffer size reserved for each file descriptor     
devices         one line of numeric data per device                          
device_hdr      single line of column names corresponding to 'devices'       
device_strs     one line of vendor, product and rev info per device          
hosts           one line of numeric data per host                            
host_hdr        single line of column names corresponding to 'hosts'         
host_strs       one line of host information (string) per host               
version         sg version as a number followed by a string representation   

Each line in 'devices' and 'device_strs' corresponds to an sg device. For
example the first line corresponds to /dev/sg0. The line number (origin 0)
also corresponds to the sg minor device number. This mapping is local to sg
and is normally the same as given by th cat /proc/scsi/scsi command which is
reported by the SCSI mid level driver. The two mappings may diverge when
'remove-single-device' and 'add-single-device' are used (see the 
SCSI-2.4-HOWTO for more information).

Each line in 'hosts' and 'host_strs' corresponds to a SCSI host. For example
the first line corresponds to the host normally represented as "scsi0". This
mapping is invariant across the SCSI sub system. [So these entries could
arguably be migrated to the mid level.]

The column headers in 'device_hdr' are given below. If the device is not
present (and one is present after it) then a line of "-1" entries is output.
Each entry is separated by a whitespace (currently a tab):
host            host number (indexes 'hosts' table, origin 0)                
chan            channel number of device                                     
id              SCSI id of device                                            
lun             Logical Unit number of device                                
type            SCSI type (e.g. 0->disk, 5->cdrom, 6->scanner)               
opens           number of opens (by sd, sr, sr and sg) at this time          
depth           maximum queue depth supported by device                      
busy            number of commands being processed by host for this device   
online          1 indicates device is in normal online state, 0->offline     
A SCSI device is set offline by the SCSI mid level when it decides that a
device is no longer responding (e.g. the device does not respond to an SCSI
INQUIRY command after it has been reset).

The column headers in 'host_hdr' are given below. Each entry is separated by
a whitespace (currently a tab):
uid             unique id (non-zero if multiple hosts of same type)           
busy            number of commands being processed for this host              
cpl             maximum number of command per lun (may be 0 if "device depth" 
                is given                                                      
sgat            maximum elements of scatter gather the adapter (pseudo)       
                DMA can accommodate                                           
isa             0 -> non-ISA adapter, 1 -> ISA adapter. ISA adapters are      
                assumed to have a 24 bit address bus limit (16 MB).           
emu             0 -> real SCSI adapter, 1 -> emulated SCSI adapter            
                (e.g. ide-scsi device driver)                                 

The 'def_reserved_size' is both readable and writable. It is only writable by
root. It is initialized to the value of DEF_RESERVED_SIZE in the "sg.h" file.
Values between 0 and 1048576 (which is 2 ** 20) are accepted and can be set
from the command line with the following syntax:
$ echo "262144" > /proc/scsi/sg/def_reserved_size                            
Note that the actual reserved buffer associated with a file descriptor could
be less than 'def_reserved_size' if appropriate memory is not available. If
the sg driver is compiled into the kernel (but not when it is a module) this
value can also be read at /proc/sys/kernel/sg-big-buff . This latter feature
is deprecated.

The 'allow_dio' is both readable and writable. It is only writable by root.
When it is 0 (default) any request to do direct IO (i.e. by setting
SG_FLAG_DIRECT_IO) will be ignored and indirect IO will be done instead.
-----------------------------------------------------------------------------

11.1. /proc/scsi/sg/debug

This appendix explains the output from the /proc/scsi/sg/debug which is
typically viewed by the command cat /proc/scsi/sg/debug. Below is the
(slightly abridged) output while this command: sgp_dd if=/dev/sg0 of=/dev/
null bs=512 is executing on the system. That sgp_dd command is using command
queuing to read a disk (and the data is written to /dev/null which forgets
it).
$ cat /proc/scsi/sg/debug                                                    
dev_max(currently)=7 max_active_device=1 (origin 1)                          
 scsi_dma_free_sectors=416 sg_pool_secs_aval=320 def_reserved_size=32768     
 >>> device=sg0 scsi0 chan=0 id=0 lun=0   em=0 sg_tablesize=255 excl=0       
   FD(1): timeout=60000ms bufflen=65536 (res)sgat=2 low_dma=0                
   cmd_q=1 f_packid=1 k_orphan=0 closed=0                                    
     fin: id=3949312 blen=65536 dur=10ms sgat=2 op=0x28                      
     act: id=3949440 blen=65536 t_o/elap=60000/10ms sgat=2 op=0x28           
     rb>> act: id=3949568 blen=65536 t_o/elap=60000/10ms sgat=2 op=0x28      
     act: id=3949696 blen=65536 t_o/elap=60000/0ms sgat=2 op=0x28            
Those items output above that are significant to user applications are
described below.

Broadly speaking the above output shows everything is going fine. Four SCSI
READ(10) commands (SCSI opcode 0x28) for different ids are underway. Three
commands are active while one is finished with its status and data read() and
the request structure is pending deletion. The "id" corresponds to the
pack_id given in the sg_io_hdr structure (or the sg_header structure). In the
case if sgp_dd the pack_id value is the block number being given to the SCSI
READ (or WRITE). You will notice the 4 ids are 128 apart.

The ">>>" line shows the sg device name followed by the linux scsi adapter,
channel, scsi id and lun numbers. The "em=" argument indicates whether the
driver emulates a SCSI HBA. The ide-scsi driver would set "em=1". The
"sg_tablesize" is the maximum number of scatter gather elements supported by
the adapter driver. The "excl=0" indicates no sg open() on this device is
currently using the O_EXCL flag.

The next two lines starting with "FD(1)" supply data about the first (and
only in this case) open file descriptor on /dev/sg0. The default timeout is
60 seconds however this is only significant if the sg_header interface is
being used since the sg_io_hdr interface explicits sets the timeout on a per
command basis. "bufflen=65536" is the reserved buffer size for this file
descriptor. The "(res)sgat=2" indicates that this reserved buffer requires 2
scatter gather elements. The "low_dma" will be set to 1 for ISA HBAs
indicating only the bottom 16 MB of RAM can be used for its kernel buffers.
The "cmd_q=1" indicates command queuing is being allowed. The "f_packid=1"
indicates the SG_SET_FORCE_PACK_ID mode is on. The "k_orphan" value is 1 in
the rare cases when a SG_IO is interrupted while a SCSI command is "in
flight". The "closed" value is 1 in the rare cases the file descriptor has
been closed while a SCSI command is "in flight".

Each line indented with 5 spaces represents a SCSI command. The state of the
command is either:

  * prior: command hasn't been sent to mid level (rare)
   
  * act: mid level (adapter driver or device) has command
   
  * rcv: sg bottom half handler has received response to this command
    (awaiting read() or SG_IO ioctl to complete
   
  * fin: SCSI response (and optionally data) has been or is being read but
    the command data structures have not been removed
   

These states can be optionally prefixed by "rb>>" which means the reserved
buffer is being used, "dio>>" which means this command is using direct IO, or
"mmap>>" which means that mmap-ed IO is being used by this command. The "id"
is the pack_id from this command's interface structure. The "blen" is the
buffer length used by the data transfer associated with this command. For
commands that a response has been received "dur" shows its duration in
milliseconds. For commands still "in flight" an indication of "t_o/elap=60000
/10ms" means this command has a timeout of 60000 milliseconds of which 10
milliseconds has already elapsed. The "sgat=2" argument indicates that this
command's "blen" requires 2 scatter gather elements. The "op" value is the
hexadecimal value of the SCSI command being executed.

If sg has lots of activity then the "debug" output may span many lines and in
some cases appear to be corrupted. This occurs because procfs requests fixed
buffer sizes of information and, if there is more data to output, returns
later to get the remainder. The problem with this strategy is that sg's
internal state may have changed. Rather than double buffering, the sg driver
just continues from the same offset. While procfs is very useful, ioctl()s
(such as SG_GET_REQUEST_TABLE) still have their place.
-----------------------------------------------------------------------------

Chapter 12. Asynchronous usage of sg

It is recommended that synchronous sg-based applications use the new SG_IO
ioctl() command. Existing applications (which are mainly synchronous) can
continue to use the older sg_header based interface which is still supported.

Asynchronous usage allows multiple SCSI commands to be queued up to the
device. If the device supports command queuing then there can be a major
performance gain. Even if the device doesn't support command queuing (or is
temporarily busy) then queuing up commands in the mid level or the host
driver can be a minor performance win (since there will be a lower latency to
transmit the next command when the device becomes free).

Asynchronous usage usually starts with setting the O_NONBLOCK flag on open()
[or thereafter by using the fcntl(fd, SETFD, old_flags | O_NONBLOCK) system
call]. A similar effect can be obtained without using O_NONBLOCK when POSIX
threads are used. There are several strategies that can then be followed:

 1. set O_NONBLOCK and use a poll() loop
   
 2. set O_NONBLOCK and use SIGPOLL signal to alert app when readable
   
 3. use POSIX threads and a single sg file descriptor
   
 4. use POSIX threads and multiple sg file descriptors to same device
   

The O_NONBLOCK flag also permits open(), write() and read() [but not the
ioctl(SG_IO)] to access a SCSI device even though it has been marked offline.
SCSI devices are marked offline when they are detected and don't respond to
the initial SCSI commands as expected, or, some SCSI error condition is
detected on that device and the mid level error recovery logic is unable to
"resurrect" the device. A SCSI device that is being reset (and still
settling) could be accessed during this period by using the O_NONBLOCK flag;
this could lead to unexpected behaviour so the sg user should take care.

In Linux SIGIO and SIGPOLL are the same signal. If POSIX real time signals
are used (e.g. when SA_SIGINFO is used with sigaction() and fcntl(fd,
F_SETSIG, SIGRTMIN + <n>) ) then the file descriptor with which the signal is
associated is available to the signal handler. The associated file descriptor
is in the si_fd member of the siginfo_t structure. The poll() system call
that is often used after a signal is received can thus be bypassed.
-----------------------------------------------------------------------------

Appendix A. Sg3_utils package

The sg3_utils package is a collection of programs that use the sg interface.
The utilities can be categorized as follows:

  * variants of the Unix dd command: sg_dd, sgp_dd, sgq_dd and sgm_dd,
   
  * scanning and mapping utilities: sg_scan, sg_map and scsi_devfs_scan,
   
  * SCSI support: sg_inq, scsi_inquiry, sginfo, sg_readcap, sg_start and
    sg_reset,
   
  * timing and testing: sg_rbuf, sg_test_rwbuf, sg_read, sg_turs and
    sg_debug,
   
  * example programs: sg_simple1..4 and sg_simple16,
   

The "dd" family of utilities take a sg device file name as input (i.e. if=<
sg_dev_filen_name>), as output of both. They can also take raw device file
names [20] instead of sg device file names. One important difference from the
standard dd command is that the value given to the block size (bs=) argument
must be the exact block size of that device and not a integral multiple as
allowed by dd. These "dd" variants are suitable for SCSI Direct Access
Devices such as disk and CDROMs (but are not suitable for SCSI tape devices).

The sg3_utils package is designed to be used with the sg version 3 driver
found in the lk 2.4 series. There is also a sg_utils package that supports a
subset of these commands for the sg version 2 driver (with some support for
the original sg driver) which is found in the lk 2.2 series (from and after
lk 2.2.6). There are links to the most recent sg3_utils (and sg_utils)
packages at the sg website at www.torque.net/sg. There are tarballs and both
source and binary rpm packages. At the time of writing the latest sg3_utils
tarball is at www.torque.net/sg/p/sg3_utils-0.97.tgz. There is a README file
in that tarball that should be examined for up to date information. The more
important utility commands (e.g. sg_dd) have "man" pages. [21]

Almost all of the sg device driver capabilities discussed in this document
appear in code in one or more of these programs. For example the recently
added mmap-ed IO can be found in sgm_dd, sg_read and sg_rbuf.

The sg3_utils package also provides some functions that may be useful for
applications that use sg. The functions declared in sg_err.h and defined in
sg_err.c categorize SCSI subsystem errors that are returned to an application
in a read() or a ioctl(SG_IO). In the case of sense buffers, they are decoded
into text message (as per SCSI 2 definitions). There is also a function to do
a 64 bit seek (llseek.h).
-----------------------------------------------------------------------------

Appendix B. sg_header, the original sg control structure

Following is the original interface structure of the sg driver that dates
back to 1991. Those field elements with a "[o]+" are added by the sg version
2 driver which was first placed in lk 2.2.6 in April 1999.
struct sg_header                                                             
{                                                                            
    int pack_len;    /* [o] */                                               
    int reply_len;   /* [i] */                                               
    int pack_id;     /* [i->o] */                                            
    int result;      /* [o] */                                               
    unsigned int twelve_byte:1;     /* [i] */                                
    unsigned int target_status:5;   /* [o]+ */                               
    unsigned int host_status:8;     /* [o]+ */                               
    unsigned int driver_status:8;   /* [o]+ */                               
    unsigned int other_flags:10;    /* unused */                             
    unsigned char sense_buffer[SG_MAX_SENSE]; /* [o] */                      
};      /* This structure is 36 bytes long on i386 */                        
SCSI commands are sent via write() calls to an sg device name (e.g. /dev/
sg0). The data written to write() is of the form <a_sg_header_obj +
scsi_command [ + data_to_write]>. The "data_to_write" component is only
needed for SCSI commands that transfer data towards the SCSI device. The
corresponding read() to the sg device name will yield data of the form <
a_sg_header_obj [ + data_to_read]>.

This interface is fully described in the www.torque.net/sg/p/scsi-generic.txt
file which documents the sg version 2 driver.

Since many Linux applications use this interface, it is still supported in
this version (i.e. version 3) of the driver. Only its most perverse
idiosyncrasies have been modified and no major applications have reported any
problems running old applications atop this newer driver.
-----------------------------------------------------------------------------

Appendix C. Programming example

This appendix contains an example program. It is an abridged version of
sg_simple2.c found in the sg3_utils package. It send a SCSI INQUIRY command
to the nominated sg device and prints out some of the response or outputs
error information. Hopefully showing the error processing does not cloud what
is being illustrated.
#include <unistd.h>                                                              
#include <fcntl.h>                                                               
#include <stdio.h>                                                               
#include <string.h>                                                              
#include <errno.h>                                                               
#include <sys/ioctl.h>                                                           
#include <scsi/sg.h> /* take care: fetches glibc's /usr/include/scsi/sg.h */     
                                                                                 
/* This is a simple program executing a SCSI INQUIRY command using the           
   sg_io_hdr interface of the SCSI generic (sg) driver.                          
                                                                                 
*  Copyright (C) 2001 D. Gilbert                                                 
*  This program is free software.   Version 1.01 (20020226)                      
*/                                                                               
                                                                                 
#define INQ_REPLY_LEN 96                                                         
#define INQ_CMD_CODE 0x12                                                        
#define INQ_CMD_LEN 6                                                            
                                                                                 
int main(int argc, char * argv[])                                                
{                                                                                
    int sg_fd, k;                                                                
    unsigned char inqCmdBlk[INQ_CMD_LEN] =                                       
                    {INQ_CMD_CODE, 0, 0, 0, INQ_REPLY_LEN, 0};                   
/* This is a "standard" SCSI INQUIRY command. It is standard because the         
 * CMDDT and EVPD bits (in the second byte) are zero. All SCSI targets           
 * should respond promptly to a standard INQUIRY */                              
    unsigned char inqBuff[INQ_REPLY_LEN];                                        
    unsigned char sense_buffer[32];                                              
    sg_io_hdr_t io_hdr;                                                          
                                                                                 
    if (2 != argc) {                                                             
        printf("Usage: 'sg_simple0 <sg_device>'\n");                             
        return 1;                                                                
    }                                                                            
    if ((sg_fd = open(argv[1], O_RDONLY)) < 0) {                                 
        /* Note that most SCSI commands require the O_RDWR flag to be set */     
        perror("error opening given file name");                                 
        return 1;                                                                
    }                                                                            
    /* It is prudent to check we have a sg device by trying an ioctl */          
    if ((ioctl(sg_fd, SG_GET_VERSION_NUM, &k) < 0) || (k < 30000)) {             
        printf("%s is not an sg device, or old sg driver\n", argv[1]);           
        return 1;                                                                
    }                                                                            
    /* Prepare INQUIRY command */                                                
    memset(&io_hdr, 0, sizeof(sg_io_hdr_t));                                     
    io_hdr.interface_id = 'S';                                                   
    io_hdr.cmd_len = sizeof(inqCmdBlk);                                          
    /* io_hdr.iovec_count = 0; */  /* memset takes care of this */               
    io_hdr.mx_sb_len = sizeof(sense_buffer);                                     
    io_hdr.dxfer_direction = SG_DXFER_FROM_DEV;                                  
    io_hdr.dxfer_len = INQ_REPLY_LEN;                                            
    io_hdr.dxferp = inqBuff;                                                     
    io_hdr.cmdp = inqCmdBlk;                                                     
    io_hdr.sbp = sense_buffer;                                                   
    io_hdr.timeout = 20000;     /* 20000 millisecs == 20 seconds */              
    /* io_hdr.flags = 0; */     /* take defaults: indirect IO, etc */            
    /* io_hdr.pack_id = 0; */                                                    
    /* io_hdr.usr_ptr = NULL; */                                                 
                                                                                 
    if (ioctl(sg_fd, SG_IO, &io_hdr) < 0) {                                      
        perror("sg_simple0: Inquiry SG_IO ioctl error");                         
        return 1;                                                                
    }                                                                            
                                                                                 
    /* now for the error processing */                                           
    if ((io_hdr.info & SG_INFO_OK_MASK) != SG_INFO_OK) {                         
        if (io_hdr.sb_len_wr > 0) {                                              
            printf("INQUIRY sense data: ");                                      
            for (k = 0; k < io_hdr.sb_len_wr; ++k) {                             
                if ((k > 0) && (0 == (k % 10)))                                  
                    printf("\n  ");                                              
                printf("0x%02x ", sense_buffer[k]);                              
            }                                                                    
            printf("\n");                                                        
        }                                                                        
        if (io_hdr.masked_status)                                                
            printf("INQUIRY SCSI status=0x%x\n", io_hdr.status);                 
        if (io_hdr.host_status)                                                  
            printf("INQUIRY host_status=0x%x\n", io_hdr.host_status);            
        if (io_hdr.driver_status)                                                
            printf("INQUIRY driver_status=0x%x\n", io_hdr.driver_status);        
    }                                                                            
    else {  /* assume INQUIRY response is present */                             
        char * p = (char *)inqBuff;                                              
        printf("Some of the INQUIRY command's response:\n");                     
        printf("    %.8s  %.16s  %.4s\n", p + 8, p + 16, p + 32);                
        printf("INQUIRY duration=%u millisecs, resid=%d\n",                      
               io_hdr.duration, io_hdr.resid);                                   
    }                                                                            
    close(sg_fd);                                                                
    return 0;                                                                    
}                                                                                

The sg_simple4.c program is an example of using mmap-ed IO in the sg3_utils
package. An example of using direct IO can be found in sg_rbuf.c in the same
package.
-----------------------------------------------------------------------------

Appendix D. Debugging

There are various ways to debug what is happening with the sg driver. The
information provided in the /proc/scsi/sg directory can be useful, especially
the debug pseudo file. It outputs the state of the sg driver when it is
called. Invoking it at the right time can be a challenge. One approach (used
in SANE) is to invoke the system() system call like this:
    system("cat /proc/scsi/sg/debug");                                       
at appropriate times within an application that is using the sg driver.

Another debugging technique is to trace all system calls a program makes with
the strace command (see its "man" page). This command can also be used to
obtain timing information (with the "-r" and "t" options).

To debug the sg driver itself then the kernel needs to be built with
CONFIG_SCSI_LOGGING selected. Then copious output will be sent by the sg
driver whenever it is invoked to the log (normally /var/log/messages) and/or
the console. This debug output is turned on by:
 $ echo "scsi log timeout 7" > /proc/scsi/scsi                               
As the number (i.e. 7) is reduced, less output is generated. To turn off this
type of debugging use:
 $ echo "scsi log timeout 0" > /proc/scsi/scsi                               

If you want the system to log SCSI (CHECK_CONDITION related) errors that sg
detects rather than process them within the application using sg then set
ioctl(SG_SET_DEBUG) to a value greater than zero. Processing SCSI errors
within the application using sg is my preference.
-----------------------------------------------------------------------------

Appendix E. Other references

The primary site for SCSI information, standards (draft and emerging) and
related reseources is www.t10.org.

The most recent news on the sg driver can be found at: www.torque.net/sg .

Some notes on the sg v3 driver can be found at: www.torque.net/sg/
s_packet.html . For some timings (and CPU utilizations) comparisons between
direct and indirect IO see: www.torque.net/sg/rbuf_tbl.html

The Linux Documentation Project's SCSI-2.4-HOWTO may help to put this driver
into perspective: linuxdoc.org/HOWTO/SCSI-2.4-HOWTO . The most recent version
of that document can be found at www.torque.net/scsi/SCSI-2.4-HOWTO .

To understand the inner workings of device drivers there is a fine book
called "Linux Device Drivers", second edition by Alessandro Rubini and
Jonathan Corbet published by O'Reilly [ISBN 0-596-00008-1]. The authors and
the publisher have unselfishly made this book available under the GNU Free
Documentation License (version 1.1). It can be found in html at 
www.oreilly.com/catalog/linuxdrive2/chapter/book .

Notes

[1]  SCSI command opcode 0x7f does allow for variable length commands but    
     that is not supported in Linux currently.                               
[2]  There is an sg version 3.0.19 which is an optional driver for the lk 2.2
     series. It has the following limitations:                               
                                                                             
       * maximum size of SCSI commands is 12 bytes                           
                                                                             
       * sense buffer limited to 16 bytes                                    
                                                                             
       * resid (residual data transfer count) is always 0                    
                                                                             
       * direct and mmap-ed IO not supported (defaults to indirect IO)       
                                                                             
                                                                             
[3]  Patches exist for sg to extend the number of SCSI devices past the 256  
     limit when the device file system (devfs) is being used.                
[4]  Linux kernel prior to 2.4.15 limited SCSI commands to a length of 12    
     bytes. In lk 2.4.15 this was raised to 16 bytes. However unless lower   
     level drivers (e.g. aic7xxx) indicate that they can handle 16 byte      
     commands (and few currently do) then the command is aborted with a      
     DID_ABORT host status.                                                  
[5]  Some HBA - SCSI device combinations have difficulties with an odd valued
     dxfer_len . In some cases the operation succeeds but a DID_ERROR host   
     status is returned. So unless there is a good reason, applications that 
     want maximum portability should avoid an odd valued dxfer_len .         
[6]  Whether aborting individual commands is supported or not is left to the 
     adapter. Many adapters are unable to abort SCSI commands "in flight"    
     because these details are handled in silicon by embedded processors in  
     hardware. SCSI device or bus resets are required.                       
[7]  Some lower level drivers (e.g. ide-scsi) clear this status field even   
     when a CHECK_CONDITION or COMMAND_TERMINATED status has occurred.       
     However they do set DRIVER_SENSE in driver_status field. Also a         
     (sb_len_wr > 0) indicates there is a sense buffer.                      
[8]  Some lower level drivers (e.g. ide-scsi) clear this masked_status field 
     even when a CHECK_CONDITION or COMMAND_TERMINATED status has occurred.  
     However they do set DRIVER_SENSE in driver_status field. Also a         
     (sb_len_wr > 0) indicates there is a sense buffer.                      
[9]  In some cases the sym53cxx driver reports a DID_ERROR when it internally
     rounds up an odd transfer length by 1. This is an example of a          
     "non-error".                                                            
[10] Unfortunately some adapters drivers report an incorrect number for      
     'resid'. This is due to some "fuzziness" in the internal interface      
     definitions within the Linux scsi subsystem concerning the _exact_      
     number of bytes to be transferred. Therefore only applications tied to a
     specific adapter that is known to give the correct figure should use    
     this feature. Hopefully this will be cleared up in the near future.     
[11] The command queuing capabilities of the SCSI device and the adapter     
     driver should also be taken into account. To this end the sg_scsi_id::  
     h_cmd_per_lun and sg_scsi_id::d_queue_depth values returned bu ioctl    
     (SG_GET_SCSI_ID) may be useful. Also some devices that indicate in their
     INQUIRY response that they can accept command queuing react badly when  
     queuing is actually attempted.                                          
[12] There is a small probability it will spend some time waiting for a      
     command block to become available. In this case the wait is             
     interruptible. If O_NONBLOCK is active then this scenario will cause a  
     EAGAIN.                                                                 
[13] The sg driver does record that the mmap() system call has been invoked  
     at least once on a file descriptor. This is not sufficient because the  
     given 'length' may be too short for the current IO. Also the driver is  
     unaware of munmap() calls so it could easily be tricked.                
[14] If ioctl(SG_SET_KEEP_ORPHAN) is set to 1 and a ioctl(SG_IO) operation is
     interrupted (e.g. by control-C by the user) then when the response      
     arrives then the "num_waiting" will be incremented to indicate a read() 
     can now pick up the response.                                           
[15] Here is the mapping from the SCSI opcode "group" (top 3 bits of opcode) 
     to the assumed length (in lk 2.4.15):                                   
     unsigned char scsi_command_size[8] =                                    
     {                                                                       
             6, 10, 10, 12,                                                  
             16, 12, 10, 10                                                  
     };                                                                      
     The assumed length of group 4 commands changed from 12 to 16 in lk      
     2.4.15 reflecting support for 16 byte SCSI commands being added to the  
     kernel.                                                                 
[16] Older SCSI adapters and some pseudo adapter drivers don't have DMA      
     capability in which case the CPU is used to copy the data.              
[17] Unfortunately that setup time is large enough in some versions of the lk
     2.4 series to adversely impact direct IO performance. Also memory malloc
     ()-ed in the user space tends to be made up of discontinuous pages seen 
     from the SCSI adapter. This requires the sg driver to build heavily     
     splintered scatter gather lists which is less than desirable. This      
     limits the maximum transfer size to                                     
     [(max_scsi_adapter_scatter_gather_elements - 1) * PAGE_SIZE]. [This is a
     _different_ scatter gather mechanism to that which the user sees in the 
     sg interface based on iovec.]                                           
[18] When a write() or ioctl(SG_IO) attempts mmap-ed IO there is no check    
     performed that a prior mmap() system call has been performed. If no mmap
     () has been issued then random data is written to the device or data    
     read from the device in inaccessible. Also once mmap() has been called  
     on a file descriptor then all subsequent calls to ioctl                 
     (SG_SET_RESERVED_SIZE) will yield EBUSY.                                
[19] One strange quirk is that the /proc/scsi/sg directory will not appear if
     there are no SCSI devices (or pseudo devices such as USB mass storage)  
     attached to the system. The reason for this is that in the absence of   
     SCSI devices, the SCSI mid level does not initialize the sg driver (even
     if it has been loaded as a module). When the sg driver is a module and  
     the rmmod sg is successfully executed then the /proc/scsi/sg directory  
     and its contents are removed.                                           
[20] Raw device names are of the form /dev/raw/raw<n> and can be bound to    
     block devices (e.g. an IDE disk partition such as /dev/hda3). The       
     binding is done with the raw command (see "man raw").                   
[21] Although the author wrote most of these programs, initially to test     
     facilities within the sg driver, some have been contributed by others.  
     See www.torque.net/sg/u_index.html for more information.