<chapter id="tsoverview-10750"><title>Troubleshooting
Software Problems (Overview)</title><highlights><para>This chapter provides a general overview of troubleshooting software
problems, including information on troubleshooting system crashes and viewing
system messages.</para><itemizedlist><para>This is a list of information in this chapter.</para><listitem><para><olink targetptr="exlvp" remap="internal">What's New in Troubleshooting?</olink></para>
</listitem><listitem><para><olink targetptr="tsoverview-12" remap="internal">Where to Find Software Troubleshooting
Tasks</olink></para>
</listitem><listitem><para><olink targetptr="tsoverview-30557" remap="internal">Troubleshooting a System
Crash</olink></para>
</listitem><listitem><para><olink targetptr="tsoverview-24917" remap="internal">Troubleshooting a System
Crash Checklist</olink></para>
</listitem>
</itemizedlist>
</highlights><sect1 id="exlvp"><title>What's New in Troubleshooting?</title><para>This section describes new or changed troubleshooting information in
this Solaris release. </para><itemizedlist><para>For information on new or changed troubleshooting features in the Solaris
10 release, see the following:</para><listitem><para><olink targetptr="frcbj" remap="internal">Dynamic Tracing Facility</olink></para>
</listitem><listitem><para><olink targetptr="gcdcd" remap="internal">kmdb Replaces kadb as Standard Solaris
Kernel Debugger</olink></para>
</listitem>
</itemizedlist><sect2 id="gexqk" arch="x86"><title>Error Message Upon System Boot if Multiboot Module From the Previous
GRUB Implementation Is Loaded</title><para><emphasis role="strong">Solaris 10 11/07:</emphasis> Changes have been made to the GRUB
bootloader that enable the direct loading and booting of the unix kernel.
The GRUB <literal>multiboot</literal> module is no longer used. If the multiboot
module from the previous GRUB implementation is loaded by GRUB, the console
displays an error message upon system boot. For more information about what
to do if this error message is displayed when the system boots, see <olink targetptr="gewwy" remap="internal">What to Do if the Multiboot Module From Previous GRUB Implementation
Is Loaded at Boot Time</olink>.</para><para>For more information about what's new in booting and changes to GRUB
in this Solaris release, see <olink targetdoc="sysadv1" targetptr="hboverview-25463" remap="external">Chapter 9, <citetitle remap="chapter">Booting a System (Overview),</citetitle> in <citetitle remap="book">System Administration Guide: Basic Administration</citetitle></olink>.</para>
</sect2><sect2 id="gcbxe"><title>Common Agent Container Problems</title><para><emphasis role="strong">Solaris 10 6/06:</emphasis> The
common agent container is a stand-alone Java program that is now included
in the Solaris OS. This program implements a container for Java management
applications.  The common agent container provides a management infrastructure
that is designed for Java Management Extensions (JMX) and Java Dynamic Management
Kit (Java DMK) based functionality.  The software is installed by the <literal>SUNWcacaort</literal> package and resides in the <filename>/usr/lib/cacao</filename> directory.</para><itemizedlist><para>Typically, the container is not visible.  However, there are two instances
when you might need to interact with the container daemon:</para><listitem><para>It is possible that another application might attempt to use
a network port that is reserved for the common agent container.</para>
</listitem><listitem><para>In the event that a certificate store is compromised, you
might have to regenerate the common agent container certificate keys. </para>
</listitem>
</itemizedlist><para>For information about how to troubleshoot these problems, see <olink targetptr="gcbwx" remap="internal">Troubleshooting Common Agent Container Problems in the Solaris
OS</olink>.</para>
</sect2><sect2 id="fxchw" arch="x86"><title>SMF Boot Archive Service Might Fail During
System Reboot</title><para> If
a system crash occurs in the GRUB based boot environment, it is possible that
the SMF service <command>svc:/system/boot-archive:default</command> might
fail when the system is rebooted. If this problem occurs, reboot the system
and select the Solaris failsafe archive in the GRUB boot menu. Follow the
prompts to rebuild the boot archive. After the archive is rebuilt, reboot
the system. To continue the boot process, you can use the <command>svcadm</command> command
to clear the <command>svc:/system/boot-archive:default</command> service.
For instructions, see <olink targetptr="fxcmh" remap="internal">What to Do if the SMF Boot
Archive Service Fails During a System Reboot</olink>. For more information
on GRUB based booting, see <olink targetdoc="group-sa" targetptr="hbx86boot-68676" remap="external"><citetitle remap="section">Booting an x86 Based System by Using GRUB (Task Map)</citetitle> in <citetitle remap="book">System Administration Guide: Basic Administration</citetitle></olink>.</para>
</sect2><sect2 id="frcbj"><title>Dynamic Tracing Facility</title><para>The Solaris Dynamic
Tracing (DTrace) facility is a comprehensive dynamic tracking facility that
gives you a new level of observerability into the Solaris kernel and user
processes. DTrace helps you understand your system by permitting you to dynamically
instrument the OS kernel and user processes to record data that you specify
at locations of interest, called, <emphasis>probes</emphasis>. Each probe
can be associated with custom programs that are written in the new D programming
language. All of DTrace's instrumentation is entirely dynamic and available
for use on your production system. For more information, see the <olink targetdoc="refman" targetptr="dtrace-1m" remap="external"><citerefentry><refentrytitle>dtrace</refentrytitle><manvolnum>1M</manvolnum></citerefentry></olink> man page and the <olink targetdoc="dynmctrcggd" remap="external"><citetitle remap="book">Solaris Dynamic Tracing Guide</citetitle></olink>.</para>
</sect2><sect2 id="gcdcd"><title><command>kmdb</command> Replaces <command>kadb</command> as
Standard Solaris Kernel Debugger</title><para><command>kmdb</command> has
replaced <command>kadb</command> as the standard &ldquo;in situ&rdquo; Solaris
kernel debugger.</para><itemizedlist><para><command>kmdb</command> brings all the power and flexibility of <command>mdb</command> to live kernel debugging. <command>kmdb</command> supports the
following:</para><listitem><para>Debugger commands (dcmds)</para>
</listitem><listitem><para>Debugger modules (dmods)</para>
</listitem><listitem><para>Access to kernel type data</para>
</listitem><listitem><para>Kernel execution control</para>
</listitem><listitem><para>Inspection</para>
</listitem><listitem><para>Modification</para>
</listitem>
</itemizedlist><para>For more information, see the <olink targetdoc="refman" targetptr="kmdb-1" remap="external"><citerefentry><refentrytitle>kmdb</refentrytitle><manvolnum>1</manvolnum></citerefentry></olink> man
page. For step-by-step instructions on using <command>kmdb</command> to troubleshoot
a system, see <olink targetdoc="group-sa" targetptr="euayz" remap="external"><citetitle remap="section">How to Boot the System With the Kernel Debugger (kmdb)</citetitle> in <citetitle remap="book">System Administration Guide: Basic Administration</citetitle></olink> and <olink targetdoc="group-sa" targetptr="fvzpl" remap="external"><citetitle remap="section">How to Boot a System With the Kernel Debugger in the GRUB Boot Environment (kmdb)</citetitle> in <citetitle remap="book">System Administration Guide: Basic Administration</citetitle></olink>.</para>
</sect2>
</sect1><sect1 id="tsoverview-12"><title>Where to Find Software Troubleshooting Tasks</title><informaltable frame="topbot"><tgroup cols="2" colsep="0" rowsep="0"><colspec colname="colspec2" colwidth="50*"/><colspec colname="colspec3" colwidth="50*"/><thead><row rowsep="1"><entry><para>Troubleshooting Task</para>
</entry><entry><para>For More Information</para>
</entry>
</row>
</thead><tbody><row><entry><para>Manage system crash information</para>
</entry><entry><para><olink targetptr="tscrashdumps-40145" remap="internal">Chapter&nbsp;17, Managing System
Crash Information (Tasks)</olink></para>
</entry>
</row><row><entry><para>Manage core files</para>
</entry><entry><para><olink targetptr="tscore-7" remap="internal">Chapter&nbsp;16, Managing Core Files (Tasks)</olink></para>
</entry>
</row><row><entry><para>Troubleshoot software problems such as reboot failures and backup problems</para>
</entry><entry><para><olink targetptr="tsgeneral-32935" remap="internal">Chapter&nbsp;18, Troubleshooting
Miscellaneous Software Problems (Tasks)</olink></para>
</entry>
</row><row><entry><para>Troubleshoot file access problems</para>
</entry><entry><para><olink targetptr="tsfileaccess-28392" remap="internal">Chapter&nbsp;19, Troubleshooting
File Access Problems (Tasks)</olink></para>
</entry>
</row><row><entry><para>Troubleshoot printing problems</para>
</entry><entry><para><olink targetdoc="sysadprtsvcs" targetptr="tsprint-34397" remap="external">Chapter 12, <citetitle remap="chapter">Troubleshooting Printing Problems (Tasks),</citetitle> in <citetitle remap="book">System Administration Guide: Solaris Printing</citetitle></olink></para>
</entry>
</row><row><entry><para>Resolve UFS file system inconsistencies</para>
</entry><entry><para><olink targetptr="tsfsck-26279" remap="internal">Chapter&nbsp;20, Resolving UFS File
System Inconsistencies (Tasks)</olink></para>
</entry>
</row><row><entry><para>Troubleshoot software package problems</para>
</entry><entry><para><olink targetptr="tsswmgr-40462" remap="internal">Chapter&nbsp;21, Troubleshooting Software
Package Problems (Tasks)</olink></para>
</entry>
</row>
</tbody>
</tgroup>
</informaltable><sect2 id="getqg"><title>Additional Resources for Troubleshooting System and
Software Problems</title><para>You can use the Sun Explorer software to collect data for troubleshooting
system and software problems. For more information about downloading the Sun
Explorer software, see <olink targetdoc="expug" remap="external"><citetitle remap="book">Sun Explorer User&rsquo;s Guide</citetitle></olink>.</para>
</sect2>
</sect1><sect1 id="tsoverview-30557"><title>Troubleshooting a System Crash</title><para>If a system running the Solaris Operating System crashes, provide your service provider
with as much information as possible, including crash dump files.</para><sect2 id="tsoverview-18640"><title>What to Do if the System Crashes</title><para>The most important things to remember are:</para><orderedlist><listitem><para>Write down the system console messages.</para><para>If a system crashes, making
it run again might seem like your most pressing concern. However, before you
reboot the system, examine the console screen for messages. These messages
can provide some insight about what caused the crash. Even if the system reboots
automatically and the console messages have disappeared from the screen, you
might be able to check these messages by viewing the system error log, the<filename>/var/adm/messages</filename> file. For more information about viewing system
error log files, see <olink targetptr="eeklc" remap="internal">How to View System Messages</olink>.</para><para>If you have frequent crashes and can't determine
their cause, gather all the information you can from the system console or
the <filename>/var/adm/messages</filename> files, and have it ready for a
customer service representative to examine. For a complete list of troubleshooting
information to gather for your service provider, see <olink targetptr="tsoverview-30557" remap="internal">Troubleshooting a System Crash</olink>.   </para><para>If the system fails to reboot successfully after a system crash, see <olink targetptr="tsgeneral-32935" remap="internal">Chapter&nbsp;18, Troubleshooting Miscellaneous
Software Problems (Tasks)</olink>.</para>
</listitem><listitem><para>Synchronize the disks and reboot.</para><screen>ok <userinput>sync</userinput></screen><para>If the system fails to reboot successfully after a system crash, see <olink targetptr="tsgeneral-32935" remap="internal">Chapter&nbsp;18, Troubleshooting Miscellaneous
Software Problems (Tasks)</olink>.</para>
</listitem>
</orderedlist><para>Check to see if a system crash dump was generated after the system crash.
System crash dumps are saved by default. For information about crash dumps,
see <olink targetptr="tscrashdumps-40145" remap="internal">Chapter&nbsp;17, Managing System
Crash Information (Tasks)</olink>.</para>
</sect2><sect2 id="tsoverview-23810"><title>Gathering Troubleshooting Data</title><para>Answer the following questions to help isolate the system problem. Use <olink targetptr="tsoverview-24917" remap="internal">Troubleshooting a System Crash Checklist</olink> for
gathering troubleshooting data for a crashed system.</para><table frame="topbot" id="tsoverview-tbl-2"><title>Identifying System Crash
Data</title><tgroup cols="2" colsep="0" rowsep="0"><colspec colname="column1" colwidth="144*"/><colspec colname="column2" colwidth="216*"/><thead><row rowsep="1"><entry><para>Question</para>
</entry><entry><para>Description</para>
</entry>
</row>
</thead><tbody><row><entry><para><emphasis>Can you reproduce the problem?</emphasis></para>
</entry><entry><para>This is important because a reproducible test case is often essential
for debugging really hard problems. By reproducing the problem, the service
provider can build kernels with special instrumentation to trigger, diagnose,
and fix the bug.</para>
</entry>
</row><row><entry><para><emphasis>Are you using any third-party drivers?</emphasis></para>
</entry><entry><para>Drivers run in the same address space as the kernel, with all the same
privileges, so they can cause system crashes if they have bugs.</para>
</entry>
</row><row><entry><para><emphasis>What was the system doing just before it crashed?</emphasis></para>
</entry><entry><para>If the system was doing anything unusual like running a new stress test
or experiencing higher-than-usual load, that might have led to the crash.</para>
</entry>
</row><row><entry><para><emphasis>Were there any unusual console messages right before the crash?</emphasis></para>
</entry><entry><para>Sometimes the system will show signs of distress before it actually
crashes; this information is often useful.</para>
</entry>
</row><row><entry><para><emphasis>Did you add any tuning parameters to the</emphasis> <filename>/etc/system</filename> <emphasis>file</emphasis>?</para>
</entry><entry><para>Sometimes tuning parameters, such as increasing shared memory segments
so that the system tries to allocate more than it has, can cause the system
to crash.</para>
</entry>
</row><row><entry><para><emphasis>Did the problem start recently?</emphasis></para>
</entry><entry><para>If so, did the onset of problems coincide with any changes to the system,
for example, new drivers, new software, different workload, CPU upgrade, or
a memory upgrade.</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
</sect1><sect1 id="tsoverview-24917"><title>Troubleshooting a System Crash Checklist</title><para>Use this checklist when gathering system data for a crashed system.</para><informaltable frame="topbot"><tgroup cols="2" colsep="0" rowsep="0"><colspec colname="column2" colwidth="180*"/><colspec colname="column3" colwidth="180*"/><thead><row rowsep="1"><entry><para>Item</para>
</entry><entry><para>Your Data</para>
</entry>
</row>
</thead><tbody><row><entry rowsep="1"><para>Is a system crash dump available?</para>
</entry><entry rowsep="1"><para></para>
</entry>
</row><row><entry colsep="0" rowsep="1"><para>Identify the operating system release and appropriate software application
release levels.</para>
</entry><entry colsep="0" rowsep="1"><para></para>
</entry>
</row><row><entry colsep="0" rowsep="1"><para>Identify system hardware.</para>
</entry><entry colsep="0" rowsep="1"><para></para>
</entry>
</row><row><entry rowsep="1"><para>Include <command>prtdiag</command> output for sun4u systems. Include
Explorer output for other systems.</para>
</entry><entry rowsep="1"><para></para>
</entry>
</row><row><entry colsep="0" rowsep="1"><para>Are patches installed? If so, include <command>showrev -p</command> output.</para>
</entry><entry colsep="0" rowsep="1"><para></para>
</entry>
</row><row><entry colsep="0" rowsep="1"><para>Is the problem reproducible?</para>
</entry><entry colsep="0" rowsep="1"><para></para>
</entry>
</row><row><entry colsep="0" rowsep="1"><para>Does the system have any third-party drivers?</para>
</entry><entry colsep="0" rowsep="1"><para></para>
</entry>
</row><row><entry colsep="0" rowsep="1"><para>What was the system doing before it crashed?</para>
</entry><entry colsep="0" rowsep="1"><para></para>
</entry>
</row><row><entry colsep="0" rowsep="1"><para>Were there any unusual console messages right before the system crashed?</para>
</entry><entry colsep="0" rowsep="1"><para></para>
</entry>
</row><row><entry colsep="0" rowsep="1"><para>Did you add any parameters to the <filename>/etc/system</filename> file?</para>
</entry><entry colsep="0" rowsep="1"><para></para>
</entry>
</row><row><entry colsep="0" rowsep="0"><para>Did the problem start recently?</para>
</entry><entry colsep="0" rowsep="0"><para></para>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
</sect1>
</chapter>