How does Dynatrace crash analysis work?

Processes crash for a multitude of reasons and it’s often difficult to understand the root causes that contribute to such crashes. When a monitored process crashes, you’ll see a process crash entry in the Events section of each affected process and host page. The example process below has some availability problems (shown in red on the timeline). By selecting the affected timeframe in the timeline, the Events section shows you the number of process crashes that occurred during that timeframe (3 crashes in this example).

Click the Process crash details button to view a detailed list of the crashes that occurred during the selected timeframe. Here you’ll find all details related to why each process crashed.

The provided crash details include the signal that killed the process (for example, Segmentation fault or Abort), the execution stack frame that crashed, and more. The available details vary based on the type of crash and may include a native core dump, a Java core dump, or an abnormal program exit due to exceptions.

Note that this functionality works for all processes on each monitored host (see example below).

Analyze additional crash artifacts

Crash details often include a Download button that provides access to additional crash artifacts, such as hs_err_pid files for Java crashes, text files that provide analysis of Linux and Windows core dumps, or files containing the .NET, Java, or Node.js exceptions that were potentially responsible for the crashes. For example, the Segmentation fault crash report above resulted in a core dump. Dynatrace OneAgent analyzed the core dump automatically and then produced the following report as a log artifact:

dumpproc version 1.108.0.20161025-115919, installer version 1.108.0.20161025-121046
2016-11-09 18:00:44: Application 'CreditCardAutho', inner pid '15891', outer pid '0', signal: 'Segmentation fault' (11)
process group ID: 0x441b2cb89962033d
process group instance ID: 0xfe58bab23100f42c
process group Name: easytravel-*-x*

threadCount: 1
thread: 0 - stack range: 0x7ffeda572000-0x7ffeda594000, size: 136 kB
 0x00007ffeda592be0 0x00007f4de477604d libpthread-2.15.so!<imagebase>+0xf04d
 0x00007ffeda592bf0 0x00000000004038d8 CreditCardAuthorizationS64!main+0x1b8
 0x00007ffeda592c60 0x00007f4de41c676d libc-2.15.so!__libc_start_main+0xed
 0x00007ffeda592d20 0x000000000040329a CreditCardAuthorizationS64!<imagebase>+0x329a

mapped files:
 0000000000400000-000000000041e000 0 /home/labuser/easytravel-2.0.0-x64/CreditCardAuthorizationS64 (MD5: da5992daf5ba3b76c633c853c7da5e87)
 000000000051d000-000000000051e000 1d /home/labuser/easytravel-2.0.0-x64/CreditCardAuthorizationS64 (MD5: da5992daf5ba3b76c633c853c7da5e87)
 00007f4de41a5000-00007f4de4359000 0 /lib/x86_64-linux-gnu/libc-2.15.so (GNU Build-Id: aa64a66ac46bff200848c0a0694011bd0140ab4e)
 00007f4de4359000-00007f4de4558000 1b4 /lib/x86_64-linux-gnu/libc-2.15.so (GNU Build-Id: aa64a66ac46bff200848c0a0694011bd0140ab4e)
 00007f4de4558000-00007f4de455c000 1b3 /lib/x86_64-linux-gnu/libc-2.15.so (GNU Build-Id: aa64a66ac46bff200848c0a0694011bd0140ab4e)
 00007f4de455c000-00007f4de455e000 1b7 /lib/x86_64-linux-gnu/libc-2.15.so (GNU Build-Id: aa64a66ac46bff200848c0a0694011bd0140ab4e)
 00007f4de4563000-00007f4de4565000 0 /lib/x86_64-linux-gnu/libdl-2.15.so (GNU Build-Id: d181af551dbbc43e9d55913d532635fde18e7c4e)
 00007f4de4565000-00007f4de4765000 2 /lib/x86_64-linux-gnu/libdl-2.15.so (GNU Build-Id: d181af551dbbc43e9d55913d532635fde18e7c4e)
 00007f4de4765000-00007f4de4766000 2 /lib/x86_64-linux-gnu/libdl-2.15.so (GNU Build-Id: d181af551dbbc43e9d55913d532635fde18e7c4e)
 00007f4de4766000-00007f4de4767000 3 /lib/x86_64-linux-gnu/libdl-2.15.so (GNU Build-Id: d181af551dbbc43e9d55913d532635fde18e7c4e)
 00007f4de4767000-00007f4de477f000 0 /lib/x86_64-linux-gnu/libpthread-2.15.so (GNU Build-Id: c340af9dee97c17c730f7d03693286c5194a46b8)
 00007f4de477f000-00007f4de497e000 18 /lib/x86_64-linux-gnu/libpthread-2.15.so (GNU Build-Id: c340af9dee97c17c730f7d03693286c5194a46b8)
 00007f4de497e000-00007f4de497f000 17 /lib/x86_64-linux-gnu/libpthread-2.15.so (GNU Build-Id: c340af9dee97c17c730f7d03693286c5194a46b8)
 00007f4de497f000-00007f4de4980000 18 /lib/x86_64-linux-gnu/libpthread-2.15.so (GNU Build-Id: c340af9dee97c17c730f7d03693286c5194a46b8)
 00007f4de4984000-00007f4de4a02000 0 /lib/x86_64-linux-gnu/liboneagentproc.so (1.108.0.20161025-115919)
 00007f4de4a02000-00007f4de4c01000 7e /lib/x86_64-linux-gnu/liboneagentproc.so (1.108.0.20161025-115919)
 00007f4de4c01000-00007f4de4c03000 7d /lib/x86_64-linux-gnu/liboneagentproc.so (1.108.0.20161025-115919)
 00007f4de4c03000-00007f4de4c05000 7f /lib/x86_64-linux-gnu/liboneagentproc.so (1.108.0.20161025-115919)
 00007f4de4cc0000-00007f4de4ce2000 0 /lib/x86_64-linux-gnu/ld-2.15.so (GNU Build-Id: e25ad1a11ccf57e734116b8ec9c69f643dca9f18)
 00007f4de4ee2000-00007f4de4ee3000 22 /lib/x86_64-linux-gnu/ld-2.15.so (GNU Build-Id: e25ad1a11ccf57e734116b8ec9c69f643dca9f18)
 00007f4de4ee3000-00007f4de4ee5000 23 /lib/x86_64-linux-gnu/ld-2.15.so (GNU Build-Id: e25ad1a11ccf57e734116b8ec9c69f643dca9f18)

Protect sensitive user data

Crash reports may contain sensitive personal information that should not be viewed by all users. For this reason, your Dynatrace administrator must enable the View logs account-security option in your user profile before you can view sensitive data. This option is disabled by default for all non-admin users and must be explicitly enabled before you can access log contents.

How Dynatrace handles crashes on Windows and core dumps on Linux

Crash handling on Windows

In order for a generic Windows process crash (core dump) to be visible to Dynatrace, the crash must be detected by Windows Error Reporting. For this reason, the Windows Error Reporting service must be enabled.

When a crash occurs on Windows, a dialog appears, asking if you want to debug or close the crashed application. This is not desirable for headless systems. You can disable this dialog by adding a value to the registry, as shown below:

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting] "DontShowUI"=dword:00000001

You can learn about other valuable settings related to Windows Error Reporting by visiting Microsoft documentation.

Linux core dump handling

In Linux, the way the core dump is handled by the kernel is set in /proc/sys/kernel/core_pattern. Beginning with kernel 2.6.19 (1), there are two methods of dealing with application crashes. The core dump may either be written to a file pointed to by the /proc/sys/kernel/core_pattern entry or pushed to an application—the entry must be prefixed with a vertical slash character (|) character.

Suse Linux uses the first method and so the entry is similar to /proc/sys/kernel/core_pattern: core. This means that a file with the name core is written in the current working directory of the crashed process.

Ubuntu and Redhat generally rely on their own tools for reporting crash dumps and so the lines appear as follows:
|/usr/share/apport/apport %p %s %c %P
or
|/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e
In the latter example, when a program crashes, the coredump output is pushed to stdin of the application given in the first parameter. Moreover, the kernel fills the values of any parameters formatted as %[a-zA-Z]. The apport reporting service overwrites the file /proc/sys/kernel/core_pattern. If apport is enabled (in /etc/default/apport), then the /proc/sys/kernel/core_pattern configuration setting is set when the apport crash reporting service starts on system boot. Read more..

Dynatrace installer core_pattern handling

The Dynatrace installer overwrites the core pattern with its own command but preserves the original pattern.

  • The content of the original /proc/sys/kernel/core_pattern file is copied to /opt/dynatrace/oneagent/agent/.original_core_pattern. When Dynatrace OneAgent is uninstalled, the uninstaller restores the original core pattern present in this file to /proc/sys/kernel/core_pattern.

  • The content of the original kernel.core_pattern option of /etc/sysctl.conf is copied to /opt/dynatrace/oneagent/agent/.original.sysctl.corepattern. When Dynatrace OneAgent is uninstalled, the uninstaller restores the original core pattern present in this file to kernel.core_pattern in /etc/sysctl.conf.

Depending on the original entry in core_pattern, Dynatrace will write different patterns to core_pattern. The possible configurations and expected entries after installation are listed below:

Original core_pattern entry core_pattern after ruxitdumpproc installation Comment
core |/opt/ruxit/agent/rdp -p %p -e %e -s %s Simple core dump without parameters.
core_%s_%e |/opt/ruxit/agent/rdp -p %p -e %e -s %s -kp %s,%e Simple core dump with parameters in the filename. The -kp parameter is appended along with all kernel parameters needed for Dynatrace to substitute in the original filename.
|/usr/share/apport/apport |/opt/ruxit/agent/rdp -p %p -e %e -s %s Core dump next application without parameters. The -a argument is not appended to the output core_pattern entry if there are no parameters.
|/usr/share/apport/apport %p %s %c %P |/opt/ruxit/agent/rdp -p %p -e %e -s %s -a %p %s %c %P Core dump next application with parameters. The -a argument gets appended along with all of the parameters after the binary path to apport.

Core handling by OneAgent dumpproc

When a crash occurs, then rdp is called first to dump the core to OneAgent folders. This core is used by Crash Reporting functionality. In the next step, OneAgent reads the /opt/dynatrace/oneagent/agent/.original_core_pattern and generates core according to the settings there. This means that if the original setup was writing the core file to a specific place, this would still happen after OneAgent was installed.

In a next step the core dump is analyzed to check if Dynatrace could have been the root cause of the crash. If that is the case, a support alert is generated. This is reported to our DevOps team. In such a case the core dump is zipped and retained in addition to all involved libraries. This is needed for later offline analysis.

If OneAgent determines that Dynatrace is not at fault, a crash is reported via the Dynatrace UI to the user and if it has any impact on the customer's application a problem is opened, and an appropriate event is generated for the involved processes as described above.

Cleanup

The log and support alert directories are cleaned up automatically.

  • For support alerts, we process the core dump, then zip it and keep it in order to be sent to cluster.
  • For crashes (non-instrumented processes or instrumented ones where we decide Dynatrace is not at fault), we process and then delete the copy of the core dump.