Difference between revisions of "Debugging Crashes"

From WikiPrizm
Jump to navigationJump to search
(Added exception data)
Line 53: Line 53:
 
The third line on the System Error dialog tells you what exception the OS handled.  This gives you an idea of what is wrong with the code at the PC.  The Transition address is the address of the exception handler called.  VBR is the address stored at 0xA0000000.
 
The third line on the System Error dialog tells you what exception the OS handled.  This gives you an idea of what is wrong with the code at the PC.  The Transition address is the address of the exception handler called.  VBR is the address stored at 0xA0000000.
 
=== ADDRESS(x) ===
 
=== ADDRESS(x) ===
'''Causes:''' This exception is thrown if there is a data address error.  This can be caused if a word is accessed across word boundary, a longword across a longword boundary, or a quadword across a quadword boundary. The (x) will either be ''R'' or ''W'' if the exception occurred on the ''R''ead or the ''W''rite cycle.
+
'''Causes:''' This exception is thrown if there is a data address error.  This can be caused if a word is accessed across word boundary, a longword across a longword boundary, or a quadword across a quadword boundary. It is also possible that this may be called when an instruction address is used that is not word-aligned.  The (x) will either be ''R'' or ''W'' if the exception occurred on the ''R''ead or the ''W''rite cycle.
  
 
'''VBR Offset:''' 0x100
 
'''VBR Offset:''' 0x100
'''Exception Code:''' Unknown
+
 
 +
'''Exception Code:''' 0x0E0 on read, 0x100 on write
 +
 
 +
'''Priority:''' 5
 
[[Category:Tutorials]]
 
[[Category:Tutorials]]
 
=== PROTECT(x) ===
 
=== PROTECT(x) ===
'''Causes:''' Unknown. The (x) will either be ''R'' or ''W'' if the exception occurred on the ''R''ead or the ''W''rite cycle.
+
'''Causes:''' This is unconfirmed, however this may be caused by a read or write to protected memory. The (x) will either be ''R'' or ''W'' if the exception occurred on the ''R''ead or the ''W''rite cycle.
 +
 
 +
'''VBR Offset:''' 0x100
  
'''VBR Offset:''' Unknown
+
'''Exception Code:''' 0x0A0 on read, 0x0C0 on write
'''Exception Code:''' Unknown
+
 
 +
'''Priority:''' 7
 
=== INTERRUPT ===
 
=== INTERRUPT ===
'''Causes:''' Unknown.
+
'''Causes:''' This is caused by a TRAPA #imm (unconditional trap) assembly opcode. This is used for purposefully triggering an exception that is handled by external code.
 +
 
 +
'''VBR Offset:''' 0x100
 +
 
 +
'''Exception Code:''' 0x160
  
'''VBR Offset:''' Unknown
+
'''Priority:''' 4
'''Exception Code:''' Unknown
 
  
This exception can be triggered by going into the diagnostic mode, pressing 3, 9, 2, F1. The "SYSTEM ERROR" message appears on screen with "INTERRUPT", but the target and PC lines are not shown. EXIT can't be used to reboot, despite what the text says. Only tested on the "fx-CG10/20 Manager" emulator with OS 01.02 by --[[User:Gbl08ma|Gbl08ma]] ([[User talk:Gbl08ma|talk]]) 10:30, 19 November 2014 (EST).
+
An example of encountering this exception is by going to the diagnostic mode, pressing 3, 9, 2, F1. The "SYSTEM ERROR" message appears on screen with "INTERRUPT", but the target and PC lines are not shown. EXIT can't be used to reboot, despite what the text says.<sup>[[User:Gbl08ma|credits]]</sup>
  
 
=== Illegal Code Err ===
 
=== Illegal Code Err ===
'''Causes:''' Unsure, execution hits an unknown opcode?
+
'''Causes:''' Assumed to be caused by either an unknown instruction being called or an opcode generated a general error (such as invalid usage of branch opcodes generating exceptions).
 +
 
 +
'''VBR Offset:''' 0x100
 +
 
 +
'''Exception Code:''' 0x180
 +
 
 +
'''Priority:''' 4
  
'''VBR Offset:''' Unknown
 
'''Exception Code:''' Unknown
 
 
=== TLB ERROR ===
 
=== TLB ERROR ===
'''Causes:''' There was a memory access to the virtual memory that isn't mapped to a physical memory page in the TLB page table.
+
'''Causes:''' There was a memory access to the virtual memory that isn't mapped to a physical memory page in the UTLB page table (and possibly the ITLB page table if the access was from an instruction fetch) and the OS was unable to handle it.  ''Note that the actual exception is normal behavior for the MMU and relies on the OS to handle it without generating errors.  This occurs if the OS cannot handle it.''
  
 
'''VBR Offset:''' 0x400
 
'''VBR Offset:''' 0x400
'''Exception Code:''' Unknown
+
 
 +
'''Exception Code:''' 0x040 if this was from a TLB miss in the UTLB cache from a read (Also a miss in the ITLB if this is an instruction fethc), 0x060 if this was from a TLB miss in the UTLB page table from a memory write.
 +
 
 +
'''Priority:''' 2 if from an ITLB miss, 6 if from a UTLB miss.

Revision as of 02:19, 30 November 2014

Debugging crashes is fun! Actually no. With a little massaging, you can use the information on the crash screen to get a better idea of what's going wrong, though.

Symptoms

A crash usually looks like this:

System ERROR
REBOOT    :[EXIT]
INITIALIZE:[EXE]
 TLB ERROR
 TARGET=D223420F
 PC    =081007C0

The lower three lines are the interesting ones, giving you the fault type, the memory access that caused the fault, and the PC value is where the fault occurred (the meaning of the PC depends on the exception type). In this case, it's a TLB fault when trying to access memory at 0xD223420F. It's usually a safe assumption (no matter the fault type) that it was cause by an invalid memory access.

Examining

By tweaking the linker options to emit a relocatable ELF object file (rather than the flat binary that is the default) [you can do this just by commenting out the first line of the prizm.x linker script], we can get an idea of what memory regions are in use:

$ sh3eb-elf-objdump -hr SDLTest.elf

SDLTest.elf:     file format elf32-sh

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         0001b234  00300000  00300000  00000080  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .rodata       00003300  0031b234  0031b234  0001b2b4  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .data         00000088  08100004  0031e534  0001e604  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  3 .bss          0000225c  0810008c  0031e5bc  0001e68c  2**2
                  ALLOC
  4 .comment      00000011  00000000  00000000  0001e68c  2**0
                  CONTENTS, READONLY
  5 .debug_info   000033df  00000000  00000000  0001e69d  2**0
                  CONTENTS, READONLY, DEBUGGING
  6 .debug_abbrev 00001b1f  00000000  00000000  00021a7c  2**0
                  CONTENTS, READONLY, DEBUGGING
  7 .debug_loc    00001bed  00000000  00000000  0002359b  2**0
                  CONTENTS, READONLY, DEBUGGING
  8 .debug_aranges 000002b0  00000000  00000000  00025188  2**0
                  CONTENTS, READONLY, DEBUGGING
  9 .debug_line   00000bbf  00000000  00000000  00025438  2**0
                  CONTENTS, READONLY, DEBUGGING
 10 .debug_str    0000040e  00000000  00000000  00025ff7  2**0
                  CONTENTS, READONLY, DEBUGGING
 11 .debug_frame  000003fc  00000000  00000000  00026408  2**2
                  CONTENTS, READONLY, DEBUGGING
 12 .debug_ranges 000001e8  00000000  00000000  00026804  2**0
                  CONTENTS, READONLY, DEBUGGING

The .debug_* sections can be safely ignored for now, since they provide the machine-readable mappings of addresses to names. Of particular interest are the .text, .data and .bss sections, which contain the code, initialized writable data, and uninitialized writable data respectively.

Referring to the error message, we attempted to access memory at 0xD223420F, which is far outside any expected ranges. The PC was 0x081007C0, which is suspicious- that's in .bss, which code should not be executing from. This is usually a symptom of a smashed stack, and makes debugging very difficult since it comes from executing bogus code and we have no way to retrieve a stack trace to see what went wrong earlier.

It appears that this particular crash was caused by a NULL pointer dereference, which is unusual (wouldn't expect that to make the system begin executing in .bss). Further additions to this page will probably be forthcoming as more experiments can be performed.

Exception Names

The third line on the System Error dialog tells you what exception the OS handled. This gives you an idea of what is wrong with the code at the PC. The Transition address is the address of the exception handler called. VBR is the address stored at 0xA0000000.

ADDRESS(x)

Causes: This exception is thrown if there is a data address error. This can be caused if a word is accessed across word boundary, a longword across a longword boundary, or a quadword across a quadword boundary. It is also possible that this may be called when an instruction address is used that is not word-aligned. The (x) will either be R or W if the exception occurred on the Read or the Write cycle.

VBR Offset: 0x100

Exception Code: 0x0E0 on read, 0x100 on write

Priority: 5

PROTECT(x)

Causes: This is unconfirmed, however this may be caused by a read or write to protected memory. The (x) will either be R or W if the exception occurred on the Read or the Write cycle.

VBR Offset: 0x100

Exception Code: 0x0A0 on read, 0x0C0 on write

Priority: 7

INTERRUPT

Causes: This is caused by a TRAPA #imm (unconditional trap) assembly opcode. This is used for purposefully triggering an exception that is handled by external code.

VBR Offset: 0x100

Exception Code: 0x160

Priority: 4

An example of encountering this exception is by going to the diagnostic mode, pressing 3, 9, 2, F1. The "SYSTEM ERROR" message appears on screen with "INTERRUPT", but the target and PC lines are not shown. EXIT can't be used to reboot, despite what the text says.credits

Illegal Code Err

Causes: Assumed to be caused by either an unknown instruction being called or an opcode generated a general error (such as invalid usage of branch opcodes generating exceptions).

VBR Offset: 0x100

Exception Code: 0x180

Priority: 4

TLB ERROR

Causes: There was a memory access to the virtual memory that isn't mapped to a physical memory page in the UTLB page table (and possibly the ITLB page table if the access was from an instruction fetch) and the OS was unable to handle it. Note that the actual exception is normal behavior for the MMU and relies on the OS to handle it without generating errors. This occurs if the OS cannot handle it.

VBR Offset: 0x400

Exception Code: 0x040 if this was from a TLB miss in the UTLB cache from a read (Also a miss in the ITLB if this is an instruction fethc), 0x060 if this was from a TLB miss in the UTLB page table from a memory write.

Priority: 2 if from an ITLB miss, 6 if from a UTLB miss.