Discussion:
PPC603 hang at external interrupt
(too old to reply)
John Howley
2003-08-22 16:49:13 UTC
Permalink
I have seen something similar to this on the MPC8260, which has a 603e
core. It was caused by a bus error and it put the chip in a checkstop
state -- I could look at registers and memory with an emulator, but
couldn't step through any instructions. The bus error was caused by
trying to access a memory location that did not exist (or wasn't
configured yet -- it was very early in the project). The 8260 has a
couple of registers, TESCR1 and TESCR2 ("60x Bus Transfer Error Status
and Control Register") that showed the errors. I would expect your
processor to have something similar.

John
Dear All,
I'm sorry that my question is about discontinued model, MPC(XPC)603.
We have a own product using MPC603 & MPC105, almost similer to PReP
model.
This product sometimes hang up, and we thought that it might be
our programs BUG.
So we have tried to set JTAG debugger to catch what happend.
Fortunately, we caught the state for 4 times, but we can't
understand what's up with the chip.
1. debugger always stop at same address, and is at the
ext.Interrupt handler, fetch and execute "stwu" opcode
first time in this interrupt.
2. this states always appear at ext.Interrupt just after
"divw" or "divwu" opcode execution.
3. we didn't set any break points with debugger, but it
stopped just like system hangs.
4. All the memory can be accessed normaly via JTAG debugger
after it stop, but step execution is not.
I could't look for any information like that, so I have asked
to motorola, but they didn't have valid information.
For your reference, I'd like to attach 2 difference samples of
registers states when JTAG debugger stopped and disasm source
includes executing points before intterupt occured, also source
code of Int.handler.
Any suggetions appreciated. Thank you.
* registers states when JTAG debugger stopped
gpr 0=003235ac 1=017ffe08 2=00000000 3=00000000 4=000009c4
5=009fbc18 6=009c70fc 7=20000000
8=003235ac 9=00000000 10=00000002 11=0000f030 12=000000a4
13=00000000 14=00000000 15=00000000
16=00000000 17=00000000 18=00000000 19=005ad450 20=005ad458
21=00002710 22=ffffffff 23=00000001
24=00000004 25=00bf0c5c 26=00000004 27=00000014 28=00d2f50c
29=00000000 30=00c1d5e0 31=00000294
tgpr0=ffffffff 1=ffffffff 2=ffffffff 3=ffffffff
dsisr=00000000 dar=00000000 msr=00001030 dec=4a2b483e
rtbu=0000013f rtbl=bbf30a7c
hid0=8000c000 lr=003235ac srr0=00323574 srr1=0000f030
ear=00000000 ctr=00000000
iabr=00000002 cr=44000000 xer=20000000 fpscr=00000000
sdr1=00000000 resrv=00000000
imis=00000000 dmis=00000000 sprg0=017ffe08 sprg1=009fbc18
sprg2=00c1d5e0 sprg3=00000294
seg 0=00000000 1=00000000 2=00000000 3=00000000 4=00000000
5=00000000 6=00000000 7=00000000
8=00000000 9=00000000 10=00000000 11=00000000 12=00000000
13=00000000 14=00000000 15=00000000
ib h0=000000ff l0=00000002 h1=fff0000e l1=fff00001 h2=00000000
l2=00000000 h3=00000000 l3=00000000
db h0=000003ff l0=00000012 h1=80001fff l1=80000032 h2=c0001fff
l2=c0000032 h3=d0001ffe l3=d0000032
ip=00003304 00003304: stwu r12,-4(r1) # store word,
update, offset 0xfffffffc
00003308: stwu r11,-4(r1) # store word,
update, offset 0xfffffffc
0000330c: stwu r10,-4(r1) # store word,
update, offset 0xfffffffc
* executing codes before Int. occured
Address Opcode Disassembly Operation
00323558: lwz r30,8(r25) # load word, offset 0x0008
0032355c: cmpi 0,0,r10 # compare r10 to 0
00323560: bng +0314 # branch if not greater
00323874
00323564: addi r22,0,0xffff # add -1 to r22
00323568: addi r21,0,0x2710 # add 10000 to r21
0032356c: lhz r4,22(r28) # load halfword,
00323570: divw r26,r21,r4 r21 by r4
<-------- External
Interrupt occured
00323574: lhz r10,38(r30) # load halfword,
00323578: addi r4,r26,0x0 # move r26 to r4
0032357c: rlwinm r9,r4,0,16,31 # mask r4 with 0000ffff
00323580: cmp 0,0,r10,r9 # compare r10 to r9
00323584: beq +0010 # branch, if equal, 00323594
00323588: sth r4,38(r30) # store halfword,
0032358c: addi r3,r30,0x0 # move r30 to r3
00323590: bl +fffffa38 # branch and link to 00322fc8
00323594: stw r22,64(r1) # store word, offset 0x0040
00323598: addi r5,r1,0x44 # add 68 to r1
0032359c: addi r6,r1,0x40 # add 64 to r1
003235a0: addi r3,r25,0x0 # move r25 to r3
* registers states when JTAG debugger stopped
gpr 0=00000000 1=017ffe08 2=00000000 3=00bccd3c 4=ffffffff
5=00000238 6=009c70fc 7=20000000
8=00323bc8 9=00000000 10=00000238 11=0000f030 12=000003e8
13=00000000 14=00000000 15=00000000
16=00000000 17=00000000 18=00000000 19=005ad450 20=005ad458
21=005d2a80 22=030c3c20 23=00000010
24=00000000 25=005ad454 26=030c3c20 27=030c3c20 28=00000070
29=006c0d20 30=00000000 31=00bccd14
tgpr0=ffffffff 1=ffffffff 2=ffffffff 3=ffffffff
dsisr=00000000 dar=00000000 msr=00001030 dec=8b9dd0bc
rtbu=00000149 rtbl=17cfe785
hid0=8000c000 lr=00323be0 srr0=0022c81c srr1=0000f030
ear=00000000 ctr=00000000
iabr=00000002 cr=44000000 xer=20000000 fpscr=00000000
sdr1=00000000 resrv=00000000
imis=00000000 dmis=00000000 sprg0=017ffe08 sprg1=009fcd78
sprg2=00000000 sprg3=00bccd14
seg 0=00000000 1=00000000 2=00000000 3=00000000 4=00000000
5=00000000 6=00000000 7=00000000
8=00000000 9=00000000 10=00000000 11=00000000 12=00000000
13=00000000 14=00000000 15=00000000
ib h0=000000ff l0=00000002 h1=fff0000e l1=fff00001 h2=00000000
l2=00000000 h3=00000000 l3=00000000
db h0=000003ff l0=00000012 h1=80001fff l1=80000032 h2=c0001fff
l2=c0000032 h3=d0001ffe l3=d0000032
ip=00003304 00003304: stwu r12,-4(r1) # store word,
update, offset 0xfffffffc
00003308: stwu r11,-4(r1) # store word,
update, offset 0xfffffffc
0000330c: stwu r10,-4(r1) # store word,
update, offset 0xfffffffc
* executing codes before Int. occured
Address Opcode Disassembly Operation
0022c7e4: addi r31,r3,0x0 # move r3 to r31
0022c7e8: bl +0005cba0 # branch and link to 00289388
0022c7ec: lwz r0,72(r1) # load word, offset 0x0048
0022c7f0: mtlr r0 # move r0 to LR
0022c7f4: addi r3,r31,0x0 # move r31 to r3
0022c7f8: lwz r31,60(r1) # load word, offset 0x003c
0022c7fc: addi r1,r1,0x40 # add 64 to r1
0022c800: bclr 0,20 # cond branch to LR
0022c804: # pad word:0x00000000
0022c808: # pad word:0x00000401
0022c80c: lwz r0,0(r1) # load word, offset 0x0000
0022c810: lwz r5,4(r3) # load word, offset 0x0004
0022c814: addi r12,0,0x3e8 # add 1000 to r12
0022c818: divwu r0,r5,r12 # divide r5 by r12 unsigned
<-------- External
Interrupt occured
0022c81c: lwz r10,0(r3) # load word, offset 0x0000
0022c820: addi r11,0,0x3e8 # add 1000 to r11
0022c824: divwu r4,r5,r11 # divide r5 by r11 unsigned
0022c828: mull r0,r0,r12 # multiply r0 times r12
0022c82c: add r10,r4,r10 # add
0022c830: stw r10,0(r3) # store word, offset 0x0000
0022c834: subf r12,r0,r5 # subtract r0 from r5
0022c838: stw r12,4(r3) # store word, offset 0x0004
0022c83c: bclr 0,20 # cond branch to LR
0022c840: # pad word:0x00000000
0022c844: # pad word:0x00000400
0022c848: # pad word:0x00000000
0022c84c: lwz r6,4(r3) # load word, offset 0x0004
0022c850: addi r12,0,0x3e8 # add 1000 to r12
3. Interrupt handler source code
mtsprg2 r3
ena_bat # BAT reg enable (macro)
mfsprg2 r3
mtsprg3 r31
mtsprg2 r30
mfsrr1 r31
mfcr r30
rlwinm. r31,r31,0,17,17
bc 12,2,glow_sp_irq
mtsprg1 r1
mfsprg0 r1
mtcr r30
mfsprg3 r31
mfsprg2 r30
stwu r12,-4(r1) <------ stopped at here
stwu r11,-4(r1)
stwu r10,-4(r1)
stwu r9, -4(r1)
stwu r8, -4(r1)
stwu r7, -4(r1)
stwu r6, -4(r1)
stwu r5, -4(r1)
stwu r4, -4(r1)
stwu r3, -4(r1)
stwu r2, -4(r1)
mfsprg1 r4
d***@excite.co.jp
2003-08-25 01:06:36 UTC
Permalink
Hi, John.

Tahnk you very much for your suggestion.
Post by John Howley
I have seen something similar to this on the MPC8260, which has a 603e
core. It was caused by a bus error and it put the chip in a checkstop
state -- I could look at registers and memory with an emulator, but
couldn't step through any instructions. The bus error was caused by
trying to access a memory location that did not exist (or wasn't
configured yet -- it was very early in the project). The 8260 has a
couple of registers, TESCR1 and TESCR2 ("60x Bus Transfer Error Status
and Control Register") that showed the errors. I would expect your
processor to have something similar.
John
Your reply is very useful to me, and is match to my doubt what happen.
By our mistake of software, machine check exception is not valid now.
Among the boot sequence, boot program is searching for existence of
optional graphic board, and if not board exist, TEA/MCP of MPC105 is
asserted(connected to MPC603:TEA/MCP), so we invalidate machine check
exception.
In our state, memory is valid, however, something similer to bus error
may be occured, and we doubt for ethernet chip(DEC 21140A), because of
states always occurred at very heavy network traffics.

I'd like to catch the evidence, but after boot sequence, MCP from MPC105
is asserted and unable to negate it.
Does someone know how to negate it? or any other ideas to catch the evidence?

At last, thanks a lot again for your suggetion.

regards

deacon1
Terry Greeniaus
2003-08-25 01:24:40 UTC
Permalink
Post by d***@excite.co.jp
I'd like to catch the evidence, but after boot sequence, MCP from MPC105
is asserted and unable to negate it.
Does someone know how to negate it? or any other ideas to catch the evidence?
On the MPC106 and MPC107, MCP remains asserted until the CPU performs a
read from 0x00000200-0x00000207 or 0xFFF00200-0xFFF00207 (the two
possible addresses of the start of the machine check exception handler).
I don't have an MPC105 manual handy, but it probably uses the same
method.

Note that if the code for the machine check exception has already been
cached, the CPU won't perform bus accesses and the MPC105/6/7 won't see
that the CPU has taken the exception. So you should probably also put
in a dummy non-cacheable read to one of those locations in your
machine check exception handler so that it is properly acknowledged to
the MPC105/6/7. Either that or invalidate the first cacheline of your
machine check handler (once the PC has left that cacheline) each time
the exception is taken so that the next time one happens it will need
to again fetch it from the 60x bus, which may be safer since it doesn't
perform unnecessary dummy bus accesses when your hardware may be in a
problem state.

TG
d***@excite.co.jp
2003-08-25 07:01:19 UTC
Permalink
Thank you very much for your assistance, Terry.

I'd like to try as you dicribed.
Post by Terry Greeniaus
On the MPC106 and MPC107, MCP remains asserted until the CPU performs a
read from 0x00000200-0x00000207 or 0xFFF00200-0xFFF00207 (the two
possible addresses of the start of the machine check exception handler).
I don't have an MPC105 manual handy, but it probably uses the same
method.
Note that if the code for the machine check exception has already been
cached, the CPU won't perform bus accesses and the MPC105/6/7 won't see
that the CPU has taken the exception. So you should probably also put
in a dummy non-cacheable read to one of those locations in your
machine check exception handler so that it is properly acknowledged to
the MPC105/6/7. Either that or invalidate the first cacheline of your
machine check handler (once the PC has left that cacheline) each time
the exception is taken so that the next time one happens it will need
to again fetch it from the 60x bus, which may be safer since it doesn't
perform unnecessary dummy bus accesses when your hardware may be in a
problem state.
May I ask you about your suggestion?
Is it only dummy read the location of MCP exception handler with
cache-inhibit state to negate MCP?
No need to execute the handler code?

Thanks in advance.

regards,
deacon1
Terry Greeniaus
2003-08-25 09:27:23 UTC
Permalink
Post by d***@excite.co.jp
May I ask you about your suggestion?
Is it only dummy read the location of MCP exception handler with
cache-inhibit state to negate MCP?
No need to execute the handler code?
Any read access to 60x busses to the first instruction of the machine
check handler should deassert the /MCP line in the bridge controller.

The idea originally seems to be that the CPU would take a machine check
and then do an instruction fetch from 0xnnn00200 which the bridge
controller would see and then deassert /MCP.

The only problem with that is if you have taken a machine check earlier
and then cached that code - then the bridge would not deassert /MCP and
you would not get new exceptions.

That is why I suggest either a dummy read or invalidating that cache
line - a dummy read acknowledges the interrupt after the read has
completed, and invalidating the cacheline means the first instruction
fetch for the next machine check exception acknowledges the interrupt.

In either case, /MCP should be deasserted, since AFAIK the controller
does not distinguish between instruction fetches and data accesses.

TG
Masahiko_Yamaura
2003-08-26 00:55:51 UTC
Permalink
Thank you so much, Terry.

Your explanation is so useful to me, an I'll try it.
Post by Terry Greeniaus
Post by d***@excite.co.jp
May I ask you about your suggestion?
Is it only dummy read the location of MCP exception handler with
cache-inhibit state to negate MCP?
No need to execute the handler code?
Any read access to 60x busses to the first instruction of the machine
check handler should deassert the /MCP line in the bridge controller.
The idea originally seems to be that the CPU would take a machine check
and then do an instruction fetch from 0xnnn00200 which the bridge
controller would see and then deassert /MCP.
The only problem with that is if you have taken a machine check earlier
and then cached that code - then the bridge would not deassert /MCP and
you would not get new exceptions.
That is why I suggest either a dummy read or invalidating that cache
line - a dummy read acknowledges the interrupt after the read has
completed, and invalidating the cacheline means the first instruction
fetch for the next machine check exception acknowledges the interrupt.
In either case, /MCP should be deasserted, since AFAIK the controller
does not distinguish between instruction fetches and data accesses.
TG
with best regards,

deacon1

Loading...