bcm-v4

[Specification

This page describes the instruction format for the microcode in core revisions 5 and higher. 4 and lower use a different format.

general

The processor always works on 16-bit words. Hence, all memory addressing is also done in 16-bit quantities (except for jumps, which are done in instruction numbers or 8-byte quantities). For example, to write to the shared memory at byte offset 0x002 you have to write to 0x001 from the microcode.

The processor works completely in little endian.

instruction format

We write instructions as

ooo  Xxx yyy zzz

with each character indicating 4 bits, while the actual instruction in binary is then

xxyyyzzz0000oooX

if you treat the firmware as 32-bit values converted to big endian. If you treat it as 64-bit values and then convert to big endian, it becomes easier:

0000oooxxxyyyzzz

Core revisions 15 and up use yet a new microcode format. It uses the same instructions, but each operand has 13 instead of twelve bits, in 64-bit big endian again (this time each character being a bit):

0000000000000ooooooooooooXxxxxxxxxxxxxYyyyyyyyyyyyyZzzzzzzzzzzzz

Also, some instructions are written in the same way as

ooo  Xxx yyy jjj

or

ooo  Aaa bbb ccc

Which will be explained in more detail with the instruction. If nothing else is mentioned, xxx, yyy and zzz always denote regular operands as below.

operands

For core revision 5 to 14 the operands look like this.

xxx is (binary)

result

0b0mmm mmmm mmmm

m is a memory (shm) address (you can only use one as input per instruction)

0b100. .... ....

register access (you can only use one as input per instruction)

0b101r rroo oooo, 0 <= r <= 6

memory at oooooo + Base r

0b1011 10.. ....

(does not exist?)

0b1011 11rr rrrr

CPU register r

0b11ii iiii iiii

i is a 10-bit signed immediate (sign extended to 16 bits before operating with)

For core revisions 15 and up, the prefixes are the same but the variable part is longer although it's not sure whether there really are more registers and more memory.

xxx is (binary)

result

0b0mmmm mmmm mmmm

m is a memory (shm) address (you can only use one as input per instruction)

0b100.. .... ....

register access (you can only use one as input per instruction)

0b101rr rooo oooo

memory at ooooooo + Base r

0b10111 0... ....

(does not exist?)

0b10111 1rrr rrrr

CPU register r

0b11iii iiii iiii

i is a 11-bit signed immediate (sign extended to 16 bits before operating with)

instructions

arithmetic

add

1cL  xxx yyy zzz

zzz := xxx + yyy + (carry if applicable)

L can have values ORed from the following:

mask

meaning

0x1

use carry bit

0x2

set carry bit

This is not the same as the one for sub.

sub

1dL  xxx yyy zzz

zzz := xxx - yyy - (carry if applicable)

L can have values ORed from the following:

mask

meaning

0x1

use carry bit

0x2

set carry bit

This is not the same as the one for add.

multiply (rev 11+ only)

101  xxx yyy zzz

zzz := (xxx * yyy) >> 16
IHR[06d] := xxx * yyy

arithmetic right shift

130  xxx yyy zzz

zzz := xxx >> yyy (filling up with the sign bit)

logical

or

160  xxx yyy zzz

zzz := xxx | yyy

and

140  xxx yyy zzz

zzz := xxx & yyy

xor

170  xxx yyy zzz

zzz := xxx ^ yyy

logical right shift

120  xxx yyy zzz

zzz := xxx >> yyy

left shift

110  xxx yyy zzz

zzz := xxx << yyy

shift right over two registers

2MS  xxx yyy zzz

mask := 1<<(M+1) - 1
tmp  := (yyy<<16) | xxx
zzz  := (tmp >> S) & mask

rotate left

1a0  xxx yyy zzz

zzz := (xxx << yyy) | (xxx >> (16-yyy))

rotate right

1b0  xxx yyy zzz

zzz := (xxx >> yyy) | (xxx << (16-yyy))

clear bits

150  xxx yyy zzz

zzz := xxx & (~yyy)

or with shift and select

3MS  xxx yyy zzz

mask := 1<<(M+1) - 1
mask := (mask << S) | (mask >> (16-S))
tmp  := (xxx << S) | (xxx >> (16-S))
zzz  := (tmp & mask) | (yyy & ~mask)

jumps

All but the special jumps can have their meaning inverted by setting the lowest bit, i.e. jump if less or equal is implemented as jump if not bigger, hence 0d5 or 0dd.

jump if binary and

040  xxx yyy jjj

if (xxx & yyy)
    pc := jjj

jump if all bits set

050  xxx yyy jjj

Every bit set in x needs to be set in y.

if ((xxx & yyy) == xxx)
    pc := jjj

jump if equal

0d0  xxx yyy jjj

if (xxx == yyy)
    pc := jjj

jump if less (signed, two's complement)

0d2  xxx yyy jjj

if (xxx < yyy)
    pc := jjj

jump if bigger (signed, two's complement)

0d4  xxx yyy jjj

if (xxx > yyy)
    pc := jjj

jump if difference is negative (two's complement)

0d6  xxx yyy jjj

The carry from the subtraction is ignored.

int16_t tmp := xxx - yyy
if (tmp < 0)
    pc := jjj

jump if difference is positive (two's complement)

0d8  xxx yyy jjj

The carry from the subtraction is ignored.

int16_t tmp := xxx - yyy
if (tmp > 0)
    pc := jjj

jump if less (unsigned)

0da  xxx yyy jjj

if (xxx < yyy)
    pc := jjj

jump if bigger (unsigned)

0dc  xxx yyy jjj

if (xxx > yyy)
    pc := jjj

special jumps

jump if zero after shift and mask

4MS  xxx yyy jjj

if ((((yyy << 16 | xxx) >> S) & (1<<(M+1)-1)) == 0)
    pc := jjj

eg. 40X tests bit X of argument yyy

jump if non-zero after shift and mask

5MS  xxx yyy jjj

if ((((yyy << 16 | xxx) >> S) & (1<<(M+1)-1)) != 0)
    pc := jjj

jump on condition register

The register bc0 appears to be just a placeholder in the instruction. This makes sense because the instructions seem to be much faster than for example "jump if 0 < 1", so we can guess that these instructions are fast-tracked through the pipeline without fetch/store cycles or so.

6CB  bc0 bc0 zzz

if (!external condition xx)
    pc = zzz

7CB  bc0 bc0 zzz

if (external condition xx)
    pc = zzz

Where C consists of four bits as below and B is the bit to test.

C (four bits)

B (four bits)

3

2

1

0

3

2

1

0

EOI?

R (condition register to test)

B (bit to test)

Possible conditions are (this list is highly incomplete!):

condition register R

bit B

condition

0

RX condition register

0x3

RX FCS good (?)

0x6

RX complete

0xa

RX crypto engine busy?

0xd

RX FIFO full

0xe

RX PLCP good

1

RXE/MAC match condition register on core < 5

2

TX (?) condition register

0x4

MAC enabled

0xb

TX underflow (?)

0xc

TBTT timer expired (?)

0xd

PHY TX error (?)

0xe

TX flush requested (? should be checked...)

0xf

TX engine busy

3

PHY condition register

0x0

unknown, EOI'ed on each state machine restart

0x1

unknown, EOI'ed on each state machine restart

0x2

Radar related?

4

?

5

PSM condition register

B

condition register 5 is the PSM condition register. On my revision 5 core, only the lower 13 bits are available.

6

RCM condition register

0x0

RX RA match (RA matched during frame RX)

0x6

RX BSS match (BSSID matched during frame RX)

7

?

0xf

always true

subroutines (rev 5-14 cores only)

There are 4 link registers (0-3) available. They must be selected manually. When I write LR[xxx] below that means that xxx is a number from 0-3 (not an immediate or such!), e.g. 001. The link registers can also be accessed as PC Register 0-3 through the special offsets 0x868-0x86b. When you want to write the PC registers, take care to read them back afterwards, otherwise the change will not take effect.

Calls and returns must always be pairwise. It is valid to nest calls when using different link registers, but before reusing a link register with a call, a return must have been made.

call

002  aaa bbb jjj

LR[aaa] := pc+1
goto jjj

return

003  aaa bbb ccc

tmp = LR[ccc]
LR[aaa] := pc+1
goto tmp

Notes:

subroutines (rev 15+ cores only)

There seems to be some stacked calling mechanism. It's not known where the stack is stored and how it is set up, yet.

call (stacked)

004  1780 1780 jjjj

save pc (FIXME)
pc := jjjj

The first and second operands are just placeholders.

ret (stacked)

005  1780 1780 0000

pc := restore pc (FIXME)

The operands are just placeholders.

TKIP Sbox lookup

1e0  xxx yyy zzz

This instruction implements the (small) Sbox table lookup needed for TKIP.

if (yyy & 0x1)
    zzz = Sbox[Hi8(xxx)]
else
    zzz = Sbox[Lo8(xxx)]

if (yyy & 0x2)
    zzz = (zzz >> 8) | (zzz << 8)

nap

001  bc0 bc0 000

This instructions seems to let the microcode wait for events and/or a certain time. The arguments do not seem to have any meaning. The MAC nap time register influences the maximum time spent napping. The value zero means infinite maximum time, the MAC nap time register is counted down and after the nap will contain the remaining time when the nap was interrupted by other conditions.


Exported/Archived from the wiki to HTML on 2016-10-27