This page describes the instruction format for the microcode in core revisions 5 and higher. 4 and lower use a different format.
Contents
- general
- instruction format
- operands
-
instructions
- arithmetic
- logical
- jumps
- subroutines (rev 5-14 cores only)
- subroutines (rev 15+ cores only)
- TKIP Sbox lookup
- nap
general
The processor always works on 16-bit words. Hence, all memory addressing is also done in 16-bit quantities (except for jumps, which are done in instruction numbers or 8-byte quantities). For example, to write to the shared memory at byte offset 0x002 you have to write to 0x001 from the microcode.
The processor works completely in little endian.
instruction format
We write instructions as
ooo Xxx yyy zzz
with each character indicating 4 bits, while the actual instruction in binary is then
xxyyyzzz0000oooX
if you treat the firmware as 32-bit values converted to big endian. If you treat it as 64-bit values and then convert to big endian, it becomes easier:
0000oooxxxyyyzzz
Core revisions 15 and up use yet a new microcode format. It uses the same instructions, but each operand has 13 instead of twelve bits, in 64-bit big endian again (this time each character being a bit):
0000000000000ooooooooooooXxxxxxxxxxxxxYyyyyyyyyyyyyZzzzzzzzzzzzz
Also, some instructions are written in the same way as
ooo Xxx yyy jjj
or
ooo Aaa bbb ccc
Which will be explained in more detail with the instruction. If nothing else is mentioned, xxx, yyy and zzz always denote regular operands as below.
operands
For core revision 5 to 14 the operands look like this.
xxx is (binary) |
result |
0b0mmm mmmm mmmm |
m is a memory (shm) address (you can only use one as input per instruction) |
0b100. .... .... |
register access (you can only use one as input per instruction) |
0b101r rroo oooo, 0 <= r <= 6 |
memory at oooooo + Base r |
0b1011 10.. .... |
(does not exist?) |
0b1011 11rr rrrr |
CPU register r |
0b11ii iiii iiii |
i is a 10-bit signed immediate (sign extended to 16 bits before operating with) |
For core revisions 15 and up, the prefixes are the same but the variable part is longer although it's not sure whether there really are more registers and more memory.
xxx is (binary) |
result |
0b0mmmm mmmm mmmm |
m is a memory (shm) address (you can only use one as input per instruction) |
0b100.. .... .... |
register access (you can only use one as input per instruction) |
0b101rr rooo oooo |
memory at ooooooo + Base r |
0b10111 0... .... |
(does not exist?) |
0b10111 1rrr rrrr |
CPU register r |
0b11iii iiii iiii |
i is a 11-bit signed immediate (sign extended to 16 bits before operating with) |
instructions
arithmetic
add
1cL xxx yyy zzz
zzz := xxx + yyy + (carry if applicable)
L can have values ORed from the following:
mask |
meaning |
0x1 |
use carry bit |
0x2 |
set carry bit |
This is not the same as the one for sub.
sub
1dL xxx yyy zzz
zzz := xxx - yyy - (carry if applicable)
L can have values ORed from the following:
mask |
meaning |
0x1 |
use carry bit |
0x2 |
set carry bit |
This is not the same as the one for add.
multiply (rev 11+ only)
101 xxx yyy zzz
zzz := (xxx * yyy) >> 16 IHR[06d] := xxx * yyy
arithmetic right shift
130 xxx yyy zzz
zzz := xxx >> yyy (filling up with the sign bit)
logical
or
160 xxx yyy zzz
zzz := xxx | yyy
and
140 xxx yyy zzz
zzz := xxx & yyy
xor
170 xxx yyy zzz
zzz := xxx ^ yyy
logical right shift
120 xxx yyy zzz
zzz := xxx >> yyy
left shift
110 xxx yyy zzz
zzz := xxx << yyy
shift right over two registers
2MS xxx yyy zzz
mask := 1<<(M+1) - 1 tmp := (yyy<<16) | xxx zzz := (tmp >> S) & mask
rotate left
1a0 xxx yyy zzz
zzz := (xxx << yyy) | (xxx >> (16-yyy))
rotate right
1b0 xxx yyy zzz
zzz := (xxx >> yyy) | (xxx << (16-yyy))
clear bits
150 xxx yyy zzz
zzz := xxx & (~yyy)
or with shift and select
3MS xxx yyy zzz
mask := 1<<(M+1) - 1 mask := (mask << S) | (mask >> (16-S)) tmp := (xxx << S) | (xxx >> (16-S)) zzz := (tmp & mask) | (yyy & ~mask)
jumps
All but the special jumps can have their meaning inverted by setting the lowest bit, i.e. jump if less or equal is implemented as jump if not bigger, hence 0d5 or 0dd.
jump if binary and
040 xxx yyy jjj
if (xxx & yyy) pc := jjj
jump if all bits set
050 xxx yyy jjj
Every bit set in x needs to be set in y.
if ((xxx & yyy) == xxx) pc := jjj
jump if equal
0d0 xxx yyy jjj
if (xxx == yyy) pc := jjj
jump if less (signed, two's complement)
0d2 xxx yyy jjj
if (xxx < yyy) pc := jjj
jump if bigger (signed, two's complement)
0d4 xxx yyy jjj
if (xxx > yyy) pc := jjj
jump if difference is negative (two's complement)
0d6 xxx yyy jjj
The carry from the subtraction is ignored.
int16_t tmp := xxx - yyy if (tmp < 0) pc := jjj
jump if difference is positive (two's complement)
0d8 xxx yyy jjj
The carry from the subtraction is ignored.
int16_t tmp := xxx - yyy if (tmp > 0) pc := jjj
jump if less (unsigned)
0da xxx yyy jjj
if (xxx < yyy) pc := jjj
jump if bigger (unsigned)
0dc xxx yyy jjj
if (xxx > yyy) pc := jjj
special jumps
jump if zero after shift and mask
4MS xxx yyy jjj
if ((((yyy << 16 | xxx) >> S) & (1<<(M+1)-1)) == 0) pc := jjj
eg. 40X tests bit X of argument yyy
jump if non-zero after shift and mask
5MS xxx yyy jjj
if ((((yyy << 16 | xxx) >> S) & (1<<(M+1)-1)) != 0) pc := jjj
jump on condition register
The register bc0 appears to be just a placeholder in the instruction. This makes sense because the instructions seem to be much faster than for example "jump if 0 < 1", so we can guess that these instructions are fast-tracked through the pipeline without fetch/store cycles or so.
6CB bc0 bc0 zzz
if (!external condition xx) pc = zzz
7CB bc0 bc0 zzz
if (external condition xx) pc = zzz
Where C consists of four bits as below and B is the bit to test.
C (four bits) |
B (four bits) |
||||||
3 |
2 |
1 |
0 |
3 |
2 |
1 |
0 |
EOI? |
R (condition register to test) |
B (bit to test) |
Possible conditions are (this list is highly incomplete!):
condition register R |
bit B |
condition |
0 |
RX condition register |
|
0x3 |
RX FCS good (?) |
|
0x6 |
RX complete |
|
0xa |
RX crypto engine busy? |
|
0xd |
RX FIFO full |
|
0xe |
RX PLCP good |
|
1 |
RXE/MAC match condition register on core < 5 |
|
2 |
TX (?) condition register |
|
0x4 |
MAC enabled |
|
0xb |
TX underflow (?) |
|
0xc |
TBTT timer expired (?) |
|
0xd |
PHY TX error (?) |
|
0xe |
TX flush requested (? should be checked...) |
|
0xf |
TX engine busy |
|
3 |
PHY condition register |
|
0x0 |
unknown, EOI'ed on each state machine restart |
|
0x1 |
unknown, EOI'ed on each state machine restart |
|
0x2 |
Radar related? |
|
4 |
? |
|
5 |
PSM condition register |
|
B |
condition register 5 is the PSM condition register. On my revision 5 core, only the lower 13 bits are available. |
|
6 |
RCM condition register |
|
0x0 |
RX RA match (RA matched during frame RX) |
|
0x6 |
RX BSS match (BSSID matched during frame RX) |
|
7 |
? |
|
0xf |
always true |
subroutines (rev 5-14 cores only)
There are 4 link registers (0-3) available. They must be selected manually. When I write LR[xxx] below that means that xxx is a number from 0-3 (not an immediate or such!), e.g. 001. The link registers can also be accessed as PC Register 0-3 through the special offsets 0x868-0x86b. When you want to write the PC registers, take care to read them back afterwards, otherwise the change will not take effect.
Calls and returns must always be pairwise. It is valid to nest calls when using different link registers, but before reusing a link register with a call, a return must have been made.
call
002 aaa bbb jjj
LR[aaa] := pc+1 goto jjj
return
003 aaa bbb ccc
tmp = LR[ccc] LR[aaa] := pc+1 goto tmp
Notes:
- May not occur directly after any jump
- If a PC register has been modified, it will adhere to that
subroutines (rev 15+ cores only)
There seems to be some stacked calling mechanism. It's not known where the stack is stored and how it is set up, yet.
call (stacked)
004 1780 1780 jjjj
save pc (FIXME) pc := jjjj
The first and second operands are just placeholders.
ret (stacked)
005 1780 1780 0000
pc := restore pc (FIXME)
The operands are just placeholders.
TKIP Sbox lookup
1e0 xxx yyy zzz
This instruction implements the (small) Sbox table lookup needed for TKIP.
if (yyy & 0x1) zzz = Sbox[Hi8(xxx)] else zzz = Sbox[Lo8(xxx)] if (yyy & 0x2) zzz = (zzz >> 8) | (zzz << 8)
nap
001 bc0 bc0 000
This instructions seems to let the microcode wait for events and/or a certain time. The arguments do not seem to have any meaning. The MAC nap time register influences the maximum time spent napping. The value zero means infinite maximum time, the MAC nap time register is counted down and after the nap will contain the remaining time when the nap was interrupted by other conditions.