Just compare the data sheets - I did this years ago and they were very interesting.
But just a few of the points I can remember off the top of my head are:
- All instructions are the same length, ie 32-bit (or 64-bit for the upcoming 64-bit version)
- Multiple registers to work with. The ARM2 I'm familiar with had 16, modern ARMs might have doubled that now
- Load/store architecture
- Pipelined from the very first version, so that instructions take effectively one clock cycle to complete. To clarify, the instuctions all took 3 clock cycles, but the short pipeline overlapped them so they took an effective single cycle, making the CPU very fast, as instructions per clock (IPC) was very good
- Two instructions to load/store multiple registers in one instruction. These instructions take several cycles, but each word of data takes one cycle
- All instructions are conditional. Makes If-Then decisions very fast, such as branching or deciding whether or not an ADD instruction should execute, say, according to a condition flag. The unexecuted instruction still takes a clock cycle
- No indirect addressing modes to complicate the instruction decode. All data processing is done between registers
- No internal microcode required, as the decode logic is relatively simple. Microcode slows down a processor significantly. Instructions are hard-wired instead, making them much faster to execute
- Flexible memory addressing modes enhance efficiency further, allowing one simple instruction to perform more complex tasks than they could otherwise
- This architecture lends itself well to superscalar processing (more than one instruction per clock cycle). PowerPC already did this years ago
The above are typical traits of the RISC design philosophy and you would expect them to appear in RISC processors in general, especially the fixed length instructions and load/store architecture.
Actually, that was rather more info than I thought I'd remember.
In what ways is it inefficient? How would you improve the x86 instruction set to make it more efficient?
x86 is more or less the opposite of what I described above. These are typical CISC designs and were intended to process the highest code density and most complex instructions possible, given the tiny amount of memory that computers had back in the late 70's.
- Variable length instructions from a single byte to something like 7 or 8 bytes
- Indirect addressing modes
The above two especially, make decode logic very complex and hard to streamline
- microcoded to deal with the above complexity. Slows down the processor significantly compared to a hard-wired design.
- Small number of registers (3 I think)
x64 improves on these, but it's still a CISC design. You can see how the RISC design is more efficient by two facts of all modern x86 processors:
- Modern x86 processors for the last decade or so have broken down the CISC instructions to RISC-like ones internally, to help speed them up. As Byte magazine (remember them?!) said of x86: CISC won by stealing RISCs clothes. And by win, they meant becoming the dominant architecture commercially, not technical merit. This was around 1990. You can bet your boots that Intel didn't want to lose their niche position as near-sole manufacturer of x86 processors, so put all its corporate might behind it to make sure it would succeed
- The SSE instruction set extension looks more like RISC. Again, I'm not familiar with the fine details, but is what I remember from an article I read about it some time ago
I'm sure there's other things, but this is all I can remember off the top of my head. Having come from an Acorn background, where ARM originated, I'm much more familiar with the technical details of the ARM than the x86.
It's performance won't be any different than current cpus. It will just do things differently, requiring everybody to port or recode their apps. Nothing scales 100% in the computer world.
Yeah, it will be better, see my answer to erocker above for why.
The apps portability is part of that mountain to climb to gain acceptance and which I have discussed before. However, I'm making the point here about the raw performance of the ARM processor with turbocharging applied.