
General Description of AST's CDSP Customizable Fixed-Point DSP Core
The CDSP is a general purpose, high performance customizable fixed-point DSP core featuring high execution speeds for both signal-processing algorithms and standard microprocessor applications. It is meant to be used as an embedded cell in ASICs developed on most of the 0.6u and below technologies. It is highly customizable and can be targeted at a large number of technologies thanks to its parameterized, HDL-only based design.
The CDSP can be optimized for most of the common DSP algorithms to obtain a highly efficient, low power and small area implementation. It is therefore most suitable for low-cost, high-volume applications.
A number of productivity tools have been developed to ease the elaboration/deploying of DSP applicaions on the CDSP. These include an Assembly Language Integrated Development Environment (aIDE) and a collection of standard DSP functions (CDSPLib). A K&R C Language Integrated Development Environment (cIDE) is currently under development
Architectural features:
- Single-cycle execution for most instructions.
- Two-operand instruction set with one operand residing in memory and the other in a register
- A dual-operation instruction-word option enabling sustained rates of two operations per cycle in memory-access intensive algorithms such as buffered image processing and adaptive filtering
- Four internal data busses enabling up to four internal data transfers per cycle
- Zero-cycle Block-repeat capability plus a standard looping instruction
- Special bank-based memory architecture enabling efficient usage of data types that are smaller than a processor word
- Very compact code and large addressing space
- Eight logical shifts and four arithmetic shifts
- Configurable hardware multiplier
- Provision for full accommodation of MAC(s) by mapping one MAC input and the output on the register file and providing the full range of indexed addressing modes for the second input
- Configurable butterfly unit enabling execution speeds comparable with the cutting edge parallel DSP processors on the market.
- Up to three index registers fully featured with modulo and bit-reversed post-increment addressing capability
- A constant-memory table-lookup pointer featured with post-increment/post-decrement options
- Synchronous program memory implementable as a RAM/ROM combination, enabling the DSP with run-time programmability feature via the comm ports
- Less than one cycle response when in wait mode allowing fast synchronization with predictable asynchronous events
- Option for shadow registers allowing zero-cycle context saving for one or more levels of interrupts
Customizable features include:
- The size of the processor word (up to 64 bits) and the point-position within the fixed-point registers
- The RAM and ROM sizes
- The number of integer and fixed-point registers
- The number and choice of shadow registers
- The number of index registers and the features of the address generators, including modulo and bit-reversed addressing modes
- The amount of shifting for the shift instructions
- The addressing space (up to 2 GW)
- The number, size and operation mode of the communication ports
- The performance of the hardware multiplier, ranging from one result-bit per cycle up to pipelined single-cycle
- Two instruction set implementation options, targeting lower power consumption and higher maximum execution speed respectively
- And more...
Performance for a typical 0.6u/5V technology implementation:
- 75MHz or 133MHz operation depending on instruction-set implementation option
- Sustained 100 MIPS performance, leading to various execution speeds depending on the architecture variant and the specific algorithm being implemented. Peak performance for typical DSP algorithms is higher than:
- 100 MOPS for the basic architecture
- 200 MOPS by using the dual-operation instruction-set option
- 300 MOPS by using the dual-operation instruction-set option and a register-mapped MAC
- As an example, an 8-channel ITU-G726 ADPCM algorithm can be implemented on a 60 MHz basic architecture
- A 10,000 256-point FFTs/second rate can be obtained at 75 MHz operation, by using the dedicated butterfly unit.
- The basic architecture, without the MAC(s) and the butterfly unit, has a 130 MIPS/Watt ratio, leading to less than 300mW power dissipation for a 40-bit processor running at 50MHz