Overview
Cascade is an automated coprocessor synthesis solution. It boosts system processing performance by creating a loosely coupled programmable coprocessor that accelerates the execution of compiled binary executable software code offloaded from the Central Processing Unit (CPU). The coprocessor thus requires no compiler, and supports the continued use of the established CPU and its associated investment in design tools and infrastructure.
The Cascade solution delivers the parallel processing resources of a customized processor, but with dramatically less design effort, and with none of the system re-design costs normally necessitated by the deployment of an additional processor - standard or custom.
Utilizing user-defined performance requirements and resource constraints, Cascade is used to rapidly:
- Co-optimize coprocessor architecture and software to maximize overall performance, or
- Synthesize a coprocessor that maximizes the execution speed of legacy software 'as is' - a true software re-use methodology.
Cascade maximizes system performance by:
- Enabling optimal software partitioning.
- Automatically optimizing cache design with data pre-fetch capability to minimize memory/system latency.
- Automatically minimizing bus communication overhead.
The resulting system performance- and hardware resource-optimized coprocessor is a programmable engine that acts as a seamless extension to the CPU. The coprocessor functions as a standard peripheral that communicates with the CPU via the system bus, and is clock-activated only as required.
Figure 1: Cascade Coprocessor Synthesis Flow
Cascade can synthesize two coprocessor configurations, both of which are ideal for accelerating packet- and frame-based applications:
- A slave coprocessor that is closely controlled by the CPU, and communicates regularly with it
- An autonomous streaming coprocessor with Direct Memory Access that allows the CPU to execute other tasks in parallel, or to enter low power mode. This configuration incurs less communications and hardware overhead than the slave coprocessor.
Cascade generates an optimized coprocessor in synthesizable RTL with synthesis scripts, an instruction- and bit-accurate C functional model, and a testbench that verifies the implementation with the same stimuli and expected responses as those of the CPU, ensuring functional equivalence. The implementation then proceeds through the designer's own system-on-chip (SoC), FPGA or structured ASIC design and verification flows.
Deployment of a Cascade coprocessor requires:
- No processor design expertise.
- No software re-targeting.
- No memory architecture redevelopment.
- No communications protocol redevelopment.
- No new software development tools.

