This project was centered around the design and verification of a 64-tap Finite Impulse Response (FIR) filter core for real-time digital signal processing.
The filter was required to correctly compute the FIR convolution using 16-bit fixed-point arithmetic, support preloaded coefficients, and operate across asynchronous clock domains while maintaining one valid output per input sample at a 10 kHz sampling rate.







The entire FIR filter design was written in Verilog and implemented using a resource-shared architecture built around a folded multiply-accumulate (MAC) unit, allowing a single multiplier and adder to process all 64 taps over multiple cycles. Input samples were transferred across asynchronous clock domains using an asynchronous FIFO, while a shift-register chain stored delayed samples for the convolution computation. Coefficients were stored in dedicated memory and accessed sequentially under the control of a finite state machine (FSM) that coordinated coefficient reads, MAC operations, and output timing.
After functional verification, the design was fully synthesized to a gate-level netlist using Synopsys Design Compiler. Static Timing Analysis (STA) was then performed using Synopsys PrimeTime to identify the critical path, verify timing closure, and determine the maximum operating frequency. Post-synthesis simulations were run on the gate-level netlist to confirm that the synthesized hardware matched the verified RTL behavior. Power and area were analyzed using PrimeTime for power estimation and Synopsys Design Compiler synthesis reports for area characterization.
To validate numerical correctness, a MATLAB reference model of the FIR filter was used alongside the RTL design. Identical randomized input samples and coefficient sets were applied to both the MATLAB model and the Verilog testbench, and the resulting output sequences were captured for direct comparison. The normalized root mean square error (NRMSE) was computed between the hardware output and the MATLAB reference to quantify numerical accuracy across the full dataset.


The FIR core achieved a throughput of 10 kS/s, matching the input sampling rate, with a maximum clock frequency of 25.27 MHz, providing substantial timing margin. Power analysis showed an energy efficiency of 38,300 pJ/sample, with total power dominated by internal switching activity rather than leakage. The synthesized design occupied 0.132 mm², with area primarily attributed to coefficient storage and sequential logic.
Accuracy verification against the MATLAB reference model demonstrated 0.00% worst-case and average NRMSE, confirming that the hardware implementation exactly matched the ideal FIR filter behavior.

