Posts Tagged ‘ cpld ’

Digital System Resets

Designing a reset architecture for a digital device such as an ASIC, FPGA, CPLD, etc. can be challenging.  Resets are a common culprit of metastability and unpredictable behavior.  Here I will discuss various reset architectures and how to properly use them.

Before you can begin to understand resets you must first understand flip-flops.  Flip-flops are the basic building block of all digital synchronous circuits.  Flip-flops are used to hold state between clock edges.  Flip-flops come in MANY varieties.  Flip-flops usually have between 0 and 2 signals that represent some sort of “reset”.  The 3 most common flip-flops are shown below (clock enables not shown):

Non-Resettable Flip-Flop:

Flip-flops don’t actually need any reset logic built-in.  External logic such as multiplexers can be used to emulate all the functionality of internal reset logic.  However, adding reset logic to the flip-flop directly greatly reduces the overall logic footprint.

Asynchronous Resettable Flip-Flop:

An asynchronous reset scheme enables a flip-flop to inherit a value when a specific signal is active.  The two asynchronous signal names are typically referred to as “preset” and “clear”.  Using positive logic, when the “preset” line is high, the output of the flip-flop is immediately forced high independent of the clock’s state and the input data.  Likewise, when the “clear” line is high, the output is forced low.

This waveform shows a simple asynchronous reset process.  On the first rising clock edge, the output ‘Q’ is set low because the input ‘D’ is low.  On the second rising clock edge, the output now goes high as a result of the input.  Between clock 2 and 3 the asynchronous clear signal goes high.  As soon as the signal reaches a full logic level 1, the output of the flip-flop is immediately forced low.

Synchronous Resettable Flip-Flop:

A synchronous reset scheme enables a flip-flop to inherit a value when a specific signal is active during an active clock edge.  The two synchronous signal names are typically referred to as “set” and “reset”.  Using positive logic and positive clock edges, when the “set” line is high during a positive clock edge, the flip-flop is forced high independent of the input data.  Likewise, when the “reset” line is high during a positive clock edge, the flip-flop is forced low.

This waveform shows a simple synchronous reset process.  On the first rising clock edge, the output ‘Q’ is set low because the input ‘D’ is low.  On the second rising clock edge, the output now goes high as a result of the input.  Between clock 2 and 3 the synchronous reset signal goes high.  This change does not effect the flip-flop output value until the third rising clock edge.  At this point the output is driven low even though the input signal is still high.

What Needs to be Reset?

A good reset design approach is “reset only what needs it”.  Things that need to be reset are flip-flops that must be put in a known state. Common examples are: finite state machine flip-flops; incrementing or decrementing counters; and control pipelines.

In general, data paths do not need to be reset.  Adding a reset to a large data path can cause excessive resource usage and routing delays.  Take care when deciding which flop-flops need to be reset.

Asynchronous/Synchronous Comparison:

Before deciding what reset architecture to use, let’s first define the advantages and disadvantages of the two styles.

Advantages of Asynchronous Resets:
  • Flip-flops immediately take the value of reset without dependence on a clock edge.
  • No signal synchronization needed for asynchronous input reset signals (like a push button reset).
Disadvantages of Asynchronous Resets:
  • Coming out of reset often causes metastability.
  • Chip-wide asynchronous resets cause modules to come out of reset at different times due to inconsistent delay paths.

Advantages of Synchronous Resets:

  • All modules come out of reset at the same time and timing assumptions can safely be made about module interfaces.
  • All clock/reset timing is taken care of by standard synthesis.
Disadvantages of Synchronous Resets:
  • Designs with large area will use excessive routing resources while trying to meet timing constraints.
  • Relies on the existence of a clock.  Signals won’t be reset until an active clock edge.

Note: this topic applies to all types of digital devices.  Each device type (FPGA, ASIC, etc.) will have optimal setups, but understanding your options will help you decide how to safely reset your device.

The Asynchronous Reset Problem:

For asynchronous resets, going into a reset state isn’t a problem.  When software tools are synthesizing, and placing components, asynchronous resets are a simple task because they are not related to a specific clock and have no timing constraints.

Asynchronous resets create a problem when the reset signal is being deactivated.  If the reset is released near an active clock edge the results of that clock cycle are unknown.  The following waveform shows this scenario:

At the start of the second clock cycle the clock rises and the clear signal falls.  What should the flip-flop be set to?  Will the input ‘D’ win the fight or will the clear signal?  The answer is that we don’t know.  Not knowing the state of a signal will certainly cause issues.  An even bigger problem is the violation of the setup and hold time requirements of the flip-flop.  Violating these requirements results in metastability.

Consider a state machine that has 3 states and is one-hot encoded with 001, 010, and 100.  Now consider the asynchronous deactivation problem.  What if bits 1 and 2 got reset but bit 3 did not?  The state could then be 101 and the circuit’s logic would consider the state machine to be in two states simultaneously.  Obviously this would kill the design.

Some designers attempt to overcome this problem by first synchronizing the reset to the appropriate clock domain then using it as a synchronous reset.  If this new synchronous reset is used globally, you’ve effectively converted your design to a synchronous reset architecture.  If the new reset signal is only used locally, you’ll create problems due to not knowing exactly when adjacent modules are in or out of reset.

The Synchronous Reset Problem:

Unlike asynchronous resets, synchronous resets must travel between flip-flops in one clock period.  During synthesis and place & route, the software tools will ensure that each reset signal will arrive at its destination before the active clock edge that it triggers on.  This may seem like a good thing because the designer now doesn’t need to worry about violating the setup and hold times of the flip-flops being reset.  This is true, but only on a small scale.

Synchronous resets, specifically global synchronous resets, create routing problems that lead to sub-optimal timing results.  Using a global synchronous reset effectively means that every block must see the same reset signal every clock cycle.  Routing one signal to all locations of a chip in one clock cycle requires a massive amount of routing resources or, depending on the clock speed and die size, is impossible.

Consider a large design with 3 major sub-designs.  Each sub-design must communicate with all other sub-designs so it is important to know that each block comes out of reset on the same clock cycle.  This is the main idea behind a global synchronous reset.

The small red block is a module that synchronizes the input reset to the clock in order to provide a synchronous reset to the rest of the chip.  Now consider the results if all 3 blocks directly use the reset as a synchronous reset.  All flip-flops using the reset signal will draw current from reset source.  For a large design, the fanout of this structure will cause most designs to fail static timing analysis.


From our discussion thus far, it’s apparent that working with synchronous resets is easier because the software tools will provide proper timing.  The first thing we need to do is synchronize the asynchronous input reset to our clock domain.  The synchronizer below outputs a reset that activates asynchronously and deactivates synchronously.  Using this style of synchronizer gives us the advantages of asynchronous resets and the safety of synchronous resets.

Now that we have a good reset signal we need to spread it across the chip efficiently.  We will create ‘M’ parallel reset pipelines of ‘N’ flip-flops.  ‘M’ is is the number of major blocks the design contains.  ‘N’ is determined according to clock speed and die size.  It needs to be high enough such that each reset pipeline can meet timing while delivering the reset to the desired location.

This figure shows M=3 and N=6.  The 3 separate pipelines are of equal length so each of the 3 blocks will receive the reset at the same time.  The 6 pipeline stages allow the place & route tools to easily make it across the chip while still meeting timing.  The pipeline stages work just like the synchronizer in that they produce a asynchronous reset assertion and a synchronous reset deassertion.

After the HDL is in place to generate the circuits described above, synthesis and timing constraints must be used in order for this reset architecture to work.

  1. A synthesis directive must be placed on all flip-flops in the pipeline stages informing the synthesizer to keep all flip-flop instances.  By default, the synthesizer will see that the pipeline stages are parallel versions of each other and “optimize” them away.  For Synopsys constraints, the “syn_keep” directive will perform this task.
  2. In order to use the advantage of the asynchronously asserted reset, the reset must be used asynchronously in the HDL.  Because it is asynchronous, the synthesizer will assume no timing dependencies relative to the clock.  However, we must guarantee that the deassertion of the reset is synchronous.  A place & route constraint must be placed between all stages of the pipeline and between the last stage and its destination.  The constraint must ensure that the reset reaches its destination without violating the setup and hold times of the input flip-flops.  If the reset is used synchronously, this step can be skipped.

Other Links:

EETimes: How do I reset my FPGA?

Using HDL the Right Way

For digital design, the fastest way to design a circuit is using a hardware description language (HDL).  All HDLs have one common flaw, they have constructs and syntax that do NOT describe hardware.  This causes fundamental issues for engineers designing ASICs, FPGAs, CPLDs, etc.

To overcome the fundamental problem with HDLs, I propose a few simple steps to allow designers to write code that translates to optimally synthesized logic.  The steps are:

1.  Use an HDL to describe the hardware.

The key word in “hardware description language” is description.  HDLs should be used to describe a digital circuit.  Unfortunately engineers often use an HDL to create a circuit that were it not for the synthesizer they would have no idea how to design it.  This almost always results in a sub-optimal design.  If you don’t know how to make a circuit, why should the synthesizer?

Before writing any HDL code, you should sit down and either make a diagram or have a good mental view of the circuit you are trying to design.  Once you have this, you can use the HDL to syntactically describe the circuit.

2.  Use only HDL syntax that can directly synthesize to logic.

As mentioned earlier, HDLs contain syntax that doesn’t describe digital hardware.  Do not use these constructs.  Only use blatant hardware-based assignments and operators.  This will allow your synthesized design to follow closely to what is found in the HDL code.

Books like “The Designer’s Guide to VHDL” actually do the designer a disservice.  99% of this book talks about unsynthesizable code while the last 1% is useful synthesizable code.  Digital logic is very easy.  It doesn’t require many types of syntax.  Our digital design world would flow much better if HDLs were only designed to describe hardware.  The unfortunate fact that many HDLs have testbench-like syntax causes less-knowledgable designers to use these constructs out of ignorance.  This will undoubtedly burn them at some point in their career.

3.  Use a netlist viewer.

After designing a digital circuit using an HDL, synthesize it and use an RTL netlist viewer to verify that the synthesized design contains the proper logic.  This is critical!  Often times the synthesizer will not properly infer logic blocks.  Using a netlist viewer will allow you to double-check the synthesized results.  This will also help you find bugs in your code that may not have been syntax bugs.  Usually when a bug makes it through the synthesis stage, the result will be quite different from what you expected.  Using an RTL netlist viewer creates one more process step, but it will reduce your development time because you’ll find and correct problems in earlier stages.

For you FPGA and CPLD designers, using a technology netlist viewer will give you yet another verification step.  This is very useful when you are using an HDL to describe some device primitive such as block RAMs, clock muxes, tristate drivers, dual-data registers, digital signal processing (DSP) blocks, and many more.

4.  Do not use your hardware HDL for your testbench HDL.

This is a commonly debated topic.  I don’t think you absolutely have to follow my advice to produce good hardware, but I definitely think it makes it easier.  I believe that there is a fundamental problem with HDLs in that they attempt to satisfy the syntactical needs of hardware and testbenches.  This would be better off split into two languages.  Using the same language causes issues because if you made a logic mistake in your hardware, what makes you think you wouldn’t make the same or inverse mistake in your testbench?

For myself I have adopted a pretty simple strategy.  I write all my hardware code in VHDL or Verilog.  I only use basic hardware-like constructs and avoid any use of complex functions that have no simple hardware explanation.  For testing, I use SystemVerilog.  SystemVerilog provides a very cool interface between hardware and computer-language-like programmability.  The typical problem with creating testbenches is that you feel like you are creating another hardware suite.  Using SystemVerilog I create drivers which send and receive object-oriented data structures to and from my top-level hardware design.  These drivers have a hardware side that is attached to my hardware design.  They also have a programmable side which is attached to my testing logic.

Here is an example.  If my hardware design was an IP packet router, I would create a SystemVerilog class that represents an IP packet.  I can use computer-language-like programming to create and monitor the status of these packets.  From this programmable side, I send all the created packets to the driver.  The driver takes the data and communicates with my hardware unit over the physical protocol defined by the hardware.  I would also have a driver for receiving packets.  After all is said and done, I can use typical C++ like programming to verify proper IP routing of my hardware device.  Simple, right?

Examples of what NOT to do:

Example #1:

It is common in communication systems to send a known pattern of bits at the beginning of each frame so that the receiving side can synchronize itself to the bit stream. For communication systems, you often need to be tolerant of a few bits errors.  To search for the sync bits, you just need to XOR the last received bits with the known sync pattern then count the number of ones, which is the number of errors.  For counting the post-XOR ones, I often see a VHDL function declared like this:

function count_ones (a : std_logic_vector) return unsigned is
  variable b : unsigned(log(a'length) downto 0) := (others => '0');
  for i in a'range loop
    if (a(i) = '1') then
      b := b + 1;
    end if;
  end loop;
  return b;

This may look harmless, but try to think of what kind of hardware it will make. All the synthesizers I’ve tried this on make an a’length series sequence of b’length adders.  Obviously this produces absolutely horrible timing results.  There are better ways to count ones.  Don’t get stuck with a sequences of adders.

Example #2:

Back when I was a digital design rookie, I was trying to figure out how to take a binary number and produce a sequence of BCD values.  For example 10100010(162) would convert to 0001(1), 0110(6), 0010(2).  I found a commonly known algorithm for this.  The 8-bit algorithm is:

1.  If any column (100’s, 10’s, 1’s, etc.) is 5 or greater, add 3 to that column.
2.  Shift all #’s to the left 1 position.
3.  If 8 shifts have been performed, it’s done! Evaluate each column for the BCD values.
4.  Go to step 1.

I then attempted to translate this into hardware.  I wanted a completely combinational implementation for single clock latency.  This is what I naively produced:

module bcd (
    input [7:0] binary,
    output reg [3:0] hundreds,
    output reg [3:0] tens,
    output reg [3:0] ones);

    integer i;
    always @(binary) begin
        // set 100's, 10's, and 1's to zero
        hundreds = 4'd0;
        tens = 4'd0;
        ones = 4'd0;

        // loop 8 times
        for (i=7; i>=0; i=i-1) begin
            // add 3 to columns >= 5
            if (hundreds >= 5)
                hundreds = hundreds + 3;
            if (tens >= 5)
                tens = tens + 3;
            if (ones >= 5)
                ones = ones + 3;

            // shift left one
            hundreds = hundreds << 1;
            hundreds[0] = tens[3];
            tens = tens << 1;
            tens[0] = ones[3];
            ones = ones << 1;
            ones[0] = binary[i];

Yes, I know, there numerous issues in this code. Can anyone look at this code and figure out what it will make?  I can’t!  Even though this code properly produces the BCD sequence, it produces a very large combinational path.  Don’t use it!