Considering SystemVerilog Tagged Unions


I have recently been thinking a lot about tagged unions in SystemVerilog, since I discovered them a few months ago. In this post I present some of the ideal use cases for tagged unions, and why I think no one actually uses them.

Despite being initially proposed in 2003, and officially part of the language since the SystemVerilog 1800-2005 standard was released, at this point tagged unions appear to be a forgotten language feature. I think this is a shame, because since I accidentally ran across the tagged union section of the LRM, I keep thinking of new applications where I would like to make use of them. Don’t get me wrong, there are plenty of reasons that you don’t see tagged unions used in the wild today (language ergonomics, tool support, etc.) but that doesn’t mean they couldn’t have been useful.

Quick Introduction to Sum Types

Tagged unions, commonly also referred to as “sum types”, at least in SystemVerilog, I like to think of a combination of two more commonly used data types, unions, and enums. Like an enum, the primary value (tag) of a tagged union variable can be one of several values, as defined by the type declaration. However, unlike a normal enum, along with the “tag”, the tagged union variable can store another value of a type defined by the tag. Each tag value can define a different type of value held by the union when the variable has that tag value. The syntax of tagged unions allow the compiler to enforce that the type of value assigned to the tagged union matches the type as defined by the tag, and it is always possible to know the type of value stored in the variable by looking at the tag, unlike a regular union.

By way of example, in the following code, we declare a tagged union type int_or_float_t, when assigned the tag INT_VAL it is guaranteed by the compiler to hold an integer value, and when assigned the tag FLOAT_VAL always a float value.

1
2
3
4
typedef union tagged {
  int   INT_VAL;
  float FLOAT_VAL
} int_or_float_t;
Simple tagged union example.

Unlike a normal union, if you assign a int_or_float_t variable to INT_VAL with an associate integer value, the language will not allow us to refer to the integer value as a float. It will either be a compile time or runtime error, depending on the syntax used.

Why not to use tagged unions

Before getting into why I think tagged unions should be used more, why have they been essentially forgotten in SystemVerilog? I believe this mainly comes down to two issues: lack of support from major simulator vendors and language ergonomics.

Lacking Support

Of the three major commercial simulators I recently tested, one (Cadence Xcelium) didn’t support tagged unions at all, and the other two (Mentor Questa and Synopsys VCS) have varying degrees of partial support. In my opinion this is the biggest reason we don’t see tagged unions used, if a feature isn’t supported by all of the big three simulators then it can’t be used in any portable VIP or standard libraries. Even in the two simulators that did have support, sadly neither supported constraining or randomizing tagged union variables. Being able to randomize both the tag and values of a tagged union would provide some interesting capabilities as I will discuss later.

Ergonomics

The other problem with tagged unions is that the syntax for using them is in some ways… clunky at best. The pattern matching case and if syntax, which is (pretty much) required to check the tag and access values of the tagged union are both fairly verbose with lots of additional tagged, matches kind of keywords. Even assignments to the union value (tag) require an additional tagged keyword. Additionally, unlike other language which have an equivalent tagged union or sum type, SystemVerilog doesn’t provide a way to “parameterize” or “template” the types held by a tagged union.

Between lack of support, and generally feeling like a feature that was added to the language without a huge amount of thought, I can’t really blame anyone for not using tagged unions. However I am, none the less, disappointed that I can’t use them because they are not supported by the tools I use every day. After having used sum types in other languages, I think they could bring a lot of value to verification code written in SystemVerilog.

Uses of Tagged Unions

Now if we ignore the legitimate issues with tagged unions discussed above, what could they be used for? Note that the proposals below are pretty much all currently theoretical. I haven’t tried to implement these, and some are even impossible today due to missing support in current tools (especially around randomization and constraints). Nevertheless, I think these ideas are worth more investigation, and would love to hear from anyone who has done anything similar in SV.

Error Handling using “Result” Types

Let’s face it, error handling in most SystemVerilog code is bad. The language itself does not provide “exceptions” or any other architected error handling pattern, so most code just doesn’t handle errors. When an error does happen, most code will either log an error message then give up, or it will log an error and carry on as if nothing happened. From a software engineering perspective, this first off just bad practice, but also can lead to unpredictable behavior following the first error in a simulation.

We can do better though! Another language that doesn’t have exceptions, but does have sum types is Rust. In Rust, the error handling problem is implemented using a Result tagged union type (called enum in rust). Any function that could fail returns a Result<T, E> value, if the function succeeds then an Ok value is returned with a return value of type T. If the function fails for some reason, then an Err value is returned, along with an E typed value representing the kind of error that occurred.

The nice thing about this error handling technique is that it requires no additional features from the language (such as exceptions), but provides a consistent way for functions to return errors to their caller in a way that cannot be ignored. The compiler doesn’t allow you to use the return value from the function without either handling errors, or at least explicitly ignoring them. In my experience, this pattern works well in Rust, and I think it could be a good pattern for error handling in SystemVerilog.

Below I show a “Result” type implemented in SystemVerilog with tagged unions. Because SystemVerilog doesn’t support parametrizing tagged union types, we can’t create a generic Result<T,E>type, but a simple macro could be used to automate the creation of result types. In this case, I am declaring the handling code for deserializing a byte stream into a packet where it is necessary to check packet length and a packet CRC, either of which could result in an error.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
typedef struct { int val1; int val2; } packet_t;  // Define a packet type
typedef enum { SIZE_ERR, CRC_ERR } error_e;       // Define a error type

// The Result tagged union type
typedef union tagged {
  packet_t Ok;
  error_e Err;
} packet_result_t;

// Deserialize function, returning packet_result_t type
function packet_result_t deserialize(byte data[$]);
    packet_t pkt;

    // Check for errors, returning an error enum value if there is one.
    if (data.size() != 16)
        return tagged Err SIZE_ERR;
    if (compute_crc(data[0:14]) != data[15])
        return taged Err CRC_ERR;

    // Extract the values
    pkt.val1 = {<<{data[0:3]}};
    pkt.val2 = {<<{data[8:11]}};
    
    // Return the packet value
    return tagged Ok pkt;
endfunction;
"Result" tagged union example.

The other half of this is how do you actually use the returned result type value. This is done using the “pattern matching conditional” syntax, as described in section 12.6 of the LRM. There are two forms of pattern matching conditionals, one for case statements, and one for if statements. This example shows using both to handle the “result” from the previously defined deserialize() function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
packet_result_t result;
result = deserialize(bytes);

// If pattern matching syntax. In this case errors are being ignored.
if (result matches (tagged Ok .pkt)) begin
    // Use the packet value
    // Inside this context, the `pkt` name can be used to refer to
    // the packet_t typed "Ok" value.
end

// Case matching syntax. This time we will actually do something with
//  the errors.
case (result) matches
    tagged Ok .pkt:
        begin
            // Use the packet. Once again `pkt` can be used to refer
            // to the packet_t value.
        end
    tagged Err .e &&& e == SIZE_ERR:
        begin
            // Handle specifically the packet size error condition.
        end
    tagged Err .e:
        begin
            // Handle any other error value. Once again the `e` value
            // can be used to determine the error type.
        end
endcase
Example showing how the result of the deserialize() function can be used.

Transaction, Packet, or Opcode subtypes

In design verification a common thing to do is to declare a class that represents a packet, opcode, or some other kind of “transaction”. These transaction classes contain variables that hold data from fields of the packet, or operands of the opcode, but what do you do when the transaction can have a different set of data values, depending on the type of transaction. In my experience I have seen two approaches used, either the transaction class contains a superset of all possible fields for the transaction, or sub-classes of the primary transaction type are declared for each transaction format, containing only the valid fields. The problem with including the super-set of fields is the user has to know which fields are valid in a transaction at any given time, and there is nothing to stop you from using an invalid field by mistake. On the other hand, declaring sub-classes can be very verbose, and requires a lot of explicit casts to work with values of the transaction.

However, tagged unions give us another option. As an example (inspired by one of the tagged union examples the SystemVerilog LRM), say we wanted a transaction that represents a CPU instruction where each instruction format has a different set of operands. We could define the instruction using a class that contains an anonymous tagged union as shown below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
class Instruction extends uvm_sequence_item;
    // Main opcode definition
    union tagged {
        // Binary arethmatic instruction format supports add, subtract, multiply, and divde
        // opcodes, with three operands, two source registers and one destination register
        struct {
            enum { ADD, SUB, MUL, DIV } opcode,
            bit [4:0] r1, r2, rdest;
        } BIN_FMT

        // Jump format instruction has a single destination literal operand with three
        // different opcodes.
        struct {
            enum { JMP, JEQ, JNE } opcode;
            bit [9:0] target;
        } JMP_FMT;
    } format;

    // Additional metadata fields can still be added to the transaction class.
    time start_time;
endclass : Instruction
Example of using tagged unions to express transacion with different field formats.

With this construction, the Instruction transaction class can hold all the operands for all instruction formats, but the compiler will only allow us to reference the fields for the instruction format actually used by the current value of the Instruction transaction. This sounds similar to the sub-class approach, however where with a sub-class implementation you would need to cast the Instruction to a BinFmtInstruction to access the r1 field (for example), in this implementation it is possible to refer to any field without casts as long as the format is correct. For example if we have already checked that format is BIN_FMT the expression inst.BIN_FMT.rdest will reference the rdest value of the instruction with no casts.

Is this actually better than the alternatives though? I believe it is a significant improvement over defining a transaction with a super-set of fields (the most common approach), since it ensures correctness when referencing the transaction fields. I am not sure it’s that much better than defining class sub-types for each packet format, since both require that you explicitly check the transaction format before you can reference values. However, I think the tagged union implementation would make the code easier to read with fewer $cast calls and variables of different subtypes hanging around.

Configuration

The final application where there could be real benefits from using tagged unions is in configuring sequences and components. Particularly where there is some mode setting where each mode has a different set of options that can be configured. Take the example of a sequence which generates transactions with a configurable traffic pattern. This might normally be implemented like the following:

1
2
3
4
5
6
7
8
9
typedef enum {PERIODIC, RAND, BURST} traffic_pattern_t;
class traffic_seq extends uvm_sequence;
    traffic_pattern_t pattern.
    rand time periodic_interval;
    rand time rand_min;
    rand time rand_max;
    rand time burst_interval;
    rand time burst_duration;
endclass : traffic_seq
Traditional implementation of a configurable traffic sequence.

In this implementation, the periodic_interval variable is only valid when pattern == PERIODIC. Likewise, the value of rand_min and rand_max are only used when pattern == RAND. At best it is confusing that you can set periodic_interval when pattern is set to BURST. A better way you could implement the same concept with tagged unions would look like the following:

1
2
3
4
5
6
7
class traffic_seq;
    union tagged {
        struct { time interval;                } PERIODIC;
        struct { time min;      time max;      } RAND;
        struct { time interval; time duration; } BURST;
    } pattern;
endclass : traffic_seq
Configuration traffic pattern sequence using tagged unions.

With the tagged union implementation the min and max values only exist (from the compiler’s perspective) when pattern == RAND. Additionally you cannot set pattern without setting it’s parameters. As an to set the pattern to BURST you would use an expression like:

seq.pattern = tagged BURST '{interval: 100ns, duration: 1ns};

I like this tagged union approach because not only is the assignment expression fairly concise and explicit, but it eliminates a whole class of errors and confusions. The configuration parameters for given traffic pattern type can only be accessed within the context of that pattern. There is no way to accidentally set periodic_internval instead of burst_interval. This tagged union design pattern could be applied to any use case where there is a setting where each option has it’s own parameters. For example you might use this for a configuration that controls a random distribution modes (think normal, uniform, geometric, etc.) where each distribution has it’s own unique arguments.

Wrap Up

As I mentioned at the beginning, the potential uses for tagged unions in SystemVerilog has been something I couldn’t get out of my mind for a few months. Although I have not been able to actually do a lot of work with them (due to Cadence lacking any kind of support), I think there are several good applications for tagged unions which would improve the readability and correctness of verification code, through syntax and compile time checks. Unfortunately, while tagged unions appear to have a lot of advantages, lacking tool support and poor syntax ergonomics sadly make it pretty much a non-starter from a practical perspective.

If anyone has actually used SystemVerilog tagged unions, I would love to know how you used them and how they worked out.