I have recently been thinking a lot about tagged unions in SystemVerilog, since I discovered them a few months ago. In this post I present some of the ideal use cases for tagged unions, and why I think no one actually uses them.
Despite being initially proposed in 2003, and officially part of the language since the SystemVerilog 1800-2005 standard was released, at this point tagged unions appear to be a forgotten language feature. I think this is a shame, because since I accidentally ran across the tagged union section of the LRM, I keep thinking of new applications where I would like to make use of them. Don’t get me wrong, there are plenty of reasons that you don’t see tagged unions used in the wild today (language ergonomics, tool support, etc.) but that doesn’t mean they couldn’t have been useful.
Quick Introduction to Sum Types
Tagged unions, commonly also referred to as “sum types”, at least in SystemVerilog, I like to think of a combination of two more commonly used data types, unions, and enums. Like an enum, the primary value (tag) of a tagged union variable can be one of several values, as defined by the type declaration. However, unlike a normal enum, along with the “tag”, the tagged union variable can store another value of a type defined by the tag. Each tag value can define a different type of value held by the union when the variable has that tag value. The syntax of tagged unions allow the compiler to enforce that the type of value assigned to the tagged union matches the type as defined by the tag, and it is always possible to know the type of value stored in the variable by looking at the tag, unlike a regular union.
By way of example, in the following code, we declare a tagged union type
int_or_float_t, when assigned the tag
INT_VAL it is guaranteed by the compiler to hold an integer value, and when assigned the tag
FLOAT_VAL always a float value.
Unlike a normal union, if you assign a
int_or_float_t variable to
INT_VAL with an associate integer value, the language will not allow us to refer to the integer value as a float. It will either be a compile time or runtime error, depending on the syntax used.
Why not to use tagged unions
Before getting into why I think tagged unions should be used more, why have they been essentially forgotten in SystemVerilog? I believe this mainly comes down to two issues: lack of support from major simulator vendors and language ergonomics.
Of the three major commercial simulators I recently tested, one (Cadence Xcelium) didn’t support tagged unions at all, and the other two (Mentor Questa and Synopsys VCS) have varying degrees of partial support. In my opinion this is the biggest reason we don’t see tagged unions used, if a feature isn’t supported by all of the big three simulators then it can’t be used in any portable VIP or standard libraries. Even in the two simulators that did have support, sadly neither supported constraining or randomizing tagged union variables. Being able to randomize both the tag and values of a tagged union would provide some interesting capabilities as I will discuss later.
The other problem with tagged unions is that the syntax for using them is in some ways… clunky at best. The pattern matching
if syntax, which is (pretty much) required to check the tag and access values of the tagged union are both fairly verbose with lots of additional
matches kind of keywords. Even assignments to the union value (tag) require an additional
tagged keyword. Additionally, unlike other language which have an equivalent tagged union or sum type, SystemVerilog doesn’t provide a way to “parameterize” or “template” the types held by a tagged union.
Between lack of support, and generally feeling like a feature that was added to the language without a huge amount of thought, I can’t really blame anyone for not using tagged unions. However I am, none the less, disappointed that I can’t use them because they are not supported by the tools I use every day. After having used sum types in other languages, I think they could bring a lot of value to verification code written in SystemVerilog.
Uses of Tagged Unions
Now if we ignore the legitimate issues with tagged unions discussed above, what could they be used for? Note that the proposals below are pretty much all currently theoretical. I haven’t tried to implement these, and some are even impossible today due to missing support in current tools (especially around randomization and constraints). Nevertheless, I think these ideas are worth more investigation, and would love to hear from anyone who has done anything similar in SV.
Error Handling using “Result” Types
Let’s face it, error handling in most SystemVerilog code is bad. The language itself does not provide “exceptions” or any other architected error handling pattern, so most code just doesn’t handle errors. When an error does happen, most code will either log an error message then give up, or it will log an error and carry on as if nothing happened. From a software engineering perspective, this first off just bad practice, but also can lead to unpredictable behavior following the first error in a simulation.
We can do better though! Another language that doesn’t have exceptions, but does have sum types is Rust. In Rust, the error handling problem is implemented using a
Result tagged union type (called
enum in rust). Any function that could fail returns a
Result<T, E> value, if the function succeeds then an
Ok value is returned with a return value of type
T. If the function fails for some reason, then an
Err value is returned, along with an
E typed value representing the kind of error that occurred.
The nice thing about this error handling technique is that it requires no additional features from the language (such as exceptions), but provides a consistent way for functions to return errors to their caller in a way that cannot be ignored. The compiler doesn’t allow you to use the return value from the function without either handling errors, or at least explicitly ignoring them. In my experience, this pattern works well in Rust, and I think it could be a good pattern for error handling in SystemVerilog.
Below I show a “Result” type implemented in SystemVerilog with tagged unions. Because SystemVerilog doesn’t support parametrizing tagged union types, we can’t create a generic
Result<T,E>type, but a simple macro could be used to automate the creation of result types. In this case, I am declaring the handling code for deserializing a byte stream into a packet where it is necessary to check packet length and a packet CRC, either of which could result in an error.
The other half of this is how do you actually use the returned result type value. This is done using the “pattern matching conditional” syntax, as described in section 12.6 of the LRM. There are two forms of pattern matching conditionals, one for case statements, and one for if statements. This example shows using both to handle the “result” from the previously defined
Transaction, Packet, or Opcode subtypes
In design verification a common thing to do is to declare a class that represents a packet, opcode, or some other kind of “transaction”. These transaction classes contain variables that hold data from fields of the packet, or operands of the opcode, but what do you do when the transaction can have a different set of data values, depending on the type of transaction. In my experience I have seen two approaches used, either the transaction class contains a superset of all possible fields for the transaction, or sub-classes of the primary transaction type are declared for each transaction format, containing only the valid fields. The problem with including the super-set of fields is the user has to know which fields are valid in a transaction at any given time, and there is nothing to stop you from using an invalid field by mistake. On the other hand, declaring sub-classes can be very verbose, and requires a lot of explicit casts to work with values of the transaction.
However, tagged unions give us another option. As an example (inspired by one of the tagged union examples the SystemVerilog LRM), say we wanted a transaction that represents a CPU instruction where each instruction format has a different set of operands. We could define the instruction using a class that contains an anonymous tagged union as shown below:
With this construction, the
Instruction transaction class can hold all the operands for all instruction formats, but the compiler will only allow us to reference the fields for the instruction format actually used by the current value of the
Instruction transaction. This sounds similar to the sub-class approach, however where with a sub-class implementation you would need to cast the
Instruction to a
BinFmtInstruction to access the
r1 field (for example), in this implementation it is possible to refer to any field without casts as long as the format is correct. For example if we have already checked that
BIN_FMT the expression
inst.BIN_FMT.rdest will reference the
rdest value of the instruction with no casts.
Is this actually better than the alternatives though? I believe it is a significant improvement over defining a transaction with a super-set of fields (the most common approach), since it ensures correctness when referencing the transaction fields. I am not sure it’s that much better than defining class sub-types for each packet format, since both require that you explicitly check the transaction format before you can reference values. However, I think the tagged union implementation would make the code easier to read with fewer
$cast calls and variables of different subtypes hanging around.
The final application where there could be real benefits from using tagged unions is in configuring sequences and components. Particularly where there is some mode setting where each mode has a different set of options that can be configured. Take the example of a sequence which generates transactions with a configurable traffic pattern. This might normally be implemented like the following:
In this implementation, the
periodic_interval variable is only valid when
pattern == PERIODIC. Likewise, the value of
rand_max are only used when
pattern == RAND. At best it is confusing that you can set
periodic_interval when pattern is set to
BURST. A better way you could implement the same concept with tagged unions would look like the following:
With the tagged union implementation the
max values only exist (from the compiler’s perspective) when
pattern == RAND. Additionally you cannot set pattern without setting it’s parameters. As an to set the pattern to
BURST you would use an expression like:
I like this tagged union approach because not only is the assignment expression fairly concise and explicit, but it eliminates a whole class of errors and confusions. The configuration parameters for given traffic pattern type can only be accessed within the context of that pattern. There is no way to accidentally set
periodic_internval instead of
burst_interval. This tagged union design pattern could be applied to any use case where there is a setting where each option has it’s own parameters. For example you might use this for a configuration that controls a random distribution modes (think normal, uniform, geometric, etc.) where each distribution has it’s own unique arguments.
As I mentioned at the beginning, the potential uses for tagged unions in SystemVerilog has been something I couldn’t get out of my mind for a few months. Although I have not been able to actually do a lot of work with them (due to Cadence lacking any kind of support), I think there are several good applications for tagged unions which would improve the readability and correctness of verification code, through syntax and compile time checks. Unfortunately, while tagged unions appear to have a lot of advantages, lacking tool support and poor syntax ergonomics sadly make it pretty much a non-starter from a practical perspective.
If anyone has actually used SystemVerilog tagged unions, I would love to know how you used them and how they worked out.