# Don't Wait For Me

## The dangerous scope of "wait fork" in Systemverilog

From the perspective your more traditional software programming languages, multi-tasking in SystemVerilog is weird. On one hand, the language includes a special syntax (fork and join) that make it really easy to write code that “runs” in parallel. On the other hand, SystemVerilog “processes” aren’t really processes, or even threads, in the traditional sense. Even if you “get” the SystemVerilog model of cooperative cooperative multi-tasking, there are a number of pitfalls you can run into. One such gotcha that I recently ran into was caused by what appeared to be an innocuous wait fork statement.

## An Unexpected Infinite Wait

The issue was in an old (I mean really old, pre-UVM) test case that hadn’t been run in a long time, and appeared to hang in a task that issued an bus read transaction. From inspecting the simulation waves and logs, I could tell that the bus transaction was completing without issue, but the task was still never returning. The basic structure of this the test code is shown above, a process is forked off, then the do_read() task is called. This obviously calls the content of do_read() into question, after digging through a few translation layers (did I mention this code was old), I found the code that actually did the bus transaction, which I’ll call do_read_internal(), looked something like this:

Now, if you know what you’re looking for you may already see the issue. But before I give it all away, suffice it to say, the problem here is the use of the wait fork statement. So let’s talk about what that last wait fork statement is actually doing.

## A Not So Brief Introduction to wait fork

Lets start in the obvious place, what does the SystsemVerilog LRM (Language Reference Manual) say about wait fork? This is summary of the statement that we can find in section 9.6.1 Wait fork statement:

The wait fork statement blocks process execution flow until all immediate child subprocesses (processes created by the current process, excluding their descendants) have completed their execution.

This seems fairly simple, it is a statement that makes a process wait until all sub-processes that were started from the current process have finished. The problem with this construct is that it’s not always obvious which processes “all immediate child subprocesses” refer to. Lets start with a simple example:

In this case the fork/join_none block is going to spawn two subprocesses which wait for a period of time and then print a message. Because a join_none was used, execution of the main_process block will not wait for these subprocesses to finish before continuing. Then the wait fork statement will wait for both the two subprocesses running the subprocess_1 and subprocess_2 blocks to finish, and then print “Done”. Easy enough. Now, what if we make it a bit more complicated and move the wait fork into another task.

In this case we start subprocesses subprocess_1 and subprocess_2, and don’t wait for them to finish, just like the previous example. However we now call a task task_1() which also spawns two new processes subprocess_3 and subprocess_4, then executes a wait fork statement, before returning and printing “Done”. If wait fork followed normal scoping rules like one might naively expect it to wait for subprocesses 3 and 4 only, but if we look at the output from this example, we see this is not the case.

The reason that the wait fork waits for all four of the spawned subprocess to finish, is that simply that they are all subprocesses of the process from which wait fork was called, the one started from the main_process initial block. This is because calling a task or another SystemVerilog scope doesn’t start a new process, in fact the only way that a new process is spawned (from an existing one) is through a fork statement. Because the code running in task_1 is still part of the same process, “all immediate subprocesses” includes subprocess_1 and subprocess_2.

This final example demonstrates, in a more obvious way, the bug in my problem code above. In this case, the task task_1 is first called from main_process which starts two subprocesses infinite_subprocess and subprocess_1, then immediately returns because the block is terminated with join_none. Then task_2 is executed, which forks two more processes (subprocess_2 and subprocess_3) and finally executes a wait fork statement. In this case the wait fork statement is never actually going to finish, because it has to wait for the conspicuously named infinite_subprocess (which never ends), even though it was started from a different task.

Finally, note that the wait fork in task_2 does not have to wait for the process infinite_sub_subprocess because it is not a direct subprocess of the main_process. The infinite_sub_subprocess process is forked from the subprocess_1 process, but because it uses join_none, the process does not wait for infinite_sub_subprocess, and leaves the sub-subprocess running, unattached, from any parent process.

## What Went Wrong?

Going back to my hanging test case. Now that we’ve looked at some examples of how wait fork works, it’s pretty obvious if you know where to look. The main test code starts an infinite process, and then calls a task which start it’s own process and then uses wait fork to wait for it’s process, but unintentionally ends up waiting on an infinite loop to end. The do_read_internal task has no visibility to know that an infinite process has even been spawned from it’s process, but wait fork will wait for it anyway.

The thing I find devious about this bug is that it demonstrates how starting of a subprocess in one place can cause a completely unrelated piece of code to hang, potentially forever. In fact it is entirely possible that the code that starts the infinite loop subprocess is in one library and the wait fork statement is in another library. All that is required to cause havoc is that they are called from the same process. When I realized this it made me reconsider the safety of using wait fork, and at this point it seems like it’s only really safe to use in very specific applications.

## Remediation

Unfortunately, there is not a version of wait fork with a more limited scope (such as waiting for subprocess of the current task or scope). This leaves us with three, not great, alternatives to implement the functionality that our do_read_internal was going for: starting a non-block process, and then conditionally waiting for it to finish.

The first, and in my opinion best, option in this case is to simply declare an event that gets emitted by the started subprocess on completion. Then to wait for the subprocess to end, the main process can just wait on the event, instead of using wait fork. Aside from needing to declare a new event variable, this works fairly well if there is only one subprocess to wait on. This implementation would look something like the following:

Another option is to use the SystemVerilog “fine-grain process control” features (see section 9.7 of the LRM). Essentially this feature provides an API to get a reference to a “process” object associated with the current process, which can then be used to wait on or otherwise manage the process. In this case, we can have the subprocess store it’s “process” object in a variable, and then use the await() task of the process object to wait for the subprocess to end. Something like the following:

The two options above work fine when you are starting a single subprocess, and then want to conditionally wait for it to finish, but what if you are starting an indeterminate number of processes and want to wait for them to all finish? In this case the behavior of wait fork to wait for all subprocesses is a really convenient feature. However as we have seen, when calling wait fork you need to ensure you know which subprocesses it will actually wait for. The only way to really do this for sure, is to fork a new process within which we carefully control which processes are started. This does have the overhead of creating an extra another process, but this could be reasonable trade-off in some cases. An example of this implementation is shown below.

## Conclusions

In the end, I chose to take the first option and replace the wait fork with a wait on a done event, and everything works fine. However, I am also now taking a much more critical look at all other uses of wait fork in our repository, as I now have a better appreciation for some of the pitfalls of this statement.

There are certainly places where it is mostly safe. Use of wait fork directly within a fork/join block is generally safe because it’s harder to get an unexpected subprocess. Likewise use within a UVM sequence body() task also appears to be fairly safe, because sequence bodies are always started in a new process. However, even in these fairly well controlled uses it is important to remember that any task or function that is called could start an unexpected infinite subprocess, and then we’re right back to the same problem.

I’m not saying don’t use wait fork, but I do think it needs to be used with caution. It is easy to forget, or not be aware of, what all it will wait for.