| Login | | Don't have an account yet? You can create one. As a registered user you have some advantages like theme manager, comments configuration and post comments with your name. | |
| Who's Online | There are currently, 61 guest(s) and 1 member(s) that are online.
You are Anonymous user. You can register for free by clicking here | |
 | |
|
Verification Guild: Forums |
|
| View previous topic :: View next topic |
| Author |
Message |
romi Senior


Joined: Feb 28, 2004 Posts: 88 Location: Minnesota
|
Posted: Wed Jun 02, 2004 9:20 am Post subject: Writing testbenches when using an emulator |
|
|
We have always written our testbenches using the sythesizable subset of Verilog and VHDL because we could easily then run the testbench on a simulator (NCSIM) or an emulator (Cobalt/Palladium).
We lose the advantages of high-level modeling when doing this. An alternative would be to maintain two totally different testbenches, one for each environment. This obviously requires considerably more resources. It seems like we are able to do everyhing we want to with the sythesizable subset and I'm questioning whether it's worth it to maintain two very different testbenches.
I'm looking for responses about what others have done faced with a similar situation of needing to use both a simulator and emulator to find all bugs and what features we are really missing out on by not modelling at a high-level. Maintainability? Performance? Coverage feedback?
Thanks. |
|
| Back to top |
|
 |
alain Junior


Joined: Jan 11, 2004 Posts: 5
|
Posted: Thu Jun 03, 2004 5:44 pm Post subject: |
|
|
Hi,
I was recently visiting a customer (I can't give names obviously) that has a very nice setup: they use high-level C++ transactors to send packets in and out of their networking chip. It gives them very impressive speed in simulation, and just plain fast when switching to emulation.
The trend is to stay away from Verilog for testbench, especially the RTL subset. Go with C++: that will work through the PLI with all your existing simulators, and will pay for itself when switching to emulation. For instance, TestBenchPlus from Zaiq (one of our partners) gives you a nice layer that abstracts the simulation engine (Verilog or emulation) so you don't have to worry about any of these details, you just write your high-level transactions, go really fast and cover all the testcases you care about. Switching between simulation and emulation is literally changing a switch in the GUI.
With that methodology, you can actually run your simulations faster than if you were using in-circuit emulation (meaning connected to real hardware through bridges and what not). Plus you don't have to worry about side-effects of interfacing to slowed down hardware, so everything behaves as if running "at-speed". That's a big win to actually create interesting corner cases for instance.
Alain. |
|
| Back to top |
|
 |
alexg Senior

![]()
Joined: Jan 07, 2004 Posts: 586 Location: Ottawa
|
Posted: Fri Jun 04, 2004 2:46 pm Post subject: |
|
|
| Quote: | | The trend is to stay away from Verilog for testbench, especially the RTL subset. |
High-level transactor still have to be supported by low-level models or monitors, transfering high-level transactions into /from the cycle-precise protocols.
We may use various languages to generate high-level transactions: C++, SystemC, Vera etc.
However, for the low-level models/monitors, IMHO, Verilog is still the right choice. It is also good idea to make them synthesisable for two reasons:
1. Usage in Formal tools. Synthesisable BFMs/monitors may greatly simplify as constraints, as properties-to-verify by increasing abstraction level from signals to transactions.
2. Emulation. Both BFMs and monitors may be synthesized along with design, allowing simple access using for example C++ transactors mentioned above.
Regards,
Alexander Gnusin |
|
| Back to top |
|
 |
z Senior

![]()
Joined: Jan 09, 2004 Posts: 92
|
Posted: Fri Jun 04, 2004 9:00 pm Post subject: |
|
|
| alain wrote: | | I was recently visiting a customer (I can't give names obviously) that has a very nice setup: they use high-level C++ transactors to send packets in and out of their networking chip. It gives them very impressive speed in simulation, and just plain fast when switching to emulation. |
| alexg wrote: |
High-level transactor still have to be supported by low-level models or monitors, transfering high-level transactions into /from the cycle-precise protocols. |
Alain's claim seemed too good to be true. If the key of the speed is in C++, why has SystemC not been known being that fast?
Alexander pointed out the key. A networking chip normally has inputs/outputs every clock cycle, and the clock cycle at its interfaces can be very short. These fast interfaces slow down the simulation, especially if the whole verification environment runs at such speed. The simulation speed can be faster if only a thin layer (called transactor) uses such fast clocks. For emulation, it is more important to save data for many clock cycles in a hardware buffer.
This thin layer clearly should be implemented similar to the design itself. No question about it for emulation. Going though the language interface between Verilog and C/C++/e/Vera every a few nanoseconds of simulation can also be bad. If you want to test the difference, try reading a signal (e.g. interrupt) 100 times each clock cycle. Use your simulator's profiling feature if possible.
Transfering high-level transactions into buffers works well if the flow is steady. If the flow has to respond quickly to feedback signals, the thin layer becomes thicker.
To make the transactions extremely high level, it can be good to put the entire test into a buffer. Then your verification run only involves the design and the thin layer. The rest of the verification environment becomes an off-line test generator. If you do this, your definition of a transaction is likely a section of data in the big buffer. Of course, you can organize the buffer contents in any fashion.
The only big advantage to generate transactions on-the-fly is not to store the transactions for extended time. This makes a big difference only if you do not need the stored transactions for debugging possible problems. Are there any other reasons?
Another extreme is to make the transactions really abstract. In stead of saving the interface data in the buffer, you can save just a few switch values in the buffer to control the "thin layer" transactors. For example, you can put payload data into a transaction or simply put a "normal pattern" flag in the transaction. Then the test data (the transactions?) are mostly generated on-the-fly though the switch values for the generation decisions can be made off-line. If the switch values are simple enough, it may not hurt to manually code them  |
|
| Back to top |
|
 |
miket Senior


Joined: Jan 12, 2004 Posts: 31 Location: Ottawa, Ontario, Canada
|
Posted: Mon Jun 07, 2004 7:58 am Post subject: Re: Writing testbenches when using an emulator |
|
|
| romi wrote: |
I'm looking for responses about what others have done faced with a similar situation of needing to use both a simulator and emulator to find all bugs and what features we are really missing out on by not modelling at a high-level. Maintainability? Performance? Coverage feedback?
|
This is one of the most interesting problems in emulation. Verisity and Cadence had a joint effort which attacked this issue. It was called (if memory serves) "eAccelerator". The whole idea was to code your low-level BFMs in a synthesizible subset of "e", which were compiled into the Cadence emulation box along with the DUT. The emulator supported the same (or similar) transaction interface as did the BFMs in simulation. It would be interesting to hear about user experiences with this technology.
Something to think about is whether or not there is any value in solving this problem at all. Some projects use emulation for in-circuit emulation, rather than testbench acceleration. Some advantages of this approach are that the DUT is exercised in an environment which is representative of the real world and that the software folks can get their hands on the design eariler. Perhaps the biggest advantage is that the design is exposed to a new environment which is (should be!) designed to exercise the DUT in different ways. A side effect is that you don't have to solve the issue of how to model transactions in RTL.
Of course, this approach creates its own set of problems, not the least of which is re-creating bugs caught in emulation with the simulation testbench.
Cheers,
---mike |
|
| Back to top |
|
 |
alain Junior


Joined: Jan 11, 2004 Posts: 5
|
Posted: Fri Jun 11, 2004 3:59 pm Post subject: |
|
|
| z wrote: | Alain's claim seemed too good to be true. If the key of the speed is in C++, why has SystemC not been known being that fast?
|
The key to speed is to keep high-level stuff in C++, and low-level bit handling in synthesizable Verilog. The typical implementation of SystemC that you refer to does both low-level and high-level stuff in C++, which is why it is slow. But if you keep SystemC for high-level (channels) and synthesize BFMs for conversion between transactions and pin wiggling, you'll get orders of magnitude more performance.
Once we have stable technology that can quickly and easily synthesize SystemC bits into emulation gates, then we won't need to use two languages at the same time.
Alain. |
|
| Back to top |
|
 |
postgenerate Senior


Joined: Jan 18, 2004 Posts: 31 Location: Phoenix
|
Posted: Thu Jun 17, 2004 10:24 pm Post subject: |
|
|
If your writing a testbench in SystemC, or one of those other proprietary short term HVL tools , then you are probably already thinking in terms of objects and transactions. That's half the battle right there. In fact if you think in terms of transactions between the testbench and the DUV you get 2 benefits even without emulation:
1. less PLI activity = better performance
2. even better, you may disconnect your testbench licenses from your Simulation license and not have that 1 for 1 relationship anymore. I have tested this and wrote something about this before but its worth mentioning again. Why are you paying for an HVL license when it sits idle 80% of the time in a typical simulation as the DUV simulator crunches the last transaction?
The next step to get to acceleration with an emulator is easy. If you planned up front, all you need to do is ensure that the last transactor, the one talking to the DUV is in synthesizable RTL. Bite the bullet and invest a bit more time in a SCE-MI approach and reap benefits of plug-and-play-ability long into the future.
Sure, you may need to give up some functional coverage in the DUV. (not totally if you choose an emulator that handles some degree of assertions in the emulator). But lets think about that. How much of your functional coverage is really coverage of the DUV fields itself? I maintain very little. Typically my functional coverage goals were obtained when 80% of the coverage scenarios could be achieved by covering items in the testbench itself.
Tom West
contact at: www.openverificationfoundation.org |
|
| Back to top |
|
 |
rpluth Newbie


Joined: Jul 07, 2004 Posts: 1
|
Posted: Wed Jul 07, 2004 2:59 pm Post subject: |
|
|
I'm coming into this thread a little late, but since someone pointed out our
(Verisity's) solution I thought I'd respond with the salient engineering points
(and try to avoid sounding like a blatantly mindless advocate ).
As has been pointed out in this thread there is a maintenance downside to having
both a high-level testbench and a separate emulation testbench. It's certainly
possible to maintain a single verification environment that will work in both
simulated and accelerated/emulated modes. SystemC transactors can work in both
worlds, but only if connected to low-level HDL BFMs/monitors. With the
synthesizable subset of Verilog, it leaves you with a not very sophisticated
language to write these units in -- and that limits your ability to do what you
need to with this approach, which is to keep the interface between the hardware
and software at the transaction (or batches of transaction) level, the
performance-based necessity of which others have pointed out.
That's what you miss out on with the Verilog approach, but you also lose constructs
for coverage and assertions, so the best solution is one that addresses all these
concerns in one unified environment that can be used in simulation and
acceleration. That's been our goal with eCelerator (and continuing now with
SpeXtreme) -- maximizing reuse and efficiency, minimizing wasted effort and
one-off development for acceleration.
Finally, in-circuit emulation is an important methodology and putting the DUT in
the system and proving it with real stimulus can be an important project milestone,
but it has limitations; it's difficult to hit corner cases, not to mention covering
them (knowing when they're hit). Before that point, it's a more efficient use of
time to use a high-level, constrained-random, coverage-driven approach. And more
and more, engineers with emulators are looking to leverage their investment
earlier in the design cycle, by using them as a fast replacement for what they use
simulation for.
Ron Pluth |
|
| Back to top |
|
 |
johns Newbie


Joined: Jul 13, 2004 Posts: 1
|
Posted: Wed Jul 14, 2004 4:53 pm Post subject: |
|
|
'Z's (posting of Jun 5, 2004), in response to 'alain' and 'alexg' makes some very good points.
The key to fast transaction based verification is to have the transactor itself locally managing the busy, cycle locked activity between itself and the DUT while communicating to it from the testbench using a high speed transaction oriented link. In an emulated environment this becomes especially important to retain verification performance.
But I would also concur with 'alain's comments that SystemC and other HVLs can be ideal languages for generating high level, untimed, abstract transactions that trigger activity within transactors.
| z wrote: | Alain's claim seemed too good to be true. If the key of the speed is in C++, why has SystemC not been known being that fast?
|
My experience is that SystemC has been known for not being fast primarily when used in an RTL modeling context. RTL modeling in SystemC has shown to be mediocre at best.
However, untimed, abstract, transaction based modeling in SystemC, by contrast, has proven to be very efficient particularly for computational algorithms that are better off being run on the FPU of a 3 GHz Linux box than in a simulated HDL model of an arithmetic unit running on say
an emulator.
The trick is to provide a transaction oriented conduit coupling such CPU based computational models with emulator based transactor models that, themselves interface with the DUT at the signal level.
This can be done by placing the transactor itself in the emulator so that it directly manages timed signal oriented communication with the DUT at emulation speeds.
Then, on the testbench side of the transactor, use some sort of standardized transaction oriented conduit (PLI, SCE-MI, SystemVerilog-DPI) to communicate with the C (or HVL) testbench models. Such models can communicate abstract transactions to the transactor to trigger its activity with the DUT.
By doing this you've removed a large part of the bottleneck that is seen with timed interfaces to C environments.
SystemC also has built-in concurrency which makes it ideal for writing concurrent testbench models that use multiple interfaces to the DUT.
'alain' hit the nail on the head here:
| alain wrote: |
The key to speed is to keep high-level stuff in C++, and low-level bit handling in synthesizable Verilog. The typical implementation of SystemC that you refer to does both low-level and high-level stuff in C++, which is why it is slow. But if you keep SystemC for high-level (channels) and synthesize BFMs for conversion between transactions and pin wiggling, you'll get orders of magnitude more performance.
|
'Z' referred to providing use of buffering in conjunction with abstract transactions. This also is a good idea when the interface lends itself to streaming semantics. However, good transaction level performance is not limited to streaming interfaces. Very high performing reactive transaction level interfaces can also be created using the proper interfacing techniques.
Two such techniques have already been mentioned on this thread. One is tried and true PLI.
Another was mentioned by Tom West:
| postgenerate wrote: |
The next step to get to acceleration with an emulator is easy. If you planned up front, all you need to do is ensure that the last transactor, the one talking to the DUV is in synthesizable RTL. Bite the bullet and invest a bit more time in a SCE-MI approach and reap benefits of plug-and-play-ability long into the future.
|
SCE-MI provides a good transaction level conduit between untimed SystemC and RTL HDL transactors. It has an advantage over PLI in that it easily supports transaction flows in both directions between the abstract (possibly multi-threaded) untimed modeling domain and the timed RTL modeling domain. Additionally SCE-MI was optimized for high speed interfacing between C models and HDL models running on an emulator.
A third interfacing standard that is now part of the SystemVerilog 3.1a standard is the Direct Programming Interface (DPI). This interface provides a simple means of passing transactions between SystemC and SystemVerilog HDL using one of the simplest mechanisms possible: function calls. It should be possible to also write interoperable transactors in a SystemVerilog simulation environment using the DPI.
John Stickley
Principal Engineer
Mentor Graphics |
|
| Back to top |
|
 |
postgenerate Senior


Joined: Jan 18, 2004 Posts: 31 Location: Phoenix
|
Posted: Sun Jul 18, 2004 9:20 am Post subject: |
|
|
Hi Romi,
I'm with you and Alexg. I believe that Verilog is the best language to write the emulator-side transactors. On the other hand I am investigating a better solution.
I have studied eCelerator in great detail and without going into this topic, lets just say I still believe Verilog is the best solution.
One promising idea that I am starting to investigate now is this C++ synthesis tool called SPARK from this link:
http://mesl.ucsd.edu/spark/
Spark is open source tool that produces VHDL RTL output. Oh well, I can live with VHDL in a mixed language simulator or I'll re-synthesis to Verilog.
I have also investigated the old Cynaps tool now marketed by Forte Design Systems. This looked promising, but the Forte guys did not have time to pursue something that was part of there target use model.
Then there is the just announced Agility compiler from Celoxica. I would like to investigate this.
My bets are on SPARK.
tom west
www.openverificationfoundation.org |
|
| Back to top |
|
 |
|
|
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
| |
|
|