| Login | | Don't have an account yet? You can create one. As a registered user you have some advantages like theme manager, comments configuration and post comments with your name. | |
| Who's Online | There are currently, 60 guest(s) and 0 member(s) that are online.
You are Anonymous user. You can register for free by clicking here | |
 | |
|
Verification Guild: Forums |
|
| View previous topic :: View next topic |
| Author |
Message |
alexg Senior

![]()
Joined: Jan 07, 2004 Posts: 586 Location: Ottawa
|
Posted: Fri Feb 11, 2011 9:54 am Post subject: Distributed simulations? |
|
|
I am looking at distributed simulations as the one of possible ways to speed up simulations for large designs. As a simple example, design and testbench may run in parallel on 2 computers, using data structures to communicate with each other. It would be interesting to know if such an expertize exists in the industry.
Regards,
-Alex |
|
| Back to top |
|
 |
qwk000 Senior


Joined: Oct 13, 2004 Posts: 66 Location: Fort Worth, TX
|
Posted: Fri Feb 11, 2011 10:29 am Post subject: |
|
|
| I don't know of any simulators that can support multi-machine computing. VCS supports multi-processor (parallel) computing, but you are looking beyond just multi-processor right? |
|
| Back to top |
|
 |
alexg Senior

![]()
Joined: Jan 07, 2004 Posts: 586 Location: Ottawa
|
Posted: Fri Feb 11, 2011 11:05 am Post subject: |
|
|
Few more words about distributed simulations as I see it.
There is no need for specific simulators - any simulator will work fine.
To further clarify my intent, I'll give another example.
Assume, we have 2 design blocks. These blocks communicate with each other using data frames, and, together, perform some function F which has to be verified. Assume also, that both blocks are quite large, and, together with their block-level testbenches, already consume significant amount of simulation time. In order to verify function F, there can be 2 solutions:
1. Put them together and combine their testbenches
2. Let 2 block-level testbenches run in parallel on 2 different machines, supplying output transactions of the first block to the input of the second one.
2-nd approach is obviously faster than the 1-st one. Block-level testbenches remain almost intact. There is just a need to implement data communication between two testbenches. Using untimed transactions for communication may significantly reduce communication frequency.
-Alex |
|
| Back to top |
|
 |
srini Senior


Joined: Jan 23, 2004 Posts: 430 Location: Bengaluru, India
|
Posted: Fri Feb 11, 2011 3:31 pm Post subject: |
|
|
How about using Socket based communication across the 2 machines (running the 2 blocks)? If this is an option, I recall VCS had an example perhaps in $VCS_HOME, we even ported to use SV-DPI.
Srini
www.cvcblr.com/blog _________________ Srinivasan Venkataramanan
Chief Technology Officer, CVC www.cvcblr.com
A Pragmatic Approach to VMM Adoption
SystemVerilog Assertions Handbook
Using PSL/SUGAR 2nd Edition.
Contributor: The functional verification of electronic systems |
|
| Back to top |
|
 |
pavanshanbhag Senior


Joined: Mar 25, 2009 Posts: 380 Location: Bangalore, India
|
Posted: Sat Feb 12, 2011 11:38 am Post subject: |
|
|
AXIOM-EDA : Has the solution for your problem. It has a Multi CPU architecture - graphical debugging using single kernel.
MPSim is the simulator which is been designed by Axiom folks, that was designed from the beginning to address RTL, testbenches, assertion, coverage and debugging in a single kernel architecture for maximum performance productivity and predictability.
Its better to check with them :
info@axiom-da.com _________________ -Pavan K Shanbhag
“The difference between genius and stupidity, genius knows his limits.” - Albert Einstein |
|
| Back to top |
|
 |
peterpb Junior


Joined: Mar 18, 2004 Posts: 9
|
Posted: Sun Feb 13, 2011 10:10 am Post subject: |
|
|
You might take a look at SimCluster from Avery Design:
http://www.avery-design.com/files/docs/SimClusterDS2010.pdf
FYI.
I don't (and didn't) work for Avery-Design. I only know that, a few years back, some engineers tried it. And I heard positive comments about the tool. |
|
| Back to top |
|
 |
pavanshanbhag Senior


Joined: Mar 25, 2009 Posts: 380 Location: Bangalore, India
|
Posted: Sun Feb 13, 2011 12:28 pm Post subject: |
|
|
| Quote: |
DISTRIBUTED SIMULATION SUPPORTS SYSTEMVERILOG, VERILOG, VHDL, C/C++ Today’s SOCs and embedded systems integrate 3rd party and proprietary hardware and software. System-level verification requires integration of these models into an overall simulation model. Often models come in the form of HDLs, ANSI C/C++, or specialized C++ class libraries such as SystemC. Avery’s distributed simulation supports a heterogeneous environment enabling all model types to be integrated and simulated in a distributed environment.
|
This was pretty impressive about avery designs.. _________________ -Pavan K Shanbhag
“The difference between genius and stupidity, genius knows his limits.” - Albert Einstein |
|
| Back to top |
|
 |
alexg Senior

![]()
Joined: Jan 07, 2004 Posts: 586 Location: Ottawa
|
Posted: Sun Feb 13, 2011 12:44 pm Post subject: |
|
|
Peter, Pavan,
Thank you for the link to the Avery Design Solution.
I have 2 problems with it:
1. I don't want to use automated partition of design/testbench. In other words - I would prefer to partition design and testbench by myself.
2. I would like to create my own means to compress/send/decompress signal-level data. I believe, I can do it better than any tool can do
So, I am thinking more about the method mentioned by Srini.
Recently, I was playing with Verilog fileIO and i looks that it provides relatively clean solution to transfer SV packet structures between 2 simulations. Here is the basic idea:
1. Simulators connect with "channels", being able to transfer 1 structure at a time ("blocking" type of communication)
2. Each "channel" is just a file. Using functions and tasks :can_get, get, can_put and put, 2 simulators can communicate through the file.
You can download working example of such communication (there is README file in the tarball).
Here is the link to the file:
http://www.box.net/shared/u7ssqrk2an
It would be good to know your opinion about this method.
Regards,
-Alex |
|
| Back to top |
|
 |
srini Senior


Joined: Jan 23, 2004 Posts: 430 Location: Bengaluru, India
|
Posted: Sun Feb 13, 2011 11:55 pm Post subject: |
|
|
Hi Alex,
| alexg wrote: | Peter, Pavan,
Thank you for the link to the Avery Design Solution.
So, I am thinking more about the method mentioned by Srini.
Here is the link to the file:
http://www.box.net/shared/u7ssqrk2an
It would be good to know your opinion about this method.
Regards,
-Alex |
Took a very quick look at it. Have you considered using OVM/UVM TLM-like interface than inventing your own (though simple)?
Regards
Srini
www.cvcblr.com/blog _________________ Srinivasan Venkataramanan
Chief Technology Officer, CVC www.cvcblr.com
A Pragmatic Approach to VMM Adoption
SystemVerilog Assertions Handbook
Using PSL/SUGAR 2nd Edition.
Contributor: The functional verification of electronic systems |
|
| Back to top |
|
 |
chm Senior


Joined: Nov 22, 2004 Posts: 43 Location: Unterpremstaetten, Austria
|
Posted: Mon Feb 14, 2011 4:06 am Post subject: |
|
|
Hi Alex,
If I understand you correctly, you want to use a networked file systems (presumably NFS) for inter-process communication between two simulators running on different machines (not just different CPUs on one machine).
Technically, your approach is certainly possible, however
1) your implementation is flawed, as there is no file locking, and there is no guarantee that your heuristic "if I can read more than 1 byte, most likely the whole file has been written already" will work
2) the efficiency will be so terrible that this will only benefit your simulation time if there is almost no inter-process communication and lots of functionality to simulate on both sides. Remember that in NFS there is no caching permitted, which means that a file must be written to the disk before the sender task may return.
If you want to pursue a DIY approach, I suggest to develop a DPI library and implement TCP sockets in C. |
|
| Back to top |
|
 |
chrisspear Senior


Joined: Jun 15, 2004 Posts: 202 Location: Marlboro, MA
|
Posted: Mon Feb 14, 2011 10:58 am Post subject: |
|
|
Parallel simulation has been tried for decades, and partitioning is always the biggest problem. You need multiple partitions with the following requirements:
-Parallel activity. Dividing the design in two blocks does no good if block B can only run after block A completes
-Equal sizes: If block A is more than 3x block B, you'll get little benefit from running them on separate processors
-Low communication: If the blocks are constantly sharing large amounts of data (a serial activity), you'll get little benefit from running them in parallel
The best designs today for parallel simulations are multi-core CPUs. Equal size, parallel activity, but still a lot of share resources, so may still not give a great speedup.
VCS has automatic partitioning, and the ability to run common activities such as waveform dumping in parallel with the rest of simulation. Look into this tried and tested solution before you go off and try to build your own. (Yes, I do work for Synopsys.)[/list][/list] _________________ Chris Spear
Co-Author: SystemVerilog for Verification - 3rd edition!
http://chris.spear.net/systemverilog |
|
| Back to top |
|
 |
alexg Senior

![]()
Joined: Jan 07, 2004 Posts: 586 Location: Ottawa
|
Posted: Mon Feb 14, 2011 11:38 am Post subject: |
|
|
Hi Chris,
Thank you for your answer. Please see my comments for the requrements you've mentioned:
| Quote: | -Parallel activity. Dividing the design in two blocks does no good if block B can only run after block A completes
-Equal sizes: If block A is more than 3x block B, you'll get little benefit from running them on separate processors |
Parallelism in activity as well as block sizes are design architecture issues. It is the task of chip architect to reduce idle time, so parallel processing is a must for good architectures. So when block B runs after block A completes, block A immediately start processing new data chunk and so on. So, better architecture - less time for parallel simulations.
Block sizing is more an issue, since simulation time is not the same as latency in data processing. However, manual division with simulation profiling may help here too.
| Quote: | | -Low communication: If the blocks are constantly sharing large amounts of data (a serial activity), you'll get little benefit from running them in parallel |
This is an issue, which usually reduces speed up effect of hardware accelerators and emulators.
To solve it, there is a need to develop verification components converting serial activity into parallel one and vice versa. Then, send only parallel data structure trough the link. These verification components (monitors and drivers) may be instantiated as between design and testbench, as between two blocks in design (here, it heavily depends on "parallel simulations-friendly" SOC architecture). Also, block-level simulations may be used to simulate complete SOC datapath if such data communication is set up between them.
So, I don't believe tools can help here. It is all about friendly SOC architectures, manual partiotions and communication hooks. And the benefit is - capability to simulate larger designs with less time using the same computer network.
Regards,
-Alex |
|
| Back to top |
|
 |
cabriggs Senior


Joined: Jan 12, 2004 Posts: 96 Location: Massachusetts
|
Posted: Wed Feb 23, 2011 4:22 pm Post subject: |
|
|
Ed Arthur of Cisco talked about this at a DV Club meeting a few years ago: http://www.dvclub.org/images/Presentations/Arthur_Q207.pdf
They built a custom layer on on top of MPI and the result is somewhat similar to Avery's SimCluster. This shows you one way to do it yourself. |
|
| Back to top |
|
 |
alexg Senior

![]()
Joined: Jan 07, 2004 Posts: 586 Location: Ottawa
|
Posted: Wed Feb 23, 2011 6:26 pm Post subject: |
|
|
Thank you. It's an interesting presentation.
-Alex. |
|
| Back to top |
|
 |
jmcneal Senior


Joined: Jan 12, 2004 Posts: 34 Location: Hillsboro, Oregon
|
Posted: Tue Mar 01, 2011 2:56 pm Post subject: |
|
|
Alex -
As has been pointed out, your NFS solution would be vastly slower than the supported options already listed.
Many years ago I worked at Avery Design and installed SimCluster at several customer sites. You do not have to go with automatic partitioning, but that often makes partitioning much easier, especially for flat gate level designs. If you already have several blocks that consume roughly the same compute resources, and communication between them is slow or infrequent, you could see some significant speed up.
I've used VCS's parallel simulation recently as well (Currently working at Synopsys).
You get the best bang for buck when you have a process that is really large, or simulations that are really slow. These tools aren't a way to take a 2-hr simulation and make it run in 30 mins, but more of a way to take a 6 day simulation and run it in 12 hrs. Several times at Avery we'd get an N-1 (N= # of processors) speedup for very large/very slow simulations.
| Quote: |
So, I don't believe tools can help here. It is all about friendly SOC architectures, manual partiotions and communication hooks. And the benefit is - capability to simulate larger designs with less time using the same computer network.
|
You're right about friendly architectures making the partitioning easier. But given a waveform of the SOC running, an auto-partitioner can figure out where the best places to break the design are, given that it can analize which interfaces have lower data rates, etc. So by adding your serial-to-parallel-to-serial block, the auto-partitioner can identify that interface as a good candidate for partitioning.
-jeff |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
| |
|
|