• June 8, 2024

Chip verification tools await AI "rescue"

The expanded verification process gap is calling for the improvement of tools, where artificial intelligence may play a greater role.

Verification engineers are the unsung heroes of the semiconductor industry, but they are on the brink of collapse, urgently needing modern tools and processes to cope with the rapidly increasing pressure.

Verification is no longer just about ensuring that the function is accurately presented when implemented; this alone is an unsolvable task. However, verification has taken on many new responsibilities. Some of these responsibilities come from technological advancements, which have brought more problems, such as thermal issues. New application fields like automotive have increased the demand for safety and confidentiality, and the issue of parameters has increased sharply, far beyond simple power assessment. In addition to this, the chip industry is approaching another turning point, which is triggered by the migration to 2.5D and 3D packaging technologies.

Existing verification tools and methods were developed 20 years ago. Since then, the capacity and performance of tools have only been improved slowly, while the scale of design has increased rapidly. Although portable stimulus (an Accellera standard created to separate the verification intent from the engine that executes the verification) has provided some relief, the adoption rate is slow, and there is a lack of a comprehensive process.

In addition to the technical factors causing increased pressure in the verification process, human factors also need to be considered. Teams need to complete more work in less time, and the shortage of talent is restricting the development of the industry.

Approaching the limitThe tools in use today were developed based on smaller blocks and a system scale similar to today's blocks. Verification can never be 100% complete, which means that teams must carefully decide where to focus their efforts and what risks they are willing to take.

Axiomise CEO Ashish Darbari said, "The design complexity is increasing at an unprecedented rate due to the AI/ML revolution adding new dimensions to the type of designs we are building." "These systems have stringent power, performance, and area (PPA) requirements. Adopting better processes and advanced verification methods, such as formal verification, is not enough. The industry still heavily relies on incomplete dynamic simulation methods of stimuli, which not only allows easy-to-catch errors to leak into the silicon but also does not have the opportunity to capture complex errors that arise from deep state machine concurrent interactions in single-clock or multi-clock domains."

What's worse, IP is becoming increasingly modal. "A module may have 1,500 specification items," said Chris Mueth, a business development, marketing, and technical expert at Keysight. "Many of them are interdependent, closely related to operating modes, but also have different voltages, different temperatures, and so on. In 6G modules, you have countless patterns and frequency bands for transmission, and they are all interdependent. They are reaching their limits in terms of frequency, bandwidth, and data transmission rates. You might think the design is complete, but you could still miss one of the patterns. This could ultimately become a problem. Even in today's digital age, if you don't meet performance requirements, you will fail. Everything has become a performance simulation."

Sometimes parametric faults are overlooked. "More faults are soft faults, sometimes also called parametric faults," said Marc Swinnen, Director of Product Marketing at Ansys. "The chip can work, but it should operate at 1.2 gigahertz, but it can only reach up to 1.0 gigahertz. When you look at any large chip, the number of parasitic components soars into hundreds of thousands."

This increases the risk of failure. "When verifying IP, they will ask what environment it will be used in," said Arturo Salz, a researcher at Synopsys. "They cannot verify all possible permutations and instead wait for the system to be ready and postpone most of the verification work to the system level. This is usually a misconception, as IP-level errors are difficult to find at the system level. This is a bigger problem. For multi-chip, you will not have this option, as the chip IP may have already been manufactured, and you must verify and test it before starting to integrate it into the next system."

Beyond Limits

Constraint random algorithms made great progress when they were first introduced, but now they are struggling. "I often compare constraint random algorithms to a pool cleaner," said Salz of Synopsys. "You certainly don't want to program the shape of the pool, although it is random, but the perimeter of the pool is fixed. Don't climb the walls, it's inefficient. It goes through the center of the pool many times more than it goes through the corners, but as long as there is enough time, it can cover the entire pool. By extension, can we sweep the Pacific Ocean? No, it's too big. You need to choose the right method. Deploy this method effectively at the block level. Formal methods are also like this, and they may not have the ability to perform formal checks at the system level."

It's not just that the scale of the method is wrong, but it also takes up a lot of the most valuable human resources. "It's no surprise that simulation test platforms miss so many errors," said Darbari of Axiomise. "Compared to a formal test environment, even for moderately complex designs, UVM test platforms take more time to get started. UVM relies heavily on manpower because the foundation of UVM requires a lot of manpower to write sequences, and these sequences ultimately do not strictly test the DUT. This shifts the burden to functional coverage to see where the gaps are. In many cases, simulation engineers simply do not have time to understand the design specifications. It is too much to expect verification engineers who have not been educated in RTL design to understand the details of microarchitecture and architecture."

In short, the problem has exceeded the capabilities of the tools. "I don't think UVM has lost momentum," said Gabriel Pachiana of the Virtual System Development Group in the Adaptive Systems Engineering Department of Fraunhofer IIS. "It is still an excellent tool for its intended use. What we need is to make full use of it and build more verification software on its foundation, for example, to address the complexity of hardware-software verification."Shift Left

The term "Shift Left" is widely used in the industry, but verification is in urgent need of a Shift Left. This means conducting verification at the earliest possible stage in the high-performance design abstraction level. Existing simulators or emulators do not have the necessary performance, and it is too late to wait until the RTL phase. Breker CEO Dave Kelf said, "In this process, Shift Left means applying verification to SystemC algorithm models or virtual platforms. This greatly simplifies the process from specification to design verification. Therefore, formulating a system verification plan on a virtual platform and then reapplying it when verifying the system on a simulator or prototype may provide enough method simplification to make effective system verification a reality."

However, the entire process has not yet been fully developed. "If the virtual prototype is the golden model, how do you extend it all the way to the chip and know that the chip is still correct?" asked Siemens EDA Strategic Verification Architect Tom Fitzpatrick. "Physical prototypes, FPGA prototypes, or simulations, no matter what the underlying engine is, all have the same system view, which is very important. Verification engineers will have to start looking at the infrastructure in this way. They need to make everyone on the team not see the underlying environment, which is where portable stimulation comes into play. Because of its abstraction, you can think about testing from an algorithmic perspective, consider what you want to happen, and where the data is going, without worrying about the underlying implementation."

It must start with the virtual prototype. "We need to explore the architecture better earlier," said Salz. "We need to consider power, throughput, and latency through factorial arrangements. Should the cache be kept with the CPU or moved out to a separate chip and increased? These are tricky questions. In the past, every company had a person, the architect, who could do this work on a napkin. But this is already the limit of human beings. Everything will be put into the virtual prototype. You will put in virtual models, simulations, emulations, and even possibly already built chiplets that you can connect after the silicon wafer. The virtual prototype will run at a frequency of 3 to 4 gigahertz, which the simulator cannot approach. You can get higher throughput, but at the cost of losing some time accuracy."

Some engines are already bound together. Cadence Director of Product Management Matt Graham said, "The ability to perform mixed modeling is constantly improving. Our ability to introduce C models and fast models and connect this platform to simulation or emulation is constantly improving. A step further is the concept of digital twins. Simulation will not have 100 times the capacity, nor will it become 100 times faster, we must be rational about this, and we must find innovative and interesting abstraction methods. Virtual platforms are one of them. We need to accept the concept of digital twins by moving prototype design and simulation to the early stages of the process and finding different ways to provide abstraction."

Reusability within the process is important. "Another promising direction is to move design and verification work further to the left of the development process, that is, to identify errors in the early stages of development," said Pachiana from Fraunhofer. "SystemC and UVM-SystemC are very useful in this regard. Although this adds another layer of development work and consumes project time, the key is to reuse the results of early work and demonstrate its benefits."

The industry does not like revolutions. "No one will completely change the way they do things," said Fitzpatrick from Siemens. "This is the fact. This is also why it is still incremental so far, because there is a limit to what you can do. This is where the portable stimulation device comes into play. It is designed to be a revolutionary step in the evolutionary framework. Being able to utilize existing infrastructure and adding additional features that cannot be achieved with UVM is the way it succeeds."

However, the challenge of building models still exists. Cadence's Graham said, "Our ability to verify models when building models has improved a lot. Now there are more models available, of course, processor models, protocol models, models of things inside the CPU subsystem, such as consistency and performance. This is the next level of abstraction, but to build the right digital twin, you need a reliable method to build models."

Clear thinking is needed. "We need to boldly accept it - more simulation cycles and blind functional coverage will not find all the errors," said Darbari. "I like formalization, and I firmly believe that it can provide the greatest return on investment compared to any other verification technology, because it can provide exhaustive evidence and reasons for what, not how. However, I also see that blindly applying formalization can lead to low yield. Considering requirements, interface specifications, understanding the relationship between microarchitecture and architecture, and the relationship between software/firmware will make it easier for everyone to see the big picture, while also mastering finer details, thus obtaining better verification methods."Artificial Intelligence to the Rescue?

Artificial intelligence has been integrated into many aspects of design and verification. "Verification has caught the excitement of AI breakthroughs," says Graham. "Customers are asking us what we are doing with AI? How are we leveraging AI? We need to catalyze all of our engineers because we don't have enough manpower."

There are some low-hanging fruits. Keysight's Mueth says, "You don't have time to simulate everything you want to simulate in a reasonable time. You can turn to AI to draw correlations in simulation data, saying 'based on a, b, and c, there is no need to simulate x, y, and z.' This is a typical AI-type problem, but you need a large amount of data to drive machine learning."

There are several ways to optimize regression. "When you make changes in the design, which tests target that area," says Salz. "You only need to run a subset of the tests. Reinforcement learning can trim specific tests. If it is very similar to previous tests, don't run it. This way, you can maximize test diversity. Before, you only had different random seeds to create test diversity."

The pressure of time is increasing, and the issue of efficiency is becoming more prominent. "Twenty years ago, there was a significant quality gap," says Graham. "Overall, the industry knew how to narrow the quality gap. Now, the gap has become efficiency. This is why everyone is talking about Shift left, productivity, time to market, and the time to turn the situation around. This prompts people to start looking for ways to leverage AI. The improvement in productivity will not come from the same things as 10 or 20 years ago."

Huge benefits will come from completely different approaches. "Many problems are caused by ambiguities in the specifications," says Salz. "I hope we can use GenAI large language models to parse the specifications and arrange for it to sit in the co-pilot seat. Then it can ask if this is what we mean. What GenAI lacks is the ability to generate timing diagrams or generate UML so that designers or architects can understand the situation. We hope to enable the tool to transition from language specifications to more formal specifications and achieve some of this automation. This cannot be done using AI to write the design, at least not yet."

But AI can fill the gaps in model creation. "I have seen several papers on using AI to build these models, whether it is top-down—I read the specifications and generate a C model for it—or bottom-up observational modeling techniques, that is, observing the role of the RTL model, and then statistically building models at a higher level of abstraction," says Graham. "We have not reached that point yet. But I think this is one of the potential uses of AI, and it may really help to solve very practical problems."

ConclusionThe gap in tools for verification is widening. Existing tools are incapable of addressing system-level issues, where the most complex problems are hidden. Although new languages and tools are being developed to fill this gap, their adoption is slow. The industry seems to be stuck on RTL abstraction, which leads to bottlenecks in model execution.

To encourage development teams to migrate to higher levels of abstraction, new tools are needed to fill the modeling gaps in a top-down or bottom-up approach. While artificial intelligence may be able to provide assistance, this capability does not currently exist.

Comment