Fixing incredibly slow launching on Intel’s Rapid Storage Technology / Virtual RAID on CPU application

Intel has a technology called Virtual RAID on CPU (VROC) that lets you set up RAID arrays of NVMe SSDs, with the RAID calculations being offloaded to specialised hardware on the CPU instead of being done in software. When I built my workstation back in 2019, I installed an Asus HYPER M.2 card, which is a PCIe 3.0 x16 carrier card that breaks out into four M.2 slots. The M.2 slots are separated using 4-way bifurcation, which allows one 16-lane slot to be split into four 4-lane slots. With four Corsair MP510 1TB drives in a VROC RAID0, I’m getting 12GB/s of sequential throughput and around 1.2GB/s to 1.5GB/s of random IO. This is still faster than the newly available PCIe 5.0 NVMe SSDs. The use of four separate drives, each with their own DRAM and SLC caches, results in unparalleled real-world performance for the price.

This solution has, mostly, worked great for me. I’m primarily using it for data processing tasks, training corpuses, and other applications that benefit heavily from high random IO performance. It’s a four way RAID0, so I treat it as having four times the risk of failure and plan accordingly.

The only issue is very intermittent: once every 12-18 months I run into a situation where one of the drives will suddenly disappear and I’ll get an array failure. After a few cycles of taking the card out, reseating the SSDs in their M.2 slots, adjusting the torque on the cover’s screws, and trying again, it eventually comes back just fine. Mark the drive as good, mark the array as good, and it’s happy. No SMART errors, no overheating, no indication that there’s anything wrong with the drives. I’ve tried another HYPER M.2 card and had the same results. My best guess is that the tolerances on the card are a little off and thermal cycles eventually lead to intermittent contact.

A few days ago I ran into my third instance of this problem. It’s frustrating, but what cranks it up to infuriating is Intel’s VROC software. I’m running a workstation board with two Xeons and 24 DIMMs, so rebooting isn’t exactly snappy, but the VROC software takes so long to launch that it makes boot times look like lightning by comparison. This latest experience finally led me to look into the problem and do something about it.

When you click the Intel(R) Virtual RAID on CPU link, it appears to do nothing. Checking in Task Manager, you’ll find an IAStorUI.exe process running, using very little memory, and using no CPU time. It sits there for as much as fifteen whole minutes, apparently doing nothing, then finally the window appears. This is rather ironic, considering that the software controls such high-speed storage.

I used Process Hacker to check on the threads and inspect their call stacks to see where they were waiting. It turns out that one thread is indeed consuming some CPU cycles, but only a small number of them, and is mostly just waiting around:

Screenshot of a thread stack showing that the thread is waiting on the kernel, in a call path that goes through a ManagementObjectCollection iterator and a method called GetVrocDriversVersions.

This call stack tells me two important things:

There’s a .NET assembly with a class called IAStorUtil.SystemReportProvider, and inside there is a method called GetVrocDriversVersions that is taking an extremely long time to complete.
The code that’s taking forever to complete is iterating over the results of a WMI query.

WMI queries can be extremely slow, especially if there are lots of results.

Looking into the application folder, I found that there is an IAStorUtil.dll file in there, and it is a .NET assembly. Loading it into dotPeek, I quickly found the offending code. It looks something like this:

public static IEnumerable<Version> GetVrocDriversVersions()
{
  List<uint> deviceIds;
  PsiDataSource.GetSupportedControllerDeviceIds(out deviceIds);
  var query = new ObjectQuery(
    "SELECT DeviceID,DriverVersion FROM Win32_PnpSignedDriver " +
    "WHERE ClassGuid LIKE \"{4d36e97b-e325-11ce-bfc1-08002be10318}\" AND " +
    "DriverProviderName LIKE \"Intel Corporation\""
  );
  var deviceIdRegex = new Regex("^PCI\\\\VEN_8086&DEV_(?'id'[A-Fa-f0-9]*)");
  using (var driverSearcher = new ManagementObjectSearcher(query))
  {
    foreach (var managementBaseObject in driverSearcher.Get())
    {
      var match = deviceIdRegex.Match(managementBaseObject["DeviceID"].ToString());
      if (match.Success)
      {
        uint id = uint.Parse(match.Groups["id"].ToString(), NumberStyles.AllowHexSpecifier);
        if (deviceIds.Contains(id.ToString())
          yield return new Version(managementBaseObject["DriverVersion"].ToString());
      }
    }
  }
}

This code iterates over every instance in the Win32_PnpSignedDriver WMI class that has a particular driver class GUID and driver provider name, uses a regex to extract the DEV numbers from the PCI device ID, and checks if it is in a list of supported controller IDs.

This is implemented as an enumerator pattern, using yield. This normally allows for lazy enumeration and early-out optimisation, but that detail is irrelevant because this function is called from the following static constructor:

public static readonly List<Version> DriversVersions = 
  SystemReportProvider.GetVrocDriversVersions()
    .OrderBy(version => version)
    .ToList();

Since there’s an OrderBy call, the source must be exhaustively enumerated, making the yield pattern essentially pointless. This field is also accessed in the IAStoreUI application’s startup function, so it has to be initialised before the program can load.

The main problem is the WMI query. It’s excruciatingly slow, in part because the Win32_PnpSignedDriver class is designed to provide digital signature information about PnP drivers. Information has to be collected from various registry locations, the driver files themselves, security catalogues, etc., processed, and collated to produce the instances. It takes forever. The fact that the query has some conditions doesn’t help the situation because all instances still have to be collected in order to evaluate those conditions. Testing the query in WMI Explorer resulted in an execution time of 9 minutes and 28 seconds.

The code is intended to extract the Product Version fields from any driver executables relating to installed devices with the right PCI device ID. This task does not require extremely slow WMI queries. It can achieved through the Setup API, e.g. SetupDiGetClassDevsW, but P/Invoke is a bit trickier to inject into .NET binaries so I decided to just manually pull the data from the registry.

PCI devices are enumerated in the HKLM\SYSTEM\CurrentControlSet\Enum\PCI key:

Screenshot of Regedit showing the key in the tree. The subkeys are named with VEN, DEV, SUBSYS, and REV values as per standard PCI device path names.

Each of these subkeys is a PCI device path, and can be processed using the same Regex that Intel used in their code, to extract the DEV number. Each key has subkeys for device instances:

Screenshot of Regedit with a device key opened. The key is for an Intel Volume Management Device NVMe RAID Controller and it shows a service name of “iaVROC”.

The Service value tells us the name of the service that the driver is running under. Windows drivers are actually registered as services in practically the same way as a usermode service would be, but they don’t show up in the Services tab of Task Manager, or in services.msc. You can query them with the sc command, e.g. sc query type= driver.

In this case, though, we can continue to leverage the registry. Service entries are registered in HKLM\SYSTEM\CurrentControlSet\Services. Each subkey matches the service name. The iaVROC service, as named by the device key above, can be found in there:

Screenshot of Regedit showing the iaVROC service key. The ImagePath value contains the path to the driver image file: System32.sys

The ImagePath value contains the relative path from the Windows install location to the driver image file. From there, we can query the version information of the driver binary and get the same information that the application was getting via WMI.

This whole process can be turned into C# quite easily:

static IEnumerable<Version> GetVrocDriversVersionsFAST()
{
  List<Version> versions = new List<Version>();
  List<uint> deviceIds;
  PsiDataSource.GetSupportedControllerDeviceIds(out deviceIds);
  Regex deviceIdRegex = new Regex("^VEN_8086&DEV_(?'id'[A-Fa-f0-9]*)");
  string winDir = Environment.GetFolderPath(Environment.SpecialFolder.Windows);
  string pciPath = @"SYSTEM\CurrentControlSet\Enum\PCI";
  string svcPath = @"SYSTEM\CurrentControlSet\Services";
  using (var pciKey = Registry.LocalMachine.OpenSubKey(pciPath))
  using (var servicesKey = Registry.LocalMachine.OpenSubKey(svcPath))
  {
    foreach (var pciKeyName in pciKey.GetSubKeyNames())
    {
      Match match = deviceIdRegex.Match(pciKeyName);
      if (!match.Success)
        continue;
      
      uint id = uint.Parse(match.Groups["id"].Value, NumberStyles.AllowHexSpecifier);
      if (!deviceIds.Contains(id))
        continue;
      
      using (var devKey = pciKey.OpenSubKey(pciKeyName))
      {
        foreach (var devSubkeyName in devKey.GetSubKeyNames())
        {
          using (var devSubkey = devKey.OpenSubKey(devSubkeyName))
          {
            var serviceName = devSubkey.GetValue("Service") as string;
            if (serviceName == null)
              continue;
            
            using (var serviceKey = servicesKey.OpenSubKey(serviceName))
            {
              var imagePath = serviceKey.GetValue(@"ImagePath") as string;
              if (imagePath == null)
                continue;
              
              imagePath = Path.Combine(winDir, imagePath);
              if (File.Exists(imagePath))
              {
                var fvi = FileVersionInfo.GetVersionInfo(imagePath);
                versions.Add(new Version(fvi.ProductVersion));
              }
            }
          }
        }
      }
    }
    return versions;
  }
}

This returns exactly the same results as Intel’s WMI code, but takes milliseconds instead of minutes.

The next job was patching the code into the DLL. I had previously used Telerik JustDecompile and Reflexil for this kind of thing, but both of those are now unmaintained. Luckily, dnSpy is an extremely capable replacement for this, and made the job easier than ever before.

All I had to do was load the DLL into dnSpy, find the method I wanted to patch, right click it, select Edit Method, remove the old code, paste my new code in, fix up the using statements, and click OK. It replaced the entire method body with my new code:

Screenshot of dnSpy showing GetVrocDriversVersion in the DLL, but with my code instead of Intel’s.

From there all I had to do is save the DLL and copy it into the application folder. In some cases there’s an additional step required with Strong Name Signing, whereby you have to also patch the references in all the other executables and DLLs to remove the public key hashes so that the patched DLL will be resolved, but in this case Intel didn’t use SN signing on the application executable so it doesn’t matter.

All that’s left to do is run the program and see how long it takes.

Screenshot of the Intel Virtual RAID on CPU application. It shows an Intel VROC Standard interface with four NVMe SSDs attached, configured in a RAID0 named “SPEED IS KEY”.

It works! The application now loads instantly instead of taking 10+ minutes.