Optimizing Performance with Windows Azure Premium Storage

Azure recently started the Preview of Premium Storage. Premium Storage provides much better IO performance than standard storage. Standard storage can reach up to 500 IOPS per disk, with minimum latency between 4-5 ms. Premium Storage on the other hand can reach up to 5000 IOPS per disk, depending on the disk size (bigger disks get more IOPS, see Premium Storage: High-Performance Storage for Azure Virtual Machine Workloads). When  tuning performance on Premium Storage, you need to be aware of a few things. First of course the limts of the disks you provision. The table below (source: Microsoft) shows these limits.

Disk Type P10 P20 P30
Disk Size 128 GB 512 GB 1023 GB
IOPS per Disk 500 2300 5000
Throughput per Disk 100 MB/sec 150 MB/sec 200 MB/sec

The second limit is the machine limit, which is determined by the number of CPUs. Per CPU you get approximately 3200 IOPS and 32 MB/s bandwidth (or disk throughput). See the DS series table in Virtual Machine and Cloud Service Sizes for Azure. I will not go into creating a Premium Storage VM (you can read about that here), but rather how to see where your machine “hurting” if the application you are running isn’t performing well.

Configuring Performance Monitor

When looking for performance bottlenecks you’re basically always looking at 4 things:

  • CPU utilization
  • Memory utilization
  • Disk utilization
  • Network utilization

In this post I will focus on the first three, because I’ve mainly seen issues with these. CPU utilization and memory utilization are single metrics in performance monitor, but disk utilization consists of the number of reads and writes, and the throughput. Another disk metric is the length of the queued IO operations. If the application reaches the disk limits, the length of the queue goes up. To collect these metrics, you need to create a Data Collector Set in Performance Monitor and run that during testing. Take the following steps to do so:

  1. Start Performance Monitor (Windows + R, type perfmon)
  2. In the tree view on the left navigate to Performance \ Data Collector Sets \ User Defined
  3. Right click on the User Defined item and select New \ Data Collector Set, as shown below
    Creating a new Data Collector Set
  4. In the dialog that follows enter the name select Create manually (Advanced), and click Next.
  5. Select Performance counter, and click Next.
  6. Click Add…
  7. Select the following counters:
    Counter Instance(s)
    Processor \ % Processor Time _Total
    Memory \ Available Mbytes N/A
    Logical Disk \ Current Disk Queue Length _Total and *
    Logical Disk \ Disk Bytes/Sec _Total and *
    Logical Disk \ Disk Reads/Sec _Total and *
    Logical Disk \ Disk Writes/Sec _Total and *
  8. Click OK.
  9. Set Sample Interval to 1 second, and click Next.
  10. Select the location where the data must be saved. On Azure it makes sense to put the logs on the D: drive, which is a local (temporary) disk, instead of on one of the attached disks.
  11.  Click Next.
  12. If you want to start and stop the collection of data manually, click Finish. Otherwise, select Open properties for this data collector and click Finish.
  13. In the next dialog you can set a schedule for data collection. A very good idea is to set a Stop Condition, either for a maximum duration or a maximum file size.
  14. When you are done, you will see the new Data Collector under the User Defined Collector Sets.

When you’re ready to test your application, click the Data Collector and press the play button, or right click the Data Collector and select Start. When the test is done press the stop button, or right click the Data Collector and select Stop.

Analyzing the Data

The data collected by Performance Monitor is stored in CSV format. To use it, import it into Excel, as follows:

  1. Start Excel and create an empty worksheet.
  2. Go to the Data tab and click From Text under the Get External Data section.
    Import CSV
  3. Select the CSV file generated by Performance Monitor and click Import.
  4. Select My data has headers and click Next.
    Import Dialog 1
  5. Select Comma as delimeter and click Next.
    Import Dialog 2
  6. Ensure . is used as decimal delimer. Click Advanced… and ensure the values in the popup are as shown below. Then click OK.
    Import Dialog 3
  7. Click Finish.

Once the data is imported, you can create charts that tell you what’s going on. Not all columns are handy at one time, so you may want to create a copy of the sheet and remove certain columns before creating a chart. For instance, in the below chart I only kept the columns with disk reads and disk writes.

Disk IO chart

As you can see, Logical Disk F tops off at approximately 2400 IOs. A closer look at the test data also shows that all disks together never use more than 3160 IOs, with CPU and memory were not impacted. In a second test I added a P30 disk, and moved the data previously on Disk F to this disk (Disk X). The results of that test are shown in the chart below.

Disk IO chart

Notice that disk X tops off at approximately 5000 IOs. Total IOs for all disks never reached above 5300.

Understanding the data

A key point is that there is more or less a one-to-one relationship between the IOs measured and the IOPS specifications of the disk as shown in the table at the top. Combined with the knowledge that a 2x CPU VM has approximately 6400 IOPS available, that means there is still room for improvement. You could change the appication to use multiple disks, or you can use Windows Storage Spaces to combine physical disks to form a logical disk.

if you would analyze the disk queue length also measured, you would see that it has quite a few IOs queued, as an indication it can’t handle more IOs. The total bytes transferred would however not reach its maximum. For other types of workloads, you may see different behavior. Applications stream large files for instance, will likely have less IOs, but will reach VM or disk throughput limits. Because I used disks with higher throughput than available for the VM, my throughput would be capped at 64 MB/s. Still other workloads may be more CPU bound, and will show % Processor Time at 100 for extended periods.

Understanding how the workload affects your Azure VM helps you determine what to do when your performance isn’t what you want it to be. In my case adding a faster disk improved the performance. If I want even better performance, I have to add disks and utilize their IO capabilities. Once the test shows no longer shows a cap on IOs, I have to look elsewhere. Note that if you reach the VM IO cap, you need to get a bigger VM to get better performance, which will also decrease the likelyhood that you will reach CPU or memory limits.

Leave a Reply

Your email address will not be published. Required fields are marked *