Monthly Archives: February 2015

Managing and Updating Custom Azure VM Images

When you create a Virtual Machine in Azure, you do this from an image. The Azure gallery contains quite a few images with Windows, several flavors of Linux, and some with middleware such as BizTalk Server, SharePoint Server, and Oracle WebLogic. When updates are available for the operating system or middleware, the images are updated so you don’t have to install the updates yourself after creating a new VM. This is great, because updating can take quite some time. Instead, you can go straight to adding your own software and configuration. In a typical environment this is what costs the most time, regardless of whether this is for development, test, or production purposes. You can speed that up by using scripts and desired state configuration through PowerShell DSC, Puppet, or Chef. Another option is to create custom images with your own configuration. This is particularly effective in scenario’s where you need to be able to create additional VMs quickly, for instance when ramping up a test team, or adding developers to a team. Even if you script it, post-configuration of a VM created from the gallery can take quite some time. Installing Visual Studio including updates, needed SDKs etc. can easily take 3-4 hours, and more complicated setups even longer.

Image Types

In Azure you can use two types of images, generalized images and specialized images. The difference between the two is very signifcant.

Generalized images have been stripped of computer specific information such as computer name and system identifier (SID). This ensures (virtual) machines provisioned from that image are unique. It also means that you need to provide it with a new set of credentials to access the VM. The images you get from the gallery are generalized images. You generalize a Windows image with sysprep, and a Linux image with the waagent command. The biggest problemof doing this, is that some software doesn’t work well after you’ve created a VM from a generalized image, because the software configurationis based on some computer specific information. For SQL Server for instance you need to take some additional measures, as explained in Install SQL Server 2014 Using SysPrep. Another example is SharePoint, which will only work if you don’t run the Sharepoint Products and Technologies Configuration (PSConfig) Wizard before creating the image. Basically this means you can install SharePoint, but you can’t configure it yet.

Specialized images on the other hand are basically just a copy of the original virtual machine. If you provision a VM based on such an image, you basically get a VM that is an exact copy of the original, including the name, SID, etc. For a single machine environment, this is not much of an issue. However, in a virtual network and with computers linked to a directory, it is an issue if multiple computers have the same name.

Creating Images

How to create an  image has been described in numerous places. This step-by-step guide tells you in detail how to do it for both Windows and Linux VMs. I’ll summarize the steps for both image types, so you have an understanding of the steps involved. Be aware that when you create a custom image, the image is stored in the storage account you created the VM in. In addition, any VM you create from the image will use the same storage account. This has two important consequences:

  1. You can only create a VM in the region where the image is stored. If you need it in other regions as well, you need to copy or recreate it there.
  2. You need to be aware of the limits of a storage account. These are listed here. The most important is the 20.000 maximum for Total Request Rate. This basically boils down to a maximum of 20.000 IOPS per storage account.
    Note: these limits apply to regular storage the limits for Premium Storage are different.

Creating a Generalized Image

  1. Create a VM in Azure based on the gallery image of your choice.
  2. Configure the VM to your liking. Keep in mind that some restrictions apply, as discussed earlier. If you’re unsure, check whether the software you install can survive the generalization process.
  3. Generalize the image (see the step-by-step guide for details).
  4. When generalization is done, the VM will be marked as Stopped in the Azure Portal. You can then do a Shutdown in the Azure Portal, so it won’t incur any more charges.
  5. Capture the image. Make sure you select the option I have run Sysprep on the virtual machine as shown below.Capture Image

Note that after you’ve captured the image the VM is gone.

Creating a Specialized Image

  1. Create a VM in Azure based on the gallery image of your choice.
  2. Configure the VM to your liking.
  3. Shutdown your VM.
  4. Capture the image. Do not select the option I have run Sysprep on the virtual machine.

After the image is captured,you can start the VM again, and continue using it.

Using Custom Images

You can select the images you created by creating a VM from the gallery. One of the gallery items is My Images, as shown below. You also see that the image information tells you whether the image is Specialized or Generalized.

Choose Image

When you select a Generalized image, the process of creating a VM is pretty much the same as with an image provided by Azure. The major difference is that you can’t select the storage account, and you can only select the region (or affinity groups or virtual networks in that region), in which that storage account is located. The same applies when you select a Specialized image, but than you also can’t provide the credentials of the administrator account. That is the same as in the VM the image was created from (so you need to keep that information somewhere).

Keeping Images Updated

Keeping specialized images up to date is easy. You create one VM that you only use as a base. VMs you run in your environment are actually a copy of the base VM. The base VM is turned off most of the time.You just fire it up when you need to apply updates. When you’ve applied updates, you shut it down again and capture an updated version of the image. This is particularly useful in scenarios where there is a single machine that may need to be redeployed at some point. A good example is a production environment in which you want to keep a working copy around of a VM, so you can quickly go back to a working state if the running VM breaks.

If your environment is more complex and you need generalized images, the process is slightly more involved. You still create a base VM as explained above. But then you need to take some additional steps.

  1. Capture the base VM as a specialized image.
  2. Create a new VM from the specialized image (VM 2).
  3. Generalize VM 2.
  4. Create a generalized image from VM 2 (which deletes VM 2).
  5. Delete the specialized image.
  6. Update VM 1 when needed.
  7. Repeat steps 1 through 5 to create an updated image.

Ad. 5: Alternatively you can delete the VM instead, and create a new VM based on the specialized image instead. My experience is that it is easier to keep the base VM around.

You may wonder why you just wouldn’t delete the base VM, and create a VM of the specialized template to perform updates in. The reason is that you can only generalize (Windows) VMs two times, so after the first update, you can’t upate and generalize again. By keeping around the base VM, you’re Always generalizing for the first time.

You typically don’t want to perform updates in the production environment. This is mostly a networking issue, except for the storage account limits discussed earlier. If you have your acceptance environment setup in the same Azure subscription, but in a different VNET, you can update in the acceptance environment, and then promote to the production environment. Remember that because everything is tied the same storage account, this also means the storage account is used for both your acceptance and production environments. Whether this is an issue depends on your specific requirements for acceptance and production environment.

Alternatively you can create a separate “Update VNET”, in which you only perform updates. and lastly you can copy images from one storage account to another, even if these are not in the same subscription. In that case you have to copy the underlying blobs, and make them into an image. How to do that is explained here.

Optimizing Performance with Windows Azure Premium Storage

Azure recently started the Preview of Premium Storage. Premium Storage provides much better IO performance than standard storage. Standard storage can reach up to 500 IOPS per disk, with minimum latency between 4-5 ms. Premium Storage on the other hand can reach up to 5000 IOPS per disk, depending on the disk size (bigger disks get more IOPS, see Premium Storage: High-Performance Storage for Azure Virtual Machine Workloads). When  tuning performance on Premium Storage, you need to be aware of a few things. First of course the limts of the disks you provision. The table below (source: Microsoft) shows these limits.

Disk Type P10 P20 P30
Disk Size 128 GB 512 GB 1023 GB
IOPS per Disk 500 2300 5000
Throughput per Disk 100 MB/sec 150 MB/sec 200 MB/sec

The second limit is the machine limit, which is determined by the number of CPUs. Per CPU you get approximately 3200 IOPS and 32 MB/s bandwidth (or disk throughput). See the DS series table in Virtual Machine and Cloud Service Sizes for Azure. I will not go into creating a Premium Storage VM (you can read about that here), but rather how to see where your machine “hurting” if the application you are running isn’t performing well.

Configuring Performance Monitor

When looking for performance bottlenecks you’re basically always looking at 4 things:

  • CPU utilization
  • Memory utilization
  • Disk utilization
  • Network utilization

In this post I will focus on the first three, because I’ve mainly seen issues with these. CPU utilization and memory utilization are single metrics in performance monitor, but disk utilization consists of the number of reads and writes, and the throughput. Another disk metric is the length of the queued IO operations. If the application reaches the disk limits, the length of the queue goes up. To collect these metrics, you need to create a Data Collector Set in Performance Monitor and run that during testing. Take the following steps to do so:

  1. Start Performance Monitor (Windows + R, type perfmon)
  2. In the tree view on the left navigate to Performance \ Data Collector Sets \ User Defined
  3. Right click on the User Defined item and select New \ Data Collector Set, as shown below
    Creating a new Data Collector Set
  4. In the dialog that follows enter the name select Create manually (Advanced), and click Next.
  5. Select Performance counter, and click Next.
  6. Click Add…
  7. Select the following counters:
    Counter Instance(s)
    Processor \ % Processor Time _Total
    Memory \ Available Mbytes N/A
    Logical Disk \ Current Disk Queue Length _Total and *
    Logical Disk \ Disk Bytes/Sec _Total and *
    Logical Disk \ Disk Reads/Sec _Total and *
    Logical Disk \ Disk Writes/Sec _Total and *
  8. Click OK.
  9. Set Sample Interval to 1 second, and click Next.
  10. Select the location where the data must be saved. On Azure it makes sense to put the logs on the D: drive, which is a local (temporary) disk, instead of on one of the attached disks.
  11.  Click Next.
  12. If you want to start and stop the collection of data manually, click Finish. Otherwise, select Open properties for this data collector and click Finish.
  13. In the next dialog you can set a schedule for data collection. A very good idea is to set a Stop Condition, either for a maximum duration or a maximum file size.
  14. When you are done, you will see the new Data Collector under the User Defined Collector Sets.

When you’re ready to test your application, click the Data Collector and press the play button, or right click the Data Collector and select Start. When the test is done press the stop button, or right click the Data Collector and select Stop.

Analyzing the Data

The data collected by Performance Monitor is stored in CSV format. To use it, import it into Excel, as follows:

  1. Start Excel and create an empty worksheet.
  2. Go to the Data tab and click From Text under the Get External Data section.
    Import CSV
  3. Select the CSV file generated by Performance Monitor and click Import.
  4. Select My data has headers and click Next.
    Import Dialog 1
  5. Select Comma as delimeter and click Next.
    Import Dialog 2
  6. Ensure . is used as decimal delimer. Click Advanced… and ensure the values in the popup are as shown below. Then click OK.
    Import Dialog 3
  7. Click Finish.

Once the data is imported, you can create charts that tell you what’s going on. Not all columns are handy at one time, so you may want to create a copy of the sheet and remove certain columns before creating a chart. For instance, in the below chart I only kept the columns with disk reads and disk writes.

Disk IO chart

As you can see, Logical Disk F tops off at approximately 2400 IOs. A closer look at the test data also shows that all disks together never use more than 3160 IOs, with CPU and memory were not impacted. In a second test I added a P30 disk, and moved the data previously on Disk F to this disk (Disk X). The results of that test are shown in the chart below.

Disk IO chart

Notice that disk X tops off at approximately 5000 IOs. Total IOs for all disks never reached above 5300.

Understanding the data

A key point is that there is more or less a one-to-one relationship between the IOs measured and the IOPS specifications of the disk as shown in the table at the top. Combined with the knowledge that a 2x CPU VM has approximately 6400 IOPS available, that means there is still room for improvement. You could change the appication to use multiple disks, or you can use Windows Storage Spaces to combine physical disks to form a logical disk.

if you would analyze the disk queue length also measured, you would see that it has quite a few IOs queued, as an indication it can’t handle more IOs. The total bytes transferred would however not reach its maximum. For other types of workloads, you may see different behavior. Applications stream large files for instance, will likely have less IOs, but will reach VM or disk throughput limits. Because I used disks with higher throughput than available for the VM, my throughput would be capped at 64 MB/s. Still other workloads may be more CPU bound, and will show % Processor Time at 100 for extended periods.

Understanding how the workload affects your Azure VM helps you determine what to do when your performance isn’t what you want it to be. In my case adding a faster disk improved the performance. If I want even better performance, I have to add disks and utilize their IO capabilities. Once the test shows no longer shows a cap on IOs, I have to look elsewhere. Note that if you reach the VM IO cap, you need to get a bigger VM to get better performance, which will also decrease the likelyhood that you will reach CPU or memory limits.