Everyone Deserves a GPU. Welcome the GRID M10!

Today, NVIDIA announced the newest member of the GRID lineup . . . . . The M10! It joins its family members, the M6 and M60.

The number of applications that require graphics acceleration is growing. Most modern browsers and Microsoft Office applications are graphics accelerated. There is also quite a bit of video content that is business related like training videos and corporate announcements, etc.  Add in video conferencing and that’s a lot of graphics happening in a business virtual desktop.

This creates a need in a virtual desktop infrastructure for a GPU that can give good graphics performance but that is affordable enough to provision to the large base of knowledge workers. Previously, GPU-enabled virtual desktops may have been reserved for people using 3D design applications, etc. NVIDIA is addressing this need head-on with the M10 card.

The new M10 card should be generally available in Fall of 2016 although there may be some small numbers of sample cards in the market prior to that.

In the image below, you can check out the M10’s specs. The frame buffer per physical GPU on the card is the same as the M6 and M60 card at 8GB. The M10 has 4 physical GPUs per card while the M60 has 2 and the M6 has 1. However, the GPU core count is lower at 640 cores per physical GPU. The M60 has 2048 cores and the M6 has 1536.  So the M10’s increase in physical GPUs and decrease in core count position it to support higher numbers of business users per host.

Below is a simple chart to compare the three different members of the GRID family.  The chart shows the density per card.   Check the GRID HCL at the link below to determine how many cards are supported in specific server models.  Then you can determine the maximum number of users per host.  http://www.nvidia.com/object/grid-certified-servers.html

CompareM10

You can see some additional comparisons in the chart below.

Check with your OEM for their pricing on the cards. The per-concurrent-user licensing is an additional cost. For an in depth look at GRID 2.0 and the new license model, please read my previous article on GRID 2.0 here: http://bit.ly/1qnHYcg. The license editions that are used for the M6 and M60 also apply to the M10.

The M10 is aimed at the two lower-end “Virtual PC” or “Virtual Apps” license editions. The graphic below shows the subscription fees for both license editions on an M10 card.

I think that the new M10 is a plus for the VDI world. Someday we may not consider whether or not to put a GPU into a virtual desktop. It’s just like we really don’t consider whether or not to put a GPU into a laptop these days. It’s just standard practice. I’m eager to see if the M10 will pave the way for that day!

-Richard

Come on down to the UUID Rodeo! – A deep dive into virtual disk architecture for XenDesktop on XenServer

I recently needed to determine if there were any orphaned virtual disks that had been created by XenDesktop and not properly cleaned up. This XenDesktop environment is connected to a Citrix XenServer that has NFS shared storage. I wanted to write this article because navigating to find specific virtual disks in XenServer is not as straightforward as it is in vSphere. This article may also help you better understand XenDesktop’s Linked Clones virtual disk architecture.

For those of you not familiar, Citrix XenDesktop has a method to clone a master image into subsequent virtual desktops. This method is called Machine Creation Services or “MCS.” A group of virtual machines created from the same master image is called a Machine Catalog. These virtual desktops are called “linked clones” because of their virtual disk structure. “Linked Clone” means that each virtual desktop has a delta disk that accepts changes as the VM runs. However, all the delta disks in the Machine Catalog point back to the same, read-only base disk. The base disk is essentially a copy of the master virtual machine’s disk at the time the Machine Catalog was created.

If you are like me, you started your virtualization career working with VMware vSphere. After getting used to how things are done in that ecosystem, well, that’s just how you expect it to be. When I needed to investigate the virtual disks created by MCS on XenServer, it was different enough that I wanted to share the steps.

To begin the investigation of how XenDesktop on XenServer organizes its virtual disks, let’s first look at how it is done with XenDesktop on vSphere. In general, vSphere stores a virtual machine’s files in a folder that is named after the virtual machine. In the screenshot below, we are looking at vSphere’s Datastore Browser. The virtual machine named “XD71_2” is stored in a folder named “XD71_2” and the virtual machine’s files begin with the same name.

In the case of Citrix’s Machine Creation Services, it creates a folder named after the Machine Catalog to hold the base disk for the linked clones. The folder name starts with the name of the Machine catalog, and includes “baseDisk” in the name. The virtual disk (.vmdk) file is also named after the Machine Catalog.

Pretty good, right? It is very straightforward to find what you are looking for. Now, let’s look at how XenServer organizes its virtual disks. XenServer, places all VHD’s (virtual disks) into one directory and all VHD’s are named by UUID. This makes it much harder for the human eye to see the relationships between virtual desktops, machine catalog master-disks, linked-clone base disks, delta disks, and Identity Disks.

**Please note that the following exploration below does not cover Machine Catalogs of the “Pooled” type or with Personal vDisks. Only “Dedicated” Machine Catalogs are covered. “Dedicated” Machine Catalogs are created when the two options of “static” and “dedicated” are chosen in the Machine Catalog Creation wizard. Screenshot below.

DedicatedChoices

Dedicated Machine Catalogs can also be recognized in Citrix Studio by checking that the “User Data” field is set to “On local disk.” See the comparison below of a Dedicated Machine Catalog to a Catalog that uses a Personal vDisk.

StudioViewOfDedicated
In my first step of investigating how Citrix Machine Creation Services organizes virtual disks on XenServer, I ran the following command which shows all VHD’s in the desired Storage Repository (SR). The Storage Repository is referenced by this UUID, “5fe934bb-f1a9-6e3d-38ed-0f49f7f113d5.”

vhd-util scan -f -p -m ‘/var/run/sr-mount/5fe934bb-f1a9-6e3d-38ed-0f49f7f113d5/*.vhd’

The output shown below is structured so that child VHD’s are indented below their parent VHD. Child VHD’s rely on, or point back to, their parent VHD. The “delta” disks of a linked clone are child disks and point back to the base disk.

At the end of each line below, it either says “parent=<UUID>” or “parent=none.” If the VHD has a parent, it notes the parent’s UUID. Alternatively, it says “none” if it is a disk that has no dependency on a parent disk. See the two examples bordered in red below.

CLI_ParentEquals

The output above is from just a handful of machine catalogs and virtual desktops so you can see how investigating your virtual disk architecture in a larger environment could be a little challenging.

The first thing I wanted to do is determine which virtual machine the first parent disk belonged to. This is the first VHD listed in the output above. Its filename ends in “2c016.vhd.” This VHD has no parent as the end of the line notes “parent=none.”

The below command normally returns the virtual machine name related to the virtual disk. I ran this command to determine which virtual machine owned this virtual disk.

xe vbd-list vdi-uuid=<UUID_of_disk>

xe vbd-list vdi-uuid=461ab151-4e27-496f-acf1-abe7ecb2c016

However, running this command to find the name of the VM associated with this VHD fails; it returns nothing. In the screenshot above, there were a number of UUID’s that did not return a related virtual machine. Given this lack of information, I needed to narrow down the number of VHDs to be able to clarify the relationships in figuring this out for the first time. Otherwise, there were too many UUID’s to do a process of elimination and narrow down which UUID belonged to which virtual machine. To narrow this down, I created a new XenServer Storage Repository on the SAN. I created a new Host Connection in XenDesktop to define this new storage location. Then, I created a new machine catalog with only one virtual desktop. With this done, I opened two CLI sessions to the same XenServer host and put the output side by side, as shown below.

In the left pane, the “vhd-util” scan command shows all VHD’s in this new Storage Repository and is structured so that child VHD’s are indented below their parent VHD. This is the same command shown at the beginning of this article. In the right pane, the “vdi-list” command shows all VDI’s or “Virtual Disk Images” on the Storage Repository.  (You’ll need to zoom in on this image a bit . . . )

CLIx2

The text colors map the UUID in the left pane to the same UUID in the right pane. I will elaborate on each UUID, organized by color below:

GREEN: The UUID’s of the Storage Repository (SR).

BLUE:
The blue VHD listed in the left pane corresponds to the blue VDI in the right pane. This blue Virtual Disk Image (VDI) in the right-pane has a name-label of “base copy.” All the virtual disks ending in “-diff” (look at yellow VDI in right-pane) point back to this base virtual disk. This is the base disk that linked-clones in a “Dedicated” Machine Catalog use. Its name-label is always generically named “base copy,” making it harder to match to a virtual desktop or Machine Catalog.

YELLOW:
 The name-label for these Virtual Disk Images ends in “-diff.” Any changes to the virtual desktop after provisioning are captured in this differential/delta disk.

ORANGE:
With a “Dedicated” machine catalog type, this is the virtual disk that newly spawned virtual desktops will be created from. This disk is not attached to a virtual machine so therefore it cannot be started. Changes/updates to it can only be made via XenDesktop PowerShell commands. I coined the term “master-disk” to describe this virtual disk.

It is interesting to note that the “master-disk” of the machine catalog is actually a snapshot on top of the (blue) “base copy” for the Linked Clones base disk. This can be seen below because the (orange) “master-disk” in the left-pane lists its parent disk. Its parent disk is the (blue) base disk. When a Machine Catalog creation is initiated, a relatively long copy process can be seen on the hypervisor. This process is copying the bootable, master virtual machine’s disk to the (blue) Machine Catalog’s base disk.

RED: The Identity Disk is always a 16MB file that contains the computer name and Active Directory account password (among other things). This ensures each VM is unique. This disk is not a Linked Clone itself, even if it is associated with a virtual desktop that is a Linked Clone. That’s why it shows that it has no parent disk.

As you can see, the cross-referencing of UUID’s in XenServer is certainly more time-consuming than the friendly names of folders and files in vSphere. Is this like being at the UUID rodeo? You be the judge. I personally would like XenServer to move towards an organization of virtual disks that is easier on the human eye!

I hope this helps anyone that needs to dive into their XenDesktop virtual disks on XenServer!

-Richard

NVIDIA GRID 2.0 Deep Dive & Configuration!

At VMworld 2015, NVIDIA announced the release of their GRID 2.0 virtual GPU product. It’s the follow-up to the ground-breaking GRID 1.0 technology released in late 2013 that provides true hardware-virtualization of the GPU into multiple virtual desktops. I’ve worked a lot with the GRID 1.0 product and I was excited to get my hands on a GRID 2.0 M60 card!

I am also excited about the recent release on April 4, 2016 of the revised GRID 2.0 license structure and costs! This revision has simplified the license model and lowered the costs and I will explain this in detail.

I think the best way to explore what’s new in GRID 2.0 is to compare it to what we know about GRID 1.0. There is a lot that I want to cover on the differences between GRID 1.0 and 2.0 so let’s dive right in. I’ve broken this article into the following sections:

-Technical comparison of GRID 1.0 and GRID 2.0
-GRID 2.0 licensing requirements
-Hardware details
-Complete hardware and software setup of the environment

Technical Comparison of GRID 1.0 and GRID 2.0:
The two new card types in GRID 2.0 are called “M6” and “M60.”   The M60 is the higher performing card and is a PCI card similar to the K1 and K2’s.  It fits inside a rack-mounted server.  However, the M6 card has a “MXM” form factor, which is much smaller and is designed for blade servers.   Given all the changes between product lines, I am focusing this comparison on the M60 cards as it will be the best comparison to the GRID 1.0 products.

GRID 1.0 and 2.0 vary in some of the following specs:

-Number of GPU cores (CUDA Cores)
-Clock speed of the GPU cores
-Amount of video RAM (frame buffer) allocated to each user
-Maximum screen resolution and 4k monitor support
-Maximum displays per user
-User density
-CUDA support
-OpenCL support
-Linux support for vGPU
-GPU passthrough support
-Number of H.264 Encoders
-H.265 Support

Linux Support:
GRID 2.0 has some notable improvements over 1.0. With the Virtual Workstation license, vGPU for Linux guest OS’s is possible. VMware Horizon View 6.2.1 and onwards can do a true vGPU Linux based session.

The GRID 2.0 product documentation says that, “GRID vGPU with Red Hat Enterprise Linux 7 / CentOS 7 guest VMs is supported on Tesla M60 and M6, as a technical preview feature on Citrix XenServer 6.5.”

Check out the excerpt below from the XenServer Tech Preview page (as of April 2016): https://www.citrix.com/products/xenserver/whats-new.html

“Enhanced NVIDIA GRID GPU support – an update to existing GRID GPU pass-through support enables deployment of Linux workloads with enhanced graphics.”

The excerpt only notes “pass-through” support here and does not mention vGPU.

And now Citrix XenDesktop’s VDA (Virtual Delivery Agent) supports GPU acceleration in a Linux virtual desktop. This Citrix blog from 4/4/2016, https://www.citrix.com/blogs/2016/04/04/virtualize-linux-3d-applications-with-hdx-3d-pro-for-linux/, says the following:

“HDX 3D Pro support is available with Red Hat Enterprise Linux 7.2 on both Citrix XenServer and VMware vSphere, in GPU pass-through mode. In addition, enables NVIDIA GRID H.264 hardware encoding (which works especially well with the H.264 hardware decoding in the Citrix Receivers for Linux and Windows).”

However, note that this is passthrough GPU in a Linux session and not vGPU. I am sure that vGPU for Linux is in the works at Citrix though.

4K Monitors:
GRID 1.0 can support a 4k monitor only with a K2 passthrough profile (http://support.citrix.com/article/CTX201696).    GRID 2.0 supports 4k monitors for several profile levels as long as you purchase the “Virtual Workstation” licenses.  (More on the licensing shortly).  If 4k monitors are important to your users, you’ll definitely be able to get more 4k monitor users per host with GRID 2.0.

CUDA Support:
GRID 2.0 also supports CUDA in a vGPU session.  CUDA is essentially an API to the GPU’s to allow developers to program the GPUs.  Some applications use CUDA and it is also widely used in scientific computing. In GRID 1.0, CUDA is accessible only in passthrough profiles.  GRID 2.0 allows CUDA to run only in the 8GB vGPU profiles.  On the M60 card, that is the M60-8Q and on the M6 it is the M6-8Q. This is a “vGPU” profile but one VM consumes the entire, physical GPU. In passthrough, one VM consumes an entire physical GPU as well.  So why is this better than just doing it via passthrough like it was done in GRID 1.0? One reason is that you can monitor the GPU usage from the hypervisor. Otherwise, the hypervisor knows nothing about the PCI device that it is “passing through” to the VM. You’d then need to get the GPU usage metrics from inside the VM, and that’s just a less centralized way of doing things.

OpenCL Support:
OpenCL is now supported. You could run OpenCL in a passthrough profile in GRID 1.0 and now it’s possible only in the 8GB vGPU profiles (M60-8Q & M6-8Q).

GPU Cores:
The number of GPU cores, or “CUDA cores,” in GRID 2.0 are higher than GRID 1.0.  The K1 cards have 192 cores available to each VM and the K2 cards have 1536 available to each VM.  The GRID 2.0 M60 cards have 2048 cores available to each VM.  As I noted, I am not focusing my comparison on the GRID 2.0 M6 cards but they have 1536 cores.  The comparison of the GPU cores are shown in the chart below. Also in the chart below, you will see references to “boost” and “maximum boost.” This is similar to “Turbo Boost” mode on Intel CPU’s where the clock speeds can temporarily exceed the usual maximum. Overall, the clock speeds in GRID 2.0 have increased but you can see how that shakes out in the chart below.

Video RAM or Frame Buffer:
The highest amount of video RAM available to each VM in GRID 2.0 is 8GB.  This compares to a maximum in GRID 1.0 of 4GB.   The other GPU profiles do vary in their amount of video RAM but essentially the RAM capacity is doubled in all cases in GRID 2.0.

If you really want to nitpick the specs of the GPU and Video RAM, I’ve compiled all the specs in one place below. Hopefully, someone out there needs to verify these specs to make a purchasing decision! Otherwise, it’s just interesting to see how the cards stack up against each other.

HardwareChart_png

Encoding:
The new Maxwell GPU architecture has 6 times the encoding capability over the Kepler GPU architecture in the GRID 1.0 release. The calculations below show how the number of supported simultaneous H.264 streams compare between cards. Each physical GPU has a graphics encoder engine in it called “NVENC.” This encoder engine is what encodes the graphics stream into H.264, for example. Each graphics card has varying numbers of physical GPU’s on it as you can see in the calculations below.

K1 Card: 4 GPUs x 3 H.264 streams per Kepler GPU = 12 simultaneous 1080p30 H.264 streams
K2 Card: 2 GPUs x 3 H.264 streams per Kepler GPU = 6 simultaneous 1080p30 H.264 streams
M6 Card: 1 GPU x 18 H.264 streams per Maxwell GPU = 18 simultaneous 1080p30 H.264 streams
M60 Card: 2 GPUs x 18 H.264 streams per Maxwell GPU = 36 simultaneous 1080p30 H.264 streams

The above specs can be affected by a number of factors. For example, if you are encoding 720p @ 30 FPS, instead of 1080p @ 30 FPS, the Kepler NVENC encoder can do 6 simultaneous streams. That’s double the 3 simultaneous streams for a Kepler GPU at 1080p. Other factors that will affect the concurrent sessions are the types of applications being streamed, the hardware encoder settings, and the CPU’s.

The below diagram gives a conceptual view that the 3D graphics engine (shown below as “3D”) and “NVENC” encoder engine are separate.

Here’s an interesting phenomena in the Kepler architecture GRID cards. If there are many concurrent sessions, the K2 cards are more likely to have performance constrained by the NVENC encoder. The opposite is true of the K1, where the performance would more likely be constrained by the 3D engine during many concurrent sessions. This is because the K2 card has a more powerful 3D graphics engine that can generate more Frames per Second, etc. and half the encoders to handle that output. The K2 card has 2 physical GPU’s on it therefore it only has two encoders. Contrast that to the K1 card that has 4 physical GPUs on it with 4 encoders total. The K1 also has a less powerful 3D graphics engine. Therefore the K1 card’s performance would be more likely constrained by its 3D engine during many concurrent sessions.

Let me give you some background information on H.264 and the architecture of the GPU to help explain if having more H.264 encoding capability on the GPU matters to your or not.

H.264 is the same as MPEG-4 AVC (Advanced Video Encoding). H.264 is used in Blu-Ray, YouTube, Adobe Flash, and Silverlight. It is designed to support low and high bitrates and low and high resolutions. This is a good fit for Citrix’s HDX protocol that is designed to give good performance in a variety network situations.

H.264 was integrated into XenDesktop when the HDX 3D Pro feature was released in Citrix XenDesktop version 7.0. At NVIDIA’s GPU Tech Conference in 2015, one of Citrix’s product managers referenced in his presentation that H.265 was on the roadmap for the HDX protocol. I don’t know if that is still the case though and that is the only place I have heard it mentioned. When I spoke with him at GTC 2016, he noted the fact that H.264 decoding is already present on most physical endpoints in the world. However, H.265 decoding is not ubiquitous yet so that’s a consideration for pursuing the integration of that protocol. That’s a great point that I had not considered.

The newer H.265 standard is synonymous with HEVC or High Efficiency Video Encoding. It has double the compression efficiency as H.264. That will allow the same quality at half the bitrate or the same bitrate at a much higher quality.

The new Maxwell GPU architecture in GRID 2.0 also has the ability for (the newer standard) H.265 encoding. From NVIDIA’s literature, “This improves encoding quality, which will enable richer colors, preserve more details after video encoding, and result in a high quality user experience.”

So does any of this matter to you? Does the increase in video encoding capability in GRID 2.0 benefit you? XenDesktop, 7.x has been using the CPU to encode graphics. With the release of VMware Horizon View 7, the “Blast Extreme” protocol now offloads encoding from the CPU to the GPU. Here’s the release statement: http://blogs.nvidia.com/blog/2016/02/09/nvidia-grid-blast-extreme-vmware-horizon/.

As of now, the increased NVENC encoding capabilities will not benefit you in XenDesktop, up to version 7.8 in Windows. However, as I mentioned earlier, the Linux VDA in XenDesktop 7.8 does support encoding offload to the GPU as noted in this Citrix blog. https://www.citrix.com/blogs/2016/04/04/virtualize-linux-3d-applications-with-hdx-3d-pro-for-linux/.

Encoding offload in Windows must be close behind in general availability and is implied in this recent Citrix blog: https://www.citrix.com/blogs/2016/04/07/whats-new-in-linux-virtual-desktop-1-2/

Here’s the relevant excerpt from the above blog:

“The latest performance enhancements for HDX have been introduced in Linux VDA, ahead of their availability in the Windows VDA. It leverages NVIDIA GRID H.264 hardware encoding on GPU-enabled Linux-HDX-3D-Graphicsservers, for greater server scalability and rich graphics performance. On clients that have GPU available, H.264 hardware decoding complements the high definition experience, in the Citrix Receivers for Linux, Chrome OS and Windows.”

I want to call out that there is offloading of both the encode and decode operations. The decoding offload occurs on the physical endpoint. Horizon 7 clients can offload decoding on Windows, Linux, iOS, and Android. The Citrix blog above notes that decode offload can work on Citrix Receivers for Linux, Chrome OS and Windows.

Putting all the specs and densities side-by-side:
Given all of the technical changes between 1.0 and 2.0., and all the different vGPU profiles, I have created the below chart that puts all the specs side by side. You can decide if the spreadsheet is annoying large, or helpful!

There are two columns in the chart that show the RAM and CPU resources that each virtual desktop will have with each GPU profile.    These two columns are titled:

“Max System RAM per User @ 512 GB/host”
“CPU Over subscription ratio (4 vCPUs / 40 logical cores/host)”

These calculations for RAM and CPU assume that the virtual desktops have 4 vCPUs and 8 GB of RAM. These specs are only for example purposes. You need to assess your own RAM and vCPU needs very carefully. This is a worthwhile exercise to see how your densities play out.

The server density calculations in the chart below are based on two SuperMicro models. Here’s how the card densities break down for each server model. By the way, the GRID hardware compatibility list is here: http://www.nvidia.com/object/grid-certified-servers.html

SuperMicro SuperServer 1027GR-72R2 supports 2 K1 cards

SuperMicro SuperServer 1027GR-72R2 supports 3 K2 cards

SuperMicro SuperServer 1028GQ-TR supports 4 M60 cards.  (This model holds 4 M60 cards in a 1U chassis! That’s pretty amazing.)

The cells highlighted in red indicate that I consider the resource is too low or constrained for this environment. Again, I am not recommending these specs. This is simply an example of how you can examine your densities.  For example, look at the last row for the “M60” cards in the chart.   Because the Video RAM is only 512MB (smaller than others), the “Max users per card” is high.  That translates to having a very high “Max users per host,” at 128 users.   If there were 128 users per host, the maximum system RAM for each VM would only be 4GB on a host with 512GB of RAM. (I rounded down to 500 GB in the calculations to account for Dom0’s RAM usage that is not available to the VMs).   This RAM level falls below my previously noted level of 8GB for this scenario so I flagged it in red to note that it is insufficient. Depending on your use case though, a smaller 512MB vGPU profile might match up well with a VM with lower system RAM.

The “CPU over-subscription ratio” column in the same row is also marked in red. This configuration gives a 13x vCPU over subscription. That’s also flagged in red for being too high. To clarify, this means that there are 13 times more vCPUs allocated than there are logical cores on the physical host. I say “logical cores” to account for Hyper Threading. In this example, the host has 2 processors with 10 physical cores each. Multiply that by 2 for Hyper Threading and you get 40 logical cores per host.

My advice is to do similar calculations to see how the densities stack up in your environment.  (The calculations for the M6 card on a blade chassis was not performed as I am focusing on the M60 card.)

2_masterChart

GRID 2.0 License Requirements:
GRID 2.0 requires concurrent user licenses and also an on-premises NVIDIA license server to manage the licenses. When the guest OS boots up, it contacts the NVIDIA license server and consumes one concurrent license. When the guest OS shuts down, the license goes back into the pool.

GRID 2.0 also requires the purchase of a 1:1 ratio of concurrent licenses to “SUMS” or “Software Updates Maintenance and Support.”

Contrast this to GRID 1.0 where there were no user licenses required. Only the purchase of the cards was needed. However, the SUMS requirement in GRID 2.0 gives you 24×7 phone support where GRID 1.0 mainly relied on NVIDIA solutions architects to assist with issues.

Another plus to this licensing model is that the perpetual user license will still be valid for GRID 3.0 boards when they are released in the future.

During NVIDIA’s GTC conference in 2016 the revised license and SUMS model for GRID 2.0 were released. Originally, there were three software editions (user licenses) noted below.

“Virtual PC”
“Virtual Workstation”
“Virtual Workstation Extended”

NVIDIA has now combined the features in “Virtual Workstation” and “Virtual Workstation Extended” into just “Virtual Workstation.” The “Virtual Workstation Extended” name is now deprecated. You may see a number of previous articles online that show all three editions so I want to call this out to avoid confusion. The chart showing the revised software editions and features is below.

There were also two levels of SUMS. They were “Basic” and “Production.” “Basic” has been dropped so now the only option is “Production” and that has 24×7 phone support.

I ran quite a number of pricing scenarios comparing GRID 1.0 to various combinations of GRID 2.0 license and SUMS levels and it was relatively complicated to get apples to apples. I am very happy that this is now simplified. Thank you, NVIDIA!

Let’s dive into the above two software editions. The lowest level of software edition, “Virtual PC,” does not include the “NVIDIA Quadro Software Features.”    This is similar to the K100 and K200 profiles in GRID 1.0.  Those profile names did not end in “Q” like the other Quadro-certified GPU profiles did.

What is a Quadro certified driver?  Quadro drivers allow applications to hook into the API’s of the GPU. Non-Quadro drivers give basic functionality. My understanding is that the “certified” part of the term means that NVIDIA has tested a number of applications that use these API’s to ensure that they work correctly.

The other software edition is “Virtual Workstation.” This license option really has all the other advanced features like passthrough, Linux support, CUDA vGPU support, and has the “Quadro” certified drivers I just mentioned. See the “Simplified NVIDIA GRID Software Editions” chart below for a complete list of features.

Now that I’ve told you how the license model has been simplified, here are a couple of additions to it. I think both are welcome though.

First, there is also a “Virtual Apps” software edition. This is for a XenApp and Horizon Server OS deployments. It has a lower price point so I think this is a positive. The Virtual Apps edition does not have the Quadro Software Features. It only has the basic driver set. You can publish a full desktop as well as a published application with this license.

Second, there is an additional purchase option called “Annual Subscription,” that combines the perpetual user license and SUMS into one yearly fee.   So instead of buying a perpetual license once, and paying yearly SUMS renewals, this new structure just pays one yearly renewal fee with everything combined. This is pretty attractive to lower the cost of getting your GRID deployment off the ground in the first year. Customers can purchase the annual subscription for three years at once.

The chart below compares the different license structures. Also, notice the references in the chart to what the MSRP price “was” when GRID 2.0 was released originally. The prices are quite lower now!

Software Edition

Perpetual Concurrent License MSRP

Yearly “SUMS” MSRP

“Annual Subscription” MSRP

Virtual Apps

$20

$5

$10

Virtual PC

$100 (Was $375)

$25 (Was $94)

$50

Virtual Workstation

$450 (Was $6,000)

$100 (Was $1,500)

$250

One other change to the license model is that the on-premises license server no longer does enforcement. It will not prevent another user from coming online or degrade the graphics if it exceeds the purchased licenses. The idea is that the license server is more for usage reporting and capacity planning.

Software Editions (Concurrent User Licenses):



Support, Updates, and Maintenance Subscription Details:

Hardware Details:
As I mentioned above, the M60 is a PCI card similar to the GRID 1.0 K1 and K2 cards.  It fits inside a rack-mount server.  The M6 card has a “MXM” form factor, which is much smaller and is designed for blade servers.   I am focusing on the M60 card as it’s a closer comparison to the GRID 1.0 cards.

The M60 board comes in an actively and passively cooled model. The passively cooled model is 300 W and relies on the host chassis’ fans. The actively cooled model comes in 240 W.

Here’s how the four GRID cards compare in terms of power requirements:

K1: 130 W
K2: 225 W
M6: 100 W or 75 W (see further explanation below on 75W and 100W power draw)
M60: 300 W passively cooled, 240 W actively cooled. Passively cooled can also be configured for 225W.

As noted above, the M6 card has both 100W and 75W power draw. The documentation actually shows more detail and that’s listed below. TGP stands for “Total Graphics Power-draw.” It makes sense that the higher clock frequencies would draw more power as shown below. But why are there both 75W and 100W in both the “Base” and “Boost” modes? It’s because the M6 card is designed for blade servers and the OEMs are given the option to configure the power draw of the card to suit their configuration needs.

Base:
722 MHz (TGP: 75 W)
950 MHz (TGP: 100 W)

Boost:
886 MHz (TGP: 75 W)
1051 MHz (TGP: 100 W)

If you want to dig into the power settings further, you can run the “nvidia-smi –q” command on the hypervisor to see the below output. These power stats are specific to one GPU and not for the entire board. The “Max Power Limit” is not the actual realized power limit. The realized power limit is the “Enforced Power Limit.” This enforced limit is set in the firmware and it keeps the power for the entire board under the board’s limit. For example, the “Max Power Limit” on one physical GPU on the M60 card is 162W. If you double that for the two GPU’s on the board, you’d get 324W, which exceeds the board’s maximum of 300W.

K2:
PowerK2

M60:
PowerM60

Below are the three M60 variations of the hardware:

PG402 SKU 40 (Left-to-right passive airflow)

PG402 SKU 60 (Right-to-left passive airflow)

PG402 SKU 80 (Right-to-left active airflow)

Be careful when purchasing and installing these cards because a card with the wrong airflow direction will fit and power-up just fine in the wrong bay! I’m sure you’d run into cooling and performance problems once in use though.

Below is an additional example of some of the output of hardware stats from the “nvidia-smi –q” command. If you are worried that you put the cards in the wrong bays for cooling (I’d open the chassis back up and take a look!) but you could start your investigation by comparing the current GPU temperature to the “Slowdown” and “Shutdown” temps.

Below are the thermal specifications from the NVIDIA M60 Product Brief. They mainly reiterate the power specs from the command line output below, but they do add the “GPU Maximum operating temperature” and show that the GPU’s will “slowdown” to 50% of normal speed if the “GPU slowdown temperature” is reached.

The cabling requirements have changed from GRID 1.0. “As an example, you can see the K1’s power cable port with 6 pins. Contrast that two the M60’s 8-pin power cable port in the second picture.

K1’s 6-pin Power Port:

M60’s 8-pin Power Port:

If you are not ordering the GRID cards included with a server from the OEM, then make sure you’ve investigated your power cabling needs. The M60’s compatibility chart is below. In the SuperMicro 1027GR-72R2, I used a dual PCIe 8-pin to CPU 8-pin cable adapter as referenced in the “Note” area of the screenshot. The “PCIe 8-pin” side of the cable plugs into the motherboard and the “CPU 8-pin” side plugs into the M60 card.

I’m not going to write much about the riser cards and mounting brackets because they are even more boring that the power connectors. I’m afraid you’ll stop reading the article. Anyone still there . . . . . .?

BRIEFLY: The GRID cards need special riser cards to properly connect to the motherboard. You will need special brackets to mount the GRID card to the riser card. Consult with your OEM for the specific riser cards and bracket. Or, just order the entire chassis full of GRID cards from the OEM and they’ll take care of all this for you.

To wrap up the first half of this article, NVIDIA has made some compelling improvements from their 1.0 product like CUDA support, Linux support, an increase in GPU cores, faster GPU cores, and more video RAM! With the revised license model and lower costs, now’s the time for 2.0! Also, NVIDIA will stop manufacturing GRID 1.0 cards by the end of 2016. OEM’s may have inventories of the cards after that point but I would contact your OEM to determine the last date they will accept GRID 1.0 orders. If you have an existing GRID 1.0 environment, start evaluating 2.0 for a path forward.

Complete Hardware and Software Setup of the Environment:

Okay, ready to get on with the actual setup? Here we go!

Hypervisors hosting the GPUs should have their BIOS’s tuned for high performance. If you want to learn more about the recommended BIOS settings, please read these two sections listed below in my previous article.

“Power Settings in XenServer 6.5”
“Step by Step Installation and Configuration Instructions”
http://blog.itvce.com/2015/02/23/initial-experiences-with-xenserver-6-5-and-nvidia-grid-using-hp-dl380-gen9-hardware-guest-blog-post-by-richard-hoffman/

These instructions are based on an M60 card installed into XenServer 6.5 and use XenDesktop 7.6 as the VDI broker.

Physical Installation of M60 Card
-Install the card in the host and note the direction of the airflow indicator. A passively cooled M60 card is shown below and the airflow indicator is circled in red. The passively cooled M60 card relies on the host’s fans to do the cooling. The fans will blow air into the side with the “Tesla” logo and exit on the side with the airflow indicator. As I mentioned earlier, it’s possible for an M60 card to fit and power up normally in a bay where the chassis fans are blowing air into the exit side. You would probably run into heat and performance issues so make sure you get it right.

-Below you can see a SuperMicro SuperServer 1027GR-72R2 chassis. This model is actually not on the HCL for 2.0 but still gives a good example of the layout of GRID cards in a rack mounted server. You can check the HCL for all GRID 1.0 and 2.0 cards here: http://www.nvidia.com/object/grid-certified-servers.html

This model can accept up to three K2 cards. The bays for the three cards are noted below. In front of each GPU card bay are the fans to cool the cards. The fans are bordered in red.

There are specific riser cards needed for the GRID cards so check the GRID documentation or discuss with your OEM. The GRID cards plug into the riser cards and the riser cards plug into the motherboard.

Registration & Download
-Register yourself at nvidia.com/grideval to get a license code and download the software. The three packages that you’ll need for a XenServer setup are shown in the screenshot below. In the order shown below, they are:

-The GPU Mode Switch utility changes the cards from the default “Compute” mode to “Graphics” mode.
-The NVIDIA GRID Manager software installs on XenServer. The NVIDIA drivers/software that install into Windows are also in this folder.
-The GRID license server installer

License Server:
-The following instructions are for the Windows version of the license server and are installed on Windows Server 2012 R2. (There is also a Linux version of the license server.)
-The GRID 2.0 license server requires Java version 7 or later. Go to java.com and install the latest version.
-Run “Setup.exe,” shown below, that was downloaded earlier. It is in the “NVIDIA-ls-windows-2015.12-0001” folder.

-Click “Next.”

-Accept the license agreement and click “Next.”

-Accept the Apache license agreement and click “Next.”

-Choose the desired installation folder and click “Next.”

-The license server listens on port 7070. This port must be opened in the firewall for other machines to obtain licenses from this server.
-Check the “License server (port 7070)” option.

-The license server’s management interface listens on port 8080. If you want the admin page accessible from other machines, you will need to open up port 8080.
-Check the “Management interface (port 8080)” option.
-Click “Next.”

-The “Pre-installation Summary” and “Repair Installation” options automatically progressed for me without user input.

-Once complete, press “Done.”

-You now need a license file that is generated from https://nvidia.flexnetoperations.com. Login to this site with the credentials setup during the registration at nvidia.com/grideval.
-Once logged in, click on “Create License Server.”
-Enter the fields as shown below. The MAC address of your local license server’s NIC should be entered into the “License Server ID” field.
-Leave the “ID Type” as “Ethernet.”
-The “Alias” and “Site Names” are friendly names of your choosing.
-Click “Create.”

-Click on the “Search License Servers” node.
-Click on your License Server ID as shown below, surrounded in red.

-Click “Map Add-Ons” and choose the number of license “units” out of your total pool to allocate to this license server.

-Once the “add-ons” are “mapped,” the interface will look like the below, showing 128 units mapped, for example.

-Click on “Download License File” and save the .bin file to your license server.
**Note that the .bin file must be uploaded into your local license server within 24 hours of generating. Otherwise, a new .bin file needs to be generated.

-On the local license server, browse to http://<FQDN of the license Server>:8080/licserver to display the License Server Configuration page.
-Click “License Management” in the left pane.
-Click “Browse” locate your recently download .bin license file. Select the .bin file and click “OK.”
-Click “Upload” and the message in red, “Successfully applied license file to license server” should appear.

Graphic Mode Configuration
-The Tesla M60 cards come optimized for High Performance Computing and are in “Compute” mode. The below collection of settings defines “compute” mode.

These optimization for High Performance Computing can cause compatibility problems with OS’s and hypervisors when the GPU is used primarily as a graphics device. Therefore, we need to change the GPUs to “graphics mode.” The “graphics mode” settings are shown below.

Change to “Graphics” mode:
-Unzip the “NVIDIA-gpumodestich-2015-09” package to see the contents shown below.

-Use a tool like WinSCP to copy the “gpumodeswitch” file (the one shown without a file extension) to the XenServer.

-Navigate to the directory where the gpumodeswitch file was placed.

-If you try to run the file, as shown below, it will likely give a “Permission Denied” error.

-To overcome the permissions error, type “chmod 777 gpumodeswitch” and hit Enter, as shown below.

-Then run the following command, which is also shown in the screenshot below.

./gpumodeswitch –gpumode graphics

-Enter “y” and press “Enter.”
-It will confirm “Updating GPU Mode of all eligible adapters to ‘graphics.'”

-There is a quite a bit of command output between the above screenshot and where it finishes below. It will confirm success and note the need for a host reboot. Reboot the host.

Installation of GRID Manager on XenServer 6.5 Host:

These instructions are written for engineers familiar with XenServer and using the command line.

-Locate the NVIDIA-vGPU-xenserver-6.5-352.70.x86_64.rpm file in the software packages downloaded earlier. Screenshot below.
-Using WinSCP, or similar, copy this installer to a directory on the XenServer.

-Once the .rpm installer is copied to the host, navigate to the directory containing the installer, and run the following commands, also shown in the screenshot below.

rpm –iv NVIDIA-vGPU-xenserver-6.5-352.70.x86_64.rpm
shutdown –r now

-Once the host has rebooted, verify that the GRID Manager software has installed correctly. Run the following command and it should return output similar to the screenshot below.

lsmod | grep nvidia

-Run the below command as an additional check that the .rpm package is installed.

rpm –q NVIDIA-vGPU-xenserver

-Run “nvidia-smi” and verify that the two physical GPU’s are listed.



Attach a vGPU Profile to a Virtual Machine:
-Launch XenCenter and connect to the XenServer 6.5 host where the M60 card is installed.

-Find the virtual machine that will be used as the master image of your XenDesktop Machine Catalog.
-Ensure that XenTools is installed in the VM if not already done.
-Right-click on this virtual machine (to be used as your master image) and choose Properties. (The VM should be powered off during this operation)
-Click the GPU node in the left pane of the VM properties window.
-Select a GPU type from the “GPU type” drop-down menu and click “OK.”

-Power up the VM and logon.
-Install the NVIDIA guest OS driver set. Use the appropriate driver for your operating system out of the selection below. The below files are within the folder, “NVIDIA-GRID-XenServer-6.5-352.70-354.56,” that was downloaded earlier.

-Run the installer and unzip to your desired location.

-Accept the license agreement and continue.

-Choose “Custom” and click “Next.”

-Check to “Perform a clean installation” and click “Next.”

-Click “Close” and reboot the desktop.

-After the reboot, log back in, open Device Management, and confirm the GRID card is present under Display Adapters. It should reflect the GPU profile that you assigned. Below the “M60-8Q” profile was selected.

License the Virtual Desktop(s) for NVIDIA GRID:
-Right-click on the desktop screen of your master-image virtual machine and choose “NVIDIA Control Panel.”

-Click on the “Manage License” node.
-Enter the FQDN of the local NVIDIA license server created earlier.
-Enter the port number: 7070
-Note if the green check-mark appears, noting “Your system is licensed for GRID vGPU.” If the license process is not successful, it will advise that in the same area.


Configure XenDesktop 7.6:
These instructions assume that you already have a running XenDesktop environment with a host connection from XenDesktop to XenServer. At this point we are just adding the “Connection and Resources” to the existing Host Connection. “Connection and Resources” contain the storage, networking, and GPU selections. If you need to create a completely new XenServer Host Connection there are plenty of XenDesktop setup documents online.


-In your existing XenDesktop environment, open Citrix Studio, and navigate to “Configuration” and then “Hosting”
-Right-click on “Hosting” and choose “Add Connection and Resources.”

 

-Again, this assumes that you already have an existing Host Connection to your XenServer. So, choose your desired “existing connection.”
-Click “Next.”

-Enter your desired name for these resources.
-Choose the network that your GPU-enabled virtual desktops will run on.
-Click “Yes” to enable “Graphics virtualization.”
-Choose the GPU profile that you want to deploy.
-Click “Finish.”

-Return to the left pane of Citrix Studio, and right-click on “Machine Catalogs” and choose “Create Machine Catalog.”

-Choose “Windows Desktop OS” and click “Next.”

-Choose “Machines that are power managed. . . “
-Choose “Citrix Machine Creation Services (MCS)” unless you are doing this with another method.

-Choose the settings that makes sense for your environment. You can see my selections below.

-Select the gold image VM that you previously prepared with the NVIDIA GRID Drivers.
-Click “Next.”

-Choose the desired number of VMs to provision and the vCPUs and RAM.

-Choose to “Create new Active Directory accounts.”
-Choose the desired OU.
-Enter the desire account naming scheme.
-Click “Next.”

-Enter the desired “Machine Catalog name.”
-Click “Finish.”

-Navigate back to the left pane of Citrix Studio.
-Right-click on “Delivery Groups” and choose “Create Delivery Group.

-Click “Next.”

-Choose the Machine Catalog that you just created.
-Choose the number of machines for this Delivery Group.
-Click “Next.”

-Choose “Desktops”
-Click “Next.”

-Click “Add” and then select the user or group to give permission to the Delivery Group.
-Click “Next.”

-This step is discretionary but I usually choose “Automatically. . .” and then check the StoreFront URL.

-Name the Delivery Group as desired.

-You are done! Launch a published desktop now from your XenDesktop StoreFront page. I recommend checking the Display Adapters in Device Management to ensure the GRID card is present. Also, ensure that the virtual desktop is getting a license from the GRID license server. Perform this verification using the earlier steps in this document to configure licensing within the master image. If you are not getting a license, your graphics performance will be noticeably poor.

GRID Manager Removal From XenServer:
How do you remove the GRID 2.0 Manager drivers from XenServer? You may need to do this for troubleshooting, reinstallations, etc. First, you need to determine the name of the package. For those of you that remember how to do this in GRID 1.0, that command his below for reference sake. However, it is slightly changed in GRID 2.0.

GRID 1.0 Command (reference sake only):
rpm –q NVIDIA-vgx-xenserver

GRID 2.0 Command:
rpm –q NVIDIA-vGPU-xenserver

You can also use the below command as an alternative method.

rpm –qa | grep NVIDIA

Now that you can see the installed package name, you can run the following command to uninstall the GRID Manager:

rpm –e NVIDIA-vGPU-xenserver-6.5-352.70

That’s it!  Let the good times roll and please leave comments!

-Richard