Cisco UCS – Cisco Server Computing takes virtualisation a step further

I recently implemented a new Cisco UCS environment using Windows Server 2016 Hyper-V, managed by a MS VMM 2016 and SCCM Current Branch management environment. This was my first introduction to the Cisco UCS platform.

The Hardware

On first look it appeared to be just another blade enclosure environment.

ucs-5108-blade-server-chassis

In addition to the standard blade chassis, the Cisco UCS environment also requires external management controllers called Fabric Interconnects. This is where all the intelligent for the environment sites and can manage multiple chassis.

UCS-6248up-48-port-fabric-interconnect

While the fabric interconnects can be installed as a single unit I can’t see why anyone would ever want to do this and so you can cluster multiple units. These are also not just the management controller for the environment but also the conduit for all external communications.

These are active/passive management clusters so just be aware that a management outage occurs when the active role changes. Blade traffic will continue to route as this uses both the active and passive nodes at all time. If a fabric interconnect goes offline then it will just mean that some of the paths are no longer available. As long as you have paths for all services via all fabric interconnects and the servers are configured correctly you won’t experience any issues.

There’s a few caveats there but unfortunately it is possible to install these units badly. This is not a unit to plug in quickly without any planning.

Connectivity

Cisco have produced validated designs which give step by step documentation to install an environment with specified hardware. The Windows 2016 Hyper-V with VMM validated design uses the following hardware:

  • UCS Blade Chassis
  • UCS Standalone Servers
  • UCS Fabric Interconnects
  • Nexus Switches
  • MDS Fibre Channel Switches
  • NetApp SAN

Put together this gives the following physical design

Cisco UCS Networking Design

It is absolutely possible to drop the MDS switches in this design and use the Nexus switches to provide the FibreChannel connectivity. Also worth noting is that in this design the NetApps are used for both FC and iSCSI/SMB storage thus requiring the connection to the Nexus switches.

Each blade chassis is connected via multiple connections to both fabric interconnects. This will provide all external connectivity including network and storage access as well as the management, which we will go into later.

Each port on the fabric interconnects will then be configured as either a server, network or FC port. Server ports will be used to discover chassis and standalone alone UCS Servers. Network ports will be configured using network templates for external connectivity.

FC ports can not be directly specified but are instead limited to a number of ports which are located in a location which differs depending on the fabric interconnects that you are using. The UCS 6248s that I used required the FC ports to be located at the top end of the ports on each fabric interconnect. If you wanted to have 2 FC ports per Fabric Interconnect then these ports would be assigned to port 31 and 32 on each unit.

The Virtualisation magic

This is reasonably standard so far so why did I say that it takes virtualisation a step further?

Well each server does not get directly configured. In fact Cisco would rather you forget that you even had servers and rather just think about resources.

Before you do anything you need to configure the external network configuration and external FC configuration as well as discover your servers.

Then everything is based on templates and service profiles. While it is possible to create a server from scratch without any templates this is not encouraged and would likely result in a giant mess. Instead you need to go through and create templates for everything.

You need to start with the addresses you will be using. This includes MAC addresses, FC Addresses, UUIDs. Next you need to create policies for the boot order, BIOS settings, power settings, and network configuration.

Then you need to configure all vLANs, vSANs which can then be assigned to vNICs and vHBAs which also have adapter configuration.

Then you need to create pools of servers which will be used to assign the configuration.

Next you create the service templates which takes all of the above information and creates a configuration template. You then assign this template to a server pool.

Finally you can configure your servers by deploying the service templates to your server pools. This will give the server a base name as well as a starting number which it will increment.

You would think that this would result in blade 1 in chassis 1 being assigned the first template but cisco really don’t want you to think that much about it. It will assign each service profile where-ever it sees fit. If you really need to know where the server is located physically then you can look it up but it’s definitely not front and centre.

Each blade will end up with what appears to be a physical NIC which is in fact the vNICs defined in the template as well as FCoE adapters to match the vHBA configuration.

Sounds like a lot of effort. Why bother?

It is a lot of effort up front, but once you’ve got your service templates expanding the environment is quite amazing. This is particularly the case if you also use SAN boot rather than local disk. Have a hardware failure? Just reassign the service profile to another blade in the environment. The server will reboot and be operational with ALL hardware configuration being identical.

Most other blade environments will allow you to switch out a blade, with the new blade having the same FC and MAC address, but this goes so much further. It also saves a trip to the data centre as you can move the configuration to a new slot rather than having to replace the server in the same slot.

Need to install a new chassis? Connect 4 cables and power, discover the chassis, potentially upgrade firmware and then add the new servers to the existing pools. Deploy 8 new servers with the existing service templates. Total time of stuff all.

Throw in the IPMI integration with VMM and you can deploy a new bare metal Hyper-V environment in no time at all.

Need to install a new network card? Sure that’s virtual. Change the service template and trigger a service profile update and all associated servers will now have the new vNIC.

What are the limitations

As so many facebook relationship status’ say. It’s complicated. Particularly when setting it up the first time you are almost guaranteed to beĀ  left scratching your head asking why a template just refuses to deploy. Unfortunately the error messages can be a little vague too with “not enough compute error” and “not enough vNIC/vHBA error” plaguing me during my deployment.

This is definitely not the unit that you quickly install and have operational in an morning with the physical installation being just the start of the deployment process.

The Cisco environment really wants you to let go of where servers are located which can really be intuitive. If you are a bit too obsessive compulsive for this chaos then you can manually deploy each server to a service template and manually assign a name but you just know that someone at Cisco is shedding a tear.

You also have to understand just how much control you are handing over to the Cisco management environment. If you are deploying Hyper-V then you should be looking for how to configure jumbo frames on the physical network adapters. The problem is you just won’t find it on the physical adapters. This is because it’s configured on the vNIC template in the UCS management interface.

There are also still some rough edges in the environment. While vSphere 6.5 supports UEFI boot using secure boot this just wouldn’t work for me and ultimately had to be disabled. This was documented as a bug for the current release at the time.

Is it worth the effort?

As always it depends. If you do just want a quick build for a static environment then this may not be for you. It’s fancy but if the steep learning curve delays the deployment and then it’s never used again it’s a bit of a waste.

I actually really like this hardware for environments that are experiencing change or growth. Everything can be standardised while still allowing for huge growth. No longer will you have 5 different chassis configuration depending on the engineer assigned to the build.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *