Migrate Windows Failover Clusters Between Domains

There are numerous tools that can be used for migrating servers between domains but what happens when you have invested in Windows Failover Clusters? If you’re right up to date with Windows Server then Microsoft can now help you out with a new domain migration process at https://docs.microsoft.com/en-us/windows-server/failover-clustering/cluster-domain-migration. This will likely help a couple of people but for everyone else you’re left with the options to either first upgrade to Windows Server 2019, or to do it the manual way.

In Microsoft’s words the manual way “involves destroying the cluster and rebuilding it in the new domain.”

OK so that’s not as bad as it sounds at first. As long as you have a process and make sure you document a few things it should go reasonably simply.

The not so bad process

The migration process involves shutting down all cluster processes and removing them from the cluster. This will then allow the cluster to be destroyed. Once you’re at that point you can migrate the individual servers to the new domain. Finally you re-create the cluster in the new domain and set up all resources again.

This might sound onerous, and it may be depending on the type of Windows Failover Cluster that you have, but some workloads are easier than others. For instance if you’re running a Hyper-V cluster then you can easily remove all VMs so that they are no longer clustered. If on the other hand this is a SQL cluster, well that’s a little more difficult. But let’s look at the example of a Hyper-V cluster to see how this process will be achieved.

Now Failover Clusters are a real special snowflake with no two being alike so first you will need to sit down and get to know your particular version.

You will also need to think about other features such as is it managed by VMM? What types of storage do you have? What type of Quorum do you have? What features are you using? Start by documenting it all if you don’t already have this.

Minimum Windows Failover Cluster Documentation

As a cheat sheet this is the minimum information that you will need to migrate a Windows Failover Cluster between Domains.

  • Cluster Name

  • Cluster IP

  • Cluster Quorum Type

  • Cluster Networks (Include name and IP ranges)

  • Cluster Quorum Disk/Location

  • List all Cluster Disks (Include Name, Disk number, disk letter or mount point)
  • Of particular note be very aware of the cluster disk mount location. This is something that is easy to forget about but is hard to discover after the fact. If you get this wrong then the new cluster won’t be able to find the VM information or components leading to a Off-Critical state.

    Migration Time!!

    Once you have this all documented you can move on to the actual migration. Below are the minimum steps that will need to be performed:

    1) Disable the computer account of the cluster in the destination domain if it exists
    2) Stop and delete any cluster specific services (Replication Brokers etc)
    3) Shut down all VMs
    4) Set VM startup to manual
    5) Remove all VMs. Check that they are located on a single node (This is removing the VMs from the cluster and not deleting the actual VMs)
    6) Move all disks to a single node
    7) Evict one node from the cluster
    8) Remove all CSV disks
    9) Destroy the cluster
    10) Change DNS on all cluster nodes to use the new domain’s DCs
    11) Change the domain membership of all cluster nodes
    12) Create new cluster on one node
    13) Run the cluster validation
    14) Insert old cluster name and IP address
    15) Add Quorum Disk and configure quorum settings
    16) Add additional disks in the correct order so that the CSV volume names are correct. Correct manually if required.
    17) Check that Hyper-V machines are now Off and not off-critical
    18) Create the Quorum disk for the cluster
    19) Add the subsequent nodes to the cluster making sure to run the cluster validation again
    20) Test moving the cluster name between nodes
    21) Test moving all disks between nodes
    22) Import VMs
    23) Start VMs
    24) Configure Auto-start for VMs
    25) Test moving VMs between nodes

    In practice I’ve scheduled a total cluster outage of two hours for this type of migration but remember that your special snowflake may require more time.

    VMM bare metal build fails due to no matching logical network

    If you deploy a new VMM bare metal build environment you may face an issue where the deployment fails with Error (21219).

    The error description doesn’t appear to make a lot of sense either stating:

    IP or subnet doesn’t match to specified logical network.

    Recommended Action
    Specify matching logical network and IP address or subnet.

    If you face this, then you’ve likely checked and double checked all of your VMM networking and everything looks fine.

    What’s happening?

    This issue is caused by the build process looking up the IP address that you have allocated to the new server, and comparing this to the logical network. When it does this it’s finding that the IP address doesn’t match the subnet configured on the logical network .

    But you’ve checked this already and it DOES. You checked it again now just in case and it still does, so this can’t be your problem right!?!?!

    Well no.

    You see VMM appears to not just be looking at the logical network for a match, but more specifically the first network site that was created on this logical network. Now in most cases you created the management network first so no big deal. But if you didn’t create it first, or you deleted it for some reason and recreated it then it will no longer be the first created site.

    You’ve got to be kidding me. But it’s easy to fix?

    You would think so but notice that there are no re-order buttons on the subnets?

    This means that the only way to “reorder” them is to delete all of the network sites and recreate them. And if you have created any VM networks or bound them to any other configuration object then you’ll be even happier to know you will need to undo all of this configuration too.

    Hopefully you’re deploying a new cluster and not deciding to deploy bare metal build to an existing one.

    In case you think you must have missed this somewhere, it isn’t stated in the documentation. So is it a bug or a feature?

    Either way just remember to create the host management subnet first in future.

    Unable to delete Hyper-V Host in VMM due to SQL statement failure

    I had an odd failure when deleting a Hyper-V server from VMM 2016. The job failed with a very generic Error 20413.

    So the next step was to check the log file which gave me an unexpected error.

    ——————- Error Report ——————-

    Error report created 4/26/2018 7:29:26 AM
    CLR is not terminating

    ————— Bucketing Parameters —————


    SCVMM Version=4.0.2244.0
    SCVMM flavor=C-buddy-RTL-AMD64
    Default Assembly Version=4.0.2244.0
    Executable Name=vmmservice.exe
    Executable Version=4.0.2244.0
    Base Exception Target Site=140717336435616
    Base Exception Assembly name=System.Data.dll
    Base Exception Method Name=System.Data.SqlClient.SqlConnection.OnError
    Exception Message=Unable to connect to the VMM database because of a general database failure.
    Ensure that the SQL Server is running and configured correctly, then try the operation again.
    Build bit-size=64

    Great!! The service can’t talk to SQL I thought but this message was also a little deceiving and the next section was actually more important.

    ———— exceptionObject.ToString() ————

    Microsoft.VirtualManager.DB.CarmineSqlException: Unable to connect to the VMM database because of a general database failure.
    Ensure that the SQL Server is running and configured correctly, then try the operation again. —> System.Data.SqlClient.SqlException: The DELETE statement conflicted with the SAME TABLE REFERENCE constraint “FK_tbl_WLC_VHD_VHD”. The conflict occurred
    The statement has been terminated.

    Again they bury the lead. The first part again goes on about not being able to talk to SQL but then they give you the actual issue. “The DELETE statement conflicted with the SAME TABLE REFERENCE constraint “FK_tbl_WLC_VHD_VHD”. The conflict occurred. The statement has been terminated”

    When VMM is trying to delete the server it’s hitting an issue due to references in the “FK_tbl_WLC_VHD_VHD” table. This is blocking the deletion of the server object.

    I found that there were some mentioned that this may be due to the server belonging to a cluster, which it was, and that VMM may take some time to clean up the reference. Well this server had been removed from the cluster almost 12 hours ago so I doubted that just waiting longer would do it and decided to clean up the table.

    This appeared to be caused by some orphaned objects that were still in the database as being present on the host even though they were long gone. These existed in the  tbl_WLC_PhysicalObject table.

    VMM uses GUIDs to refer to objects in the database so I first needed to get the GUID for the server which could then be used to target these entries. This was simple with powershell.

    (Get-SCVMHost Hyper-V-Server-Name).ID

    We then pick up the GUID and insert it into the following SQL query after a quick DB backup.

    DELETE FROM [tbl_WLC_PhysicalObject] WHERE [HostId]=’VM-Host-GUID’

    Finally back to VMM Powershell and delete the Hyper-V server again. My Hyper-V server was already off the network so I used a -force to just remove the database references.

    remove-vmhost Hyper-V-Server-Name -Force

    This time the job succeeded.

    VMM Hates SAN Groups Or How To Kill Your Cluster

    A really nice feature of VMM is that you can integrate it with any SAN with an SMIS interface and then perform storage tasks, such as adding disks or even deploying VMs based on SAN snapshots. In fact if you set up an SMIS SAN many standard tasks will be updated to include SAN activities. This is where things start to go off the rails.

    You see most SANs will use groups to manage access to LUNs. This way as you add a LUN you only have to add it to a single group and then all servers can see it.

    Well VMM doesn’t work this way. It thinks in terms of servers. You’ll see this if you add a new LUN from VMM. It will map each server to the LUN rather than adding any obvious group. That’s fine you might think but things get nasty when you try to remove a server’s access.

    You see VMM may not add servers to groups but it absolutely knows enough to do some serious damage. If you remove a server from a cluster then part of the job is to remove the cluster disk access. This will not only remove any direct access published to the server but also remove any groups that the server is also a member of. This has the side effect of removing all disk access to any other server also a member of the same SAN group. Effectively removing all SAN disks from all cluster nodes.

    I first saw this with a SAN that I had never used before and just thought that it might be a bug in this vendor’s SMIS implementation but have recently seen the same behaviour with a totally different vendor.

    So in short, groups make a heap of sense from the SAN point of view, but if you are going to use SMIS with VMM then ONLY assign servers to the LUNs.

    VMM Bare Metal Builds and why you should use a Native vLAN

    VMM Bare Metal Builds are an amazing way to ensure that your Hyper-V servers start out consistent. It’s a bit magical but part of that process just works better when you use a native VLAN. But why is that the case?

    First let’s look at the VMM Bare Metal Build process.

    1. The VMM Server connects to the hardware management interface and instructs the server to reset. This is immediate and if you specified the wrong hardware management address, well congratulation you just rebooted a server.
    2. The new server being rebuilt goes through it’s boot process. Hopefully you have it configured to PXE boot. This will get a DHCP address and then request a PXE server to respond
    3. The WDS server receives the PXE boot request and checks with the VMM server to see whether this request is authorised. If it is then it responds to the request and send the WinPE image
    4. The new server loads the WinPE operating system and connects to the network. This network connection is a brand new network connection and is in no way connected to the PXE boot. You’ve just booted into an OS after all
    5. The new server runs the VMM scripts to discover the hardware inventory and then send this to the VMM server
    6. Once the admin inputs the required information (New server name and possibly network information) the new server begins the build process by cleaning the specified disk and downloading the VHDX image.
    7. The new server then reboots. This time the server is not authorised to PXE boot so proceeds to boot off the new VHDX boot image.
    8. The new server then customises the sysprepped operating system including any static IP address you provided and performs any additional customisation required by the VMM build process (ie. Adding the Hyper-V and MPIO role and installing the VMM agent).
    9. You should now be left with a server on the network using the configured network settings.

    There are a few things to note here. Each time that the server uses either PXE or boots into WinPE it’s reliant on finding a DHCP server. If you’re using port-channel network connections, and very few people are not now, then how is this request going to work? It needs to know what vLAN to tag the request with.

    Now you can configure most servers in the BIOS to PXE boot with vLAN tagging and that’s great. Now you have your WinPE image. How does WinPE know about the port-channel. This will be dependent on the NIC driver for your server. Is it even possible to modify it so that, when the driver is loaded, it automatically uses vLAN tagging with the correct vLAN ID. It’s possible but something else that needs to be managed. If VMM updates the WinPE image then you need to reconfigure it again.

    Next when you boot off the VHDX this also needs be configured with the correct vLAN ID. Now I have to admit I have never got to this stage since the NIC driver in WinPE has always been a blocker for me but is VMM able to set the correct VLAN ID? You absolutely need to tell VMM what network switch to use and what logical network but does this mean that it will set the VLAN ID correctly. If it doesn’t then this is again another blocker.

    So as you can see it may be possible to use vLAN tagging throughout the VMM Bare Metal Build process but sometimes you need to look at whether it’s worth the additional overhead. From managing the server BIOS, to the WinPE drivers and configuration, and the OS customisation. There’s a lot going on with this process and everything needs to work perfectly to result in a fully built server. Is it worth the additional overhead just to avoid setting a network as the native vLAN.

    Windows Core Hyper-V Setup Using PowerShell

    In a previous post I gave some sample powershell commands to get a Windows Core server configured with the Hyper-V role and with some base networking. Let’s have a look at that script and what it does.

    install-windowsfeature -name Hyper-V, Data-Center-Bridging, FailOver-Clustering, multipath-IO, hyper-v-powershell, rsat-clustering-powershell, rsat-clustering-cmdInterface, rsat-datacenterBridging-lldp-tools

    First up we need to install the features that we need for the server. Notice that we really need to install the powershell management tools to do much locally. Yes you can absolutely get away with running all commands remotely but there are some changes, like networking, that you might still want to be local for.

    new-netlbfoteam -Name “Switch1”-TeamMembers “vNIC1”, “vNIC2” -loadbalancingalgorithm HyperVPort

    Next we’re going to create a Load Balance and Failover Network Team. This is the older style Windows 2008/2012 network team and you could change this to the new style team if you really want to.

    new-vmswitch -Name “VMSwitch1” -NetAdapterName “Switch1”

    This part is easy. We need to create a Hyper-V switch which will be connected to the network team we created in the previous step.

    add-vmnetworkadapter -name “HV-Mgmt” -switchname “VMSwitch1” -managementos
    add-vmnetworkadapter -name “HV-CSV” -switchname “VMSwitch1” -managementos
    add-vmnetworkadapter -name “HV-LM” -switchname “VMSwitch1” -managementos

    Now we can create some virtual network adapters for the Hyper-V host to use. In this case we have a vNIC for Management, CSV Disk Management, and Live Migration. These adapters are all virtually plugged in to out virtual switch.

    set-vmnetworkadaptervlan -vmnetworkadaptername “HV-CSV” -vlanid 2 -access -managementos
    set-vmnetworkadaptervlan -vmnetworkadaptername “HV-LM” -vlanid 3 -access -managementos

    We don’t want to have these three separate network cards just for the sake of it, they need to be on different networks to isolate the traffic. So here we configure them with different VLAN IDs. These need to have been configured on the network switch that the Hyper-V server plugs in to.

    So why don’t we have a VLAN ID for the management vNIC? Well you really want to be able to perform bare metal build of the Hyper-V servers using VMM and while it’s possible to do this with VLAN tagging on the management adapter it’s far easier without this. By enabling the management network as the native VLAN on the Hyper-V server port any untagged traffic will be put into the Hyper-V Management VLAN. This will allow the server to PXE boot and load the WinPE environment without using a VLAN ID. The other side of this is that once you are in Windows you still don’t use the actual VLAN ID. Just leave it blank.

    New-VMSwitch -Name “VM-Switch2” -NetAdapter “vNic3″,”vNIC4” -enableembeddedTeaming $true

    Since we want to be fancy and use the new Windows 2016 Switch Embedded Networking for the VM networks the next team is created a different way. We don’t need to create the Network Team first it’s all managed in Hyper-V Networking.

    Get-NetAdapterAdvancedProperty -DisplayName “Jumbo Packet” | Set-NetAdapterAdvancedProperty –RegistryValue “9014”

    Almost at the end now. Hyper-V experiences significant performance increases if jumbo frames are enabled. This is particularly when machines are migrated between hosts but also around any other large network transfers. The problem is that all new network adapteds, including the ones we created above, default to having jumbo frames disabled. Turn these on whenever possible. In fact keep checking that these are still turned on. It’s a simple change which results in huge performance benefits.

    mpclaim -r -i -a “”

    Finally if you are using a SAN you’ll likely have multiple pathways and require MPIO to be enabled. If you don’t you’ll see multiple copies of the same disk and yet will only be using a single path. MPCLAIM will discover any MPIO devices and then will reboot the server to enable the configuration.

    Now all you need to do is use sconfig to set the IP address for your new vNICs, change your server name and join the domain. Then you can use all your normal tools remotely.

    Windows Core isn’t so scary after all.


    Windows Server Core – Is it worth the hassle?

    It’s been around for a long time now but how many environments are actually using Windows Server Core? It appears that it’s something that everyone knows they should be using but no one really wants to commit to.

    Now Microsoft have made life harder with Windows 2016, by removing the ability to add and remove the GUI meaning you need to commit up front.

    So should you commit or run to the safety of the desktop experience? As expected that will depend.

    Most servers come with ample memory and CPU to enable you to run the Desktop Experience so there really is little requirement to run Core but if you want to squeeze that little bit more out of your servers then maybe it’s worth looking further. What else do you need to think about before taking the plunge?

    What do you need the server for?

    There are still a lot of services that rely on the desktop experience to work and I’m not just talking about remote desktop services. Some print services will still want the desktop for instance, and there are many application servers that will still want it too.

    If you’re looking at a Hyper-V server then you just know that Microsoft want you to install core on it. Feeling guilty for hovering over the desktop experience option yet?

    What is the driver support like?

    This might sound like a strange question but one of the limitations of Core is around driver management. Device Manager is available externally, but only in read only mode.

    You can install, remove and update drivers using the Core commands but what about if you want to modify settings? Hopefully there’s a registry key or a configuration file because it’s not guaranteed that their device management utility will run in core. Sometimes they will but….

    You may think that this isn’t really an issue but just think about when you need to tweak network driver settings. Some of these, like Jumbo Frames, are accessible using PowerShell but not all.

    How often do you need to log on to the server anyway?

    This question needs to be answered in multiple ways. First what sort of admin staff are there? Do they install as much as they can on their own admin workstations or jump-boxes and do all their admin remotely or is everything done on the server itself? Do they have an aversion to PowerShell and all other command lines? If you try to force Core on the wrong staff it’s not going to end well.

    The second part of this question is how often you need to use the server. Let’s face it a nice GUI is quite comforting and if you need to do some manual task on the server every day then you just know you’re going to be happier with a full desktop experience. But stop you say. Why aren’t you automating your daily task? If so then maybe you are ready for Core after all.

    You’re taking the plunge. How bad is it really?

    That all depends. How do you feel about seeing this when you log on?

    Windows Core Desktop

    If you’re a little concerned then let’s make you feel better.

    Windows Core Powershell Desktop

    So much better right. As long as you get drivers sorted the actual setup isn’t too bad now. Remember the bad old days of running VBScript and weird command lines that no one will ever remember to get a server up and running? Well they’ve all gone. Now just run sconfig and you’re presented with a fairly user friendly, albeit ASCII menu.

    Windows Core SConfig

    That will get you over the initial setup but what about if you need to tweak things. Please don’t tell me I have to use REG command lines to edit the registry!!!

    Well no you don’t, and this is the not so dirty little secret about Core. It might be desktop experience-less but that doesn’t mean GUIless.

    Windows Core with GUI

    In fact the mistake that almost everyone makes is exiting the command prompt thinking it will log them off. Nope, that’s not how it’s done. You’re suddenly left with a session with no interface. Don’t worry though just use CTRL-ALT-DEL and you get the ASCII version of the LoginUI.

    Windows Core LoginUI

    From here you can bring up the GUI version of task manager and re-run explorer.

    Windows Core Task Manager

    As you can tell you may find that even if you do have some GUI tool that needs to run on the server it might still be fine. After all it is still Windows just with a little less than normal.

    Are there any advantages to offset the hassle of running Core? Absolutely. One major advantage is that the server will be left alone to just do what it’s intended to do. Admins won’t be logging on to the server and using browsers or other programs built in to the Desktop Experience.

    The size of the installation will be significantly smaller which will not only mean less disk space requirements but also less patching. Yes Microsoft are now using cumulative updates so you’re likely still going to be patching monthly but the time to install these updates and the potential impact will be smaller.

    Boot time is also incredibly quick since it has so little to load on boot.

    Finally it will change how you look at managing a server. It’s so easy to just have a process you follow for deploying a server but wouldn’t it be nice to instead have a scripted installer. how often do you find a small typo in configuration resulting in a slightly different configuration between servers. Core makes it so easy to create scripts for everything you do that can then be used in the future.

    So as an example of this say you want to install a Hyper-V server. How hard is it to get a base level using a script? Well here’s a basic script to get you going.

    install-windowsfeature -name Hyper-V, Data-Center-Bridging, FailOver-Clustering, multipath-IO, hyper-v-powershell, rsat-clustering-powershell, rsat-clustering-cmdInterface, rsat-datacenterBridging-lldp-tools

    new-netlbfoteam -Name “Switch1”-TeamMembers “vNIC1”, “vNIC2” -loadbalancingalgorithm HyperVPort

    new-vmswitch -Name “Switch1” -NetAdapterName “Switch1”

    add-vmnetworkadapter -name “HV-Mgmt” -switchname “Switch1” -managementos
    add-vmnetworkadapter -name “HV-CSV” -switchname “Switch1” -managementos
    add-vmnetworkadapter -name “HV-LM” -switchname “Switch1” -managementos

    set-vmnetworkadaptervlan -vmnetworkadaptername “HV-CSV” -vlanid 2 -access -managementos
    set-vmnetworkadaptervlan -vmnetworkadaptername “HV-LM” -vlanid 3 -access -managementos

    New-VMSwitch -Name “VM-Switch2” -NetAdapter “vNic3″,”vNIC4” -enableembeddedTeaming $true

    Get-NetAdapterAdvancedProperty -DisplayName “Jumbo Packet” | Set-NetAdapterAdvancedProperty –RegistryValue “9014”

    mpclaim -r -i -a “”

    Simple right? So what does it all do? I’ll go through it in the next post.