Blog

Deploy a Test Microsoft 365 Tenancy

In recent history it was possible to set up labs to test new technologies in a lab. This was due to the ease of deploying VMs to run test work loads. But what do you do when these services become cloud services? Most companies aren’t keen on the idea of paying for a sandbox environment even when there is a quantifiable value to having it.

Microsoft have provided a solution for this if you are a developer through MSDN but now they have opened up access to their cloud services through Developer.microsoft.com.

You can join their Dev Program at Developer Program – Microsoft 365. This allows you to deploy a new tenancy and includes 25 E5 licenses (just without Windows, and PSTN services). These licenses expire every 90 days but at this stage can be renewed so is an amazing deal.

These tenancies originally expired after 90 days which was good, but the change to allow renewals is outstanding.

Just remember that since you are the admin of this environment that it still needs to be secured.

Also don’t get too attached. Microsoft could always change their mind and go back to non-renewable tenancies.

Migrating from Hybrid to Native Microsoft Cloud – Overview

While Microsoft would love businesses to consume their services as a native cloud service, most environments operate in a hybrid mode. What does this mean and when would you run in this mode?

How did we get here?

If we look back on most business systems they were all located on-premises on hardware running inside the company offices. When the Microsoft cloud was first released it was essentially Microsoft running their own server software. Few customers were able to migrate all of their services into the cloud so used Hybrid services to introduce customers to the cloud.

In this mode selected users could use Cloud Services while other users would use on-premises services.

If you needed to use any Hybrid Cloud Services they all have a prerequisite of deploying Azure AD (AAD) in hybrid mode. You did this by installing Azure AD Connect on an Active Directory (AD) joined server. This replicated your on-premises AD users into AAD.

Azure AD Connect environment
Azure AD Connect provides the glue between AD and AAD identity providers

Note that the user accounts are replicated. They are not the same account. We either use single sign on, or password replication to make it appear to the user that they are using the same account.

Over time as more users are migrated to the Microsoft Cloud there has been less reliance on the on-premises services. But what is the process to move from a Hybrid to a Native Cloud environment?

Audit your environment

First you need to look at what services you still are using on-premises. Let’s assume that you’ve migrated all of your services to the cloud AND removed the old on-premises services. (Exchange, SharePoint, Skype for Business, and Configuration Manager). You’re likely still be left with AD.

In this case you’ll likely still be creating users in Active Directory and syncing them to AAD if for no other reason than to make sure that you can add them to your AD based groups.

Is your computer still connected to the on-premises Active Directory, or were they moved to Azure Active Directory connected computers? Do you still use any services located in the old Active Directory forest on file servers for instance? Do you use Password Hash or Federated Authentication?

Once you understand what dependencies you still have on the on-premises environment you can work to remove them.

Migrate your Authentication

If you are still using Federated Authentication then this is using your internal AD to authenticate any authentication requests for you AD users. This will also stop you from having any native cloud users with the same domain name as your hybrid users.

This should be one of the first services you migrate. Once completed all Office 365 login requests will be authenticated in AAD rather than being forwarded to AD.

Migrate users to Azure AD Devices

Even when your account is AD homed you can still log on using AAD if the computer is natively joined to AAD. Once the machine is removed from AD and added to AAD users will log on using their AAD account. Remember that the accounts are seperate so if you log on to an AAD device then you are logging on using your AAD account. If you log on to an AD device then you are using your AD account.

Same user logging in to different devices
The same user uses a different identity provider by using a different machine

Unfortunately the tools to make this migration are quite limited at this time. If you need to automate this process then Binary Tree Power365 currently should help migrate machines from AD to AAD.

Migrate all non-user AD objects

Before you can move your user objects you will need to move any groups that they are a member of. If you move the account first then they would non longer be a member as the group ownership was still in AD.

In Office 365 there are different types of groups which are migrated in different ways. If a group is a distribution group then this migration will need to be performed in Exchange Online. If on the other hand it’s a security group then this is owned by Azure AD.

It might also be worth cleaning up the rest of the directory at this stage. Remember that contacts also need to be moved but you may also have other objects that need to be cleaned up. While you can the source of most objects in the Azure AD portal it is worth using the Azure AD Connect service to see exactly what is being sent to AAD.

Migrate user accounts

Once your user accounts are finally ready to move you face a problem. Microsoft currently doesn’t have a process to move a user from AD to AAD. Instead you need to stop the user from syncing which will delete the user account in AAD. Once the account is deleted you can then recover it from the AAD recycle bin complete with all data. This will then change the account to be a native cloud account.

Flip the Native Cloud Switch

Once all of your users have been migrated to be native cloud you can then disable the Sync between AD and AAD.

While this will feel like a substantial change this won’t impact your users at all as they will no longer be accessing any AD services.

Hopefully this gives you a good overview of the steps required to move from Hybrid to Native Cloud. In subsequent articles we’ll look at the details for each step and how you can minimise the user impact.

Migrate Windows Failover Clusters Between Domains

There are numerous tools that can be used for migrating servers between domains but what happens when you have invested in Windows Failover Clusters? If you’re right up to date with Windows Server then Microsoft can now help you out with a new domain migration process at https://docs.microsoft.com/en-us/windows-server/failover-clustering/cluster-domain-migration. This will likely help a couple of people but for everyone else you’re left with the options to either first upgrade to Windows Server 2019, or to do it the manual way.

In Microsoft’s words the manual way “involves destroying the cluster and rebuilding it in the new domain.”

OK so that’s not as bad as it sounds at first. As long as you have a process and make sure you document a few things it should go reasonably simply.

The not so bad process

The migration process involves shutting down all cluster processes and removing them from the cluster. This will then allow the cluster to be destroyed. Once you’re at that point you can migrate the individual servers to the new domain. Finally you re-create the cluster in the new domain and set up all resources again.

This might sound onerous, and it may be depending on the type of Windows Failover Cluster that you have, but some workloads are easier than others. For instance if you’re running a Hyper-V cluster then you can easily remove all VMs so that they are no longer clustered. If on the other hand this is a SQL cluster, well that’s a little more difficult. But let’s look at the example of a Hyper-V cluster to see how this process will be achieved.

Now Failover Clusters are a real special snowflake with no two being alike so first you will need to sit down and get to know your particular version.

You will also need to think about other features such as is it managed by VMM? What types of storage do you have? What type of Quorum do you have? What features are you using? Start by documenting it all if you don’t already have this.

Minimum Windows Failover Cluster Documentation

As a cheat sheet this is the minimum information that you will need to migrate a Windows Failover Cluster between Domains.

  • Cluster Name

  • Cluster IP

  • Cluster Quorum Type

  • Cluster Networks (Include name and IP ranges)

  • Cluster Quorum Disk/Location

  • List all Cluster Disks (Include Name, Disk number, disk letter or mount point)
  • Of particular note be very aware of the cluster disk mount location. This is something that is easy to forget about but is hard to discover after the fact. If you get this wrong then the new cluster won’t be able to find the VM information or components leading to a Off-Critical state.

    Migration Time!!

    Once you have this all documented you can move on to the actual migration. Below are the minimum steps that will need to be performed:

    1) Disable the computer account of the cluster in the destination domain if it exists
    2) Stop and delete any cluster specific services (Replication Brokers etc)
    3) Shut down all VMs
    4) Set VM startup to manual
    5) Remove all VMs. Check that they are located on a single node (This is removing the VMs from the cluster and not deleting the actual VMs)
    6) Move all disks to a single node
    7) Evict one node from the cluster
    8) Remove all CSV disks
    9) Destroy the cluster
    10) Change DNS on all cluster nodes to use the new domain’s DCs
    11) Change the domain membership of all cluster nodes
    12) Create new cluster on one node
    13) Run the cluster validation
    14) Insert old cluster name and IP address
    15) Add Quorum Disk and configure quorum settings
    16) Add additional disks in the correct order so that the CSV volume names are correct. Correct manually if required.
    17) Check that Hyper-V machines are now Off and not off-critical
    18) Create the Quorum disk for the cluster
    19) Add the subsequent nodes to the cluster making sure to run the cluster validation again
    20) Test moving the cluster name between nodes
    21) Test moving all disks between nodes
    22) Import VMs
    23) Start VMs
    24) Configure Auto-start for VMs
    25) Test moving VMs between nodes

    In practice I’ve scheduled a total cluster outage of two hours for this type of migration but remember that your special snowflake may require more time.

    Exchange Server Throttled by Back Pressure Due to Internal Message

    You may experience issues resulting in mail failing to be delivered to internal users. This may be difficult to detect using the common tactics for Exchange management.

    In the situation experienced, an internal user had sent an email over 1.5GB in size. Normally this wouldn’t be a problem but due to a mis-configuration the internal receive connectors were set to accept messages up to 2047MB, which is also the maximum message size limit. This resulted in the exchange service attempting to receive the message rather than generating an NDR response, which would have stopped the message delivery from being retried.

    This resulted in the Transport service being put under pressure and the incoming queue being throttled by back pressure, with no messages being accepted or delivered. If left alone the problem would not auto-resolve unlike a back pressure issue caused by legitimate email load.

    Exchange Mail Flow

    To explain how this happened we need to understand how the transport service works for delivering all email. Microsoft provide the following mail flow diagram to show how this works

    Exchange Server 2016 Mail Flow Diagram

    No matter where an email originates it needs to traverse the transport service to be delivered. External messages are typically what an admin deals with. These come into the front-end service as an SMTP message. They are checked to see whether they are authorised before being transferred to the transport service and then forwarded into the mailbox delivery service for delivery.

    Messages sent by internal users will also transit the transport service but these take a slightly different route. The client does not send the message via SMTP but rather puts the message into the outbox folder for the mailbox. From here the message is sent by the store driver submit process which submits the message to the transport service. This then processes the message and sends it back to the store driver to deliver the message

    Back Pressure

    An added complication to this process is back pressure detection.

    Exchange Server detects pressure on the transport service and starts to reduce the speed that messages are accepted to ensure that the server remains operational. Even though emails may be delivered at a slower rate the server remains operational in this situation.

    This back pressure is detected based on several metrics. The easiest way to see the current state of these metrics is to run the powershell command:

    [xml]$bp=Get-ExchangeDiagnosticInfo [-Server <enter-Exchange-server-name-here> ] -Process EdgeTransport -Component ResourceThrottling; $bp.Diagnostics.Components.ResourceThrottling.ResourceTracker.ResourceMeter

    Most of these counters are fairly self explanatory and relate to free disk and memory on the server but one that may be new to you is UsedVersionBuckets. This is the number of uncommitted message queue database transactions in memory. So what does this mean?

    When a message is being received the exchange server will be receiving it into memory. Once the complete message is received then it can save it into the mail queue database. While the message is still in memory though the UsedVersionBuckets will increase. This can happen either when the server is receiving many small messages or a small number of very big messages.

    How large messages impact the transport service

    In this case a single very large message was causing this pressure. Every time the message was submitted by the Store Driver it would result in the UsedVersionBuckets soaring to over 3000. At this point the transport service stopped processing any messages into the mailbox submission queue and stalled. While the service could be restarted once the message was resubmitted the same behaviour was repeated.

    Typically advise includes looking at the message queues to see how many messages are being received and how big they are, but in this case the queues could even be clear. The message hasn’t made it to the queue yet as it’s still being transferred by the store driver.

    In order to find this message you will need to perform a system wide search of the outbox folder of all mailboxes using the powershell command

    Get-mailbox -Resultsize Unlimited | get-mailboxfolderstatistics -folderscope outbox | fl |ft identity,ItemsinFolder,FolderSize

    Once the offending message is found it needs to be removed from the client side to make sure that it doesn’t get resubmitted to the store driver. There is no way to do this from the server side.

    Windows Server 2012 R2 fails to install .Net Framework 3.5

    .Net Framework 3.5 is getting old now and really shouldn’t be installed unless it’s required but if you do need this it’s now a real pain to get installed.

    Almost everyone that has had the pleasure of trying to install this feature on Windows 2012 R2 will have found this Microsoft article which basically says that if you’ve already installed 2966827 or 2966828 that the installation will fail. The fix is to remove them before trying the installation again.

    Well these patches came out in 2014 and there doesn’t appear to be any update to this guidance. If you look at most new servers you won’t see these patches but just try to install .Net Framework 3.5. It didn’t go very well did it?

    When you run the installation it’s trying to find newer file versions which aren’t present on the original source media. It needs these newer versions due to other updates that have been installed. Of course since this is the first time .Net Framework has been installed these files weren’t around to be patched. But it knows that it needs them to remain stable and secure.

    If the server is configured to use Windows Update then it will be able to download these files but otherwise the installation will fail.

    Tracking down the offending patches

    When you run the .net installation it logs the installation in c:\windows\logs\CSB\cbs.log. When the installation fails have a look for something like the following:

    CommitPackagesState: Started persisting state of packages
    2018-01-29 09:03:42, Info                  CBS    Failed call to CryptCATAdminAddCatalog. [HRESULT = 0x800706be – RPC_S_CALL_FAILED]
    2018-01-29 09:03:42, Info                  CBS    Failed to install catalog file \\?\C:\WINDOWS\CbsTemp\30644498_651525325\Package_for_KB4058702~31bf3856ad364e35~amd64~~16299.188.1.0.cat for package [HRESULT = 0x800706be – RPC_S_CALL_FAILED]
    2018-01-29 09:03:42, Info                  CBS    Failed to install catalog for package: Package_for_KB4058702~31bf3856ad364e35~amd64~~16299.188.1.0 [HRESULT = 0x800706be – RPC_S_CALL_FAILED]
    2018-01-29 09:03:42, Info                  CBS    Failed to stage package manifest. [HRESULT = 0x800706be – RPC_S_CALL_FAILED]
    2018-01-29 09:03:42, Info                  CBS    Failed to add package. [HRESULT = 0x800706be – RPC_S_CALL_FAILED]
    2018-01-29 09:03:42, Info                  CBS    Failed to persist package: Package_for_KB4058702~31bf3856ad364e35~amd64~~16299.188.1.0 [HRESULT = 0x800706be – RPC_S_CALL_FAILED]
    2018-01-29 09:03:42, Info                  CBS    Failed to update states and store all resolved packages. [HRESULT = 0x800706be – RPC_S_CALL_FAILED]
    2018-01-29 09:03:42, Info                  CSI    [email protected]/1/29:15:03:42.863 CSI Transaction @0x27adb0b1d50 destroyed
    2018-01-29 09:03:42, Info                  CBS    Perf: Resolve chain complete.
    2018-01-29 09:03:42, Info                  CBS    Failed to resolve execution chain. [HRESULT = 0x800706be – RPC_S_CALL_FAILED]
    2018-01-29 09:03:42, Error                 CBS    Failed to process Multi-phase execution. [HRESULT = 0x800706be – RPC_S_CALL_FAILED]
    2018-01-29 09:03:42, Info                  CBS    WER: Generating failure report for package: Package_for_KB4058702~31bf3856ad364e35~amd64~~16299.188.1.0, status: 0x800706be, failure source: Resolve, start state: Resolved, target state: Staged, client id: UpdateAgentLCU
    2018-01-29 09:03:42, Info                  CBS    Not able to query DisableWerReporting flag.  Assuming not set… [HRESULT = 0x80070002 – ERROR_FILE_NOT_FOUND]

    In this part of the log you will notice it mentioning KB4058702. This is trying to locate the .Net Framework 3.5 files located in this patch which simply don’t exist. But if you remove this patch and retry you are likely to find another patch being mentioned.

    Ultimately I found that the following patches needed to be removed before .Net Framework 3.5 would install but you may find a slightly different list.

    • KB3195792
    • KB4058702
    • KB4040981
    • KB4014505
    • KB4014581
    • KB3048072
    • KB3142045
    • KB3072307
    • KB3188732
    • KB3188743
    • KB3195792
    • KB3210132
    • KB2966828

    Once the installation is successful make sure that you reinstall the missing patches again.

    Skype for Business CU fails to install – Error 1603: Server.msp had errors installing

    Microsoft have done a good job making the patching process for Skype for Business as simple as possible but over time it is possible that you may suddenly come across a server that will just not install a CU.

    When you look at the logs the error doesn’t give you a lot of information to work on:

    Executing command: msiexec.exe  /update “Server.msp” /passive /norestart /l*vx “c:patchesServer.msp-SRV01-[2018-11-28][19-27-42]_log.txt”

    ERROR 1603: Server.msp had errors installing.

    ERROR: SkypeServerUpdateInstaller failed to successfully install all patches

    Right.

    Luckily if does give you a log file in the first line. A REALLY BIG log file.

    If you search for “error” then you will likely find a few but don’t get too worried. In particular one entry points you yet another log. This is located in the AppData\Local\Temp folder of the user running the upgrade and is called LCSSetup_Commands.txt. Inside this you will find the following information:

    Install-CsDatabase : Command execution failed: Install-CsDatabase was unable to find suitable drives for storing the database files. This is often due to insufficient disk space; typically you should have at least 32 GB of free space before attempting to create databases. However, there are other possible reasons why this command could have failed. For more information, see http://ift.tt/1Og9jlm

    So it seems that Skype for Business won’t patch the database if the disk on the server drops below a certain threshold. This log mentions 32GB but we’ve found that it will go lower than this.

    After a bit of housekeeping the patch will run through successfully.

    VMM bare metal build fails due to no matching logical network

    If you deploy a new VMM bare metal build environment you may face an issue where the deployment fails with Error (21219).

    The error description doesn’t appear to make a lot of sense either stating:

    IP or subnet doesn’t match to specified logical network.


    Recommended Action
    Specify matching logical network and IP address or subnet.

    If you face this, then you’ve likely checked and double checked all of your VMM networking and everything looks fine.

    What’s happening?

    This issue is caused by the build process looking up the IP address that you have allocated to the new server, and comparing this to the logical network. When it does this it’s finding that the IP address doesn’t match the subnet configured on the logical network .

    But you’ve checked this already and it DOES. You checked it again now just in case and it still does, so this can’t be your problem right!?!?!

    Well no.

    You see VMM appears to not just be looking at the logical network for a match, but more specifically the first network site that was created on this logical network. Now in most cases you created the management network first so no big deal. But if you didn’t create it first, or you deleted it for some reason and recreated it then it will no longer be the first created site.

    You’ve got to be kidding me. But it’s easy to fix?

    You would think so but notice that there are no re-order buttons on the subnets?

    This means that the only way to “reorder” them is to delete all of the network sites and recreate them. And if you have created any VM networks or bound them to any other configuration object then you’ll be even happier to know you will need to undo all of this configuration too.

    Hopefully you’re deploying a new cluster and not deciding to deploy bare metal build to an existing one.

    In case you think you must have missed this somewhere, it isn’t stated in the documentation. So is it a bug or a feature?

    Either way just remember to create the host management subnet first in future.

    Configure Hybrid Public Folder with Exchange 2013/2016 (aka Modern Public Folders)

    Public Folders don’t seem to have the usage that they used to so it’s been a while since we worked with Public Folders in Exchange. So long in fact that what we last configured is now called Legacy Public Folders with the new version, introduced in Exchange 2013 called Modern Public Folders.

    A Refresher on Exchange Public Folders

    In order to understand the new process of setting up Hybrid mode with Exchange Online you first need to understand some changes to how Public Folders work.

    In Exchange 2010 public folders were stored in dedicated Public Folder Databases. These also had their own log files and had to be managed independently of any User Mailbox Databases.

    With Modern Public Folders they have been moved into Mailboxes which are stored in a standard user database. The environment can contain multiple public folder mailboxes, each of which can contain different parts of the public folder hierarchy.

    When a user accesses a public folder they are actually opening the mailbox that contains that part of the hierarchy. Unlike previous versions the data is only accessible from the server hosting the active database rather than any server hosting a public folder replica.

    Configuring Hybrid Public Folders

    What does this mean for configuring Hybrid mode Public Folders?

    First of all if you searched for something like “Configure Exchange Public Folder Hybrid” and found this Exchange 2019 article referring to Exchange 2010 SP3 or later then you’ve got the wrong article. You need to look for this article which is only on the Exchange Online documents site.

    This newer article ignores all of the steps setting up new Public Folder Mailboxes resulting in just three steps:

    1) Download the following files from Mail-enabled Public Folders – directory sync script

    • Sync-MailPublicFolders.ps1
    • SyncMailPublicFolders.strings.psd1

    2) On Exchange Server, run the following command to synchronize mail-enabled public folders from your local on-premises Active Directory to O365.

    
    Sync-MailPublicFolders.ps1 -Credential (Get-Credential) -CsvSummaryFile:sync_summary.csv
    

    3) Enable the exchange online organization to access the on-premises public folders. You will point to all of you on-premises public folder mailboxes.

    
    Set-OrganizationConfig -PublicFoldersEnabled Remote -RemotePublicFolderMailboxes PFMailbox1,PFMailbox2,PFMailbox3
    

    Issues When Configuring Hybrid Mode

    There are a few things to be aware of with this process though, particularly the final step.

    1) Remember that the new Public Folders are stored in User Mailboxes which are associated with AD user accounts. If you aren’t syncing your entire Active Directory forest then the Public Folder Mailbox objects may not be synced to Exchange Online. So where are these stored by default? Well the Users container in your exchange enabled domain of course.

    It’s likely that you haven’t synced this but you CAN move these objects to an OU that is being synced without any impact. Unfortunately this requirement isn’t included in the documentation. If these objects aren’t synced to Exchange Online then you’ll get the following message

    
    Set-OrganizationConfig -PublicFoldersEnabled Remote -RemotePublicFolderMailboxes PFMailbox1
    Couldn't find object "PFMailbox1". Please make sure that it was spelled correctly or specify a different object.
        + CategoryInfo          : NotSpecified: (:) [Set-OrganizationConfig], ManagementObjectNotFoundException
        + FullyQualifiedErrorId : [Server=SYAPR01MB2717,RequestId=d79eaa00-ff32-4076-8791-54ba22e3cb76,TimeStamp=26/11/201
       8 7:13:26 AM] [FailureCategory=Cmdlet-ManagementObjectNotFoundException] C4302B7C,Microsoft.Exchange.Management.Sy
      stemConfigurationTasks.SetOrganizationConfig
        + PSComputerName        : outlook.office365.com
    

    2) Once you’ve moved the public folder mailbox objects remember that the -RemotePublicFolderMailboxes PFMailbox1,PFMailbox2,PFMailbox3 syntax is referring to the Public Folder Mailboxes and NOT the public folder names. You can find these in the ECP under Public Folder Mailboxes.

    3) You also need to list all public folder mailboxes in the one command. If you add an additional public folder mailbox in the future then include all the mailboxes and not just the new one.

    4) Finally remember that your on-premises address book is different from your online address book. This means that any new mail enabled public folders will only appear in your online address book if you sync them using the Sync-MailPublicFolders.ps1 script. If users can create these objects then you may want to think about scheduling this task.

    Only users who have been created on-premises and migrated to Exchange Online can access the on-premises Public Folder store. Only these users exist in the on-premises address book used to authenticate access.

    It may not seem that way but ultimately this is a simple service to configure with just a few little gotchas to be aware of.

    Skype for Business Web Sites Fail to Work Using Microsoft Web Application Proxy

    In a classic case of skim reading the documentation we had trouble publishing a Skype for Business environment externally using a Microsoft Web Application Proxy (WAP). This was mainly impacting the Skype for Business mobile client which was failing to log on.

    You could still access all of the standard web sites through a browser such as dialin and meet though.

    This happened due to a missed step when setting up the WAP. If the internal and external URLs are different, you need to disable the translation of URLs in the request headers. Use the following powershell command on the WAP server.

    
    $Rule = (Get-WebApplicationProxyApplication -Name "Insert Rule Name to Modify").ID
    Set-WebApplicationProxyApplication –ID $Rule –DisableTranslateUrlInRequestHeaders:$True
    
    

    Once completed reload the mobile client and it should connect without issues.

    Misconfigured Skype for Business Edge Server Breaks Office 365 Hybrid Federation

    We’ve been moving more customers to Office 365 recently. Not only are they seeing the business case stacking up from a cost point of view but they are also after the cloud only features which are now more frequently appearing. A troubling development with these migrations are the number of broken Skype for Business Edge servers that we are seeing.

    Now these aren’t totally broken but just broken enough that when we try to integrate their on-premises Skype for Business environment with Office 365 services things go wrong.

    How will you detect this?

    This will often show up when trying to get voicemail configured to use hosted voicemail in Exchange Online since this is often the first hybrid service being deployed. When the call is redirected to the Exchange Online server it fails. Looking at the event logs on the front end server it says that the dial plan wasn’t configured correctly.


    Attempts to route to servers in an Exchange UM Dialplan failed

    No server in the dialplan [Hosted__exap.um.outlook.com__tenant.onmicrosoft.com] accepted the call with id [XXXXXXXXXXXXXXXXXXXXXXXXX].

    Cause: Dialplan is not configured properly.

    Resolution:

    Check the configuration of the dialplan on Exchange UM Servers.


    All the configuration looked fine and so we needed to dig into the SIP traffic a little more. We did this using snooper. We could see the message being handed off to the edge server from the front-end server but then the edge server connection timed out.

    
    <strong>Response Data</strong><br>
    504  Server time-out<br>
    ms-diagnostics:  1008;reason=”<strong><em>Unable to resolve DNS SRV" </em></strong>
    
    

    This was a little strange as the edge server was working fine for other federation partners, and DNS lookups were working on the edge server.

    What was happening?

    One thing that didn’t look right though was that the internal interface was configured to use the internal DNS server. Referring to the Edge server deployment guide confirmed that this wasn’t correct.

    https://docs.microsoft.com/en-us/skypeforbusiness/deploy/deploy-edge-server/deploy-edge-servers

    Interface configuration without DNS servers in the perimeter network
    1. Install two network adapters for each Edge Server, one for the internal-facing interface, and one for the external-facing interface.

    Note
    The internal and external subnets must not be routable to each other.


    2. On your external interface, you’ll configure one of the following:


    a. Three static IP addresses on the external perimeter network subnet. You’ll also need to configure the default gateway on the external interface, for example, defining the internet-facing router or the external firewall as the default gateway. Configure the adapter DNS settings to point to an external DNS server, ideally a pair of external DNS servers.


    b. One static IP address on the external perimeter network subnet. You’ll also need to configure the default gateway on the external interface, for example, defining the internet-facing router or the external firewall as the default gateway. Configure the adapter DNS settings to point to an external DNS server, or ideally a pair of external DNS servers. This configuration is ONLY acceptable if you have previously configured your topology to have non-standard values in the port assignments, which is covered in the Create your Edge topology for Skype for Business Server article.


    3. On your internal interface, configure one static IP on the internal perimeter network subnet, and don’t set a default gateway. Also leave the adapter DNS settings empty.

    4. Create persistent static routes on the internal interface to all internal networks where clients, Skype for Business Server, and Exchange Unified Messaging (UM) servers reside.

    5. Edit the HOST file on each Edge Server to contain a record for the next hop server or virtual IP (VIP). This record will be the Director, Standard Edition server or Front End pool you configured as the Edge Server next hop address in Topology Builder. If you’re using DNS load balancing, include a line for each member of the next hop pool.

    How to fix it?

    The edge servers were changed to meet this guidance by creating a hosts file with all servers in the topology using both short names and FQDNs, as well as setting the external adapter to be the only adapter with DNS settings which were external to the organisation.

    Voicemail started working once this change was made.

    Why this happens

    So why did this happen? Part of the setup for voicemail located in Office 365 is configuring a hosting provider in Skype for Business.

    
    New-CsHostingProvider -Identity 'Exchange Online' -Enabled $True -EnabledSharedAddressSpace $True -HostsOCSUsers $False -ProxyFqdn "exap.um.outlook.com" -IsLocal $False -VerificationLevel UseSourceVerification
    
    

    This provider has shared address space enabled. This means that endpoints with the same SIP domain name can be located either on-premises or in the cloud. In our case the endpoint is the Exchange Online UM service.

    When a call is routed to Exchange Online UM it looks up the local directory to see that the user isn’t located on-premises. The call is passed to the Edge server which performs a lookup of the _sipfederationtls._tcp.domain.com DNS record. Why is it doing this? Well basically it’s trying to make a federation request with it’s own domain and this is the start of that process. But the _sipfederationtls._tcp.domain.com record only exists externally so that lookup is failing. Since it can’t federate with itself it doesn’t go to the next step which is establishing a connection to Exchange Online.

    This can also be fixed by adding the DNS record to your internal DNS but the edge server would still not being configured correctly. It’s possible that using the internal DNS server would result in something else not working later on. Far better to fix it properly.

    Just as a matter of interest if you tried to configure a hybrid mode with Skype Online you would also experience issues where your on-premises users couldn’t see presence or send messages to cloud users. This is the same reason as the Exchange UM issue with shared address space also enabled on this hosted provider

    
    New-CSHostingProvider -Identity SkypeforBusinessOnline -ProxyFqdn "sipfed.online.lync.com" -Enabled $true -EnabledSharedAddressSpace $true -HostsOCSUsers $true -VerificationLevel UseSourceVerification -IsLocal $false -AutodiscoverUrl https://webdir.online.lync.com/Autodiscover/AutodiscoverService.svc/root
    

    Both Exchange Online and Skype for Business have hybrid relationships with Skype for Business On-Premises. The only difference, apart from the provider endpoint address, is that the Skype Online provider is configured to host users while the Exchange Online provider hosts services.