S4B Clients on Split-Tunnel VPNs.

S4B:  Bypass Split-Tunnel VPNs.

Take Away:  

Skype for Business (S4B) and Lync clients may experience problems when traversing a split-tunnel VPN.  Use Name Resolution Policy Table (NRPT) and Windows firewall group policies (GPOs) to bypass split-tunnel VPNs.  This solution is easy to administer and provides remote offices the best multimedia experience.


The DCA office experiences weird S4B/ Lync issues:  

  • Local S4B/ Lync clients cannot host conference calls for external clients.
  • All clients (external and DCA) can connect to conference calls hosted at at the company headquarters (JFK).  
  • Local S4B/ Lync clients cannot share multimedia content (e.g., screen-sharing, video, etc.) between external clients.  
  • All clients can share multimedia content when connected to conference calls hosted at JFK HQ.
  • Audio and video quality is poor (e.g., choppy or static) between DCA and JFK locations.


This business consists of two locations: JFK is the primary HQ office.  DCA is the branch office.
  • A site-to-site IPsec VPN tunnel connects the DCA and JFK offices.  
  • DCA uses split-tunneling to forward all corporate data.
  • DCA uses its default gateway to forward all other traffic to the Internet.  
  • JKF hosts all Lync servers:  Front End, Access Edge, and Reverse Proxy servers.
  • Both DCA and JFK use Active Directory (AD) integrated DNS servers.
  • External clients allow staff to work from home.
Figure 1.  Example of  Lync and organization topology.

ICE Framework:  

S4B-Lync uses network topology to select the best connection path.  It uses a peer-to-peer connection framework called Interactive Connectivity Establishment (ICE).   This framework includes Session Traversal Utilities for NAT (STUN) and Traversal Using Relay NAT (TURN) protocols.

STUN identifies client Network Address Translation (NAT) (i.e., private IPs). This process also identifies the default gateway (i.e., public IP).  Multimedia travels directly between end-points when STUN is used.  S4B/ Lync clients prefer to communicate directly (i.e., peer-to-peer) between clients which reside on the same LAN.  N.B., LAN is not a reference for broadcast domains.  LAN, in this situation, includes all internal networks (i.e., subnets) with routes to the Front-End subnet.  Internal clients never use the Access Edge server for internal communication. 

Similarly, external clients prefer STUN for communicating multimedia content to other external peers.  The Access Edge server will only bridge external-to-external clients (i.e., TURN) if peer-to-peer communication is not possible.

Lync clients use TURN framework when end-points do not share a common LAN.  The TURN process creates dynamic ports on the Access Edge server; and in turn (pun), proxies external multimedia.  TURN is similar to Port Address Translation (PAT), just as the Access Edge server is similar to an Internet gateway.

To recap, S4B/ Lync clients prefer direct peer-to-peer multimedia communication.  Internal clients will never use the Access Edge server for internal multimedia communication.  External clients use the Access Edge server to bridge communication whenever peer-to-peer communication is unavailable; including external-to-external, and external-to-internal.

Split-Tunnel Problems:  

ICE framework (generally) provides the best multimedia experience.  However, it does not work well over split-tunnel VPNs.  Split-tunnel VPNs create STUN and TURN mismatches.  For example, the DCA branch office firewall forwards all domain traffic to the JFK primary office; all other traffic forwards out the local gateway (i.e., Internet).  DCA and external Lync clients interpret this topology differently (Table 1).

Table 1. 
Default Multimedia Network Traffic Between Lync Clients
     Source           Destination
JFK DCA External Client
External Client TURN STUN STUN
Notes:  DCA uses split-tunnel VPN to connect to JKF.  Stun represents Lync client-to-client.  TURN represents multimedia proxy (i.e., Lync Access-Edge) requirement.    Blue represents split-tunnel topology.  Red represents client topology mismatch.  

The primary problem with split-tunnel VPNs is with how the S4B/ Lync client interprets the topology.  Recall, internal clients always use the Access Edge server for external communication.  Likewise, internal clients never use the Access Edge for internal conversations.  The VPN firewall forwards all domain traffic to the JKF network.  Therefore, DCA clients consider themselves as internal; and external clients as external.  DCA clients will only use the Access Edge server when communicating with external clients.

External clients have an entirely different interpretation of  the topology.  External clients are aware of the DCA Internet gateway, but they remain unaware of its split-tunneling.  External clients will therefore interpret DCA clients as external peers; multimedia traffic is sent directly to the DCA clients (i.e., STUN).

To recap, external clients are unaware of the DCA split-tunnel.  These external clients attempt to send audio and video (AV), and expect to receive AV, directly from the DCA clients.  Whereas DCA clients send AV, and expect to receive AV, proxied from the Access Edge server.

Figure 2.  Lync directional mismatch.

The split-tunnel VPN causes a secondary problem between JFK and DCA.   These clients use STUN to establish peer-to-peer connections across the VPN.   Users complain about overall client AV quality between these locations.

Multiple layers of encryption decreases overall AV quality.  Lync encrypts multimedia packets with TLS and SRTP protocols.  The VPN adds additional packet overhead as it encrypts and encapsulates each packet.  Staff at both locations can expect better AV if  DCA S4B-Lync clients bypass split-tunneling (i.e., TURN).

Figure 3.  Bypass the split-tunnel VPN.


S4B-Lync clients can bypass split-tunneling entirely:  (a) changes to DNS topology; and (b) changes to client firewalls.  Recall, both offices belong to a single AD domain, and each office uses recursive AD integrated DNS servers.  AD replication ensures internal name resolution is the same at each location.  Lync clients use DNS to locate S4B-Lync servers via S4B-Lync Discovery (Table 2). 
Table 2. 
S4B-Lync Client Discovery Preference Order
DNS Prefix lyncdiscoverinternal lyncdiscover
Discovery Order 1st preference 2nd preference
Client Internal clients External clients
Server Front-End Access-Edge
Notes:  Discovery preference assumes organization uses a split-brain DNS topology.  Topology consists of independent internal and external DNS servers.
  All internal clients, including those on the VPN, use internal DNS for Lync Discovery resolution.  External clients use external DNS for their Lync Discovery process.  Therefore, VPN clients can bypass split-tunneling using a process that distinguishes Lync traffic, and resolves it using external name records.  N.B., Internal DNS continues to resolve all other (i.e., non-Lync) requests.  Otherwise, what's the point of having a VPN?    

Name Resolution Policy Table:
  Split-brain DNS requires a confusing array of zone records.   Most Internet documentation suggests pin-point DNS zones to influence Lync traffic.  Instead, consider using NRPT, which simplifies the entire domain resolution process. 

Lync clients can bypass the VPN with NRPT group policy.   NRPT is configured with two simple rules:
  1. Forward all domain name requests for Lync services to external DNS servers.
  2. Use client DNS settings (i.e. internal)  for all other DNS resolution.
Create the NRPT Group Policy to allow S4B-Lync clients to bypass the VPN:
  1. Create new GPO:  Computer Configuration → Policies → Windows Settings → Name Resolution Policy.

  2. Configure the Advanced Global Policy Settings: 

    Figure 4.  NTRP GPO to bypass split-tunneling.

  3. Change the Query Resolution settings.  Enable "Configure query resolution options".  Enable Resolve both IPv4 and IPv6 addresses for names.

  4. Create rules that forward Lync FQDNs to external DNS servers.

    a.  To which part of the namespace does this rule apply?  Choose FQDN.
    b.  Click on the Generic DNS Server tab.
    c.  Toggle the Enable DNS settings check box
    d.  Click the Add button
    e.  DNS server:   Enter an external recursive DNS server; or the authoritative public (i.e., Internet facing) DNS server for your organization's sip-domain.
    f.  Click Apply.

   GPOs are applied to AD domains, sites, or Organizational Units (OUs).  In most situations, it makes sense to apply the NRPT GPO to the AD site that correlates with the branch office.

   From Group Policy Management:  Right click on Sites → Left click on Show Sites → Right click on the branch office site → Link an Existing GPO.     

   Alternately, create separate computer OUs per location.  Link the NRPT GPO OU that nests all branch office computers. 

Windows Firewall:

   NRPT influences clients to logically bypass the VPN.  However, there may be circumstances when Lync clients discover alternate (i.e., split-tunnel) paths to internal resources.  Lync clients, therefore, require both logical and physical divisions.  Windows Firewall compliments the NRPT GPO with two simple rules:
  • Restrict traffic based on application (i.e., S4B).
  • Restrict traffic based on source (i.e., DCA) and destination (i.e., JFK).
Create the Windows Firewall GPO:
  1. Create new GPO:  Computer Configuration → Policies → Windows Settings → Security Settings → Windows Firewall with Advanced Security → Inbound Rules.
  2. Right click on Inbound Rules → New Inbound Rule → Program → Path:  %ProgramFiles%\Microsoft Office\Office15\lync.exe → Block the Connection → Apply rule to Domain.  N.B, Use applicable application paths.  For example, Lync Basic and Lync Professional may use different paths.
  3. Edit the new Inbound Rule:  Right click on the new rule → Click on the Scope tab → Add all internal IP subnets (i.e., primary office) to the Remote IP address field → Click Add → Click OK.

    Figure 5.  Windows Firewall GPO to bypass VPN.

  4.  Apply the newly created Firewall GPO to apply the AD site that correlates with the branch office.  Alternately, apply this GPO to OU that nests branch office computers.


   NRTP and firewall GPOs force S4B-Lync clients to bypass split-tunnel VPNs.  These combined GPOs have two primary effects:  (a) DCA-to-external clients prefer STUN (i.e., client-to-client); and (b) DCA-to-JFK clients use TURN (i.e., client-to-Access Edge) for external AV communication (Table 3).  

Table 3. 
Effects of  Split-Tunnel GPOs on Multimedia Traffic 
     Source           Destination
JFK DCA External Client
External Client TURN STUN STUN
Notes:  DCA uses split-tunnel VPN to connect to JKF.  Stun represents Lync client-to-client.  TURN represents multimedia proxy (i.e., Lync Access-Edge) requirement.  Blue emphasizes branch office traffic.   

That's It!

What is Skype for Business Resiliency?

Skype for Business (S4B)/ Lync 2013 Resiliency Outline:  S4B and Lync 2013 resilience pools are similar to other highly available (HA) and fault tolerance solutions.  If one pool fails (i.e., server or network disruption) its clients automatically connect to the second pool. Even better, the Lync clients maintain their client-to-client sessions (i.e., VOIP and IM) after connecting to the backup pool :
  • UDP Direct.
  • UDP NAT.
  • UDP Relay.
  • TCP Relay.
When the primary server pool fails (e.g., server down) its clients experience a brief hiccup (e.g., 1 second delay) as they connect to the backup pool.  Although this near-HA solution is useful, the system is not without flaws.  The resiliency caveat is that clients enter a limited resilience mode upon connecting to the backup pool.  Clients in resilience mode operate with limited functionality:
  • New users connect in resilience mode.
  • Scheduling features are unavailable.
  • Presence state displays as unknown.
  • The Contact list and Address book is not available (searches for individuals work though).
Network administrators can later use PowerShell to manually fail-over remaining Lync services.  This process changes clients' resilience mode to the fully functional regular mode.  A talented Lync administrator can pair a network monitor application with a custom PowerShell script to fully automate the entire fail-over process.  

What is the Lync CMS?  The central management store (CMS) is a SQL Express database (Standard Edition) that stores configuration data in XML format.  The CMS database is named RTS.  Lync server tools make changes to the CMS database:
  • Lync PowerShell Console.
  • Lync Server Control Panel (LsCP).
  • Lync Topology builder.  
Each pool has a single CMS database; however only one of the CMS instances is active.

What is the replication database?  The CMS database uses a master-slave replication model.  All changes to the active RTS database are replicated to the RTSLocal databases.  Changes are replicated across pools.   Lync uses the RTCLocal instance -not the master RTC for its client services.  This distinction is important for resiliency purposes.   

What is the Lync Front End Pool?    The Lync Front End pool consists of a single (i.e., Lync Standard) or multiple (i.e., Lync Enterprise) Front End servers that connect to an associated back-end CMS database.  There is only one pool per CMS instance.  Users are assigned to a single pool; never both.

When users sign-on to Lync, their clients automatically connect to Front end servers which resides in the users' assigned Front End pool.  The Front End sever uses the replication database located in its local pool. 

What are Lync Resiliency Pools?  The Lync Resiliency pool consists of exactly two Front End pools (e.g., pool_A or pool_B).   This 1:1 pool ratio provides one primary pool and one backup pool.  If a single resiliency pool becomes unavailable, the second Front End server prevents any change to its CMS database.  However, the second server continues to operate using its replica's database which maintains a static configuration from the time off the loss.  Clients that re-connect use resilient mode because the back-end service is in a read-only state.  Consider:

  • The resiliency pool is considered active/passive when individuals are all assigned to the same pool.  
  • The resiliency pool is considered active/active when individual Lync users are assigned to separate Front End pools.  
The difference between the two models is significant during the fail-over process.  In the event one of the Lync servers fail there will be some clients that enter resiliency mode and others will remain in standard mode.

What is Lync High Availability?  Lync Enterprise separates the front-end and back-end roles for truly high-available services within a single pool.  Each pool allows up to twelve Front End servers and connects up to 80,000 simultaneous users.  In addition, Microsoft SQL 2012 hosts the CMS database by either mirroring or shared storage clustering.  Disruption to any single server within the same pool is negligible and are therefore considered highly available.  

Lync 2013 Standard edition differs because it consolidates roles; both the Front End and CMS installs on the same server.  Nonetheless, Lync Standard remains a robust solution.  It scales up to 5,000 users, is simple to deploy, and supports all workloads (i.e., VOIP, IM, etc...).  Microsoft provides an alternative to HA through the use of resiliency pools.

How do Lync resiliency pools work?

Back-End Server Perspective:

  1. There are only two resiliency pools (e.g., pool_A or pool_B).
  2. There is only one CMS master database per pool.  
  3. The CMS instance replicates to the LocalRTS database.
  4. Lync servers always use the LocalRTS -never the CMS.
Front-End Server Perspective:

    1. Front End Servers belong to a single pool.  Each pool has at least one Front End Server.
    2. Users are assigned to a single pool; never both.
    3. Active/Active pools assign users to both pools.
    4. Active/Passive pools assign users to a single pool.
    Lync Client Perspective.
    1. Users connect to Front End servers (i.e., pools) based on administrative configuration.
    2. Clients from the primary pool communicate seamlessly with clients from the secondary pool; and vice verse.
    3. Pools are not important to the end users unless the client is forced into resiliency mode.

    What happens when a Lync server goes down?  

    • Changes cannot be made to the CMS database from any pool when an outage occurs.
    • Clients access the data, rather than modify it.  The Lync FE server uses a second database 
    • Users associated with the unavailable server enter resiliency mode.  

    How do we fail-over Lync clients?

     Determine which Front End pool hosts the CMS:

    1. Get-CsService -CentralManagement
      The command will most likely error out because the primary pool, which normally hold the active CMS is unavailable. 

      However, the CMS can run on either pool, and we can at least determine if the secondary pool is the active CMS host.  Look for the active attribute in the host Identity field. 
    2. Let's assume the primary pool hosted the active CMS and is no longer available.  Fail over the CMS to the secondary pool:

      Invoke-CsManagementServerFailover -BackupSqlServerFqdn FE02.lab.local –BackupSqlInstanceName RTC –Force

      Verify the secondary pool hosts the active CMS:

      Get-CsService -CentralManagement
      Look for the active attribute in the host Identity field for confirmation.
    3. Once we've confirmed the secondary pool hosts the active CMS we can fail over the Replica to the secondary pool.

      Invoke-CsPoolFailOver –PoolFqdn –Disastermode –Verbose
      Lync clients seamlessly exit resiliency mode and enter standard mode.  The "Limited Functionality" warning disappears!