WAN File Server Problems - SMB Limitations Over the VPN, Internet, WAN...

Abstract:  

   This research examines the limitations of SMB file transfers over the WAN.  End users complain of slow file browsing, slow file enumeration, and an inability to save Word files from the branch office.  Recommendations are made to resolve the issues.  


WAN File Services:The Influences of latency and protocols over the WAN

By Steven M. Jordan
University Wisconsin StoutLast updated  November 20th, 2013

Chapter 1:  Introduction

This research is based on a network problem between a corporate office and a branch office.  The corporate office is based in Oconomowoc, WI, and is referred to as ORP.  The branch office is based in Carmel, IN, and is referred to as IDTC.  The ORP corporate office has an in-house datacenter that provides network connectivity to over 100 branch offices throughout the Midwest. 

Scope:

The network is modeled on a hub-and-spoke design.  The ORP datacenter is the central hub and facilitates all network services to the separate branch offices.  Provided network services include terminal, email, and file sharing via Window 2003 servers.  The primary methods used to connect branch offices include Internet VPNs (virtual private networks), T1 circuits, and DOCSIS (data over cable).  Approximately 10% of IDTC staff access the ORP file server from Windows-based workstations.  The remaining 90% of IDTC network users connect to the ORP file server from thin clients (simple computers) via Citrix Terminal Server.

Problem Statement:

Network users experienced difficulty connecting to the remote ORP file server.  The network problem caused work loss and disruption for staff located at the IDTC branch office.  Reports of the problem were sporadic and the exact cause remained unidentified.  The problem was not experienced by all network users at IDTC.  The end users that had experienced the problem complained of slow network file browsing and slow file enumeration. 

The problem was most symptomatic when large file directories on the ORP file server were accessed.  The directories associated with disruption usually contained hundreds of files and folders.  Workstations became unresponsive, mouse icons displayed an hourglass symbol, and several minutes passed before user functionality retuned.  There were also reports of mapped network drives that had disappeared or error messages that read, “Network location is unavailable.” 

Chapter 2:  Problem Determination:


            Network tests were required to diagnose potential problems.  The research targeted three potential problems:
1.      Network latency
2.      Bandwidth
3.      Network protocols 

Latency: 

            Latency is the delay of data flow.  A ping test measures the amount of time data requires to travel across the network.  Lower ping replies indicate faster network connections.  Conversely, high ping times indicate slow network connections.  The ping test between IDTC and ORP measured an average time of 25 ms.  A rate of 25 ms is usually considered sufficient to support network service for a remote office.

            Test results between IDTC and ORP were then compared to latency data from a separate branch office.  The second branch office is referred to as ODTC, and is located in Summit, WI.  The wide area network (WAN) technology is similar at each location.  The major difference between ODTC and IDTC is their geographic distance from ORP.  IDTC is located in Indiana while ODTC is located within 10 miles of ORP (Jordan, 2011, p. 5)

          The second latency test revealed a discrepancy.  Latency to ODTC averaged 1 ms; Latency to IDTC averaged 25 ms.  IDTC experienced the highest latency to the datacenter among all branch offices connected with dedicated circuits to ORP (Jordan, 2011, p. 5).

          Time Warner Cable (TWC) provides network connections to each office.  TWC’s network engineer says that the higher latency is most likely caused from the large geographic distance between ORP and IDTC.  TWC also notes that they partner with a separate telecommunications company (Telco) to provide service across state lines.  There was no method to determine the number of switches the data passed through to complete the connection; each hop (switch) slightly increased the latency (Jordan, 2011, p. 6).

     Microsoft confirms that latency has a negative impact on network performance.  The following table reports the estimated time to enumerate file share content based on available bandwidth, latency, and the volume of content (Microsoft, 2009):


Table 1
File Crawl Rates
Bandwidth
1 GB
5 GB
25 GB
100 GB
500 GB
10 Mbps
Latency = none
Crawl rate = 467 MB/min
12 sec
1 min
5 min
20 min
1 hr 30 min
10 Mbps
Latency = 100 ms
Crawl rate = 330 MB/min
2 min
9 min
45 min
3 hr
15 hr
(Jordan, 2011, p. 7)

Average file crawl enumeration of the ORP file server was tested from both IDTC and ODTC:
Table 2
Branch Office Latency
Location
Latency
Enumeration:
IDTC
23 ms
130 sec
ODTC
1 ms
3 sec
(Jordan, 2011, p. 7)
Microsoft’s published file crawl rates apply to measurements collected between ORP and IDTC. The research demonstrates negative impact when 75 MB of content volume is processed:
Table 3
Critical Mass Data
Data Size Reference
1 GB = 1,000 MB
File Crawl Formula
File Crawl Formula Applied to IDTC
Expected File Crawl at IDTC *
23 ms = 75 MB
Note.
*Calculations are based on 10Mb of available bandwidth.

Bandwidth:  

Latency measures the rate at which data is delivered.  Bandwidth indicates the amount of data that can be delivered.  Throughput is the specific amount of data delivered.  Throughput is impacted by variables, including the slowest-speed link and external interference (Odem & Knott, 2006). Data throughput and available bandwidth were tested between IDTC and ORP.
Network traffic was generated to compare the throughput rates between IDTC to ORP and ODTC to ORP.  The traffic was generated from the ORP datacenter and transmitted to the print servers at each branch office.  The inbound and outbound bandwidth results were similar up to 6 Mbps.  There was a noticeable performance difference between the branch offices when more than 6 Mbps of data traffic were delivered (Jordan, 2011, p. 8).
Traffic flow to ODTC worked as expected.  Up to 10 Mbps transfers were completed between ODTC and ORP.  The IDTC transfer rates significantly decreased when more than 6 Mbps of data delivery were attempted.  When excess of 6 Mbps of data were sent, the average latency doubled, and the outbound rate decreased from 6 Mbps to under 2Kbps (slower than an analog modem).
Table 4
Data sent from ORP to IDTC
% of 10 Mb
Data Sent
Receive
Transmit
Latency
30%
3043 Kbps
2.85 Mb
2.85 Mb
25
40%
4004 Kbps
5.4 Mb
5.4 Mb
31
60%
5,916 Kbps
2,028 bps
1,313 bps
80 ms
75%
7,523 Kbps
Time Out
Time Out
Time Out
(Jordan, 2011, p. 8)


          TWC provided network tests between IDTC and ORP to confirm previous results.  The first tests indicated possible network problems.  Packets sent back and forth experienced poor throughput and latency.  TWC repeated the network tests the following day and reached the opposite conclusion.  It was their belief that the first tests were therefore inaccurate.  Actual test results from TWC were inconclusive because of the relay to the second Telco in Indiana.  Because duplex mismatch was still suspected, the routers were replaced at both ORP and IDTC.  Both sites were then able to send and receive a full 10 Mbps of data without issue.  After the network throughput problem was resolved, staff at IDTC continued to experience the original file server problem (Jordan, 2011, p. 11).

Protocols:      

User accounts of the problem were subjective.  Further tests were required to identify the exact problem.  The first tests were conducted on the ORP local area network (LAN).  An ORP workstation was used to connect to the ORP file server.  The results were positive; more than 200 files populated in less than one second.

The second test connected an IDTC thin client with the Citrix terminal server located at ORP.  At ORP, terminal services host simultaneous client sessions and provide individual Windows desktops.  Applications operate entirely from the server.  Only keystrokes, mouse movements, and display data is exchanged between the thin clients and terminal server (Microsoft, 2003).  When connected via terminal server, the network resources are considered local to the ORP LAN.  Test results confirmed that network problems were absent from terminal server sessions.  File server directories populated information in less than one second.


The third test connected an IDTC workstation to the ORP file server.  Windows Explorer was used to browse to remote directories at ORP.  It took more than five minutes for all of the files to populate.  The same test was applied a second time while a network sniffer examined the traffic.  (A network sniffer is a software utility that is used to troubleshoot network problems.)  The network sniffer logged the data conversation between the two endpoints.  The log results documented SMB as the primary network protocol.  SMB is used for file sharing and network printing.  Most documents and spreadsheets that reside on file servers depend on SMB for delivery to client endpoints (MSDN, 2012).  

 Logs found that the SMB protocol sent duplicate data across the WAN when a workstation at IDTC connected to the file server at ORP.  In some instances, the same file information was delivered as many as 14 consecutive times.  The process stopped before data fully enumerated and the process repeated itself.  The SMB transmission process was stuck in a repetitive loop that resulted in poor network performance and slow enumeration times.

Further research confirmed the inherent limitations of SMB and file server performance over a wide area connection.  Vinodh Dorairajan is credited with coining the term WAFS (Wide Area File Systems).  According to Dorairajan, “file sharing protocols tend to be rather chatty” (Dorairajan, 2004).  CIFS (SMB) protocols were designed to work well in a LAN environment but will not work well over the WAN (Jordan, 2011, p. 9).
In most situations, additional bandwidth will not resolve problems inherent to SMB over geographic distance.  Copying a single large file may transfer well over the WAN but folder enumeration may be considerably slower (Microsoft, 2008).  In this case, a single 1 GB file was used to test this theory.  The single file successfully transferred in less than seven minutes.  The IDTC network problem mostly occurred while file browsing directories over the WAN.  IDTC throughput had increased by 40% and end users continued to experience the same problems.

Chapter 3:  Available Technology


Research identified four potential technology solutions to address the SMB problem.

WAFS (Wide Area File Servers):

WAFS are specialized servers that are designed to overcome traditional network limitations when data is sent over a WAN.  WAFS increase network efficiency with a combination of data compression and IP spoofing.
IP spoofing is a process normally used by network hackers as a method to gain unauthorized network access.  Each network packet contains a source IP address and a destination IP address.  Routers normally use the destination address to forward data and ignore the source address.  Hackers manipulate the IP packet and fool the remote computer into believing the data was sent from a trusted source (Velasco, 2000).   
WAFS use similar techniques to increase the amount of data sent and received over the WAN.  WAFS use IP spoofing to change the MTU (maximum transmission unit).  MTU dictate the maximum amount of data that can be transferred per packet.  If the packet is larger than 1,500 bytes it is normally fragmented into smaller packets for transmission (Seifert, 2000).  The additional fragmented packets create network delay.  WAFS overcome the MTU limits by manipulating IP packet headers.  More data is delivered than what traditional MTU standards allow (Citrix, 2012).

WAFS optimization was considered a solution to the SMB problem.  Cisco, Citrix, and Riverbed each offered WAFS appliances but they were all considered too expensive.  The WAFS appliances were cost prohibitive because IDTC had already invested substantial resources for their WAN.  A minimum $50,000 was not available in the current budget.  The network staff decided to review additional technology.

DFS (Distributed File System): 

DFS consolidates file services for the end-user.  Its primary purpose allows multiple file servers to serve data from a single UNC (uniform naming convention).  A UNC is similar to a URL (uniform resource locator).  A URL is a web address and a UNC is a file server address.
When multiple file servers are used (without DFS), the end users must access the respective servers from separate UNCs.  For example:
            \\fileserver1\data
            \\fileserver2\data
When DFS is implemented, the data from multiple servers can be accessed from a single domain file share.  The same paths from the previous example can be consolidated into a single UNC.  For example:
\\uwstout.edu\data\
DFS divides file services between the branch office and the data center.  This solution requires a file server located at both IDTC and ORP.  IDTC related content can be hosted in Indiana and all other content will remain in Wisconsin.  A single UNC will present the separate file servers as a single system.  This method does not resolve the SMB related problems but it helps because it makes IDTC less dependent on the file server at ORP.

DFSR (Distributed File Services Replication):

DFSR provides file server redundancy through replication.  Data located on the first file server can be synchronized with the second file server.  If one file server becomes unavailable the UNC serves files from the second file server.  The process is seamless from the end user’s perspective.
            Replication can also be used to mask SMB-related problems.  This solution requires a file server located at both IDTC and ORP.  DFSR will copy files between the two servers.  Any insertions, removals, and rearrangements of data within the files will be replicated (MSDN, 2012).
            DFSR will not resolve the SMB problem but it provides a functioning alternative.  IDTC staff will not experience delay or enumeration problems with the file server on their LAN.  The drawback to DFSR is the lack of geographic file locking.  A single Windows server prevents multiple users from editing a single file at the same time.  With replicated data on separate servers it becomes possible for users from IDTC and ORP to overwrite each other’s changes (Pyle, 2009).  This dilemma can be limited but not fully eliminated.  The IT staff believed DFSR had potential but they were hesitant to introduce a separate problem.

BranchCache:

            Microsoft BranchCache is a Windows service designed to increase application performance and reduce WAN traffic when accessed from branch offices.  BranchCache stores local copies of remote files.  BrancheCache only retrieves data when clients request it over the WAN.  The cached files are stored on local workstations or servers.  When clients from within the LAN request a cached file, the client downloads it from the cache, instead of the remote server across the WAN (Microsoft, 2009).  BrancheCache was considered more favorable than DFSR because it addressed the geographic file locking limitations (Microsoft, 2008) .
Implementation:                                       

The IT staff decided to implement the BrancheCache technology because of its simple implementation and affordability.  The service was packaged with Windows as an installable feature.  The IT staff passed on WAFS optimization because of the additional expense.  DFS was not chosen because it did not specifically address the limitation of SMB over the WAN.  DFSR could work but it also introduced additional risk. 
BranchCache was only available with Windows 2008 and the file server ran on Windows 2003.  Licensing for Windows 2008 had previously been purchased and upgrades were already planned.  The BranchCache project expedited the server upgrade.  The ORP file server was the first server to be upgraded to the Windows 2008 platform.
The file server upgrade required minimal downtime because the system volume (operating system) was kept separate from the data volume.  The 2003 file server was shut down and the system volume was removed.  A pre-built volume configured with Windows 2008 was then paired with the original data volume.
File service tests were run before BranchCache was installed on the Windows 2008 server.   File browsing was performed between an IDTC Windows 2003 print server and the new ORP file server.  Directory browsing and file enumeration were slow.

A second test was conducted from a separate computer in Indiana.  IT staff accessed a Windows Vista workstation and repeated the enumeration process with the file server in Wisconsin.  The second test had different results.  Directories populated in less than two seconds.  The installation of BranchCache was put on hold to allow further research of the new development.

SMB2:

The Windows 2008 file server improved file services over the WAN.  It was later discovered that Windows 2008 included an improved version of SMB.  SMB2 had significant improvements to allow for fast folder enumerations and file copying over connections with high latency.  In order to use SMB2 both the client and the server must support the protocol (Barreto, 2008).  File services performed poorly from the IDTC Windows 20003 print server because the older SMB protocol was used.  Enumeration tests were quick from the IDTC Windows Vista workstation because it ran SMB2.  The Windows 2008 file server performed best with SMB2. 

Additional workstations were tested to ensure quick enumeration and file transfers over the WAN.  All workstations at IDTC had Windows Vista or Windows 7 operating systems installed.  SMB2 resolved the file service problems caused from high latency between IDTC and ORP.  An unintended consequence resolved the network problem.  BranchCache was no longer needed because file services worked as expected.

Chapter 4:  Future Innovations:

IT staff continued to monitor the network traffic between IDTC and ORP in the weeks that followed.  IDTC staff members with workstation access were also contacted to confirm service was working satisfactory.  After four weeks the specific problem was considered resolved.  After the trouble ticket was closed the IT staff at ORP continued to search for innovations to improve the network file services to the IDTC branch office. 

QoS (Quality of Service): 


 SMB2 allows for quick and efficient data transfer over the WAN.  A comparison between SMB and SMB2 revealed data can transfer up to six times faster over a high-latency network (Barreto, 2008).  More data is delivered within a shorter time frame.  The increased efficiency presents a potential new problem, however.  Most WANs have fixed bandwidth and can only deliver a limited amount of data at any given time.  Too much data transferred at once may saturate the connection and cause congestion.


Network congestion can be relieved with purchase of additional bandwidth.  The larger pipeline allows for greater volume delivery.  There are instances when additional bandwidth cannot be purchased because of physical or cost limitations.  Leased line fees increase proportionately with increased bandwidth and geographic distance.  The IDTC budget does not allow for additional bandwidth.

Network congestion may repeat the original content crawl problems at IDTC.  Congestion results from common network activities include web surfing, video streaming, and file services.  Windows 2008 allows QoS policy to identify and prioritize specific traffic (Davies, 2006).  QoS tags SMB2 traffic and avoids service disruption during periods of high activity.  File enumeration will work well but at a potential cost of other network serveries (e.g. slow web surfing).  The trade-off is acceptable because IDTC places higher value on file services over web surfing.

SharePoint:

Windows file servers are usually accessed from the Windows desktop environment.  Windows Explorer is used to manually browse through multiple directories to store and retrieve files.  This solution does not always scale well across the WAN.  File management is usually administered by the IT Staff.  Management responsibilities include file directory organization and security. Files can quickly become outdated, unorganized, and unsecured because of management constraints.

Microsoft SharePoint is a document management and collaboration server that addresses some limitations of traditional file servers.  Employees can access SharePoint with a web browser and website address.  SharePoint server eliminates WAN-related file enumeration problems because web browsers do not use the SMB protocol.  SharePoint provides additional benefits, including meta-tags and delegation.  Meta-tags are document keywords that allow for robust search capabilities.  The ability to search with key words is more efficient than manually browsing.  Delegation allows different departments to self-manage their content.  Each business unit can assign staff permissions and make directory changes.  

SMB3: 

Microsoft released Windows Server 2012 and SMB3 in May 2012.  SMB3 was enhanced to further improve network performance.  SMB Directory Leasing is a subset function of SMB3 that enables clients to cache directory and meta-data.  The local directory caching reduces round-trip protocol traffic from the file server (Snover, 2012).  SMB3 satisfies all file server requirements for IDTC because file enumerations are faster and bandwidth requirements are reduced.

BranchCacheV2:

            Windows 2012 Server also introduces an enhanced BranchCache.  BranchCache V2 takes advantage of SMB3 improvements.  BranchCache V2 reduces CPU cycles to reduce server resource load.  Reduced WAN traffic and storage requirements are achieved because duplicate data is stored and downloaded only once per branch office.  Only small changes made to a large file are delivered and cached.  The services also divide files into smaller units through hash algorithms for further bandwidth savings.  Both SMB3 and BranchCache deliver and store data with encryption. 

Chapter 5:  Conclusion

A combination of latency and outdated SMB protocol caused network disruption for the staff at IDTC.  Research identified the problem with network tests for latency, bandwidth, and protocol performance.  After the problem was defined, existing technology was examined to determine a resolution.  WAFS, DFS, DFSR, and BranchCache were considered potential candidates.  The fileserver was upgraded to Windows 2008 as preparation for BranchCache installation.  Before BranchCache was installed on the server, it was discovered that file services over the WAN experienced improved performance.  Additional research found the improved performance was a result from an enhanced SMB2 protocol.  Although the network problem between IDTC and ORP was resolved, continued research identified technology that could prevent potential problems and provide additional benefits.   

References:




Barreto, J. (2008, November 11). File Server performance improvements with the SMB2 protocol in Windows Server 2008. Retrieved October 2012, from TechNet: http://blogs.technet.com/b/josebda/archive/2008/11/11/file-server-performance-improvements-with-the-smb2-protocol-in-windows-server-2008.aspx

Citrix. (2012, October). How Branch Repeater Works. Retrieved October 2011, from Citrix: http://www.citrix.com/English/ps2/products/feature.asp?contentID=1686852

Davies, J. (2006, March). Policy-based QoS Architecture in Windows Server 2008 and Windows Vista. Retrieved from Microsoft TechNet: http://technet.microsoft.com/library/bb878009

Dorairajan, V. (2004, May 25). Enabling File Sharing Over the WAN. Retrieved October 2012, from Electrical Engineer Times: http://www.eetimes.com/electronics-news/4144653/Enabling-File-Sharing-over-the-WAN

Jordan, S. (2011). ICT & SMB2. (Unpublished research from ICT-701). Meonomonee, WI: UW-Stout.

Microsoft. (2003, MArch 28). Remote Access Technologies. Retrieved October 2012, from Microsoft Technet: http://technet.microsoft.com/en-us/library/cc755399(v=ws.10).aspx

Microsoft. (2008, February). Branch Office Infrastructure Solution Architecture Guide. Retrieved October 2012, from Microsoft Branch Office Tech Center: http://download.microsoft.com/download/4/2/e/42e8ee6e-5365-4e79-b3bf-b10fdac3170e/BOIS%20Architecture%20Guide.docx

Microsoft. (2008, August). Optimizing Applications for Remote File Access Over WAN. Retrieved October 2012, from PDC Microsoft Professional Developers Conference: http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&ved=0CCgQFjAB&url=http%3A%2F%2Fdownload.microsoft.com%2Fdownload%2Ff%2F2%2F1%2Ff2146213-4ac0-4c50-b69a-12428ff0b077%2FOptimizing_Applications_for_Remote_File_Access_Over_WAN.pptx&ei=a8

Microsoft. (2009, January). BrancheCache Executive Overview. Retrieved October 2012, from Microsoft: http://www.microsoft.com/en-us/download/confirmation.aspx?id=4606

Microsoft. (2009, April 23). Plan for Bandwidth Requirements. Retrieved October 2012, from Microsoft: http://technet.microsoft.com/en-us/library/cc262952(office.12).aspx#section3

MSDN. (2012, October 16). DFSR Overview. Retrieved October 2012, from Microsoft Developer Network : http://msdn.microsoft.com/en-us/library/windows/desktop/bb540025(v=vs.85).aspx

MSDN. (2012, September 7). Microsoft SMB Protocol and CIFS Protocol Overview. Retrieved October 2012, from Microsoft Developer Network: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365233(v=vs.85).aspx

Odem, W., & Knott, T. (2006). Networking Basics. Indianapolis: Cisco Press.

Pyle, N. (2009, February 20). Understanding (the Lack of) Distributed File Locking in DFSR. Retrieved October 2012, from Ask the Directory Services Team: http://blogs.technet.com/b/askds/archive/2009/02/20/understanding-the-lack-of-distributed-file-locking-in-dfsr.aspx

Seifert, R. (2000). The Switch Book. New York: John Wiley & Sons, Inc.

Snover, J. (2012, April 19). SMB 2.2 is now SMB 3.0. Retrieved from Microsoft Windows Server Blog: http://blogs.technet.com/b/windowsserver/archive/2012/04/19/smb-2-2-is-now-smb-3-0.aspx

Velasco, V. (2000, November 21). Introduction to IP Spoofing. Retrieved October 2012, from SANS Institute: http://www.sans.org/reading_room/whitepapers/threats/introduction-ip-spoofing_959





0 Comments:

Post a Comment

My Instagram