WAN File Server Problems - SMB Limitations Over the VPN, Internet, WAN...
Abstract:
This research examines the limitations of SMB file transfers over the WAN. End users complain of slow file browsing, slow file enumeration, and an inability to save Word files from the branch office. Recommendations are made to resolve the issues.WAN File Services:The Influences of latency and protocols over the WAN
By Steven M. Jordan
University Wisconsin StoutLast updated November 20th, 2013
Chapter 1: Introduction
This
research is based on a network problem between a corporate office and a branch
office. The corporate office is based in
Oconomowoc, WI, and is referred to as ORP.
The branch office is based in Carmel, IN, and is referred to as IDTC. The ORP corporate office has an in-house
datacenter that provides network connectivity to over 100 branch offices
throughout the Midwest.
Scope:
The
network is modeled on a hub-and-spoke design.
The ORP datacenter is the central hub and facilitates all network
services to the separate branch offices.
Provided network services include terminal, email, and file sharing via
Window 2003 servers. The primary methods
used to connect branch offices include Internet VPNs (virtual private networks),
T1 circuits, and DOCSIS (data over cable).
Approximately 10% of IDTC staff access the ORP file server from Windows-based
workstations. The remaining 90% of IDTC
network users connect to the ORP file server from thin clients (simple
computers) via Citrix Terminal Server.
Problem Statement:
Network
users experienced difficulty connecting to the remote ORP file server. The network problem caused work loss and
disruption for staff located at the IDTC branch office. Reports of the problem were sporadic and the
exact cause remained unidentified. The
problem was not experienced by all network users at IDTC. The end users that had experienced the
problem complained of slow network file browsing and slow file
enumeration.
The
problem was most symptomatic when large file directories on the ORP file server
were accessed. The directories
associated with disruption usually contained hundreds of files and folders. Workstations became unresponsive, mouse icons
displayed an hourglass symbol, and several minutes passed before user functionality
retuned. There were also reports of
mapped network drives that had disappeared or error messages that read, “Network
location is unavailable.”
Chapter 2: Problem Determination:
Network tests
were required to diagnose potential problems.
The research targeted three potential problems:
1. Network latency
2. Bandwidth
3. Network protocols
Latency:
Latency is the delay of data flow. A ping test measures the amount of time data requires to travel across the network. Lower ping replies indicate faster network connections. Conversely, high ping times indicate slow network connections. The ping test between IDTC and ORP measured an average time of 25 ms. A rate of 25 ms is usually considered sufficient to support network service for a remote office.
Test results between IDTC and ORP were then compared to latency
data from a separate branch office. The
second branch office is referred to as ODTC, and is located in Summit, WI. The wide area network (WAN) technology is similar
at each location. The major difference between
ODTC and IDTC is their geographic distance from ORP. IDTC is located in Indiana while ODTC is located
within 10 miles of ORP (Jordan, 2011, p. 5) .
The second latency test revealed a discrepancy. Latency to ODTC averaged 1 ms; Latency to IDTC averaged 25 ms. IDTC experienced the highest latency to the datacenter among all branch offices connected with dedicated circuits to ORP
Time Warner Cable (TWC) provides network connections to each office. TWC’s network engineer says that the higher latency is most likely caused from the large geographic distance between ORP and IDTC. TWC also notes that they partner with a separate telecommunications company (Telco) to provide service across state lines. There was no method to determine the number of switches the data passed through to complete the connection; each hop (switch) slightly increased the latency
Microsoft confirms that latency has a
negative impact on network performance.
The following table reports the estimated time to enumerate file share
content based on available bandwidth, latency, and the volume of content (Microsoft, 2009) :
Table 1
File Crawl Rates |
|||||
Bandwidth
|
1
GB
|
5
GB
|
25
GB
|
100
GB
|
500
GB
|
10
Mbps
Latency
= none
Crawl rate = 467 MB/min
|
12 sec
|
1 min
|
5 min
|
20 min
|
1 hr 30 min
|
10
Mbps
Latency
= 100 ms
Crawl rate = 330 MB/min
|
2 min
|
9 min
|
45 min
|
3 hr
|
15 hr
|
Average file crawl enumeration of the ORP file server was tested from both IDTC and ODTC:
Table 2
Branch Office Latency |
||
Location
|
Latency
|
Enumeration:
|
IDTC
|
23 ms
|
130 sec
|
ODTC
|
1 ms
|
|
Microsoft’s
published file crawl rates apply to measurements collected between ORP and
IDTC. The research demonstrates negative
impact when 75 MB of content volume is processed:
Table 3
Critical Mass Data |
|
Data Size Reference
|
1
GB = 1,000 MB
|
File Crawl Formula
|
|
File Crawl Formula
Applied to IDTC
|
|
Expected File Crawl at
IDTC *
|
23
ms = 75 MB
|
Note.
*Calculations
are based on 10Mb of available bandwidth.
|
Bandwidth:
Latency measures the rate at
which data is delivered. Bandwidth
indicates the amount of data that can be delivered. Throughput is the specific amount of data
delivered. Throughput is impacted by variables,
including the slowest-speed link and external interference (Odem & Knott, 2006) . Data throughput and
available bandwidth were tested between IDTC and ORP.
Network
traffic was generated to compare the throughput rates between IDTC to ORP and ODTC
to ORP. The traffic was generated from
the ORP datacenter and transmitted to the print servers at each branch
office. The inbound and outbound bandwidth
results were similar up to 6 Mbps. There
was a noticeable performance difference between the branch offices when more
than 6 Mbps of data traffic were delivered (Jordan, 2011,
p. 8) .
Traffic
flow to ODTC worked as expected. Up to 10
Mbps transfers were completed between ODTC and ORP. The IDTC transfer rates significantly decreased
when more than 6 Mbps of data delivery were attempted. When excess of 6 Mbps
of data were sent, the average latency doubled, and the outbound rate decreased
from 6 Mbps to under 2Kbps (slower than an analog modem).
Table 4
Data
sent from ORP to IDTC
|
||||
%
of 10 Mb
|
Data Sent
|
Receive
|
Transmit
|
Latency
|
30%
|
3043 Kbps
|
2.85 Mb
|
2.85 Mb
|
25
|
40%
|
4004 Kbps
|
5.4 Mb
|
5.4 Mb
|
31
|
60%
|
5,916
Kbps
|
2,028 bps
|
1,313 bps
|
80 ms
|
75%
|
7,523 Kbps
|
Time Out
|
Time Out
|
Time Out
|
TWC provided network tests between IDTC and
ORP to confirm previous results. The
first tests indicated possible network problems. Packets sent back and forth experienced poor
throughput and latency. TWC repeated the
network tests the following day and reached the opposite conclusion. It was their belief that the first tests were
therefore inaccurate. Actual test
results from TWC were inconclusive because of the relay to the second Telco in
Indiana. Because duplex mismatch was
still suspected, the routers were replaced at both ORP and IDTC. Both sites were then able to send and receive
a full 10 Mbps of data without issue.
After the network throughput problem was resolved, staff at IDTC
continued to experience the original file server problem (Jordan, 2011, p. 11) .
Protocols:
User
accounts of the problem were subjective.
Further tests were required to identify the exact problem. The first tests were conducted on the ORP
local area network (LAN). An ORP
workstation was used to connect to the ORP file server. The results were positive; more than 200
files populated in less than one second.
The
second test connected an IDTC thin client with the Citrix terminal server
located at ORP. At ORP, terminal
services host simultaneous client sessions and provide individual Windows
desktops. Applications operate entirely from
the server. Only keystrokes, mouse
movements, and display data is exchanged between the thin clients and terminal
server (Microsoft, 2003) .
When connected via terminal server, the network resources are considered
local to the ORP LAN. Test results
confirmed that network problems were absent from terminal server sessions. File server directories populated information
in less than one second.
The
third test connected an IDTC workstation to the ORP file server. Windows Explorer was used to browse to remote
directories at ORP. It took more than
five minutes for all of the files to populate.
The same test was applied a second time while a network sniffer examined
the traffic. (A network sniffer is a
software utility that is used to troubleshoot network problems.) The network sniffer logged the data
conversation between the two endpoints. The
log results documented SMB as the primary network protocol. SMB
is used for file sharing and network printing.
Most documents and
spreadsheets that reside on file servers depend on SMB for delivery to client
endpoints (MSDN, 2012) .
Logs found that the SMB protocol sent
duplicate data across the WAN when a workstation at IDTC connected to the file
server at ORP. In some instances, the
same file information was delivered as many as 14 consecutive times. The process stopped before data fully
enumerated and the process repeated itself.
The SMB transmission process was stuck in a repetitive loop that resulted
in poor network performance and slow enumeration times.
Further
research confirmed the inherent limitations of SMB and file server performance
over a wide area connection. Vinodh Dorairajan
is credited with coining the term WAFS (Wide Area File Systems). According to Dorairajan,
“file sharing protocols tend to be rather chatty” (Dorairajan, 2004) .
CIFS (SMB) protocols were designed to work well in a LAN environment but
will not work well over the WAN (Jordan, 2011, p. 9) .
In
most situations, additional bandwidth will not resolve problems inherent to SMB
over geographic distance. Copying a
single large file may transfer well over the WAN but folder enumeration may be
considerably slower (Microsoft, 2008) .
In this case, a single 1 GB file was used to test this theory. The single file successfully transferred in
less than seven minutes. The IDTC
network problem mostly occurred while file browsing directories over the WAN. IDTC throughput had increased by 40% and end
users continued to experience the same problems.
Chapter 3: Available Technology
Research
identified four potential technology solutions to address the SMB problem.
WAFS (Wide Area File Servers):
WAFS are specialized servers that are
designed to overcome traditional network limitations when data is sent over a WAN. WAFS increase network efficiency with a
combination of data compression and IP spoofing.
IP spoofing is a process normally used by network
hackers as a method to gain unauthorized network access. Each network packet contains a source IP
address and a destination IP address.
Routers normally use the destination address to forward data and ignore
the source address. Hackers manipulate the
IP packet and fool the remote computer into believing the data was sent from a
trusted source (Velasco, 2000) .
WAFS use similar techniques to increase the
amount of data sent and received over the WAN.
WAFS use IP spoofing to change the MTU (maximum transmission unit). MTU dictate the maximum amount of data that
can be transferred per packet. If the
packet is larger than 1,500 bytes it is normally fragmented into smaller
packets for transmission (Seifert, 2000) . The additional fragmented packets create network
delay. WAFS overcome the MTU limits by
manipulating IP packet headers. More
data is delivered than what traditional MTU standards allow (Citrix, 2012) .
WAFS optimization was considered a solution
to the SMB problem. Cisco, Citrix, and
Riverbed each offered WAFS appliances but they were all considered too
expensive. The WAFS appliances were cost
prohibitive because IDTC had already invested substantial resources for their
WAN. A minimum $50,000 was not available
in the current budget. The network staff
decided to review additional technology.
DFS (Distributed File System):
DFS
consolidates file services for the end-user.
Its primary purpose allows multiple file servers to serve data from a
single UNC (uniform naming convention).
A UNC is similar to a URL (uniform resource locator). A URL is a web address and a UNC is a file
server address.
When
multiple file servers are used (without DFS), the end users must access the
respective servers from separate UNCs.
For example:
\\fileserver1\data
\\fileserver2\data
When
DFS is implemented, the data from multiple servers can be accessed from a
single domain file share. The same paths
from the previous example can be consolidated into a single UNC. For example:
\\uwstout.edu\data\
DFS
divides file services between the branch office and the data center. This solution requires a file server located
at both IDTC and ORP. IDTC related
content can be hosted in Indiana and all other content will remain in
Wisconsin. A single UNC will present the
separate file servers as a single system.
This method does not resolve the SMB related problems but it helps
because it makes IDTC less dependent on the file server at ORP.
DFSR (Distributed File Services Replication):
DFSR
provides file server redundancy through replication. Data located on the first file server can be synchronized
with the second file server. If one file
server becomes unavailable the UNC serves files from the second file
server. The process is seamless from the
end user’s perspective.
Replication
can also be used to mask SMB-related problems.
This solution requires a file server located at both IDTC and ORP. DFSR will copy files between the two servers. Any insertions, removals, and rearrangements
of data within the files will be replicated (MSDN, 2012) .
DFSR will not resolve the SMB
problem but it provides a functioning alternative. IDTC staff will not experience delay or
enumeration problems with the file server on their LAN. The drawback to DFSR is the lack of
geographic file locking. A single
Windows server prevents multiple users from editing a single file at the same
time. With replicated data on separate
servers it becomes possible for users from IDTC and ORP to overwrite each
other’s changes (Pyle, 2009) . This dilemma can be limited but not fully
eliminated. The IT staff believed DFSR
had potential but they were hesitant to introduce a separate problem.
BranchCache:
Microsoft BranchCache is a Windows
service designed to increase application performance and reduce WAN traffic
when accessed from branch offices.
BranchCache stores local copies of remote files. BrancheCache only retrieves data when clients
request it over the WAN. The cached
files are stored on local workstations or servers. When clients from within the LAN request a
cached file, the client downloads it from the cache, instead of the remote
server across the WAN (Microsoft,
2009) . BrancheCache was considered more favorable than
DFSR because it addressed the geographic file locking limitations (Microsoft, 2008) .
Implementation:
The
IT staff decided to implement the BrancheCache technology because of its simple
implementation and affordability. The
service was packaged with Windows as an installable feature. The IT staff passed on WAFS optimization because
of the additional expense. DFS was not
chosen because it did not specifically address the limitation of SMB over the
WAN. DFSR could work but it also
introduced additional risk.
BranchCache
was only available with Windows 2008 and the file server ran on Windows
2003. Licensing for Windows 2008 had previously
been purchased and upgrades were already planned. The BranchCache project expedited the server
upgrade. The ORP file server was the
first server to be upgraded to the Windows 2008 platform.
The
file server upgrade required minimal downtime because the system volume (operating
system) was kept separate from the data volume.
The 2003 file server was shut down and the system volume was
removed. A pre-built volume configured
with Windows 2008 was then paired with the original data volume.
File service tests were run before BranchCache
was installed on the Windows 2008 server.
File browsing was performed
between an IDTC Windows 2003 print server and the new ORP file server. Directory browsing and file enumeration were
slow.
A second test was conducted from a separate
computer in Indiana. IT staff accessed a
Windows Vista workstation and repeated the enumeration process with the file server
in Wisconsin. The second test had
different results. Directories populated
in less than two seconds. The
installation of BranchCache was put on hold to allow further research of the
new development.
SMB2:
The Windows 2008 file server improved file
services over the WAN. It was later
discovered that Windows 2008 included an improved version of SMB. SMB2 had significant improvements to allow
for fast folder enumerations and file copying over connections with high
latency. In order to use SMB2 both the
client and the server must support the protocol (Barreto, 2008) .
File services performed poorly from the IDTC Windows 20003 print server
because the older SMB protocol was used.
Enumeration tests were quick from the IDTC Windows Vista workstation because
it ran SMB2. The Windows 2008 file
server performed best with SMB2.
Additional workstations were tested to ensure
quick enumeration and file transfers over the WAN. All workstations at IDTC had Windows Vista or
Windows 7 operating systems installed. SMB2
resolved the file service problems caused from high latency between IDTC and
ORP. An unintended consequence resolved
the network problem. BranchCache was no
longer needed because file services worked as expected.
Chapter 4: Future Innovations:
IT staff continued to monitor the network
traffic between IDTC and ORP in the weeks that followed. IDTC staff members with workstation access
were also contacted to confirm service was working satisfactory. After four weeks the specific problem was
considered resolved. After the trouble
ticket was closed the IT staff at ORP continued to search for innovations to
improve the network file services to the IDTC branch office.
QoS (Quality of Service):
SMB2
allows for quick and efficient data transfer over the WAN. A comparison between SMB and SMB2 revealed
data can transfer up to six times faster over a high-latency network (Barreto, 2008) . More data is delivered within a shorter time
frame. The increased efficiency presents
a potential new problem, however. Most
WANs have fixed bandwidth and can only deliver a limited amount of data at any
given time. Too much data transferred at
once may saturate the connection and cause congestion.
Network congestion can be relieved with
purchase of additional bandwidth. The
larger pipeline allows for greater volume delivery. There are instances when additional bandwidth
cannot be purchased because of physical or cost limitations. Leased line fees increase proportionately
with increased bandwidth and geographic distance. The IDTC budget does not allow for additional
bandwidth.
Network congestion may repeat the original
content crawl problems at IDTC.
Congestion results from common network activities include web surfing,
video streaming, and file services. Windows
2008 allows QoS policy to identify and prioritize specific traffic (Davies, 2006) . QoS tags SMB2 traffic and avoids service
disruption during periods of high activity.
File enumeration will work well but at a potential cost of other network
serveries (e.g. slow web surfing). The
trade-off is acceptable because IDTC places higher value on file services over
web surfing.
SharePoint:
Windows file servers are usually accessed
from the Windows desktop environment.
Windows Explorer is used to manually browse through multiple directories
to store and retrieve files. This solution
does not always scale well across the WAN.
File management is usually administered by the IT Staff. Management responsibilities include file
directory organization and security. Files can quickly become outdated,
unorganized, and unsecured because of management constraints.
Microsoft SharePoint is a document management
and collaboration server that addresses some limitations of traditional file
servers. Employees can access SharePoint
with a web browser and website address.
SharePoint server eliminates WAN-related file enumeration problems
because web browsers do not use the SMB protocol. SharePoint provides additional benefits,
including meta-tags and delegation.
Meta-tags are document keywords that allow for robust search
capabilities. The ability to search with
key words is more efficient than manually browsing. Delegation allows different departments to self-manage
their content. Each business unit can
assign staff permissions and make directory changes.
SMB3:
Microsoft released Windows Server 2012 and
SMB3 in May 2012. SMB3 was enhanced to
further improve network performance. SMB
Directory Leasing is a subset function of SMB3 that enables clients to cache
directory and meta-data. The local
directory caching reduces round-trip protocol traffic from the file server (Snover, 2012) . SMB3 satisfies all file server requirements
for IDTC because file enumerations are faster and bandwidth requirements are reduced.
BranchCacheV2:
Windows 2012 Server also introduces
an enhanced BranchCache. BranchCache V2
takes advantage of SMB3 improvements.
BranchCache V2 reduces CPU cycles to reduce server resource load. Reduced WAN traffic and storage requirements
are achieved because duplicate data is stored and downloaded only once per
branch office. Only small changes made
to a large file are delivered and cached.
The services also divide files into smaller units through hash
algorithms for further bandwidth savings.
Both SMB3 and BranchCache deliver and store data with encryption.
Chapter 5: Conclusion
A combination of latency and outdated SMB
protocol caused network disruption for the staff at IDTC. Research identified the problem with network
tests for latency, bandwidth, and protocol performance. After the problem was defined, existing
technology was examined to determine a resolution. WAFS, DFS, DFSR, and BranchCache were
considered potential candidates. The
fileserver was upgraded to Windows 2008 as preparation for BranchCache
installation. Before BranchCache was
installed on the server, it was discovered that file services over the WAN
experienced improved performance.
Additional research found the improved performance was a result from an
enhanced SMB2 protocol. Although the
network problem between IDTC and ORP was resolved, continued research identified
technology that could prevent potential problems and provide additional
benefits.
References:
Barreto, J. (2008, November 11). File Server
performance improvements with the SMB2 protocol in Windows Server 2008.
Retrieved October 2012, from TechNet:
http://blogs.technet.com/b/josebda/archive/2008/11/11/file-server-performance-improvements-with-the-smb2-protocol-in-windows-server-2008.aspx
Citrix. (2012, October). How Branch Repeater
Works. Retrieved October 2011, from Citrix: http://www.citrix.com/English/ps2/products/feature.asp?contentID=1686852
Davies, J. (2006, March). Policy-based QoS
Architecture in Windows Server 2008 and Windows Vista. Retrieved from
Microsoft TechNet: http://technet.microsoft.com/library/bb878009
Dorairajan, V. (2004, May 25). Enabling File
Sharing Over the WAN. Retrieved October 2012, from Electrical Engineer
Times:
http://www.eetimes.com/electronics-news/4144653/Enabling-File-Sharing-over-the-WAN
Jordan, S. (2011). ICT & SMB2. (Unpublished research from ICT-701).
Meonomonee, WI: UW-Stout.
Microsoft. (2003, MArch 28). Remote Access
Technologies. Retrieved October 2012, from Microsoft Technet:
http://technet.microsoft.com/en-us/library/cc755399(v=ws.10).aspx
Microsoft. (2008, February). Branch Office
Infrastructure Solution Architecture Guide. Retrieved October 2012, from
Microsoft Branch Office Tech Center:
http://download.microsoft.com/download/4/2/e/42e8ee6e-5365-4e79-b3bf-b10fdac3170e/BOIS%20Architecture%20Guide.docx
Microsoft. (2008, August). Optimizing Applications
for Remote File Access Over WAN. Retrieved October 2012, from PDC
Microsoft Professional Developers Conference:
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&ved=0CCgQFjAB&url=http%3A%2F%2Fdownload.microsoft.com%2Fdownload%2Ff%2F2%2F1%2Ff2146213-4ac0-4c50-b69a-12428ff0b077%2FOptimizing_Applications_for_Remote_File_Access_Over_WAN.pptx&ei=a8
Microsoft. (2009, January). BrancheCache
Executive Overview. Retrieved October 2012, from Microsoft: http://www.microsoft.com/en-us/download/confirmation.aspx?id=4606
Microsoft. (2009, April 23). Plan for Bandwidth
Requirements. Retrieved October 2012, from Microsoft:
http://technet.microsoft.com/en-us/library/cc262952(office.12).aspx#section3
MSDN. (2012, October 16). DFSR Overview.
Retrieved October 2012, from Microsoft Developer Network :
http://msdn.microsoft.com/en-us/library/windows/desktop/bb540025(v=vs.85).aspx
MSDN. (2012, September 7). Microsoft SMB Protocol
and CIFS Protocol Overview. Retrieved October 2012, from Microsoft
Developer Network:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365233(v=vs.85).aspx
Odem, W., & Knott, T. (2006). Networking
Basics. Indianapolis: Cisco Press.
Pyle, N. (2009, February 20). Understanding (the
Lack of) Distributed File Locking in DFSR. Retrieved October 2012, from
Ask the Directory Services Team:
http://blogs.technet.com/b/askds/archive/2009/02/20/understanding-the-lack-of-distributed-file-locking-in-dfsr.aspx
Seifert, R. (2000). The Switch Book. New
York: John Wiley & Sons, Inc.
Snover, J. (2012, April 19). SMB 2.2 is now SMB
3.0. Retrieved from Microsoft Windows Server Blog:
http://blogs.technet.com/b/windowsserver/archive/2012/04/19/smb-2-2-is-now-smb-3-0.aspx
Velasco, V. (2000, November 21). Introduction to
IP Spoofing. Retrieved October 2012, from SANS Institute:
http://www.sans.org/reading_room/whitepapers/threats/introduction-ip-spoofing_959