Fix DPM 2010 Slow and Unresponsiveness Issues

*Update:  Pruneshadowcopy.ps1 is supposed to delete old snapshot volumes from the server -it stops working after WMF 3.0 is installed.  Thousands of expired snapshot volumes are never removed from the server.  These stale volumes cause a slew of other problems:  Windows updates fail, ridiculous startup times, excessive registry size, slow logons, slow recoveries, thousands of orphaned devices, etc....

Use these steps to resolve:
  1. Check Pruneshadowcopy.ps1 for exceptions:
    $VerbosePreference="Continue" pruneshadowcopies.ps1
  2. Confirm WMF 3.0 status:
    $PSVersionTable.PSVersion
  3. Uninstall the WMF 3.0 update:
    wusa /uninstall /quiet /kb:2506143 wusa /uninstall /quiet /kb:2506146
  4. The uninstall may fail due to problems related to excessive VSS volumes.  Increase the trusted installer block time increment if necessary.
DPM Problem:  Overtime, the DPM server retains tens-of-thousands of "expired" recovery point VSS volumes.  Excessive volumes cause problems:
  • DPM 2010 is slow and unresponsive.
  • DPM has slow start and login times.  The server locks or freezes for up-to an hour.  
  • Windows updates fail.
  • SQL queries are excessively slow.  For example, it takes DPM an hour to display all restore points for a protected database.
  • DPM uses all available server memory.

Background:  The DPM server has limited memory (e.g., 16GB).  The server synchronizes a large number of SQL databases every 15 minutes.  The server has become unreliable.

DPM supports a limited number of protected resources.  Over-provisioning DPM servers become unstable.  Recovery point volumes (i.e., incremental backups) are not automatically removed.

DPM runs a daily Powershell script that removes expired recovery point volumes (incremental VSS backups).  This process fails when the server is over-provisioned.  The script times-out if it does not have sufficient available memory.  This situation results in thousands of recovery points that causes additional problems.


Solution:  

  1. Discover and remove expired shadow\ recovery point volumes.

       -Run pruneVSS.ps1 from the DPM Management Shell.
       -Run this script for every protected data source.

    N.B., This process creates thousands of orphaned volumes.
  2. Remove orphaned devices with DevNodeClean tool.
    Options:

    (a)  Download cleanup.exe from my OneDrive.

    (b) Or, download and modify RMHidDev.bat.

    N.B., these tools remove orphaned devices but the registry file size remains large.
    Figure 1.  Cleanup.exe to view and remove orphaned phantom volumes.
    Cleanup.exe commands:

    Assess /view phantom volumes:
    cleanup.exe ?
    Remove all phantom volumes:
    Cleanup.exe -r
  3. Delete and re-create protection groups to recover hundreds or thousands of available gigabytes (optional).

    -Ensure continuity before deleting recovery volumes.
         *Backup existing recovery points using a second DPM server.
         *Backup to tape.

    -DPM can automatically expand volumes.
    -DPM does not automatically shrink volumes.

    -Attempt to delete shadow volumes with the "Stop protection of member" wizard.

    -Run the"Stop protection of member" wizard a second time if the wizard crashes on the first attempt.
         *Do not remove the replica volume on the second attempt
  4. Remove recovery volumes from Inactive protection group:

    -Delete replica volumes from the "Inactive protection for previously protected data group".  
         *Run Removeinactivedatasource.ps1 from the DPM console.     *Choose the "ondisk" source.
  5. Shrink Volumes (optional).  In the first step we deleted thousands of phantom volumes.  This may amounts to hundreds of Gigabytes or Terabytes.  Let's reclaim that storage.

    Let's consider how DPM storage management is flawed. It creates and expands volumes well enough.  Microsoft did not, however, include an automated process for recovering unused space -this is a manual process.

    The preferable method for shrinking Windows 2008 volumes is with diskpart.  N.B., Disk manager GUI works as well.

    C:\Users\Administrator>diskpart
    DISKPART> list volume
    Volume ###  Ltr  Label        Fs     Type        Size     Status     Info
      ----------  ---  -----------  -----  ----------  -------  ---------  --------
    Volume 0     C                NTFS   Partition     64 GB  Healthy    System
    * Volume 1     G   Extra        NTFS   Partition    100 MB  Healthy
    DISKPART> 
    select volume 1 Volume 1 is the selected volume.
    DISKPART> 
    shrink querymax
    The maximum number of reclaimable bytes is: 30GB
    DiskPart successfully shrunk the volume by:   25 MB
    DISKPART> list volume
    Volume ###  Ltr  Label        Fs     Type        Size     Status     Info
      ----------  ---  -----------  -----  ----------  -------  ---------  --------
      Volume 0     C                NTFS   Partition     64 GB  Healthy    System
    * Volume 1     G   Extra        NTFS   Partition     75 MB  Healthy
    DISKPART> shrink desired=25 minimum=10
    


  6. Shrink the registry with Chkreg

    The system registry stores volume information -including those pesky phantom drives.  All the data in the system registry is stored inside a single file:

    C:\Windows\System32\config\registry.

    Now consider how the registry file grows larger with every additional phantom drive.  Large registry files are bad for the server.  It causes slow boot-times (e.g., hours) and may cause updates to fail.

    Worse yet, the registry file generally grows but does not shrink.  It's similar to how DPM handles storage volumes.  Therefore, the registry remains indefinitely bloated.

    Thankfully, Microsoft provides a chkreg tool to manually shrink the registry file.  This procedure requires a special version of Chkreg -only available by contacting Microsoft support.  The tool is also available from my OneDrive.   Also, this tool must be run while the server is offline.  Run it from a separate Windows boot disk (e.g., Win2Go).

    Instructions:

    a.  Access DPM system drive while machine is off (e.g., boot with Win2Go).
    b.  Copy registry file to temp directory (i.e., backup):
    c:\windows\System32\config\system c.  Shrink registry file:
    #Chkreg /F SYSTEM /R #Chkreg /F SYSTEM /C 
    d.  Boot up DPM!

    That's It.  The process is a bit tedious but DPM runs like a champ afterwords.

Preventative Measures:

First and foremost, uninstall WMF 3.0 from Windows 2008 R2!  This fixes the problem from reoccurring.  It does not fix existing problems related to excessive shadow volumes.

Alternately, consider installing DPM 2010 on a fresh Windows 2012 R2 installation.  2012 R2 has significant storage and network improvements that prevent these issues from occurring in the first place.

Also, pay close attention to DPM server limitations.  Use as much RAM as possible.  I found the server works better after upgrading it to 64GB.      

 DPM 2010 Server Limit Totals:
  1. Maximum 250 storage groups
  2. Maximum 10 Tb for 32-bit DPM servers
  3. Maximum 45 TB for 64-bit DPM servers
  4. Maximum 256 data sources per DPM server (64-bit) where each data source needs two volumes
  5. Maximum 128 data sources per DPM server (32-bit)
  6. Maximum 8000 VSS shadow copies
  7. VSS Addressing limits: Add a DPM server for each 5 TB (32-bit) or 22 TB (64-bit)
  8. Maximum 75 protected servers and 150 protected workstations per server
  9. Data sources in another domain / forest that is untrusted… Add a new DPM server

 DPM 2010 Retention Limits per Protected Resource:
  1. Maximum 64 file recovery points per protected resource.
  2. Maximum 448 application (i.e., SQL) recovery points per protected application.
  3. Maximum 448 days allowed of disk based retention storage.
  4. Minimum 15 minute increments between 
Consider these limitations when planning the backup schedule.  The recovery point frequency is proportionate to the retention range (Table 1) (Table 2):

Table 1.
DPM Protected Application Frequency and Retention Relationship
Frequency
Max No. VSS
VSS per Day
Max Retention
15 minutes
448
96
4.6 Days
30 minutes
448
48
9.3 Days
45 minutes
448
32
14 Days
60 minutes
448
24
18.6 Days
12 hours
448
2
224 Days
24 hours
448
1
448 Days
Note:  DPM can store up-to 448 application recovery points, for a maximum of 448 days.
Table 2.
DPM Protected File Resource Frequency and Retention Relationship
Frequency
Max No. VSS
VSS per Day
Max Retention
15 minutes
64
96
0.6 Days
30 minutes
64
48
1.3 Days
45 minutes
64
32
2 Days
60 minutes
64
24
2.6 Days
12 hours
64
2
32 Days
24 hours
64
1
64 Days
1 Week
64
1 (per week)
64 Weeks
Note:  DPM can store up-to 64 recovery points per file resource, for a maximum of 448 days.




0 Comments:

Post a Comment

My Instagram