ads

Style6

Style3[OneLeft]

Style3[OneRight]

Style4

Style5[ImagesOnly]

Style2

Transfer SQL TempDB

Task:  

How to transfer the TempDB.

Problem:  

MS SQL ran slow on 6Gb/s SAS hard drive (HHD).  Moved the user database (DB) to PCIe x4 (i.e., 31.52 Gb/s), MLC solid-state (SSD) storage -and it still runs slow!

Solution:  

Does the TempDB run on the old HHD drive or the new MLC SSD?  Consider how the TempDB can generate serious disk I/O.  Transfer the TempDB to the high-speed SSD as well.  N.B., this entire process gave new life to my old DPM2010 server.

  Step 1:  TSQL script to identify TempDB location:


Use master
GO

SELECT 
name AS [LogicalName]
,physical_name AS [Location]
,state_desc AS [Status]
FROM sys.master_files
WHERE database_id = DB_ID(N'tempdb');
GO
   TSQL output:

tempdev C:\Program Files\Microsoft DPM\SQL\MSSQL10.MSDPM2010\MSSQL\DATA\tempdb.mdf
templog C:\Program Files\Microsoft DPM\SQL\MSSQL10.MSDPM2010\MSSQL\DATA\templog.ldf


  Step 2:  TSQL script to move TempDB to new location.

USE master;
GO

ALTER DATABASE tempdb 
MODIFY FILE (NAME = tempdev, FILENAME = 'X:\SQL\Data\tempdb.mdf');
GO

ALTER DATABASE tempdb 
MODIFY FILE (NAME = templog, FILENAME = 'X:\SQL\Data\templog.ldf');
GO

   TSQL output:

The file "tempdev" has been modified in the system catalog. The new path will be used the next time the database is started.
The file "templog" has been modified in the system catalog. The new path will be used the next time the database is started.

  Step 3:  Restart SQL services

  Step 4:  Verify Change - See Step 1.

Conclusion:

SQL server runs quick with new storage.  Disk queue lengths, for all disks, are a thing of the past.  Individual results may vary based on load (duh).

That's It!

References:

http://dbadiaries.com/tempdb-best-practices
http://dbadiaries.com/sql-server-tempdb-whats-it-for
http://www.mytechmantra.com/LearnSQLServer/How-to-Move-TempDB-to-New-Drive-in-SQL-Server/

How to Add LTO Tape Drives to DPM

Problem:

Data Protection Manager (DPM) does not work with new Ultrium-5 LTO-5 tape drive. DPM Library only provides options to clean drive or disable drive. DPM does not correctly identify and detect the new tape drive.
Hardware: HP StoreEver LTO-5 Ultrium-5 tape drive. N.B., this solution works with just about any LTO drive.

 Background: 

 Old LTO-3 drive was running out of space. Purchased HP LTO-5 drive to expand available storage from 800GB to 1.5TB:
  • Server has compatible SAS HBA and cable.
  • Device Manager recognizes the new drive after a simple hardware swap.
  • DPM lists the new hardware under its Libraries.
  • Protection groups changed and reflect new hardware. 

 Solution:

 Update the DPMLA.xml file to reflect the new hardware:
C:\Program Files\Microsoft DPM\DPM\Config\DPMLA.xml
XML Before:
<drivelibraryassociation>
<drive drivebayindex="0" scsibus="0" scsilun="0" scsiport="2" scsitargetid="5" serialnumber="HUExxxxxxx">
<library scsibus="-1" scsilun="-1" scsiport="-1" scsitargetid="-1" serialnumber="">
</library></drive></drivelibraryassociation>
XML After:
<drivelibraryassociation>
<drive drivebayindex="0" scsibus="0" scsilun="0" scsiport="2" scsitargetid="5" serialnumber="HUExxxxxxx">
<library scsibus="-1" scsilun="-1" scsiport="-1" scsitargetid="-1" serialnumber="">
</library></drive></drivelibraryassociation>

Tips:

  • Serial is located on the physical tape drive. 
  • Use the HPE Library and Tape Tool to identify SCSI assignments. 
  • Alternately, identify SCSI assignments in the Device Manager → Tape drive → Properties →Details → Location information property.

Final Steps:

  • Save changes to DPMLA.xml and restart DPM services. 
  •  Rescan the DPM tape library. 
  • Inventory tape library.
That's It!

References:

2012 R2 Full Metal Backups Fail from the 2010 DPM Server

Problem:  Server 2012 full metal backups fail from the DPM server.


Errors:
  1. DPM cannot create a backup because Windows Server Backup (WSB) on the protected computer encountered an error (WSB Event ID: 517, WSB Error Code:  0x8078015B). (ID 30229 Details: Internal error code: 0x809909FB)
  2. The backup operation that started at '‎2015‎-‎07‎-‎28T15:32:54.838780500Z' has failed with following error code '0x8078015B' (Windows Backup encountered an error when accessing the remote shared folder. Please retry the operation after making sure that the remote shared folder is available and accessible.). Please review the event details for a solution, and then rerun the backup operation once the issue is resolved.
  3. C:\Windows\Logs\WindowsServerBackup\Backup.log:  Backup of volume C: has failed. Windows Backup encountered an error when accessing the remote shared folder. Please retry the operation after making sure that the remote shared folder is available and accessible.

Suggestions:

  1. Increase the allocated replica volume size by using the modify the disk allocation wizard from the DPM GUI.  Run a full consistency check.
  2. WSB bare metal recovery (BMR) backups use Volume Shadow Snapshots on the System Recovery and EFI PartitionsEnsure these resources have at least 320MB of free space for Volume Shadows Snapshots (VSS).
  3. Do not run WSB on the protected 2012 server -only manage backups from DPM.  Individual backups run from WSB modifies the backup catalog and VSS volume associations -DPM backups begin to fail.  Steps to correct:

      -Use Vssadmin and/or Diskshadow to delete all local snapshots on protected 2012 member.
      -Delete WSB catalog from protected 2012 server: wbadmin delete catalog
      -
      Perform consistency check from DPM.
    1. Manually increase the size of the Replica volume.  DPM does not calculate an appropriate replica size for bare metal backups.    Microsoft recommends:

           Data Source Size x 3 / 2

      However, I found this formula was insufficient.  Rather, increase storage liberally.  If Microsoft's forumla call for 50GB, try using 100GB!  Consistency checks will work after the resource has sufficient storage in DPM.    
    References:
    https://technet.microsoft.com/en-us/library/Bb795684.aspx



    Windows Server Backup Fails

    Problem:  

    Windows Server Backup fails on Windows 2012 R2 Hyper-v server.  System fails to create shadow copy.

    Specific Windows Server Backup Errors:

    -Backup not started.
    -The backup operation of the volume did not start.
    -Location is Invalid

    Event Log Errors:

    -The backup operation attempted at xxxx has failed to start, error code '0x8078000C' ('The specified backup storage location is invalid.').

    -The backup operation that started at xxxx  has failed because the Volume Shadow Copy Service operation to create a shadow copy of the volumes being backed up failed with following error code '0x80780119'.

    Background:  

    The EFI-based server uses a GUID partition table (GPT) file system.  A GPT drive includes the following partitions:


    • EFI System Partition:
      -The server boots to the EFI partition.
      -The EFI partition cannot store user data.
      -The typical size for a 512-byte sector drive is 100MB.
       
    • Microsoft Reserved Partition (MSR):
      -Windows uses the MSR (i.e., recovery partition) partition to store security and recovery tools (e.g., Windows Recovery (RE).
      -The MSR cannot store user data.
      -The typical size for a the MSR partition is 300MB.  
    Solution:  System partitions must have enough free space to create shadow copies of the partition -including both the EFI and MSR volumes:

    • Partitions smaller than 500 MB require at least 50 MB of free space.
    • Partitions larger than 500 MB require at least 320 MB of free space.
    • Partitions larger than 1 GB require at least 1 GB of free space.
    Therefore, Windows backups fail when there is not enough free space on either the EFI or MSR volumes.

    Recovery Steps:

    1. Free-up space by shrinking one of the primary partitions.
    2. Use unallocated disk space to create a new volume for VSS snapshots. For example:

      Volume Name:  VSS Backup
      Format:  NTFS
      Size:  512MB
      Drive Letter:  None
    3. Identify volume GUIDs:

      d:\vssadmin.exe list volumes
    4. Add additional VSS storage to the MSR.  For example:

          vssadmin.exe add shadowstorage /for=\\?\Volume{247c09ad-xxx-xxx-xxx-xxxxxxxx}\ /On=\\?\Volume{815230ad-xxx-xxx-xxx-xxxxxxx}\ /MaxSize=Unbounded 

      Note: VSSAdmin may need run from non-system volume (e.g., D:\).
    5. Use Diskshadow for related VSS errors (optional):
      d:\Diskshadow 
      DISKSHADOW>delete shadows all        

    It's also worth mentioning that most vssadmin functions can be completed with the Windows GUI:


    C: → Properties → Shadow Copies → Advanced




    That's it!

    References:

    https://technet.microsoft.com/en-us/library/dd799232%28v=ws.10%29.aspx?f=255&MSPPError=-2147217396
    http://answers.microsoft.com/en-us/windows/forum/windows_8-performance/backup-error-0x80780119/78adc0e0-c793-4b6c-95db-af625f3c9fe7



    Manually Install DPM Agent

    For Reference Purposes:  How to manually add a DPM client agent.

    1. Connect to DPM share:

      \\dpm1\agent
    2. Copy agent to local directory.
    3. Run Administrator CMD.
    4. Install agent:

      DPMAgentinstaller.exe

      OR

      DPMAgentInstaller_amd64.exe d
    5. Configure agent with DPM server and firewall:

      c:\Program Files\Microsoft Data Protection Manager\DPM\bin\SetDpmServer.exe -dpmServerName dpm1.contoso.com
    6. Attach agent from DPM server(optional).

      Attach-ProductionServer.ps1 dpm1.contoso.com client1.contoso.com DPMAdmin Password contoso.com
    7. Windows 8 clients (optional) require a manual registry update:
       HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Data Protection Manager\Agent\2.0

      Create DWORD named "ForceFixup" and set its value to "1".

    References:  https://technet.microsoft.com/en-us/library/Bb870935.aspx

    Fix DPM 2010 Slow and Unresponsiveness Issues

    *Update:  Pruneshadowcopy.ps1 is supposed to delete old snapshot volumes from the server -it stops working after WMF 3.0 is installed.  Thousands of expired snapshot volumes are never removed from the server.  These stale volumes cause a slew of other problems:  Windows updates fail, ridiculous startup times, excessive registry size, slow logons, slow recoveries, thousands of orphaned devices, etc....

    Use these steps to resolve:
    1. Check Pruneshadowcopy.ps1 for exceptions:
      $VerbosePreference="Continue" pruneshadowcopies.ps1
    2. Confirm WMF 3.0 status:
      $PSVersionTable.PSVersion
    3. Uninstall the WMF 3.0 update:
      wusa /uninstall /quiet /kb:2506143 wusa /uninstall /quiet /kb:2506146
    4. The uninstall may fail due to problems related to excessive VSS volumes.  Increase the trusted installer block time increment if necessary.
    DPM Problem:  Overtime, the DPM server retains tens-of-thousands of "expired" recovery point VSS volumes.  Excessive volumes cause problems:
    • DPM 2010 is slow and unresponsive.
    • DPM has slow start and login times.  The server locks or freezes for up-to an hour.  
    • Windows updates fail.
    • SQL queries are excessively slow.  For example, it takes DPM an hour to display all restore points for a protected database.
    • DPM uses all available server memory.

    Background:  The DPM server has limited memory (e.g., 16GB).  The server synchronizes a large number of SQL databases every 15 minutes.  The server has become unreliable.

    DPM supports a limited number of protected resources.  Over-provisioning DPM servers become unstable.  Recovery point volumes (i.e., incremental backups) are not automatically removed.

    DPM runs a daily Powershell script that removes expired recovery point volumes (incremental VSS backups).  This process fails when the server is over-provisioned.  The script times-out if it does not have sufficient available memory.  This situation results in thousands of recovery points that causes additional problems.


    Solution:  

    1. Discover and remove expired shadow\ recovery point volumes.

         -Run pruneVSS.ps1 from the DPM Management Shell.
         -Run this script for every protected data source.

      N.B., This process creates thousands of orphaned volumes.
    2. Remove orphaned devices with DevNodeClean tool.
      Options:

      (a)  Download cleanup.exe from my OneDrive.

      (b) Or, download and modify RMHidDev.bat.

      N.B., these tools remove orphaned devices but the registry file size remains large.
      Figure 1.  Cleanup.exe to view and remove orphaned phantom volumes.
      Cleanup.exe commands:

      Assess /view phantom volumes:
      cleanup.exe ?
      Remove all phantom volumes:
      Cleanup.exe -r
    3. Delete and re-create protection groups to recover hundreds or thousands of available gigabytes (optional).

      -Ensure continuity before deleting recovery volumes.
           *Backup existing recovery points using a second DPM server.
           *Backup to tape.

      -DPM can automatically expand volumes.
      -DPM does not automatically shrink volumes.

      -Attempt to delete shadow volumes with the "Stop protection of member" wizard.

      -Run the"Stop protection of member" wizard a second time if the wizard crashes on the first attempt.
           *Do not remove the replica volume on the second attempt
    4. Remove recovery volumes from Inactive protection group:

      -Delete replica volumes from the "Inactive protection for previously protected data group".  
           *Run Removeinactivedatasource.ps1 from the DPM console.     *Choose the "ondisk" source.
    5. Shrink Volumes (optional).  In the first step we deleted thousands of phantom volumes.  This may amounts to hundreds of Gigabytes or Terabytes.  Let's reclaim that storage.

      Let's consider how DPM storage management is flawed. It creates and expands volumes well enough.  Microsoft did not, however, include an automated process for recovering unused space -this is a manual process.

      The preferable method for shrinking Windows 2008 volumes is with diskpart.  N.B., Disk manager GUI works as well.

      C:\Users\Administrator>diskpart
      DISKPART> list volume
      Volume ###  Ltr  Label        Fs     Type        Size     Status     Info
        ----------  ---  -----------  -----  ----------  -------  ---------  --------
      Volume 0     C                NTFS   Partition     64 GB  Healthy    System
      * Volume 1     G   Extra        NTFS   Partition    100 MB  Healthy
      DISKPART> 
      select volume 1 Volume 1 is the selected volume.
      DISKPART> 
      shrink querymax
      The maximum number of reclaimable bytes is: 30GB
      DiskPart successfully shrunk the volume by:   25 MB
      DISKPART> list volume
      Volume ###  Ltr  Label        Fs     Type        Size     Status     Info
        ----------  ---  -----------  -----  ----------  -------  ---------  --------
        Volume 0     C                NTFS   Partition     64 GB  Healthy    System
      * Volume 1     G   Extra        NTFS   Partition     75 MB  Healthy
      DISKPART> shrink desired=25 minimum=10
      


    6. Shrink the registry with Chkreg

      The system registry stores volume information -including those pesky phantom drives.  All the data in the system registry is stored inside a single file:

      C:\Windows\System32\config\registry.

      Now consider how the registry file grows larger with every additional phantom drive.  Large registry files are bad for the server.  It causes slow boot-times (e.g., hours) and may cause updates to fail.

      Worse yet, the registry file generally grows but does not shrink.  It's similar to how DPM handles storage volumes.  Therefore, the registry remains indefinitely bloated.

      Thankfully, Microsoft provides a chkreg tool to manually shrink the registry file.  This procedure requires a special version of Chkreg -only available by contacting Microsoft support.  The tool is also available from my OneDrive.   Also, this tool must be run while the server is offline.  Run it from a separate Windows boot disk (e.g., Win2Go).

      Instructions:

      a.  Access DPM system drive while machine is off (e.g., boot with Win2Go).
      b.  Copy registry file to temp directory (i.e., backup):
      c:\windows\System32\config\system c.  Shrink registry file:
      #Chkreg /F SYSTEM /R #Chkreg /F SYSTEM /C 
      d.  Boot up DPM!

      That's It.  The process is a bit tedious but DPM runs like a champ afterwords.

    Preventative Measures:

    First and foremost, uninstall WMF 3.0 from Windows 2008 R2!  This fixes the problem from reoccurring.  It does not fix existing problems related to excessive shadow volumes.

    Alternately, consider installing DPM 2010 on a fresh Windows 2012 R2 installation.  2012 R2 has significant storage and network improvements that prevent these issues from occurring in the first place.

    Also, pay close attention to DPM server limitations.  Use as much RAM as possible.  I found the server works better after upgrading it to 64GB.      

     DPM 2010 Server Limit Totals:
    1. Maximum 250 storage groups
    2. Maximum 10 Tb for 32-bit DPM servers
    3. Maximum 45 TB for 64-bit DPM servers
    4. Maximum 256 data sources per DPM server (64-bit) where each data source needs two volumes
    5. Maximum 128 data sources per DPM server (32-bit)
    6. Maximum 8000 VSS shadow copies
    7. VSS Addressing limits: Add a DPM server for each 5 TB (32-bit) or 22 TB (64-bit)
    8. Maximum 75 protected servers and 150 protected workstations per server
    9. Data sources in another domain / forest that is untrusted… Add a new DPM server

     DPM 2010 Retention Limits per Protected Resource:
    1. Maximum 64 file recovery points per protected resource.
    2. Maximum 448 application (i.e., SQL) recovery points per protected application.
    3. Maximum 448 days allowed of disk based retention storage.
    4. Minimum 15 minute increments between 
    Consider these limitations when planning the backup schedule.  The recovery point frequency is proportionate to the retention range (Table 1) (Table 2):

    Table 1.
    DPM Protected Application Frequency and Retention Relationship
    Frequency
    Max No. VSS
    VSS per Day
    Max Retention
    15 minutes
    448
    96
    4.6 Days
    30 minutes
    448
    48
    9.3 Days
    45 minutes
    448
    32
    14 Days
    60 minutes
    448
    24
    18.6 Days
    12 hours
    448
    2
    224 Days
    24 hours
    448
    1
    448 Days
    Note:  DPM can store up-to 448 application recovery points, for a maximum of 448 days.
    Table 2.
    DPM Protected File Resource Frequency and Retention Relationship
    Frequency
    Max No. VSS
    VSS per Day
    Max Retention
    15 minutes
    64
    96
    0.6 Days
    30 minutes
    64
    48
    1.3 Days
    45 minutes
    64
    32
    2 Days
    60 minutes
    64
    24
    2.6 Days
    12 hours
    64
    2
    32 Days
    24 hours
    64
    1
    64 Days
    1 Week
    64
    1 (per week)
    64 Weeks
    Note:  DPM can store up-to 64 recovery points per file resource, for a maximum of 448 days.




    Windows Updates Fail on DPM 2010 Server



    Problem:  Windows updates fail on DPM 2010 servers.  Additional symptoms include slow startup and account sign-ins.  Updates fail because of excessive VSS volume information in the registry.  Log on times are slow because it takes a long time for Windows to enumerate thousands of orphaned volumes listed in its registry.

    The slow logon process interferes with updates and software installations.  During the installation, Windows enumerates driver information from the registry.  Installations will fail when the  enumeration takes longer than 15 minutes,


    Work-Around Solution:  Change the Trusted Installer block time increment from 15 minutes, to 3 hours.
    1. Edit the Registry: regedit.exe
    2. Reg Path:  HKLM\System\CurrentControlSet\Services\TrustedInstaller
    3. Change permissions on the TrustedInstaller key to Full Control for Users.
    4. Edit BlockTimeIncrement. By default this is set to 384 (15 min); change it to 2a30 (3 hours). 
    5. Reboot and install updates.
    That's it.  Installations may seem unusually slow, but at least they will complete.

    N.B, This solution is a short-term "workaround".  Read this article to fully resolve these DPM issues.

    Install DPM 2010 on a Windows 2012 Server

    Summary:  This solution is a gem for the unlucky folks still running DPM 2010.  Windows 2012 R2 innovations resolve many performance problems associated with DPM 2010:


    • Improved Networking 
    • Improved Storage 
    • Improved iSCSI
    • Fixes VSS Orphan Issues
    • Fixes Volume Enumeration Issues
    • Fixes DPM Startup Hangs


    Problem:  DPM 2010 is not supported on Windows Server 2012.  This is an unsupported.  DPM client agents get DCOM errors in this unsupported configuration.

    Solution:


    1. Install Windows 2012R2
    2. Install DPM 2010
    3. Unjoin Domain.  Ensure Computer Object is not in AD.
    4. Rejoin DPM server to domain.
    5. Add DPM server to the Distributed COM Users AD group.
    6. Manually add DPM agents.


    References:

    http://serverfault.com/questions/563088/dpm2010-windows-2012-dcom-errors-communicating-with-all-agents


    How Resolve Windows Update Errors

    How to Repair Windows UpdatesThis article explains how to fix installation errors for Windows Update installations.

    Problem:  Windows Updates fails to install on Windows 2008 R2 System Center Data Protection Manager (DPM) server.

    Errors:  Specific errors may include, "Installation Failure: Windows failed to install the following update with error 0x80070643".

    Solution:  Delete the Windows Update cache and remove all superseded service pack backup components to resolve the issue.  N.B., This situation is not specific to DPM; and can help with other Windows environments, including Windows 8.1 & Windows 2012 R2.

    1. Stop the Windows Automatic Update Service from the command line:

      net stop wuauserv
    2. Go to the Windows directory from the command line:

      cd\windows
    3. Purge the update cache from the command line:

      rd /s SoftwareDistribution
    4. Start the Windows Automitic Update Service from the command line:

      net stop wuauserv
    5. Remove superseded cumulative Service Pack backup components:

      Dism.exe /online /Cleanup-Image /SPSuperseded

    That's it.


    References:

    http://technet.microsoft.com/en-us/library/dn251565.aspx
    http://support.microsoft.com/Default.aspx?kbid=971058


    Resolve Data Protection Manager (DPM) Recovery Points and Registry Problems.

    By Steven Jordan on 4/16/2014.

    *Update:  This issue is related to Winodows Update KB2506143 and KB2506146 (i.e., WMF 3.0).  Uninstall these updates from DPM 2010!

    Problem Statement: The DPM server takes a long time to logon. It can take 15 to 90 minutes to logon after the server restarts. Additionally, Windows updates fail and rolls back to its previous state after the server restarts. The integrity of system backups and restorations are at risk because DPM server has become unreliable.

    Additional Symptoms:

      a.)  Expired recovery points are not removed per DPM policy goals. Roughly half of the protected members show excessive recovery points in the DPM console:

    Figure 1. Example of a protected member with 20478 recovery points
      b.)  Protection groups show excessive volume size. For example, the Exchange protection group indicates the recovery point volume consumes nearly 2TB.

      c.)  PruneshadowcopiesDPM2010.ps1 is a DPM PowerShell script that removes expired recovery points. The script hangs and does not remove expired recovery points.

     d.)  DPM console hangs when deleting inactive protection group members. The GUI is unresposive and must be manually closed.

     e.)  The registry System file has bloated to over 220MB. System is located in c:\windows\system32\config\.

    Figure 2. Bloated registry.
    Root Cause:

       There were excessive disk based recovery points (i.e., VSS volumes). In our case, DPM (or Windows) had improbably kept tens of thousands of recovery points per proctection member. DPM, by design, is only supposed to store up to 64 recovery points for its file members, and up to 448 recovery points for its application (e.g., SQL database) members.

       The problem did not affect every protection group member. Some members (e.g., recent additions) that had less than 100 recvery points. However, nearly half of all the protection group members had excessive (e.g., over 20,000 ) recovery points (Figure 1).

       The excessive, or rather expired, recovery points had to be removed. Normally, DPM automatically removes expired recovery points with its PruneshadowcopiesDPM2010.ps1. The default script was not working so I turned to a custom PowerShell script named PruneVSS.ps1.

       PruneVSS.ps1 is a handy tool that removes disk based recovery points based on date. Its interactive session determines protection groups and recovery point date ranges. N.B. The script was originally written by the late, Ruud Baars.

      I had mixed success with Baars' script. It worked great on resources that had less than 8,000 recovery points. The script hung indefinetly for protection group members with more than 10,000 recovery points. The situation required extreme measures.

    Inactive Protection Group Members

       The final option nixes the remaining protection group members that continue to retain expired recovery points. This DPM nuclear option removes all disk based recovery points by deleting their associated volumes. It is imperative to plan for continuity before committing. It's best to ensure the secondary DPM server has backups of the primary protection groups and to make a full tape backup before proceeding.

      The afflicted protection group members were transitioned as inactive protection group members. I then attempted to remove the disk based recovery points using the DPM console. Unfortunetly, I had limited success using the GUI. I was able to remove the disk based recovery points from a few of the inactive members. For the majority, however, the console simply froze. At this point, I turned to a second custom PowerShell script, named removeinactivedatasource.ps1. This script was a life saver -it removed all remaining disk based recovery points. I ran the script in verbose mode, so I could see its progress. It took about two hours to complete its job.

       I then moved the inactive protection group members back to their original protection groups. N.B., the recovery points must be deleted before re-adding them to their original protection group members; otherwise DPM will continue to use their originally assigned volumes.

       The next day recovery points looked great; less than 100 for each member in the DPM console. DPM's PruneshadowcopiesDPM2010.ps1 also ran without problems. I had high hopes that the problem was solved -except that DPM continued to hang after restarting it. Victory was short lived.

    Secondary Cause

       I had won a battle but not the war. Efforts to fix the recovery point volumes were successful but its cure exposed a secondary sickness: phantom VSS volumes.

      I was fortunate to discover a handful of blogs that had somewhat similar DPM problems. Microsoft explains some of the symptoms in KB982210:
    This issue occurs because there are a large amount of orphaned registry keys.
    The Volume Shadow Copy Service (VSS) snapshots create many registry keys. However, they are not deleted after the VSS snapshot operations are completed. 
    Indeed, the DPM's registry system was bloated with nearly 15,000 VSS volume registry keys.
    Fig 3. Registry bloating
    from VSS Snapshots

       Scott Forsyth's Blog recommends applying the hotfix from KB982210. The hotfix however, cannot install on a DPM server unless it runs Hyper-V! In fact, most of the focus for this problem centers on Hyper-V backups -but my problem has nothing to do with Hyper-V. Even if I wanted to install Hyper-V, to allow the hotfix installation, the server was in no condition to install a new feature; all updates failed upon restarting the server.

       In our case, DPM uses iSCSI disks for the replica and shadow copy volumes. The alternate approach removes the phantom devices via script and then requires a second tool that shrinks the registry. Both Forsyth and Gary Fenton, recommend running the Microsoft tool called DevNodeClean to remove phantom devices from the registry.

    DevNodeClean 

       DevNodeClean is available from Microsoft support or it can be compiled with Visual Studio per KB934234. Fenton also has a complete version available for download on his blog.

       I ran DevNodeClean and it indeed found orphaned devices -a grand total of 7. It was less than the 10,000 I had expected. The reason DevNodeClean did not work in this instance is because it only checks for orphaned devices on disks, partitions, and volumes; It does not check for phantom volume shadow copies.

    I described the problem to a talented programmer, #SAK, who works at my office. He reviewed DevNodeClean and further developed it so it checks for orphaned VSS volumes. SAK explained his program lists all orphaned VSS volumes from the command prompt: c:\cleanup.exe.

    The program removes can remove all orphaned VSS volumes by including a switch: c:\cleanup -r

    Success! The SAK cleanup application found and deleted nearly 10,000 orphaned VSS volumes from the registry.  Download the SAK Cleanup tool from my OneDrive.

       N.B., David Candy's Blog has a good alternative to SAK's custom application. The modifed RmHidDev.bat also finds and deletes orphaned VSS shadow volumes.

    Tertiary Problem (i.e., third time's a charm):

     The crazy slow logons remained; even after all the expired recovery point volumes were deleted; and all the orphaned VSS volume registry keys had been removed. Gambit's blog explains that DPM's problems persist because of its bloated registry. I confirmed the registry size had not changed:

    Fig. 4. Bloated registry causes log on profile and update issues.

     Microsoft support provides a tool that shrinks the registry, called Chkreg.  N.B., Chkreg is only available by contacting their support team.  Chkreg is also available for download from my OneDrive.  The tool is easy to use; the process is somewhat tedious. Essentially, Chkreg cannot fix the system file while the server is operational.  The server must be turned off and the disk must be accessed using a separate method.

      I shut the server down and used the Windows 2008 installation media to boot into the recovery mode command line.  I then used the recovery command to navigate to c:\windows\System32\config, and copied the system file to a separate location. N.B., the drive letters in the recovery command were different from what Windows normally uses. FDISK provides current assignments with its list disk, list partition, and list volume commands.

      I removed the Windows CD and re-started the server (and waited an hour). When the server was back up I used the chkreg tool to repair the copy of the registry system. I issued the following commands:

       #Chkreg /F SYSTEM /R
       #Chkreg /F SYSTEM /C

      The new system file was significantly smaller than the original. The system file shrank from 219 MB to approximately 140 MB. I admit, I had hoped the new file size was closer to 10 MB, but at least there was some progress.

      Once more, I restarted the DPM server, and accessed the recovery command prompt with the installation media. I moved the original system file to a new location -as a precaution. I then copied the new (i.e., shrunken) system file back to it's original location, c:\windows\system32\config. I restarted the server and waited for DPM to come back online.

    End result -it worked!  I can finally log onto the DPM server in less than 30 seconds.  Shortly thereafter I installed a year's worth of updates. Everything installed OK and the server remains trouble-free.



    References

    http://virtuallyaware.wordpress.com/2010/06/18/hotfix-kb982210-windows-server-2008-r1r2-hang-at-logon-bug-deep-dive/ 
    http://garysgambit.blogspot.com/2010/05/windows-2008-hyper-v-vss-backup-bug.html
    http://weblogs.asp.net/owscott/archive/2011/02/02/slow-boot-from-massive-registry-on-a-hyper-v-server-fix.aspx 
    http://technet.microsoft.com/en-us/library/hh757783.aspx
    http://blogs.technet.com/b/dpm/archive/2008/06/11/cli-script-to-remove-all-datasources-in-inactive-protection-state.aspx 
    http://blogs.technet.com/b/dpm/archive/2011/03/21/rest-in-peace-ruud-baars.aspx
    http://virtuallyaware.wordpress.com/2010/06/18/hotfix-kb982210-windows-server-2008-r1r2-hang-at-logon-bug-deep-dive/http://dcandy.wordpress.com/
    http://blogs.technet.com/b/dpm/archive/2011/05/04/easy-dpm-2010-fix-disk-based-recovery-points-are-not-deleted-as-per-retention-goals.aspx

    Remove Inactive Data Sources in DPM with PowerShell.


    By Steven Jordan on 4/15/2014.

    There are times when the DPM console GUI cannot remove its inactive data sources.  This PowerShell script removes all inactive data from DPM.  Choose between inactive disk or tape based data sources.

    Removeinactivedatasource.ps1:

     
    
    param([string] $DPMServerName, [string] $RemoveOption)
    
    function Usage()
    {
     write-host
         write-host "Usage::"
     write-host "Remove-InactiveDatasource.ps1 -DPMServerName [DPMServername] -RemoveOption [Remove Options]"
     write-host
     write-host "Run 'Remove-InactiveDatasource.ps1 -detailed' for detailed help"
     write-host
     write-host
    }
    
    if(("-?","-help") -contains $args[0])
    {
     Usage
     exit 0
    }
    
    if(("-detailed") -contains $args[0])
    {
     write-host
     write-host "Detailed Help :  Use this script to remove inactive datasources on disk or tape or both"
     write-host "Valid inputs of RemoveOption"
     write-host "OnDisk : Removes all inactive datasources on Disk only"
     write-host "OnTape : Removes all inactive datasources on Tape only"
     write-host "OnBoth : Removes all inactive datasources on both Disk and Tape"
     write-host
     write-host
     exit 0
    }
    
    if(!$DPMServerName)
    {
         $DPMServerName = read-host "DPMServerName:"
    }
    $dpmServer = Connect-DPMServer $DPMServerName
    if (!$dpmServer)
    {
        write-Error "Failed To Connect To DPM Server::$DPMServerName"
        exit 1
    }
    
    $dsList = get-datasource $dpmservername
    if (!$dsList -or ($dsList.Count -eq 0) )
    {
        write-verbose   "No Datasources found"
        disconnect-dpmserver $dpmservername
        exit 2
    }
    
    
    if(!$RemoveOption)
    {
     $RemoveOption = read-host "RemoveOption:"
    }
    if($RemoveOption)
    {
     if ("ONDISK" -eq $RemoveOption)
     {
      $RemoveOption = "OnDisk"
     }
     elseIf ("ONTAPE" -eq $RemoveOption)
     {
      $RemoveOption = "OnTape"
     }
     elseIf("ONBOTH" -eq $RemoveOption)
     {
      $RemoveOption = "OnBoth"
     }
     else
     {
      write-Error "Invalid Value::$RemoveOption For Parameter -RemoveOption[OnDisk/OnTape/OnBoth]"
      Disconnect-dpmserver
      exit 1
     }
    }
    else
    {
     Usage
     Disconnect-dpmserver
     exit 1
    }
    
    foreach($ds in $dsList)
    {
     if($RemoveOption -eq "OnDisk" -and 
                ($ds.InactiveProtectionStatus -eq [Microsoft.Internal.EnterpriseStorage.Dls.UI.ObjectModel.OMCommon.InactiveProtection]::Disk -or
                 $ds.InactiveProtectionStatus -eq [Microsoft.Internal.EnterpriseStorage.Dls.UI.ObjectModel.OMCommon.InactiveProtection]::DiskAndTape)
          )
     {
      write-host "Removing inactive Disk protection of " $ds.Name
      $confirm = read-host "Confirm(y/n):"
      if($confirm -eq "y")
      {
       Remove-DatasourceReplica -Datasource $ds -Disk
       write-host "Inactive disk protection for " $ds.Name " removed" 
      }
     }
    
     if($RemoveOption -eq "OnTape" -and 
                ($ds.InactiveProtectionStatus -eq [Microsoft.Internal.EnterpriseStorage.Dls.UI.ObjectModel.OMCommon.InactiveProtection]::Tape -or
                 $ds.InactiveProtectionStatus -eq [Microsoft.Internal.EnterpriseStorage.Dls.UI.ObjectModel.OMCommon.InactiveProtection]::DiskAndTape)
          )
     {
      write-host "Removing inactive Tape protection of " $ds.Name
      $confirm = read-host "Confirm(y/n):"
      if($confirm -eq "y")
      {
       Remove-DatasourceReplica -Datasource $ds -Tape
       write-host "Inactive tape protection for " $ds.Name " removed" 
      }  
     }
    
     if($RemoveOption -eq "OnBoth" -and 
              $ds.InactiveProtectionStatus -ne [Microsoft.Internal.EnterpriseStorage.Dls.UI.ObjectModel.OMCommon.InactiveProtection]::None)
     {
      write-host "Removing inactive Disk and Tape protection of " $ds.Name
      $confirm = read-host "Confirm(y/n):"
      if($confirm -eq "y")
      {
       if($ds.InactiveProtectionStatus -eq [Microsoft.Internal.EnterpriseStorage.Dls.UI.ObjectModel.OMCommon.InactiveProtection]::Disk) 
       {
        Remove-DatasourceReplica -Datasource $ds -Disk
       }
       elseif($ds.InactiveProtectionStatus -eq [Microsoft.Internal.EnterpriseStorage.Dls.UI.ObjectModel.OMCommon.InactiveProtection]::Tape)
       {
        Remove-DatasourceReplica -Datasource $ds -Tape
       }
       else
       {
        Remove-DatasourceReplica -Datasource $ds -Disk
        Remove-DatasourceReplica -Datasource $ds -Tape
       }
       write-host "Inactive protection for " $ds.Name " removed" 
      }  
     }
    }
    
    Disconnect-dpmserver