Info

You are currently browsing the archives for the Utilities category.

Calendar
February 2012
S M T W T F S
« Apr    
 1234
567891011
12131415161718
19202122232425
26272829  
Categories

Archive for the Utilities Category

Deduplicating files with LogParser and SQL Server

Perhaps this should be categorized as a stupid people trick.  On the other hand, it’s a good, cheap way to deduplicate files assuming you’re running SQL Server and LogParser.  This post assumes at least a basic knowledge of T-SQL and LogParser command line syntax.

Basically, you use LogParser to create a two-column table containing file paths and MD5 hashes like this:

LogParser “SELECT path, HASHMD5_FILE(Path) AS MD5 INTO T_FILE_INFO FROM D:\some_path\*.*” -i:FS  -o:SQL  -database:files -driver:”SQL Server”  -createTable:ON

The -o parameter directs LogParser to write the output to a SQL Server database table.

The T-SQL  ranking function makes it very easy to find and pick out duplicate rows.  Instead of finding duplicate rows, I use it to generate statements to delete duplicate files.  The entire script looks something like this:


USE [TempDB]
GO

– Create LogParser command
DECLARE @LogParserCMD VARCHAR(8000);
SET @LogParserCMD = ‘LogParser “SELECT PATH, HASHMD5_FILE(Path) AS MD5 INTO #T_FILE_INFO FROM ”E:\some_path\*.*”” -i:FS -recurse:1 -o:SQL  -database:files -driver:”SQL Server”  -createTable:ON’

– Create table to store file paths and MD5 hashes
IF  EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N’[dbo].[#T_FILE_INFO]’) AND TYPE IN (N’U’))
DROP TABLE [dbo].[#T_FILE_INFO]
GO
CREATE TABLE [dbo].[#T_FILE_INFO](
[PATH] [varchar](512) NULL,
[MD5] [varchar](255) NULL
)
ON [PRIMARY]
GO
– run LogParser to populate table.  Either run from CMD prompt or use XP_CMDSHELL

– note: this step can be time consuming, since LogParser must generate an MD5 has for every file specified

EXEC xp_cmdshell @LogParserCMD;

–Get duplicate hashes; generate deletion commands
WITH tblCTE(MD5, PATH, ranking)AS
(SELECT MD5, PATH, Ranking =
DENSE_RANK() OVER(PARTITION BY MD5
ORDER BY NEWID() ASC)FROM #T_FILE_INFO)

SELECT ‘DEL “‘ + PATH + ‘”‘
FROM tblCTE
WHERE Ranking > 1
AND MD5 IS NOT NULL;
– Review the result set and run it from a command prompt

– clean up after yourself
DROP TABLE #T_FILE_INFO

PowerShell Script to Clean Up Old Files Based on Age

Here’s an extremely concise PowerShell script to remove old files.  You could use it to enforce a policy that keeps 90 days of ETL input files on a server.  Files older than that would purged with this script.  The script would be invoked by a SQLAgent CMDExec task or a Windows Task Scheduler job.


#Powershell Script to delete files older than a certain age

 

$intFileAge = 90  #age of files in days

$strFilePath = “c:\archive” #path to clean up

 

#create filter to exclude folders and files newer than specified age

Filter Select-FileAge 

      {

      param($days)

      If ($_.PSisContainer) {}

              # Exclude folders from result set

      ElseIf ($_.LastWriteTime -lt (Get-Date).AddDays($days * -1))

            {$_}

      }

 

      get-Childitem -recurse $strFilePath | Select-FileAge $intFileAge ‘CreationTime’ | remove-item 

Converting an existing Perfmon .BLG file to CSV

You may run across the situation where you need to import an existing Perfmon (or System Monitor) binary log file into and Excel spreadsheet or SQL database.  The obvious first step is to convert it to CSV format.  It took googling numerous forums to figure out how to do this, so I thought that I’d document it here for future reference.

The relog.exe utility, which ships with the OS, has the ability to convert between Perfom log file formats and merge several log files into one file.  It has some nice filtering capabilities including filtering by time, counters, and nth record.  Example syntax for simply converting between file formats:

relog -f csv inputfile.blg -o outputFile.csv

You can display all the options by simply typing relog -? at the command prompt.

MD5 Hash Utilities

FCIV.EXE is a handy little utility to snap a MD5 hash of a single file or multiple files. Optionally it can recurse directories and produce XML output. http://support.microsoft.com/kb/841290

MD5SUM.EXE is a more commonly used utility that runs on a number of different platforms. A good Wikipedia article with links can be found here and a Win32 binary can be found here.

Open Source Diff Tool

Interesting tool: http://winmerge.org/

To quote the authors: WinMerge is an Open Source visual text file differencing and merging tool for Win32 platforms. It is highly useful for determing what has changed between project versions, and then merging changes between versions.

I have previously used to use windiff.exe to diff two files. This is or was part of the Windows Resource Kit. More recently I have used the freeware Context editor http://context.cx, which has a very good diff tool built in.

On a related note, Here’s a cool trick for finding registry differences between two machines: Export the subkey to a file for each machine and use a file diff tool to compare the differences. By default, the registry editor exports unicode, so either your diff tool has to support unicode, or you need to use the optional Win NT4/9x output format for your export.

|