Author Archive

Deduplicating files with LogParser and SQL Server

Perhaps this should be categorized as a stupid people trick.  On the other hand, it’s a good, cheap way to deduplicate files assuming you’re running SQL Server and LogParser.  This post assumes at least a basic knowledge of T-SQL and LogParser command line syntax.

Basically, you use LogParser to create a two-column table containing file paths and MD5 hashes like this:

LogParser “SELECT path, HASHMD5_FILE(Path) AS MD5 INTO T_FILE_INFO FROM D:\some_path\*.*” -i:FS  -o:SQL  -database:files -driver:”SQL Server”  -createTable:ON

The -o parameter directs LogParser to write the output to a SQL Server database table.

The T-SQL  ranking function makes it very easy to find and pick out duplicate rows.  Instead of finding duplicate rows, I use it to generate statements to delete duplicate files.  The entire script looks something like this:


USE [TempDB]
GO

– Create LogParser command
DECLARE @LogParserCMD VARCHAR(8000);
SET @LogParserCMD = ‘LogParser “SELECT PATH, HASHMD5_FILE(Path) AS MD5 INTO #T_FILE_INFO FROM ”E:\some_path\*.*”” -i:FS -recurse:1 -o:SQL  -database:files -driver:”SQL Server”  -createTable:ON’

– Create table to store file paths and MD5 hashes
IF  EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N’[dbo].[#T_FILE_INFO]’) AND TYPE IN (N’U’))
DROP TABLE [dbo].[#T_FILE_INFO]
GO
CREATE TABLE [dbo].[#T_FILE_INFO](
[PATH] [varchar](512) NULL,
[MD5] [varchar](255) NULL
)
ON [PRIMARY]
GO
– run LogParser to populate table.  Either run from CMD prompt or use XP_CMDSHELL

– note: this step can be time consuming, since LogParser must generate an MD5 has for every file specified

EXEC xp_cmdshell @LogParserCMD;

–Get duplicate hashes; generate deletion commands
WITH tblCTE(MD5, PATH, ranking)AS
(SELECT MD5, PATH, Ranking =
DENSE_RANK() OVER(PARTITION BY MD5
ORDER BY NEWID() ASC)FROM #T_FILE_INFO)

SELECT ‘DEL “‘ + PATH + ‘”‘
FROM tblCTE
WHERE Ranking > 1
AND MD5 IS NOT NULL;
– Review the result set and run it from a command prompt

– clean up after yourself
DROP TABLE #T_FILE_INFO

Free SQL Server training videos

The people over at SQLSkills.com have made their MCM training videos available online for free.  They’re pretty incredible.  Subject areas include database internals and performance, high availability and disaster recovery, performance tuning,  security and development support.  Link here:  http://www.sqlskills.com/T_MCMVideos.asp

 

There are also a plethora of wonderful videos over at SQLBits.com.  One of my favorites is a talk by Thomas Kejser on designing I/O systems for SQL Server.

 

SQLWorkshops.com has a collection of wonderful videos on advanced query tuning.   The instructor provides live demos with  runtime statistics to fully demonstrate his tuning techniques.  Free and very worthwhile.

PowerShell Script to Clean Up Old Files Based on Age

Here’s an extremely concise PowerShell script to remove old files.  You could use it to enforce a policy that keeps 90 days of ETL input files on a server.  Files older than that would purged with this script.  The script would be invoked by a SQLAgent CMDExec task or a Windows Task Scheduler job.


#Powershell Script to delete files older than a certain age

 

$intFileAge = 90  #age of files in days

$strFilePath = “c:\archive” #path to clean up

 

#create filter to exclude folders and files newer than specified age

Filter Select-FileAge 

      {

      param($days)

      If ($_.PSisContainer) {}

              # Exclude folders from result set

      ElseIf ($_.LastWriteTime -lt (Get-Date).AddDays($days * -1))

            {$_}

      }

 

      get-Childitem -recurse $strFilePath | Select-FileAge $intFileAge ‘CreationTime’ | remove-item 

Quick & Dirty way to identify orphan files

Here is a quick and dirty script to identify orphan database files - that is, files that are no longer in the system catalog. It uses several undocumented stored procedures and is not particularly clean SQL, but it works.

To do: come up with a version that uses LogParser to do the same thing.

DECLARE @SQL VARCHAR(MAX)
CREATE TABLE #temp1 (DRIVE CHAR(1), MB_FREE INT
)
INSERT INTO #temp1 (DRIVE, MB_FREE
)
EXEC
xp_fixeddrives
CREATE TABLE #TEMP2(SUBDIRECTORY VARCHAR(256),
DEPTH INT, [FILE] INT, FULL_PATH VARCHAR(256), COMMAND VARCHAR(512
))
SELECT @SQL=COALESCE(@SQL,)+ ‘INSERT INTO #TEMP2 (SUBDIRECTORY, DEPTH, [FILE]) EXEC Master.dbo.xp_DirTree “‘ + DRIVE
+
‘:\MSSQL\DATA”, 1, 1; UPDATE #TEMP2 SET FULL_PATH = ”’ + DRIVE +
”’ + ”:\MSSQL\DATA\” + SUBDIRECTORY WHERE FULL_PATH IS NULL; ‘
FROM #temp
1
PRINT
@SQL
EXEC (@SQL
)
DELETE
FROM
#TEMP
2
WHERE [FILE] =
0
OR SUBDIRECTORY LIKE
‘%CER’
OR SUBDIRECTORY LIKE
‘MSSQLSYSTEMRESOURCE%’
OR SUBDIRECTORY LIKE
‘distmdl%’
OR FULL_PATH
IN
(
SELECT
PHYSICAL_NAME
FROM
SYS.MASTER_FILES
)
SELECT FULL_PATH, ‘DEL ‘ + FULL_PATH AS
COMMAND
FROM #TEMP
2
DROP TABLE #temp
1
DROP TABLE #TEMP2

Getting data file space usage

Here is a simple query to get space usage by data file:

 SELECT 
   
a.FILEID

   
[FILE_SIZE_MB] CONVERT(decimal(12,2),ROUND(a.size/128.000,2
)), 
   
[SPACE_USED_MB] CONVERT(decimal(12,2),ROUND(FILEPROPERTY(a.name,'SpaceUsed')/128.000,2
)), 
   
[FREE_SPACE_MB] CONVERT(decimal(12,2),ROUND((a.size-FILEPROPERTY(a.name,'SpaceUsed'))/128.000,2
)) , 
   
NAME LEFT(a.NAME,24
), 
   
FILENAME LEFT(a.FILENAME,55

   
FROM dbo.sysfiles a 

Getting HH:MM:SS value from Seconds

Here is another handy date manipulation function. This one converts the number of seconds, expressed as an int, into the HH:MM:SSS format

SELECT
CASE
WHEN @seconds/3600<10 THEN '0'
ELSE
END
+ RTRIM(@seconds/3600)
+
':' + RIGHT('0'+RTRIM((@seconds % 3600) / 60),2)
+
':' + RIGHT('0'+RTRIM((@seconds % 3600) % 60),2)

Rewritten as a function, we have:

CREATE FUNCTION SecToHHMMSS
(
– Add the parameters for the function here
@seconds INT
)
RETURNS VARCHAR(10)
AS
BEGIN
RETURN
CASE
WHEN @seconds/3600<10 THEN '0'
ELSE
END
+ RTRIM(@seconds/3600)
+
‘:’ + RIGHT(‘0′+RTRIM((@seconds % 3600) / 60),2)
+
‘:’ + RIGHT(‘0′+RTRIM((@seconds % 3600) % 60),2)
END
GO

Moving

Yahoo 360 is a lame duck, so I’m moving my postings over to this web site.

Basic Database Inventory Script

CREATE TABLE #tempresults2
(
database_name NVARCHAR(128),
backup_start_date DATETIME
)
INSERT INTO #tempresults2
SELECT database_name, backup_start_date
FROM msdb.dbo.backupset AS b1
WHERE backup_start_date =
(
SELECT MAX(backup_start_date)
FROM msdb.dbo.backupset AS b2
WHERE b1.database_name = b2.database_name
AND TYPE IN (‘D’, ‘I’)
);
WITH TBL_CTE(name, Size_MB, create_date, compatibility_level, [state], recovery_model, page_verify_option) AS
(
SELECT D.name, CONVERT(DECIMAL(12,2),ROUND(SUM(f.[size])/128.000,2)) AS Size_MB, d.create_date, d.compatibility_level, d.state_desc, d.recovery_model_descd.page_verify_option_desc
FROM SYS.DATABASES D
INNER JOIN SYS.MASTER_FILES F
ON F.database_id = D.database_id
GROUP BY D.name, d.create_date, d.compatibility_level, d.state_desc, d.recovery_model_desc, d.page_verify_option_desc
)
SELECT name, Size_MB, create_date, compatibility_level, [state], recovery_model, page_verify_option,
tr2.backup_start_date AS last_backup, GETDATE() AS inventory_date
FROM TBL_CTE D
LEFT OUTER JOIN #tempresults2 tr2
ON D.name = tr2.database_name;
DROP TABLE #tempresults2

Query to get last database backup dates

Here’s a simple query to find the last backup dates:

 

SELECT backup_start_date, database_name
FROM msdb.dbo.backupset AS b1
WHERE backup_start_date =
(
SELECT MAX(backup_start_date)
FROM msdb.dbo.backupset AS b2
WHERE b1.database_name = b2.database_name
AND type = ‘D’
)

Windows shell script: putting date & time into a file

SET mydate=%DATE:~4,2%_%DATE:~7,2%_%DATE:~10,4%
echo mydate >> somefile.txt

Earlier Postings

…stuff imported from an earlier journal



2/3/07  I stumbled across a number of useful SQL security tools at SQLSecurity.com: http://www.sqlsecurity.com/Tools/FreeTools/tabid/65/Default.aspx


1/23/07  I ran into a problem configuring SQL XML support in IIS 6.0. When I went to do a query It was coming back with a 404 (page not found). The IIS log entry looks like this:

2007-01-24 01:04:07 127.0.0.1 GET /Northwind sql=select+*+from+Customers+For+XML+Auto&root=Customers 80 MACBETH\Administrator 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.2;+.NET+CLR+1.1.4322;+.NET+CLR+2.0.50727) 404 2 1260

The last part of the string, ‘1260′ is the Win32 status. Doing a ‘net helpmsg 1260 ‘ from the command prompt shows a problem with policies:

Windows cannot open this program because it has been prevented by a software restriction policy.

Doing some searching, I found the following useful articles:

HTTP 404.x-File or Directory Not Found (IIS 6.0)
http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/0f4ac79a-dc2b-4a5f-89c1-d57266aa6ffe.mspx?mfr=true

IIS 6.0: ASP.NET Is Not Automatically Installed on Windows Server 2003
http://support.microsoft.com/kb/332124/

“HTTP Error 404 - File or Directory not found” error message when you request dynamic content with IIS 6.0
http://support.microsoft.com/?id=315122

Enabling Web Service Extensions (IIS 6.0)
http://www.microsoft.com/technet/prod … 2f-b029-aea2b0418bea.mspx


12/15/2006  I finally caught up with the times and replaced my AC97 audio with a SoundBlaster Audigy. The difference is quite noticeable. I also recently discovered EAC or Exact Audio Copy, a freeware program that accurately transcribes your audio CDs to WAV files. The combination of EAC and a decent sound card essentially turns your PC into a high-end CD player.


11/15/06  I purchased my fourth pneumatic nailer. This beauty is a framing nailer that shoots 2″ to 3 1/2″ nails. It makes quick work of many tasks around the home. By the way, did you know that “he who dies with the most tools wins”?framing-nailer1.jpg

11/3/06 Today is my last day at Citigroup and two days after my 7th anniversary. Company management decided to eliminate the Diners Club systems development group based here in Denver. My job was among 168 of those affected.  It’s a shame because it was probably the best software development team within Citigroup.


7/15/06   I have decided to install security cameras around the house. One thing that I needed was connectors for 75 ohm RG6 cable. Most RG6 connectors are the ‘F’ style, while most security cameras use BNC connectors. I also needed replacement solder-tab batteries for our Interplak toothbrushes. After checking with several suppliers, I placed my order with All Electronics Wednesday evening. The parts arrived via USPS Saturday afternoon — very quick turnaround!

6/25/06  We have to relocate our gas meter. The utility and the building department have pretty stringent requirements for this, so Friday I took the Denver homeowner’s plumbing exam and pulled a plumbing permit. The plumbing inspection requires that you pressurize the gas line to 10 PSI and connect a gauge. You must demonstrate to the inspector that the gas piping can maintain pressure for at least 15 minutes. Not having a suitable gauge or test fixture, I went General Hardware, purchased a few parts and the built one shown here. The parts include a 0-30 psi gauge, a Schrader valve for applying and bleeding the compressed air, a 3/4″ tee, and the necessary reducers for the gauge and the Schrader valve.testfitting.jpg


6/19/06  A colleague recommended a really useful site for finding useful security tools: http://sectools.org/.  Fyodor, the creator of Nmap, and the maintainer of the site http://www.insecure.org, surveyed security professionals and created a listing of the 100 most popular security tools.  Many of these tools have multiple uses. For example, the Paros proxy is a useful tool if you maintain and troubleshoot web sites.

6/1/06  Are your favorite troubleshooting tools not installed on the machine that you are currently working on? One solution is to install your applications on a USB drive a site called PortableApps.com has a number of applications that have been built to run off of a USB or CDROM drive.

4/20/06  Hooray for the retarded kid! I finished up the OS course and took a DANTES exam. Only 7 semester hours to go!

2/21/06  Two useful measures of energy density are Watt hours/Liter (volumetric energy density) or Watt hours/kilogram (gravimetric energy density).  A gallon of gasoline has an energy density of 9000 watt hours per liter while a lead acid battery has an energy density of 40 watt hours per liter.  This helps explain why electric cars are not yet common.
Liquid hydrogen has a substantially lower volumetric energy density than gasoline, only 2600 watt hours per liter.  Don Lancaster has an excellent tutorial on energy fundamentals: http://tinaja.com/glib/energfun.pdf.


1/6/06  Downloaded and installed GIMP on Linda’s XP computer. GIMP stands for GNU Image Manipulation Program. It seems to be a nice alternative to Photoshop.

1/2/06  Hooray for the retarded kid!  I’m finishing up my last fourteen semester hours of undergraduate work.  Today I am starting a course on operating systems.

12/28/05  Used an updated version of a familiar utility autoruns.exe from the sysinternals.com website.  There are all sorts of places within Windows that a program can be automatically launched.  This includes services as well as programs launched as the logged-in user.  These guys seemed to have figured it all out.

12/27/05  ps command usage on AIX:

  • to display all processes with memory utilization use ps gv
  • to display all processes with memory utilization and command line ps gev

Interesting options from the man page: (none of these are preceeded by a - sign)

  • e  Displays the environment as well as the parameters to the command, up to a limit of 80 characters.
  • ew  Wraps display from the e flag one extra line.
  • eww  Wraps display from the e flag as many times as necessary.
  • g  Displays all processes.
  • u  Displays user-oriented output. This includes the USER, PID, %CPU, %MEM, SZ, RSS, TTY, STAT, STIME, TIME, and COMMAND fields.
  • v  Displays the PGIN, SIZE, RSS, LIM, TSIZ, TRS, %CPU, %MEM fields.
  • w  Specifies a wide-column format for output (132 columns rather than 80). If repeated, (for example, ww), uses arbitrarily wide output. This information is used to decide how much of long commands to print.

12/5/05  Grep Examples (in a Win32 environment)

  • Search a file for lines containing “cat” or “dog”:   grep -i “cat\|dog” file.txt    in this case the escaped | symbol is a logical OR operator
  • Search a file for lines containing both “cat” and “dog”:   grep “cat” file.txt | grep “dog”  in this second case, the | symbol is used for redirection

11/18/05:   Found a useful Win32 utilities site. http://www.nirsoft.net/ Of particular note: MyUninstaller. Way more useful that the stock add/remove programs control panel applet.11/17/05: The Unix chsh utility can be used to change your working shell, but

  • This will only work if you have permissions to the passwd file
  • BOKS managed systems periodically push down the passwd file, which will overwrite your changes

|