Deleting from very large tables can cause uncontrolled transaction log growth. The way around this is to perform an iterative TOP(n) delete where you limit the number of rows to be deleted in each batch. I got this idea from an article by Hugo Kornelis in the first volume of SQL Server MVP Deep Dives. It’s titled Set-based iteration, the third alternative. It’s a worthwhile read.
A basic iterative deletion script might look something like this. We set a batch size variable, @BatchSize equal to the number of rows to delete per iteration. As long as the number of rows deleted is equal to the batch size, the WHILE loop will continue. Otherwise it will terminate:
1 2 3 4 5 6 7 8 9 10 | DECLARE @BatchSize int, @RowCnt int; SET @BatchSize = 10000; SET @RowCnt = @BatchSize; WHILE @RowCnt = @BatchSize BEGIN; DELETE TOP (@BatchSize) FROM dbo.sometable WHERE somecolumn < 2395; SET @RowCnt = @@ROWCOUNT; END; |
The script shown above isn’t really suitable for production. Below is a more production-ready script. I’ve incorporated similar scripts into scheduled SQL Agent jobs to maintain a specific data retention target. For example, seven years. The script has a couple of noteworthy features. First, it incorporates a WAITFOR DELAY into the loop. This gives other processes a certain amount of time to use the physical resources of the computer. It also gives processes a chance to access the table you are deleting. Second, it incorporates a cutoff time of day. After that time, the loop will terminate. For example, you might use SQL Agent to schedule this process every Sunday at 4:00 PM and set the cutoff time to 8:00 PM, because 8:00 PM is when other maintenance processes start up. Note that you will have to tune the @BatchSize and WAITFOR DELAY parameters to suit your particular situation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | --controlled deletion prevents long-term locks & blowing out transaction log DECLARE @BatchSize INT , @sPurgeMessage VARCHAR(64) , @RowCnt INT , @StartRowCount BIGINT , @EndRowCount BIGINT , @sMessage VARCHAR(512) , @dStart DATETIME, @StopTime DATETIME , @cutoffDate DATETIME; --Set the stopping time for 18:00 on the current day SELECT @StopTime = DATEADD(HOUR, 16, CAST(FLOOR(CAST(GETDATE() AS FLOAT)) AS DATETIME)) -- @cutoffDate is the date value below which rows are purged SELECT @cutoffDate = DATEADD(DAY, -2555, GETDATE()) --informational - get & store starting row count --BTW, this method of counting rows is considered to be optimal in terms of speed SELECT @StartRowCount = SUM(row_count) FROM sys.dm_db_partition_stats WHERE object_id = OBJECT_ID('dbo.SomeGynormousTable') AND ( index_id = 0 OR index_id = 1 ); SET NOCOUNT ON; --set number of deletions per batch SET @BatchSize = 1; SET @RowCnt = @BatchSize WHILE ( @StopTime > GETDATE() ) AND @RowCnt > 0 BEGIN SELECT @dStart = GETDATE() BEGIN TRAN DELETE TOP ( @BatchSize ) FROM dbo.SomeGynormousTable WHERE [Date] < @cutoffDate; SET @RowCnt = @@ROWCOUNT COMMIT; SELECT @sPurgeMessage = 'Purged ' + CAST(@RowCnt AS VARCHAR(10)) + ' rows in ' + CAST(DATEDIFF(SECOND, @dStart, GETDATE()) AS VARCHAR(6)) + ' seconds.' RAISERROR (@sPurgeMessage, 1, 1) WITH NOWAIT WAITFOR DELAY '00:00:05' -- give other spids a chance to work END; --informational - get & store ending row count SELECT @EndRowCount = SUM(row_count) FROM sys.dm_db_partition_stats WHERE object_id = OBJECT_ID('dbo.SomeGynormousTable') AND ( index_id = 0 OR index_id = 1 ); -- optionally send out a completion notice SET @sMessage = CAST(( @StartRowCount - @EndRowCount ) AS VARCHAR(32)) + ' rows were deleted' EXEC msdb.dbo.sp_send_dbmail @profile_name = 'someprofile', @recipients = 'somedba@gmail.com', @body = @sMessage, @subject = 'purge of dbo.SomeGynormousTable done'; |