Database statistics is a subject near and dear to my heart. As an Architect and consultant I have come across statistics issues quite often. You may ask why would you come across statisticsstatistics issues? Here is a list of the most frequent reasons.
statisticsDatabase statistics is a subject near and dear to my heart. As an Architect and consultant I have come across statistics issues quite often. You may ask why would you come across statistics issues? Here is a list of the most frequent reasons.
- Lack of maintenance plans
- Custom targeted maintenance plans
- Being told that modern SAN infrastructure means no longer requires worrying about index fragmentation or statistics.
- Not updating statistics often enough.
- Highly volatile tables that change frequently within an environment.
As hopefully we all know having out of data statistics can cause serious performance issues like bad plans, random query performance, and reindexing processes not correctly parallelizing. Statistics are often the source the “ghost in the machine errors” especially for highly volatile tables. The users report that a particular query is running slow but by the time you get around to it is now running fine. Or have you ever run a query with different parameters with varying degrees of performance results? This could be a stats issue.
I started really looking in-depth at statistics about 11 years ago when I had an issue that was really bugging me. I kept getting called to look at the systems other DBAs managed in my company because they were having performance issues. Often I found that they were not doing proper database maintenance. They all had a hodgepodge of custom process determined by each DBA. These processes had to maintained because they were very very targeted. If table or index changed, the process had to be updated. Often it wasn’t and stuff just didn’t work. Or maybe they added a new database and they forgot to add it to the process. Maybe the SQL Edition had changed when they moved servers and they could no longer do online reindexing So, I started querying all the databases on all of the instances (over a thousand) to prove a point that we were missing stuff because we were not using self healing process that automatically included databases like I did in my environments. (Oh the self-righteousness) . There was a lot of stuff that was just not working. But when I gathered all of the information in to a nicely compiled report there was only one problem…. my servers were on the list too.
For the environments I maintained I had created a custom process that reindexed all of the databases based on fragmentation. I also ran update stats afterwards on the tables. The process generated it’s own code and automatically included new databases. I called it my “One Ring to Rule Them All”. Sound familiar? Because someone else did it to and published for their code for the world to use. Ola Hallengren has some great scripts for database maintenance. I later switch to using them with some custom wrappers 🙂
I first generated a query to find all of the out of date statistics in a database. It found all of the statistics after I reindexed and reorganized that were still out-of-date. While digging into this, I realized that I was double dipping when I immediately updated statistics after I rebuild. I just needed to get the remaining items. This process was going to significantly reduce my maintenance windows!
SELECT
GETDATE() as [Date],
CONVERT(VARCHAR(100), SERVERPROPERTY('ServerName')) as [Server],
DB_NAME() as [DBName],
sch.name as [SchemaName],
OBJECT_NAME(i.object_id) as [TableName],
i.name as [IndexName],
STATS_DATE(i.object_id, index_id) as [Stats_Date],
si.rows as [RowCount],
si.rowmodctr as [RowModCtr],
cast(si.rowmodctr as Decimal(15,2))/si.rows as [PercentChange],
'UPDATE STATISTICS ['+db_name()+'].['+sch.name+'].['+OBJECT_NAME(i.object_id) +'](['+i.name+'])WITH SAMPLE 100 Percent'
FROM sys.indexes i WITH(NOLOCK)
INNER JOIN sys.sysindexes si WITH(NOLOCK) ON i.name = si.name
INNER JOIN sys.tables t WITH(NOLOCK) ON i.object_id=t.object_id
INNER JOIN sys.sysobjects so WITH(NOLOCK) ON t.object_id = so.id
INNER JOIN sys.schemas sch WITH(NOLOCK) ON so.uid = sch.schema_id
WHERE OBJECTPROPERTY(t.OBJECT_ID,'IsUserTable') = 1
AND i.name IS NOT NULL --Remove Heaps
AND si.rows > 1000
AND si.rowmodctr > 100
AND si.rows >100000000 and cast(si.rowmodctr as Decimal(15,2))/si.rows > .2
order by cast(si.rowmodctr as Decimal(15,2))/si.rows DESC
But back to the problem, 99.99% of my environment was good, but I did have some stats not getting properly updated, Was there a bug in the code? I manually rebuilt the index and nothing happened. The statistics were still out of date. I then manually updated stats for the table …Still out of date. Finally I manually updated the statistic itself. This worked, but I could not do this manually every time this happened. The mystery would remain unsolved for many years as to why this happened, meanwhile I was going to create a custom statistics update process to fix the issue.
Recently Ola Hallengren updated his statistic process to do just this in a much more automated fashion than shown above. Erin Stellato has a great blog on how he has changed his update stats process and why you really should not use the built in functionality of SQL Server. So why am I blogging about this then? Well what I think is lacking in this process is a variable sample size based on the size of the table. I don’t want to scan a 1,000,000,000 row table at the same percentage as a 65,000 row table
By varying the scan depth percentage (aka Sample size), I could run my process more often. I even passed a parameter to it so that throughout the day I could update stats. How often? Every 15 minutes… Why would I want to do that? When those highly volatile tables were getting updated when they needed it. I got fewer bad plans. I have turned this on for 1000s of instances. The first time I turned it on, a Business Analyst came by my desk and asked what we had done. I asked why. He said everything is fast. We had one extremely volatile table in the environment for a VLDB that controlled who was working on what. It was basically a queue table. How many rows? Less than 65,000 rows table in a 20 terabyte database caused a noticeable performance issue to my users. We turned off the custom update statistics process and it all slowed down. So we left it on. I later rolled it out at another company that was having an issue with bad plans. Months after we rolled it out, I had an upper up ask me what was going on with the database. Why ? The bad plan issue had gone away.
This is what the basic form of the query looks like now that we vary the sample size by table size. We also vary the threshold for updating statistics based on the size of the table.
SELECT
GETDATE() as [Date],
CONVERT(VARCHAR(100), SERVERPROPERTY('ServerName')) as [Server],
DB_NAME() as [DBName],
sch.name as [SchemaName],
OBJECT_NAME(i.object_id) as [TableName],
i.name as [IndexName],
STATS_DATE(i.object_id, index_id) as [Stats_Date],
si.rows as [RowCount],
si.rowmodctr as [RowModCtr],
cast(si.rowmodctr as Decimal(15,2))/si.rows as [PercentChange],
'UPDATE STATISTICS ['+db_name()+'].['+sch.name+'].['+OBJECT_NAME(i.object_id) +'](['+i.name+'])WITH SAMPLE '+
CASE WHEN si.rows >= 100000000 THEN '1000000 ROWS'
WHEN si.rows >= 10000000 THEN '1 PERCENT'
WHEN si.rows >= 1000000 THEN '5 PERCENT'
ELSE '100 PERCENT' END as CMD
FROM sys.indexes i WITH(NOLOCK)
INNER JOIN sys.sysindexes si WITH(NOLOCK) ON i.name = si.name
INNER JOIN sys.tables t WITH(NOLOCK) ON i.object_id=t.object_id
INNER JOIN sys.sysobjects so WITH(NOLOCK) ON t.object_id = so.id
INNER JOIN sys.schemas sch WITH(NOLOCK) ON so.uid = sch.schema_id
WHERE OBJECTPROPERTY(t.OBJECT_ID,'IsUserTable') = 1
AND i.name IS NOT NULL --Remove Heaps
AND si.rows > 1000
AND si.rowmodctr > 100
AND ( (si.rows >100000000 and cast(si.rowmodctr as Decimal(15,2))/si.rows > .02)
OR (si.rows >10000000 and cast(si.rowmodctr as Decimal(15,2))/si.rows > .06)
OR (si.rows < 10000000 and si.rows < 100000 and cast(si.rowmodctr as Decimal(15,2))/si.rows > .08)
OR cast(si.rowmodctr as Decimal(15,2))/si.rows > .1)
order by cast(si.rowmodctr as Decimal(15,2))/si.rows DESC
Part 2 of this post will contain the full block for the statistics update rolling through all the databases in an environment. As well with strategies for deploying this in your environment. I should have Part 2 posted in a few days.