Today the main actor of this post is this error:
Failed to execute IS server package because of error 0x80131904. Server: ASGARD<c/> Package path: \SSISDB\ASGARD\ASGARD Staging\Orchestrator_Transaction.dtsx<c/> Environment reference Id: 20012. Description: The operation failed because the execution timed out. Source: .Net SqlClient Data Provider Started: 05:00:00 Finished: 05:00:11 Elapsed: 10.484 seconds.
Disclaimer: Solution provided here is rather a kind of a workaround than a real one. It helps to overcome above-mentioned problem in a quick way, but in a long run you need to find a proper solution for the problem. At the end of the post, you’ll find my several propositions how to overcome the problem at a more mature level.
Have you ever received the above-mentioned error when you tried to execute your package that resides in SSISDB? In my project everything was fine until one day when most of SQL Server Agent jobs started to fail because of this error. It is a well-known problem and some recommendation say that adding indexes can help (http://www.ssistalk.com/2013/01/31/ssis-2012-catalog-indexing-recommendations/). These recommendations, although, are for SQL Server 2012. In my project I use SQL Server 2016 SP1 and all these indexes are already applied out of the box.
In my project I execute many SSIS packages (around 15) through SSA jobs in exactly the same time. This led me to check the internals of catalog.create_execution stored procedure that can be found inside SSISDB database. Around line 161 a mechanism of lock acquiring starts:
DECLARE @lock_result int DECLARE @retry int SET @lock_result = -1 SET @retry = 2 WHILE @retry > 0 AND @lock_result < 0 BEGIN EXEC @lock_result = sp_getapplock @Resource = 'MS_ISServer_Create_Execution', @LockTimeOut= 5000, @LockMode = 'Exclusive' SET @retry = @retry - 1 END IF @lock_result < 0 BEGIN RAISERROR(27195, 16, 1) WITH NOWAIT END
The mechanism itself is simple and works like this:
1. Try to acquire lock with lock name MS_ISServer_Create_Execution by using sp_getapplock with timeout of 5 seconds (@LockTimeOut parameter)
2. Wait 5 seconds
3. If lock was acquired then continue, If lock was not acquired, try to acquire lock once again (and wait another 5 seconds)
4. If lock was acquired then continue, If lock was not acquired after the second attempt, throw an error with error number 27195: The operation failed because the execution timed out.
As a part of investigation I checked If there are any heavy-duty SQL statements executed inside this stored procedure and one thing drew my attention: there are a lot of inserts and updates on a internal.execution_parameter_values table (code lines: 424, 458, 502, 592, 629, 671, 680, 689, 700, 745) . I was scratching my head that if the table itself is big enough, then it may lead to some performance problems with the lock acquisition.
Furthermore, I saw that many times statements are filtered in WHERE clause by project_version_lsn column which comes from internal.object_parameters table, so when you have a lot of SSIS projects and/or SSIS packages, the volume of data about versioning of these projects can also be significant.
I checked the number of rows in tables: internal.execution_parameter_values and internal.object_parameters] and in my case the number of rows were 854123 and 273812 appropriately.
This led me to conclusion that these massive INSERT and UPDATE operations and these two tables are bottleneck to acquire lock.
It’s easy to decrease number of rows in both tables without even physically delete rows from these tables with using DELETE statement. In order to do this you need to modify two parameters on the whole SSISDB catalog:
- Retention Period (days) – parameter that sets the number of days of package execution history stored in SSISDB database
- Maximum Number of Versions per Project – parameter that sets the number of last deployed and stored versions of SSIS projects in SSISDB catalog
You can reach these parameters by right clicking on your SSISDB catalog and chose Properties option:
In my case setting these values for the parameters helped:
- Retention Period (days) – 14 (previously 31)
- Maximum Number of Versions per Project – 2 (previously 10)
and since then I haven’t seen any problems regarding timeout during package execution due to failing lock acquiring. Of course the caveat is that you decrease the number of history days stored and the number of stored versions of SSIS projects, so be aware of this.
As I mentioned at the beginning, this is not a fully proper solution to the problem. But If you can live with decreasing of your SSISDB execution history and number of projects’ versions and you don’t want to interfere in the stored procedure code, it may be an option for you.
Possible solutions I see how to solve this problem may be:
- Modify the catalog.create_execution stored procedure, for example by increasing number of lock acquisition attempts, change timeout time, or even redesign the procedure completely by completely get rid of lock acquiring as you can find here: https://speaksql.wordpress.com/2013/06/27/ssis-2012-fail-to-start-execution-timed-out/
- Modify your jobs to be run in a different schedule. For example: instead of executing all packages at 07:00.00, you may want to execute some packages at 07:00.00, some at 07:00.20, some at 07:00.40, etc…
- Write your own package execution scheduler on the top of SSISDB, by using for some T-SQL or C# code. Then you can be better in control of packages’ execution by for example putting them on FIFO queue, etc…