Hunting Deadlocks part 3
In part two, I mentioned that large files could take tens of seconds to process and the processing of a file is wrapped into a single transaction that either fails or succeeds. I also said this problem seemed to occur with outlets that typically polled larger files. I did not mention though that stock transactions appear towards the beginning of the file. Putting this information together leads us to believe an update to a row in STKTAB may take tens of seconds from the update to the commit or rollback. During this time each data block has a transaction marked in the block header. To get our deadlock we now need to have another outlet processing its POS data during the time the first transaction is processing and also have STKTAB records belonging to the same data block. In addition the must be no space to mark the transaction in the block header. So how likely is that?
Well, I guess that between 10 and 15 percent of the outlets belong to the two big brands so that gives a good chance that at some times the processing of two or more such outlets would overlap. But why should they have stock items in the same data blocks and why should the block be too full to allow more than one transaction? First a quick look at the table shows it was created with PCTFREE of 10 and INITRANS of 1, this will be important a little later. Looking at the application code I see that the code to allocate stock items to outlets only supplies the two key columns and leaves the date columns as NULL. The keys only take 5 bytes each, so we are inserting 10 byte rows. But when we update the row the first time we add 14 bytes more for the two dates. But we only had 10% of the block available for updates. No wonder it fills. Perhaps PCTFREE should be more like 60, but then for the rows that have already had the date columns set no further change in row length is possible so 60 would appear to be a bit excessive for the majority of the legacy data in the table.
One feature of the application is method to allocate the same stock items to all of the outlets in a brand. This is basically a nested loop that allocates (inserts rows into the STKTAB) for each outlet in the brand and each item on a list. If the item list is short, say only 3 or 4 items, it is very possible that all (or at least, many) of the rows would be stored in the same data block. And if new stock items are allocated then the business expects them to sell in the near future so updates will be likely and quite probably to the same block.
Well I convinced myself. But what should we do about it?
A quick and dirty interim solution is to copy the data out of the table then rebuild it using an ordered select from the copy table. This will not fix the problem but might make it containable for a short time. We could also ALTER TABLE MOVE it to set a higher value of INITRANS, of course this needs a bit more testing as we making changes to the database. Another option is to change to STKTAB table to an Index Organised Table using Outlet_ID as the leading edge of the index key. This should ensure that all of the rows for a given outlet are close to each outer and, more significantly, less likely to be in the same block as those for a contending outlet. This of course needs even more test especially to see what happens when all of those short rows expand; we want to avoid both wasted space and overflow.