Sunday, 30 September 2012

IMS's HALDBA again

Last time we looked at the first part of Neil Price’s presentation to the Virtual IMS user group back in February. Neil is a Senior DBA and Systems Programmer with TNT Express ICS. Here’s some more highlights from his presentation.

Neil suggested that at his site key-range partitions can be so different from each other that they need to be treated like independent databases. For example if the volume of data in one partition is growing much faster than in the others, it might need a lot more free space so that you don’t have to resize or split it too soon, and you might want to set the free space warning threshold lower, so that you have time to react. Another example is where the average database record length in one partition is much lower than in the others. If you want to be alerted when the record lengths increase, the threshold for this partition would need to be correspondingly lower.

When it comes to tuning – what Neil called “Performance Tweaks” – they have various processes where values are obtained in physical sequential order from the root keys of one database and used to access one or more related databases with the same root key. If the databases involved are all HDAM and use the DFSHDC40 randomizing routine, this works well because the randomizer tends to keep keys in more or less the same sequence regardless of the total number of Root Anchor Points, as long as this is much greater than the number of roots. This means that all the databases are accessed in physical sequential order, which more or less guarantees that each database block is read only once and so minimizes I/O, as well as generating a sequential access pattern that maximizes read hits in disk cache and more recently has enabled them to take advantage of OSAM Sequential Buffering. But once you go to PHDAM, the keys are only kept in this sequence within each partition. Optimal performance is only restored when all the databases involved are HALDBs with identical partitioning arrangements.

Neil Price identified something that he said confuses a lot of people, including some in IBM support, not least because the manuals are unclear. Normally when a KSDS data CI fills up, it split roughly in half. That’s not good for ascending keys such as the timestamps used by most of their HIDAM databases, because the old or “lower” CI will stay half-empty forever. In the same way, a CA split will leave the old CA with half its CIs empty and the rest usually half-empty, in other words only about a quarter full. If the insert rates are high, the KSDS can end up with a very high proportion of unusable free space and might approach the 4GB limit. However if Sequential Mode is used for the inserts, CI and CA splits are done at the point of the insert – which for ascending keys means that it starts a new, empty CI or CA and leaves the old one as it is. This is good for performance – CA splits have been known to take seconds to complete – as well as greatly slowing down the dataset growth.

It’s possible to specify sequential-mode inserts in the IMS buffer pool specification member by coding INSERT=SEQ on the OPTIONS statement, but that would apply to every KSDS. Instead they include a DBD statement for each index they want treated this way and specify FREESPACE=YES, which is the dataset-level equivalent. Sequential-mode inserts also honour the free space specifications for the KSDS. This means that if free space is specified and all inserts are done this way, the free space will never get used and is a complete waste of space. All their inserts are done through the online system, so FREESPACE=YES implies that there should be no free space specified for the cluster!

FREESPACE=YES would work almost as well with constantly descending keys. It might also be beneficial for some indexes whose keys are generally, rather than strictly, ascending or descending, but until they’ve got IMS 12 and can change buffer pool parameters dynamically it’s going to be difficult to determine. Currently they specify FREESPACE=YES for 6 of their PSINDEXes, of which only one has a strictly ascending key and the rest start with a date. There used to be more, but after analysing the VSAM statistics, Neil reverted some to normal inserts and non-zero free space. This reduced the rate of CA splits, ie growth, and also the overall I/O. Neil reminded us that PSINDEX records can be relatively large, which makes the total index much larger than its non-HALDB version. It also means that fewer records are retrieved in one I/O, which is bad for sequential processing.

For Neil’s site, most of their indexes are accessed with a partial key, which in some cases will match hundreds of entries. Neil attempted to minimize the effects by making the VSAM buffer pools bigger, although he knew he wouldn’t be able to get the I/O rates back down to pre-HALDB levels. He reasoned that the buffers ought to hold the same number of VSAM data records as before, so he aimed to increase the number of buffers in line with the increase in the record length. When he calculated the new subpool sizes he discovered that some came out greater than the limit of 32,767 buffers, and in any case there wasn’t enough virtual storage available for such large increases, so he had to scale them all back anyway. But the I/O rates for most of the OSAM subpools are higher than for any of the VSAM ones, so this is maybe not the area with the most potential for I/O reduction.

To sum up, Neil informed the user group that HALDBs can change your life if you’re a DBA! Once you’ve converted your databases, that’s when the fun really begins

You might like to know that the Virtual IMS user group can be found at The next meeting is on 9 October at 11:30am EDT, when the guest speaker will be Aurora Emanuela Dell’Anno, a Senior Engineering Services Architect with CA Technology, who’ll be talking about “IMS performance – taming the beast”. It will be well worth attending – and being a virtual meeting, there’s no need to leave your desk. See the Web site for more details.

No comments: