Lessons Learned Upgrading IBM ISAM v9.0.7.1 to ISVA 10.0.3.1

Lessons Learned Upgrading IBM ISAM v9.0.7.1 to ISVA 10.0.3.1

In October 2021, I started working with a customer on a project to upgrade their ISAM v9.0.7.1 environment to ISVA 10.  The primary motivator was the IBM announcement that ISAM v9 was going out of support at the end of April 2022.

IBM has published an active compilation of known issues or concerns with an upgrade to the latest release that are encountered in the field.  This blog goes into the specific issues we encountered that led to some of the fixes in 10.0.3.1.

The Challenge

This customer has a complex environment consisting of just under 350 ISAM appliances and hundreds of supporting infrastructure servers in 14 independent ISAM environments. Complexity ranges from single server implementations in the development and QA environments, to multi-server/multi-site implementation in their non-production environments.

With millions of customers and 40,000 employees, upgrading enterprise infrastructure is always considered a high risk activity. It requires full regression testing by every line of business and application owner.  Because of this level of testing, appliance firmware upgrades (even a minor point upgrade) are only undertaken once per year.

The entire infrastructure is managed with an automation platform based on Dev Ops tools such as Ansible Tower, BitBucket, Artifactory, Bamboo, Jira and Confluence.  We introduced automation three years ago to reduce the work effort of developers by 50%.  At the time, developers were responsible for writing deployment scripts, most of which were environment specific.  The migration to Declarative Driven Deployment is the subject of another blog.

Previous Upgrade

As we reviewed our experience upgrading from ISAM v9.0.4 to v9.0.7.1, we incorporated the lessons learned during that upgrade, the first of which was to engage IBM support early in the process.  A major update brings lots of new features and changes in behavior that need to be understood and potentially mitigated.

The biggest surprise in the previous upgrade was an unexpected change to the default session timeout.  The lesson learned was to ensure that we had a definition for every configuration item on both the appliance and runtime components.

Getting Started

Back in October 2021, our target version was ISVA 10.0.2.0.  In the first four months we discovered 4 product defects for which IBM provided fixpacks and subsequently fixed in v10.0.3.1.  We also ran into some interesting problems that traced back to the appliance upgrade history.  Most of the original appliances had been installed at ISAM v8, so many settings had been carried forward from that version, some of which required targeted fixpacks to update in-accessible settings.

Early Issues

One of the first issues we ran into after upgrading was the Policy Server failed to start because it could not connect to the LDAP server over SSL (IJ37329).  This was the result of the ssl-keyfile-pwd entry in the configuration file.  Removing this entry (allowing the runtime to use the stash file) eliminated the problem.

ISVA 10.0.3.0

During the project we did take advantage of the ability to roll back to the previous version of the firmware on the inactive partition.  To upgrade again, we discovered that we needed to add an Advanced Tuning Parameter to allow any version of the firmware to be manually uploaded: sys.direct.update.allowed = true

The next problem was the failure of the Policy GUI (WPM) on the appliance.  This was the result of a permissions issue on the configuration file (IJ37550).

After upgrading our first few environments we discovered we could no longer manage the Network Time Protocol (NTP) settings on the appliance (IJ37516) or do authorization server cleanup (IJ37365).  While not a project blocker, it was inconvenient until we received fixpacks from IBM. 

We were now three months into the project and our target version was now 10.0.3.0.  This version brought a change in behavior regarding keystores.  CMS keystores were deprecated in favour of PKCS12 format.  This change in behavior required a review of our existing Deployment and Build Automation scripts as we were supporting both ISAM v9 and ISVA v10.0.3 at the time.  Every variable or task that referred to a keystore had to have a conditional check based on the appliance version as v10.0.3 did not accept the file extension in Proxy configuration files.  Fortunately, our keystore definitions in our Ansible Inventory repository included both name and filename properties, so the condition was easy to incorporate.

ISVA 10.0.3.1

By the end of March 2022 we were locked in to v10.0.3.1 for upgrade.  This was the latest version we could get into the Pre-Production environment and still have enough time for full regression testing prior to Production go live in July.

We ran into a few problems at this point in the upgrade journey.  Prior to ISVA 10.0.3, the WebSEAL Proxy would silently provide the default certificate to a junction server if mutual SSL was required.  The customer was unknowingly relying on this behavior for some backend servers, not realizing mutual SSL was required.  After upgrading to ISVA 10.0.3.1, all of these junctions failed because there was no longer a default certificate in the new PKCS12 keystore.  We had to update almost 100 junction definitions with the certificate label.

Also related to the keystore change was SAML partner issues due to the stricter requirements for certificates (IJ38991).  The fix for this required both a fixpack and configuration change to disable the certificate path validation.  The other option would have been importing the certificate chain into the rt_profile_keys keystore.

On one of WebSEAL Proxy instances we ran into a problem where session cookies to the ISVA MGA component were getting lost.  This was the only instance that was managing cookies for the MGA.  It turns out that there is an undocumented parameter ([server] clear-unauth-managed-cookies) that must be set to false to enable the legacy behavior on reauthentication.

DB2 External Databases

In the original installation of ISAM at this customer, only the High Volume Database (HVDB) was externalized.  This external database had to be manually upgraded during the process to propagate schema changes for new and updated features.  Normally this is a simple process using the supplied Java program and update files that can be downloaded from the appliance.

In our case, the externalized database is a bit more complex, specifically the database user does not have privileges to modify the database schema so the Java program cannot be used.  This required consolidating all required updates into a single DDL file for the DBAs to apply to the database.  This file was updated for each new version during the project.  In our case, the changes to the database were backwards compatible with the previous version, so no action was required if the upgrade was backed out.

It was a different experience with the externalized CONFIG database for the newer ISAM appliance clusters.  Not only were the changes more complex, they were also not backwards compatible with the previous version. A backup had to be taken prior to upgrade and restored if the environment was rolled back to the previous version.  In some cases, this was an onerous task because the longer the ISVA 10 environment had been in service, the more changes had been made to configuration, meaning that they had to be re-applied after a rollback.  For this reason, we strongly discouraged rollback after more than a week.

The CONFIG database upgrade script also had to include any changes that were required for fixpacks that updated the MGA configuration.

Conclusion

At most, I only get to work on projects of this magnitude once per year.  It is incredibly satisfying to delve deep into the product and how all the components interact not only with each other, but with our customizations and automation processes.

I can’t leave this blog without recognizing that I work with teams and individuals who are supportive and knowledgeable.  It makes all the difference in the world.

About Post Author

Kevin Jeffery

Kevin has worked in the Services, Utilities and Finance Industries in IT Architecture, Administration and Process Design, and Software Development. With over 20 years of experience in Information Technology, Kevin currently works as a Cyber Security Consultant specializing in IAM deployment and operations automation.

Leave Comments