Datacenter Migration, Authentication Changes, and Solaris – The Perfect Storm

Peter Griffin is calm while the world burns around him

The action of moving an Oracle Primavera P6 database from one server to another should be fairly simple. Changing the authentication type on that database from a virtual LDAP server to native Active Directory should also be simple. Doing both at the same time? Not so much. And even though I generally try to keep each major change in an application isolated from others so as not to cause unnecessary problems, the tight timelines I’ve had to deal with lately have made keeping these sorts of things separate well nigh impossible. The fact that most of our third-party applications are not under vendor support contracts also adds to the fun!

The good news is that I’ve gotten used to thinking about every possible thing that could go wrong with all these moving parts, and have an excellent record in averting disaster. Of course, there’s a first time for everything.

For a couple of months I had been aware that our VLDAP servers were going to be shut down, and all applications using them would have to authenticate their users directly against AD or some other method.

I had inherited responsibility for maintaining our P6 environment earlier this year, but had not had a chance to take any formal training or to dig too far into the documentation from Oracle, as I have many other applications to deal with besides this one. It’s been a power-through-it-with-my-hair-on-fire kind of year.

I was somewhat familiar with the P6 Web Administrator, and I learned that is where the VLDAP authentication settings were. Before the VLDAP server was shut down, I managed to change the settings to point at AD for (what I thought was) all of the P6 databases.

Users were able to get into the application both through a thick client via Citrix and also through the Web client – everything was working. All of the VLDAP servers were turned off but not yet decommissioned. Life was good!

A few days later, the P6 databases – which are Oracle Databases – were to be moved to a different database server which had a different IP address and the original databases were to be placed in a standby mode.

The same morning that the databases were to be moved, a user calls in a trouble ticket. He can’t log on to one of the P6 databases – not the primary one that’s used 95%+ of the time, but one that only had occasional use. The error he received specifically mentions the name of the old VLDAP server that I thought had been completely disconnected from the system.

About the time I get the call that this is happening, the databases get moved and no users are able to get into the system. Fortunately, this move was planned and everyone was expecting an outage.

I ask to have the named VLDAP server turned back on long enough to get this user back into the system and for me to reconfigure the authentication on his database (and any other databases I might have somehow missed).

This is where things get really interesting.

I had found the location of the file that configures the database config information for the thick client, which most people were using. For anyone who is interested, it is found in these two locations on a Windows server:

%PROGRAMDATA%\Oracle\Primavera P6\P6 Professional\PrmBootStrap.xml

and

C:\Users\All Users\Oracle\Primavera P6\P6 Professional\PrmBootStrap.xml.

The database connection info can be changed there.

<?xml version="1.0" encoding="UTF-8"?>
<BootStrap MajorVersion="8" MinorVersion="2" PatchVersion="0" DeplyomentVersion="4">
	<DataBaseAliases DefaultPMAlias="primavp">
		<Alias>
			<Name>primavp</Name>
			<AppType>Project Management</AppType>
			<Connection>
				<DriverName>Oracle</DriverName>
				<BlobSize>-1</BlobSize>
				<DataBase>primavp</DataBase>
				<User_Name>pubuser</User_Name>
				<Password>(encrypted password)</Password>
				<LocaleCode>0000</LocaleCode>
				<PublicGroupId>1</PublicGroupId>
				<ReadCommited>ReadCommited</ReadCommited>
				<RowsetSize>128</RowsetSize>
			</Connection>
		</Alias>
[... an Alias block for each P6 database ...]

I wouldn’t recommend changing the encrypted password in the Alias blocks, nor anything after the DatabaseAliases section.

Be sure to also increase the number in the DeploymentVersion field at the top of the file, otherwise, it won’t be sent to the working directory when P6 is started again.

The working directory for this file for each user is:

%LOCALAPPDATA%\Oracle\Primavera P6\P6 Professional\PrmBootStrap.xml.

This is how the file looks if Oracle Database is used for P6. Should you use MS SQL Server or some other system, YMMV.

Incidentally, I only had to make changes here because the name of the database actually changed – in the old location, it had a suffix that was removed. I found that as long as I could tnsping the database from the computer hosting the thick client, the users could connect. For an added layer of insurance, I added a tnsnames.ora file and added entries using the old database names, but pointed at the new database service names and servers, and finally changed the order to “(TNSNAMES, LDAP)” in the sqlnet.ora file to ensure that tnsnames was queried before the Oracle LDAP servers – just in case there was something I was missing here in changing the P6 XML file.

At this point, all users could get into all databases using the thick client, and the outage did not extend past the expected time.

I then proceeded to go to the P6 Web Administrator to look for the places I missed when changing the authentication settings. This is when a bit of panic set in.

P6 Web Administrator error message

A message like the one above shows up – and not only can I not get into the Web Administrator, I can’t get into the Web client either.

I assume (wrongly, by the way – don’t we know what assuming does by now?!) that I can change some XML or INI files and everything will be back to normal.

At this point, be aware that I had a full backup of the server and all databases, and I always backed up any files I changed so as not to corrupt the environment beyond repair. This is not business as usual – but as we had no working test environment, and no vendor support, I had little choice but (and perhaps most importantly – explicit permission!) to work in production.

Unlike the thick client, which is running on three Windows 2008 R2 servers, the Web client runs on Solaris 10 with Oracle WebLogic. Though I can log into the OS via SSH, I don’t have admin rights on the box.

I decide to run several searches, checking for all files with various strings – primarily the old database server name and the names of the databases.

The search I run uses this command:

find /u02/app/bea64 -type f -exec grep '(DB server name, etc.)' /dev/null {} +

I chose the “bea64” folder as the starting point of the search, as that’s where WebLogic is installed and all P6 files and folders seemed to be under that folder.

For further info about this use of the find command in UNIX: StackExchange.

I found a few instances here and there where the server names could be changed. I even found a single binary file that I tried editing with a hex editor. It didn’t matter what I did – I would stop and restart the Web services and the old server names would still show up in the web logs. I noticed that the logs always had a JDBC connection string – which told me that Oracle LDAP and tnsnames.ora were not being used by WebLogic to find the databases. This was made especially clear when I found that the Oracle client wasn’t fully set up correctly – meaning that I couldn’t tnsping the databases from the bash prompt without a few environment variables being set up by me.

I then found what would prove to be the tool that would save the day – the thick client (not web-based!) P6 Web Administrator! It was located at {p6_eppm_home}/p6/dbconfigpv.sh, where {p6_eppm_home} is the home folder for P6 selected at installation.

I tried running the app: ./dbconfig.sh…

First message:

Running Database Configuration Setup...

…getting excited now! And then…

Exception in thread "main" java.awt.HeadlessException:
No X11 DISPLAY variable was set, but this program performed an operation which requires it.
        at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:159)
        at java.awt.Window.<init>(Window.java:432)
        at java.awt.Frame.<init>(Frame.java:403)
        at java.awt.Frame.<init>(Frame.java:368)
        at javax.swing.SwingUtilities$SharedOwnerFrame.<init>(SwingUtilities.java:1733)
        at javax.swing.SwingUtilities.getSharedOwnerFrame(SwingUtilities.java:1810)
        at javax.swing.JWindow.<init>(JWindow.java:168)
        at javax.swing.JWindow.<init>(JWindow.java:120)
        at com.primavera.admintool.Initializing.<init>(Unknown Source)
        at com.primavera.admintool.AdminApp.main(Unknown Source)

So close!

I hadn’t worked much with UNIX in the last few years, but I remembered the X Windows GUI and decided to find out if I could run this app remotely using X Windows.

I was, up until this point, using PuTTY as my SSH client, and WinSCP to transfer files via SSH. But how to run X Windows?

After doing some searching, I found Cygwin, which I had used several years ago in attempting to set up Ruby on Rails in Windows.

Using this wonderful documentation, I was quickly able to get up and running.

I was able to open the P6 Web administrator remotely using X Windows!

At the local prompt in Cygwin:

$ export DISPLAY=:0.0

Then…

$ ssh -Y username@servername_or_IP_address

Enter your password at the prompt. You should then be logged onto the remote system via SSH. Now, at the remote prompt…

$ xterm &

Nota bene – from the documentation mentioned above: “By default, the OpenSSH server does not allow forwarded X connections. This must be configured on the remote host by adding X11Forwarding yes to the sshd_config configuration file. The OpenSSH server must be restarted or SIGHUP’ed to re-read the configuration file after it is changed.” Also: “The OpenSSH server requires the xauth command to be available to forward X connections. Consequently, it must be installed on the remote host.”

At this point, a new xterm window should open that has another command prompt from the remote system. Run the executable command – in this case, “./dbconfigpv.sh” from its folder, and the GUI-based app should open!

The thick-client P6 Web Administrator opened, I was able to change the JDBC connection strings for each database and also to reset the VLDAP authentication back to AD.

I saved my settings, restarted all services, crossed my fingers and stood on one leg.

BINGO! Everything worked now – even with the VLDAP server turned off!

Connecting to an Oracle Database RAC with ColdFusion 11

Adobe ColdFusion logo

Recently we’ve been moving quite a few servers and databases into our new datacenter, so I’ve been having to learn – more quickly than usual – about the often obscure differences between older and newer versions of various software including Solaris and Oracle Database.

A change to an Oracle RAC system caused the most interesting trouble I’ve had so far during this move.

In ColdFusion 11, there are native JDBC drivers for Oracle that are normally selected in the Data Source Management page using the logical and expected “Oracle” selection.

ColdFusion Data Source Management page

However, I was chagrined to find out that if you use this to try to connect to an Oracle Database that is load balanced with RAC, this selection will not work. Upon trying to connect, you will receive the error:

“[Macromedia][Oracle JDBC Driver][Oracle]ORA-12505 Connection refused, the specified SID (whatever string you put in the SID field) was not recognized by the Oracle server”

Neither Oracle nor Adobe documentation was much help on this. Fortunately, I came across a thread on the Adobe forum where a user was having this problem.

The answer is to, instead of selecting “Oracle”, select “other” in the Driver drop-down list. Then, use this connection string. The last variable (AlternateServers), appears to be optional. I’m not certain whether or not the IP addresses also need to be enclosed in parentheses.

jdbc:macromedia:oracle://(ORACLE SERVER):1521;SERVICENAME=(serviceName);sendStringParametersAsUnicode=false;MaxPooledState ments=1000;AlternateServers=((IP ADDRESS1),(IP address 2))

This is basically how I ended up setting the connection up:

CF other driver for Oracle RAC

Oracle Database Error ORA-01013 in Excel VBA After Migrating Application to New Server

Oracle Database logo

I was testing an old Excel application that I had moved from one server to another and kept getting this error when running it on the new server:

VBA run-time error ORA-01013

The error ORA-01013 (“user requested cancel of current operation”) is commonly a timeout issue. What could be causing a query to timeout on one server, but not another?

In debugging the code, I found that no CommandTimeout property had been set on the Connection object.

Dim cnnRawData As ADODB.Connection
'...
Set cnnRawData = New ADODB.Connection
cnnRawData.ConnectionString = glbConnectString
cnnRawData.Open
cnnRawData.Execute glbSetCurrentSchema

If it is not explicitly set, this property will default to 30 seconds. However, this did not explain why the query did not timeout on the old server.

The connection to the Oracle Database was being made via ODBC, so I opened the 32-bit ODBC Administrator to see if the connections had been set up differently. This is where I realized what the problem was.

On the old server, the connection configuration looked like this:

ODBC config on old server

And on the new server it looked like this:

ODBC config on new server

See the difference? (Aside from the Description field, that is!)

It’s the “Enable Query Timeout” checkbox in the Application tab. By default, when creating a new connection, it is checked. I unchecked the box and saved the connection, and the error went away.

This is not necessarily the best way to do it, however. I might want some queries to timeout, and not others. In this case the solution would be to set the CommandTimeout property for this particular VBA macro to 0:

Dim cnnRawData As ADODB.Connection
'...
Set cnnRawData = New ADODB.Connection
cnnRawData.ConnectionString = glbConnectString
cnnRawData.CommandTimeout = 0
cnnRawData.Open
cnnRawData.Execute glbSetCurrentSchema

This would prevent timeouts on this particular connection, but not any others that might be made using the same ODBC connection.