Thursday, December 6, 2007

Essbase API error fix - (geeky)

Edit: If you are running Windows 2008, Windows 7, Windows Vista (maybe) and later operating systems, see this link for updated information: http://timtows-hyperion-blog.blogspot.com/2011/05/essbase-api-ephemeral-port.html.

Here is something we came across while working on issues reported by a few olapunderground Outline Extractor users. The symptoms they were seeing is that the ancestor members were not being returned for some of the members in large outlines. I have also made this content downloadable as a pdf document here.

Essbase error 1042006 when API tries to do too many connections from one machine in quick succession

Note: this document was originally obtained from Hyperion tech support sources and has been edited to remove identifying information for the customer, etc and to place context into the document. We also included notes from our internal testing with the olapunderground Outline Extractor tool. If you have other comments about this document which others may find helpful, please let us know and we will try to incorporate them into this document.
– Tim Tow (timtow@appliedolap.com)


Occasionally, we have heard of the Essbase API failing when it tries to do too many connections in quick succession on the Windows operating system. One of the more frequent places we have seen this occurring is in the olapunderground Outline Extractor which does literally thousands to hundreds of thousands of calls (or more) to the API to get outline information. The typical scenario seen in that product is that parent member information may be missing for members in the extract file.

The problem relates to the port numbers used on the client. Those ports are ephemeral (“briefly used”) port numbers. The Windows default for the TcpTimedWaitDelay is 240 seconds (valid values are 30-300) and for the MaxUserPort is 5000 (valid values are 5,000-65,534). The default values essentially limit the number of ephemeral ports available and the API runs out of ports to use for the connection. Adjustment of the MaxUserPort and TcpTimedWaitDelay settings in the Windows Registry may fix the error. Other alternatives to solving this issue include modifying the API code to avoid a massive number of calls to the application in a short period of time. For member manipulation calls, for example, you may try to get the outline to the client machine and then lookup the attributes of the members using a local copy of the outline.

The values of these 2 settings determine how many connections can open on the client side and how long those connections last. You can examine how many client ports are in a TIME_WAIT state by using the Netstat tool on the client computer. Run the Netstat tool with the -n flag and count the number of client sockets to your Server IP address that are in a TIME_WAIT state. Note that the MaxUserPort and TcpTimedWaitDelay settings are applicable only for a client computer that is rapidly opening and closing connections to a remote computer.

When you use the TCP/IP protocol to open a connection to a computer that is running Essbase, the underlying network library opens a TCP/IP socket to the that computer. When it opens this socket, the network library does not enable the SO_REUSEADDR TCP/IP socket option. Note that the Essbase network library specifically does not enable the SO_REUSEADDR TCP/IP socket option for security reasons. When SO_REUSEADDR is enabled, a malicious user can hijack a client port to Essbase and use the credentials that the client supplies to gain access to the computer that is running Essbase. By default, because the network library does not enable the SO_REUSEADDR socket option, every time you open and close a socket through the network library on the client side, the socket enters a TIME_WAIT state for four minutes (240 seconds using the default TcpTimedWaitDelay). If you are rapidly opening and closing connections over TCP/IP , you are rapidly opening and closing TCP/IP sockets. In other words, each connection has one TCP/IP socket. If you rapidly open and close approximately 4000 sockets in less than 240 seconds, you will reach the default maximum setting for client anonymous ports, and new socket connection attempts fail until the existing set of TIME_WAIT sockets times out.

In our testing with the olapunderground Outline Extractor, we noted the following items while debugging a reported issue and testing these registry adjustments:

  • We replicated the problem on Windows XP but, despite some limited efforts, did not replicate the issue on Windows 2000; we did not try to replicate the issue on Windows 2003. Our tests were limited to a single outline submitted by an Outline Extractor user.
  • The registry key ‘MaxUserPort’ did not appear to exist by default in the Windows XP registry. We had to create it and, in our test case, a value of 12000 solved the issue. It seems logical, however, that the processor speed of the client machine, combined with the code path of the actual API code, could have a tremendous effect on whether the number of ports becomes an issue.
  • After changing this registry key, we needed to reboot XP for the new setting to take effect.
Note: the below was apparently provided by a Hyperion engineer; the name has been removed to maintain keep the identity anonymous. This section is quite technical and goes into the background of how the ports work.

Registered Ports, ports between 1024 and 49151, are listed by the IANA and on most systems can be used by applications or programs executed by users. Table C.2 specifies the port used by the server process as its contact port. The IANA registers uses of these ports as a convenience to the Internet community. To the extent possible, these same port assignments are used with UDP. The Registered Ports are in the numerical range of 1024-49151. The Registered Ports between 1024 and 5000 are also referred to as the Ephemeral Ports. At least on Windows, the TCP stack (OS) re-uses these ports internally on every socket connection cycling from 1024...5000 and wrapping around to 1024 again. This could lead to some interesting problems if sockets are opened and close very quickly as there is usually a time delay before that port is made available again...

Second, the number of user-accessible ephemeral ports that can be used to source outbound connections is configurable with the MaxUserPort registry entry (HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters). By default, when an application requests any socket from the system to use for an outbound call, a port between the values of 1024 and 5000 is supplied. You can use the MaxUserPort registry entry to set the value of the highest port number to be used for outbound connections. For example, setting this value to 10000 would make approximately 9000 user ports available for outbound connections. For more details, see RFC 793. See also the MaxFreeTcbs and MaxHashTableSize registry settings (HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters).

Below are also excerpts from Microsoft website and the links for your references:
  • TcpTimedWaitDelay

    Determines the time that must elapse before TCP can release a closed connection and reuse its resources. This interval between closure and release is known as the TIME_WAIT state or 2MSL state. During this time, the connection can be reopened at much less cost to the client and server than establishing a new connection.

    Registry key=HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
    Data type=REG_DWORD
    Default value=0xF0 (240 seconds = 4 minutes)
    Valid values=0x1E to 0x12C (30 to 300 seconds)

7 comments:

Anonymous said...

Great information. Have been looking for explenation of possible causes of connection errors.

Anonymous said...

Rather cool place you've got here. Thanx for it. I like such themes and anything connected to them. BTW, try to add some pics :).

Anonymous said...

I actually applied this same fix to a Hyperion Reports server that was having issues pulling back members from large outlines to the member selection list in Hyperion Reports. It worked like a charm. Thanks, Tim!

Jiten said...

I am having issue while extracting one of the dimension. For some reason it creates few Orphan members whiouth any parent. Checked in the system, and those are actual members. So basically its not showing the paretn information in the generation extract. Need some help to resolve the issue.

Tim Tow said...

The symptoms you described are the likely caused by the issue described in this blog post.. Did you try the workarounds suggested in this post?

Tim

bobcary@sysmatrix.net said...

Hi Tim, Admire your Hyperion work a lot.
I'm having what may be a port-related problem at a client site. They are on EssBase 6.2 (I know, upgrade). In the past we were able to 'COPYAPP sourceapp destapp' as described in the Admin Guide. Now we get a copy of all app files on the EB srvr, but the EAM copy screen hangs, and we get a 10060 when accessing DB properties or trying s read. The Server log has only one 1042013 err. We can display App-level properties. I've tried the copy thru EssCmd/Maxl, tried kicking up NETDELAY, etc, tried adding SERVERPORTS. We moved the server to a new support service out of state, and the production app works fine. Usage is not heavy, data not large. We had only two server ports, support is opening more, we are licensed for 10. Q1: Does EB dynamically adjust to any open ports as defined in essbase.cfg, or do we reboot the server? Q2: If ports aren't the prob, any other guesses? I just did a new 'create application', seems ok. But after 'create DB' we get the same error 10060 even tho there's no data in the DB.

With regards BobCary@sysmatrix.net

Tim Tow said...

Did you try the port fix described in this article?

Tim