Friday, December 13, 2013

Smart View Internals: Exploring the Plumbing of Smart View

Ever wonder how Smart View stores the information it needs inside an Excel file?  I thought I would take a look to see what I could learn.  First, I created this simple multi-grid retrieve in Smart View.



Note that the file was saved in the Excel 2007 and higher format (with an .xlsx file extension).  Did you know that the xlsx format is really just a specialized zip file?  Seriously.  It is a zip file containing various files that are primarily in xml format.  I saved the workbook, added the .zip extension to the filename, and opened it in WinRar.  Here is what I found.

I opened the xl folder to find a series of files and folders.


Next, I opened the worksheets folder to see what was in there.  Of course, it is a directory of xml files containing the contents of the worksheets.


My Essbase retrieves were on the sheet named Sheet1, so let’s take a look at what is in the sheet1.xml file.   The xml is quite large, so I can’t show all of it here, but needless to say, there is a bunch of information in the file.  The cell contents are only one of the things in the file.  Here is an excerpt that shows the contents of row 5 of the spreadsheet.



This is interesting as it shows the numbers but not the member name.  What is the deal with that?  I noticed there is an attribute, ‘t’, on that node.  I am guessing that the attribute t=”s” means the cell type is a string.  I had noticed that in one of the zip file screenshots, there was a file named sharedStrings.xml.  Hmm...  I took a look at that file and guess what I found?




That’s right!  The 5th item, assuming you start counting at zero like all good programmers do, is Profit.   That number corresponds perfectly with the value specified in the xml for cell B5, which was five (circled in blue in the xml file above).   OK, so when are we going to get to Smart View stuff?  The answer is pretty quick.  I continued looking at sheet1.xml and found these nodes near the bottom.

Hmm, custom properties that contain the name Hyperion?  Bingo!  There were a number of custom property files in the xml file.  Let’s focus on those.

Custom property #1 is identified by the name CellIDs.  The corresponding file, customProperty1.bin, contained only the empty xml node <root />.  Apparently there aren’t any CellIDs in this workbook.

Custom property #2 is identified by the name ConnName.  The file customProperty2.bin contains the string ‘Sample Basic’ which is the name of my connection.

Custom property #3 is named ConnPOV but it appears to contain the connection details in xml format.  Here is an excerpt of the xml.


Custom property #4 is named HyperionPOVXML and the corresponding file contains xml which lines up with the page fields I have in my worksheet.



What is interesting about the POV xml is that I have two different retrieves that both have working POV selectors which are both implemented as list-type data validations in Excel.  I don’t know what happens internally if I save different values for the POV.

Custom property #5 is labeled HyperionXML.  It appears to contain the information about the Essbase retrieval, but it doesn't appear to be the actual retrieval xml because it doesn't contain the numeric data.  My guess is that this xml is used to track what is on the worksheet from a Hyperion standpoint.



There is a lot of information in this simple xml stream, but the most interesting information is contained in the slice element.  Below is a close-up of contents in the slice.



The slice covers 6 rows and 7 columns for a total of 42 cells.  It is interesting that the Smart View team chose to serialize their XML in this manner for a couple of reasons.  First, the pipe delimited format means that every cell must be represented regardless of whether it has a value or not.  This really isn’t too much of a problem unless your spreadsheet range is pretty sparse.  The second thing about this format is that the xml itself is easy and fast to parse, but the resulting strings need to be parsed again to be usable.  For example, the vals node will get split into an array containing 42 elements.  The code must then loop the 42 elements and process them individually.  The other nodes, such as the status, contain other pieces of information about the grid.  The status codes appear to be cell attributes returned by Essbase; these attributes are used to apply formatting to cells in the same way the Excel add-in UseStyles would apply formatting.  There are a couple of things to take away:

  1. In addition to the data on the worksheet itself, there is potentially *a lot* of information stored under the covers in a Smart View file.
  2. String parsing is a computation-intensive operation and can hurt performance.  Multiply that workload by 8 because, depending on the operation and perhaps the provider, all 8 xml nodes above may need to be parsed.

In addition, the number of rows and columns shown in the slice may be important when you are looking at performance.  Smart View must look at the worksheet to determine the size of the range to read in order to send it to Essbase.  In the case of a non-multi-grid retrieve, the range may not be known and, as a result, the grid may be sized based on the UsedRange of the worksheet.  In our work with Dodeca, we have found that workbooks converted from the older xls format to the newer xlsx format, which support a larger number of cells, may have the UsedRange flagged internally to be 65,536 rows by 256 columns.  One culprit appears to be formatting applied to the sheet in a haphazard fashion.  In Dodeca, this resulted in a minor issue which resulted in a larger memory allocation on the server.   Based on the format of the Smart View xml, as compared to the more efficient design of the Dodeca xml format, if this were to happen in Smart View it may cause a larger issue due to the number of cells that would need to be parsed and processed.  Disclaimer: I did not attempt to replicate this issue in Smart View but rather is an educated guess based on my experience with spreadsheet behavior.

Note: The Dodeca xml format does not need to contain information for cells that are blank.  This format reduces the size and the processing cycles necessary to complete the task.  In addition, when we originally designed Dodeca, we tested a format similar to the one used today by Smart View and found it to be slower and less efficient.

Considering all of this information, I believe the xml format would be difficult for the Smart View team to change at this point as it would cause compatibility issues with previously created workbooks.  Further, this discussion should give some visibility to the fact that the Smart View team faces an on-going challenge to maintain compatibility between different versions of Smart View considering that different versions distributed on desktops and different versions of the internal formats that customers may have stored in their existing Excel files.  I don’t envy their job there.

After looking at all of this, I was curious to see what the xml string would look like on a large retrieve, so I opened up Smart View, connected to Sample Basic and drilled to the bottom of the 4 largest dimensions.  The resulting sheet contained nearly 159,000 rows of data.  Interestingly enough, when I looked at the contents of customProperty5.bin inside that xlsx file, the contents were compressed.  It occurred to be a bit strange to me as the xlsx file format is already compressed, but after thinking about it for a minute it makes sense as the old xls file format probably did not automatically compress content, so compression was there primarily to compress the content when saved in the xls file format.

Custom property #6 is labeled NameConnectionMap.  The corresponding property file contains xml that appears to map the range names in the workbook to the actual grid and the connection.


Custom property #7 is labeled POVPosition. The file customProperty7.bin contains the number 4 followed by a NUL character.  Frankly, I have no idea what position 4 means.

Moving on to custom property #8 which is labeled SheetHasParityContent.  This file contains the number 1 followed by a NUL character.  This is obviously a boolean flag that tells the Smart View code that new features, such as support for multiple grids, are present in this file.

Custom property #9 is labeled SheetOptions.  The corresponding file, customProperty9.bin, contains an xml stream that (obviously) contains the Hyperion options for the sheet.


Custom property #10 is labeled ShowPOV and appears to contain a simple Boolean flag much like that in custom property #8.

Finally, custom property #11 is labeled USER_FORMATTING and may not be related to Smart View.

I did look through some of the other files in the .zip and found a few other references to Smart View, but I did not see anything significant.

So, now that we have completed an overview of what is contained in one, very simple, multi-grid file, what have we learned?

  1. There is a bunch of stuff stored under the covers when you save a Smart View retrieve as an Excel file.
  2. With the reported performance issues in certain situations with Smart View, you should now have an idea of where to look to resolve Smart View issues in your environment.

There are a number of files I did not cover in this overview that could also cause performance issues.  For example, Oracle support handled one case where they found over 60,000 Excel styles in the file.  Smart View uses Excel Styles when it applies automatic formatting to member and data cells.  When there are that many styles in the workbook, however, it is logical that Excel would have a lot of overhead searching through its internal list of Style objects to find the right one.  Accordingly, there is a styles.xml file that contains custom styles.  If you have a bunch of Style objects, you could delete the internal styles.xml file.

Note: Be sure to make a copy of your original workbook before you mess with the internal structures.  There is a possibility that you may mess it up and lose everything you have in the workbook. Further, Oracle does not support people going under-the-covers and messing with the workbook, so don’t even bring it up to support if you mess something up.

Wow, that should give you some idea of what may be going on behind the scenes with Smart View.  Even with the experience I have designing and writing the Dodeca web services that talk to Essbase, I wouldn't say that I have a deep understanding of how the information in a Smart View workbook really works.  However, one thing is for certain;  Dodeca does not put stuff like this in your Excel files.  It may be interesting to hear what you find when you explore the internals of your workbooks.

5 comments:

Steph said...

Hello,

nice job!

Could be interesting to see the effect of the function "improved metadata storage" sine it seems to be the default feature on newer versin of smart view!

GlennS said...

This is a great description of how Excel is actully storing all of the smart view properties that we never learn from Oracle. I learned a lot.
Thanks for taking the time to research this and sharing the info

Tim Tow said...

I think there is a lot to learn about how things stored in the workbook. It would be interested to trace what happens with each transaction with the server.

I don't know if this shows the improved metadata storage or not; if someone has an older Smart View workbook (or one from the very latest SV as this is 11.1.2.3), perhaps they can compare what they have to what my workbook has.

Tim

Anonymous said...

Thank you very much for sharing your research and analysis in the blog. This is really a great job done and I also learned much from it. Now I see possibilities to check SmartView behaviour and more...

Regards,
Philip Hulsebosch

Guzilla said...

I'm glad I remembered this post. Pulling my hair out trying to troubleshoot SV weirdness and wishing they'd listened to me and just bought Dodeca.