Skip to content

How to retrieve Cell Properties from a SSAS MDX query using SSIS

Scenario

This is a short article where I want to share something I discovered recently. I worked on an SSIS package where I had to query a SSAS cube using MDX and pass cell properties such as the VALUE and FORMATTED_VALUE down the data path to an output/destination file.

Problem

Typically a MDX query would have both dimension properties and cell properties and in my case, the required dimension properties were duly set using the following expression;

DIMENSION PROPERTIES MEMBER_KEY, MEMBER_CAPTION ON COLUMNS

As a result of using this expression, the values returned from the dimension properties were as expected. The challenge for me was getting those for the cell properties. For instance, a look at the column tab of the data source editor, shows the required dimension properties

(i.e. MEMBER_CAPTION and MEMBER_KEY) were successfully retrieved. Please note I have obfuscated the dimension and measures using the blue highlight just for security reasons 🙂

Figure2

The problem, however, was how to get or retrieve the cell properties, VALUE and FORMATTED_VALUE. As can be seen from the picture above the three measure values where these are required, are not being retrieved and this was an issue.

Solution

Dear reader, I am please to say that the solution to this issue is quite simple and it lays in the connection string that point to the analysis services cube.

The current connection string is defined as;

Data Source=MyServerName;Initial Catalog=MyDatabase;Provider=MSOLAP.4;Integrated Security=SSPI;

The solution is simply adding the “ReturnCellProperties=True” property to the connection like so;

Data Source=MyServerName;Initial Catalog=MyDatabase;Provider=MSOLAP.4;Integrated Security=SSPI;ReturnCellProperties=True;

After adding the property, the cell properties (i.e. VALUE and FORMATTED_VALUE) were successfully retrieved as shown below

Figure3.

Conclusion

From the above demonstration, it is clear that the “ReturnCellProperties” property is not available by default or initially when the connection string is defined. If you have a requirement to retrieve the cell properties then this property must be explicitly set in order to show this.

Hope this has been a useful read and as always any comments, additions or suggestions are welcome. Cheers

 

How to Load XML Data Files using the SSIS Script Component.

One of the common source data files that every ETL Developer/Architect will have to grapple with every now and then is XML source files. This is even truer given how common distributed systems have become and the need for applications from different sources and protocols to share data and communicate.
In this article, I will be showing how to use the SSIS Script Component to consume XML files.

Business Problem and Requirement

I recently got involved in a data warehouse project and one key requirement on the ETL implementation, was loading and processing very large XML source data feeds daily (About  50 files , each with averages sizes between 7-10GB) .

Performance of this process had deteriorated over time and one of the project deliverables was to ensure that job run as fast as possible, extract certain fields from the XML document and ignore the rest.

The Solution

Thankfully there are a number of approaches like say the XML Task/Source component and shredding XML data with SQL that can be used in consuming XML files and as with most things in Data Warehousing the answer to which option to use in solving a particular problem is “It depends”.  Thus given the business problem and requirements at hand, I would like to share with you an approach I used in solving this problem. Although this approach involved some custom coding, it gives the developer full control of the way the required fields are extracted.

To address the business problem, I used a design pattern that uses

  • Script Component.
  • LINQ to XML
  • The XMLReader .NET class to stream the XML documents

So, dear reader, please roll up your sleeves and let’s get started.

To simulate the process, I am using a sample Books. XML file which can be downloaded from here

  1. Start by opening up SQL Server Data Tools for Visual Studio (SSDT).
  2. Use the default package or create a new one, giving it a meaningful name.
  3. Drag and drop a Data Flow Task (DFT) onto the Control Flow pane of the package used in Step 2 and rename it to something meaningful.Figure1
  4. Double Click the DFT to open up the Data Flow page.
  5. Drag and drop the Script Component transform (I renamed this to “Load XML data”) on the data flow canvass and     select “Source” from the Script Component type dialog.Figure4
  6. Add a file connection manager to your package.
  7. Set Usage Type to Existing File and set the path to our XML source file.Figure2
  8. In this article, I will just be extracting the required fields and load them into a staging using an OLE DB destination (I renamed to “Ziek Staging table”). However you can perform other transformations as required prior to loading your staging tables.Figure3
  9. Double-click the script component to open up the editor.
  10. Click the Inputs and Outputs page.
  11. Rename your output from Output 0 to something more meaningful. I renamed this to “BooksData” since my sample data is a collection of Books.Figure5
  12. Expand the “BooksData” node, click on the “Output Columns” folder and use the “Add Column” button the add the required output fields and ensure each field is set to the appropriate data type. In this demonstration I am outputting only three fields (Title, Author and Publish_Date).Figure6
  13. On the Connection Managers page and add a reference to the file connection manager created in step 6 and give the connection manager a meaningful name. Example “BooksConnection”.Figure7
  14. Next go the Script page and click the “Edit Script” button to launch the VSTA script editor. Now this is where we’ll writing the custom code and making use of the powerful .NET classes I mentioned earlier. For the uninitiated, this can be a bit daunting but please follow me as we go through this journey.
  15. To keep this simple for now, I will be adding custom code to the “CreateNewOutputRows” method of the ScriptMain class and override the “AcquireConnection” method from the base class. Before continuing let me briefly explain, perhaps in a layman’s terms, what Overriding and base classes in C# parlance are. Overriding and base classes are concepts of object-oriented programming that allows one to re-use existing code (in a base class) and override an existing code in the base class with your own or custom implementation for that code. You can find further reading here and at this place too.
  16. Once in the script editor, in order to process the XML data with XmlReader and LINQ to XML, we firstly have to import the highlighted namespaces.Figure8
  17. To add missing/non-existing References;- In the Code editor, right click the Reference node- Click On “Add Reference”

    – Then select the references from the reference manager     dialog.

  18. The “AcquireConnection” method will retrieve the path to our XML file from the connection manager configured in step 13. The code is shown below;Figure9
  19. The “CreateNewOutputRows” method is where the majority of the work will happen i.e. extracting the required fields and sending them down to the data flow. I have highlighted some of the comments in red boxes and please note that the “BooksDataBuffer” object is used as a “store” from where the data is sent through to its destination.
  20. Figure10Figure11
  21. For a simple test case in this article, I used the ForEach loop container to dynamically load two sample XML files (I will expand on this a future post) and as shown below the data load succeeded.Figure12

 

Conclusion

In this article, I have shared how I was able to re-factor an XML data loading job to have it run in an acceptable and perform better. Although there are several other methods to load XML files, I recommend that if you are not “scared” of writing .NET code, please do consider this approach especially if you need more control over how the XML is parsed, need to process large XML files as fast as possible and performance is of prime importance.

In a future post, I will show how to check if a XML file exists, validate it using its XSD schema before processing.  Thanks for reading and as always, any comments or suggestions are welcome.

 

 

 

.

 

 

 

Why the MS Power Query (PQ) for Excel is an invaluable Business Intelligence tool

Dear reader, I would like to share with you some of the capabilities that the Power Query for Excel brings to the BI table. The Microsoft Power Query for Excel, previously the Data Explorer, provides a useful tool for gaining insights into data, data transformation and analysis.

The MS PQ holds a lot of promise in that it allows the importation of data from structured and unstructured data stores. In this post, I will share with you how to import data from Facebook! and yes you heard me, import data from Facebook.

BUSINESS SCENARIO

One of main reasons for using new technology/tools in industry is for businesses to be efficient at what they do and also help to better serve their customers. The use of the Microsoft Power Query for Excel is no different. Let’s assume a fictitious Company ABC has recently launched a new product and would like to know what peoples’ perception are on the social media arena/online forum.

Thus say company ABC may seek to know the following;

1. What customers and/or potential customers perceive about the product.

2. Which category of customers is likely to buy or recommend the product to friends and family based on whether they like the product.

3. Which region/location are likely to generate the most sales etc.

STEPS TO IMPORTING DATA FROM FACEBOOK

  1. If you don’t have Power Query installed, visit http://office.microsoft.com/en-us/excel/download-microsoft-power-query-for-excel-FX104018616.aspx to download and install Power Query.
  2. In the POWER QUERY ribbon tab ==> click From Other Sources ==> From Facebook. (See image below).

Image

    3.  In the Facebook dialog box, enter the facebook username under the “Me”,Username or ObjectID  textbox. – This could be like so; CompanyABC.12345. (See Image below).Image

    4. In the Connection name list, select a node to connect to access to all the information available in the selected node Facebook category. For example selecting “Likes” will retrieve all the information available in the Facebook like category for the Company.

   5.  Click OK to retrieve the information into the excel sheet.

   6.  The data extracted can then be shape to meet your data analysis requirements and help make informed business decisions.

Thanks for reading this far, I am sure you learnt one or two things in the post and I recommend you dig deep and explore more of this invaluable tool. In a future post I will show how you can use Excel Power Pivot and Power View to transform and analyse the extracted data.

For further reading:

  1. http://office.microsoft.com/en-us/excel-help/start-page-HA104003813.aspx

How to drop all tables in a SQL Server Database whose name begins with a common value

Scenario

I have discovered from working on projects that sometimes you may have to make use of temporary or staging tables for things like loading, parsing and data cleansing.  Let’s assume that as per your project or company’s database naming convention, these tables are all prefix with the name/value; “temp_” and after these temporary tables have served their purpose, you may want to drop them. It is easy dropping these tables when they are very few, say about two or three in number, however how do we deal with a situation where depending on the complexity of your project, we have quite a lot of these tables?

Dropping these tables one at a time will be time-consuming and that means loss of productive man-hours. Dear reader, to overcome this I would like to share with you one of the ways all these temporary tables can be dropped at one go. Assuming all the temporary/staging tables are prefix with “temp_”, the code is as below;

Drop

The variable “@tempTables” when executed will drop all the temporary tables whose name begin with “temp_”.

SharePoint 2010 Setup Error

Sometimes in life things happen when you least expect them.  This was my experience some few weeks ago when I had to install SharePoint 2010 on a new server. I installed all the pre-requisites needed for this application and going forward  I was faced with a very strange error – strange because I haven’t seen this before and I was not expecting it anyway.

The error is as follows – see the Figure 1 below

(Setup is unable to proceed due to the following error(s):
A system restart from a previous installation or update is pending. Restart your computer and run setup to continue.
For the list of pre-requisites needed to install the product please refer to:
http://go.microsoft.com/fwlink/?LinkId=106209
Correct the issue(s) listed above and re-run setup.)

Figure 1

 

 

This was still happening even after restarting the server. Digging further, I turned to the registry editor and I found the culprit , PendingFileRenameOperation registry key  existed which needed deleting as shown in Figure 2 below.

Figure 2

 

 

After the deleting, the setup continued without any problems – see figure 3 below.

Figure 3

 

 

I trust this has been useful and will hopefully come in handy should you have similar problems/errors in the future.

SQL Server 2012 – Enter the EOMONTH Function

In this blog post, I will be sharing a new date feature in SQL Server 2012 called the EOMONTH (End of Month) which is used to find or get the last day of a month in a specified date.

In prior versions of SQL Server 2012, date functions like “dateadd” and/or “datediff” were invaluable in finding the last day of a month.

Consider a SQL Server 2005/2008 R2 example of how we find the last day of a month;

 

The use of the date functions in the example above returns a date value of 2012-08-31 00:00:00.000 (see red arrow).

Now this same result can be achieved by using the EOMONTH function in SQL Server 2012 as shown below;

 

DECLARE  @date  DATETIME

SET @date = ‘20120802’

SELECT EOMONTH (@date) AS LAST_DAY_OF _MONTH

This returns ‘2012-02-31’

 

Thanks for reading and hope you’ve gained something.

cheers.

 

For additional reading, please see http://msdn.microsoft.com/en-us/library/hh213020.aspx

 

Data Quality; the road to Accurate Reporting

I am sure you have heard of the term; garbage in garbage out. This maxim is even more true when it comes to preparing, processing and analysing data for reporting purposes. In this blog post, I will focus on some of processes of data quality and then share some best practices for data quality management.

The data quality process is the set of activities that ensure data is correct/accurate and complete. These activities include checking data prior to being used for reporting and also alert systems to report bad data so it can be corrected or reviewed.

Best Practices for Data Quality

Cleansing

This is the process of correcting mistakes within a database by amending existing data based on pre-defined business rules. An example of this could be ensuring there are no duplicate customer records (e.g for a company in the retail industry), that all subscriptions refer to existing packages or services –(e.g for a company in the telecom industry).

Standardization

This ensures that all completed data entries are in a consistent and in an agreed format. An example of this to ensure all employee reference numbers are in a consistent format based on a predefined business rule.

Enrichment

This best practice ensures the comprehensiveness of data within a database. This involves enhancing data by comparing it to a lookup or third-party content and appending an attribute (column in a table) where appropriate. An example of this is determining area codes of employees based on their location.

Matching and Merging

Data matching is the process of determining that one data item is the same as another data item. This is used to identify duplicate entries within one system or multiple systems and then linked or merged as per the business requirement.This process is invaluable for matching character-based data types as numeric or datetime data types can easily be matched by using the equal sign (=).  In subsequent blog posts, I will demonstrate how this can be done using SQL Server Integration Services.

Scoring

This is a technique that is increasingly used to enhance data quality. It involves assigning a number to every row of data in say a table. This number provides a clue about the quality of the data in the row and helps to decide on the appropriate action to take. For example an accurate record may have a score of five while an incomplete or invalid record would have a score of one. Any score between one and five would indicates the degree of confidence the user/business has in the record and also which appropriate action to take. The action could be an amendment or a review of the record.

From the figure above, the row with the completed and accurate (as defined by the business rule) has a score of 5. While the row with incomplete and data in wrong format has a score of 1. It is also clear from the figure that all scores from 1 to 4 requires an action from those responsible for ensuring data quality.

Thanks for reading and please stay tuned as I share how some of these best practices discussed above can be achieved using SSIS in subsequent posts.