Data Warehouse Assignment
Data Warehouse Assignment
WARNING: DO NOT COPY FROM ANYONE! ANY SIMILARITY WILL RESULT IN ZERO, AND I GUARANTEE IT!
Caution: please do not start on this assignment unless you have read Chapters 9 (pages 197 to 235) and 10 (pages 237 to 251) on Dimensional Modeling from the textbook. Thats important!
Please read this entire document carefully before starting to work on the assignment.
You are a Junior BI developer who is working on a data warehouse implementation project. You are visiting Jim Riner, the Sales Manager for the company who is also a major stakeholder of this project to elicit requirements for the project. Jim informs you one of the goals the BI project will help him to achieve is deeper analysis of the sales data. His analysis of the sales data will include analyzing the sales based on the following dimensions: product, sales territory, customer, order, date, and product attributes.
Jim informs you sales data analysis from the Product dimension will be based on the following contexts: product id, categories, subcategories, and product name.
Sales data analysis based on Product Attributes will be based on the following contexts: product id, colors, models, style, class, and list price. By doing the sales analysis based on the Product and Product Attributes dimensions Jim can find out answers to questions like which items are selling more based on their categories, subcategories, colors etc.
Sales data analysis based on the Sales Territory dimension (or context) will be based on the following contexts: territory id, territory name, territory group, country or region codes. For example, Jim might want to know which geographic locations are more profitable than others and what kinds of products are sold there; revenues earned from the more profitable regions than the less profitable reasons.
Sales data analysis from the Customers dimension will be based on the following contexts: customer id, customer name, territory name, country, customer address, customer city, customer zip. The customer dimension should include both retail and individual customers.
Sales data analysis from the Order dimension would be based on the following contexts: oder id, order detail id, customer id.
Sales analysis from the Date dimension would allow Jim to find out which products are selling more during what seasons and which products are not selling well during certain days, weeks, months, years. The Seasonality dimension is also known as Date dimension (DimDate or Dim_Date). The granularity (fields) of this dimension should be at least Date Surrogate Key, Date value, Month, Year, IsHoliday (Yes/No value which would indicate is the date a holiday date like July 4th etc.), Holiday name (like Christmas day, Independence day etc.). Some of the holidays like Christmas has a range of dates like from December 15th until December 26th and the IsHoliday value should be set to Yes for these dates and the Holiday Name should be set to Christmas. The date dimension should include date surrogate key, date, month, year, is holiday, holiday name.
The Excel file called StarSchema.xls on the Canvas contains the star schema diagram that shows the Dimensional tables and the Fact table you will be creating for this data warehouse project. The columns in your tables are listed in that Excel document. Please open the file and review it before starting to work on this project.
In a nutshell, using SSIS packages, you will be creating the tables listed below for this project:
Fact Table:
FactSalesDimensional Tables:
DimSalesTerritoryDimOrderDimCustomerDimOrderDateDimProductDimProductAttributeYou have all the data available in the AdventureWorks database for this project. You will need to create the packages that will create the fact and dimensional tables.
Please rename the AdventureWorks database to AdventureWorks2016CTP3 in your sever (in your computer) before starting to work on this project. Name the destination database that will host the fact and dimensional tables AdventureWorksDW.
There are two types of customers in the AdventureWorks database, retail customers and individual customers. You will need to include both types of customers in the DimCustomer table.
You will need to create a surrogate key for the DimOrderDate table. The surrogate key value will be same as the date value except you will not have any slashes. For example, if the date is 11/18/2017 the surrogate key vale for that date will be 11182017. The DimOrderDate table should have dates from 1980 until 2050. The month and the year column will contain the month value and the year value of the date. IsHoliday and HolidayName could be one date (such as 4th of July) or a range of dates (such as for Christmas, ranging between December 1st until December 25th. You decide which are the holiday dates). Chapter 9 can help you on how to create a Date dimension (pages 221-224).
Deliverables:
You will include all of your packages in one project. You will most likely have multiple packages. Please order them in the sequence they need to be executed. After completing the assignment, upload your packages on Dropbox. The link and further instruction will be provided on this soon. During the grading of the project, I would like to run your packages and populate the data in the AdventureWorksDW database. Please name every destination table (Fact and Dimension tables) and each column in the tables exactly the way specified in the Star Schema diagram found in the Excel file. Any deviation will result in zero in the project.
Please include the truncate SQL commands in every package to delete any destination table before running the package.