Draft a report for data design and analysis case study
- Subject Code :
Critically analyse the online retail business case (see below) and write a 1,500-word report that:
- Identifies various sources of data to build an effective data pipeline;
- Identifies challenges in integrating the data from the sources and formulates a strategy to address those challenges; and
- Describes a design for a storage and retrieval system for the data lake that uses commercial and/or open-source big data
Please refer to the Task Instructions (below) for details on how to complete this task.
A modern data-driven organisation must be able to collect and process large volumes of data and perform analytics at scale on that data. Thus, the establishment of a data pipeline is an essential first step in building a data-driven organisation. A data pipeline ingests data from various sources, integrates that data and stores that data in a ‘data lake’, making that data available to everyone in the organisation.
This Assessment prepares you to identify potential sources of data, address challenges in integrating data and design an efficient ‘data lake’ using the big data principles, practices and technologies covered in the learning materials.
Big Retail is an online retail shop in Adelaide, Australia. Its website, at which its users can explore different products and promotions and place orders, has more than 100,000 visitors per month. During checkout, each customer has three options: 1) to login to an existing account; 2) to create a new account if they have not already registered; or 3) to checkout as a guest. Customers’ account information is maintained by both the sales and marketing departments in their separate databases. The sales department maintains records of the transactions in their database. The information technology (IT) department maintains the website.
Every month, the marketing team releases a catalogue and promotions, which are made available on the website and emailed to the registered customers. The website is static; that is, all the customers see the same content, irrespective of their location, login status or purchase history.
Recently, Big Retail has experienced a significant slump in sales, despite its having a cost advantage over its competitors. A significant reduction in the number of visitors to the website and the conversion rate (i.e., the percentage of visitors who ultimately buy something) has also been observed. To regain its market share and increase its sales, the management team at Big Retail has decided to adopt a data-driven strategy. Specifically, the management team wants to use big data analytics to enable a customised customer experience through targeted campaigns, a recommender system and product association.
The first step in moving towards the data-driven approach is to establish a data pipeline. The essential purpose of the data pipeline is to ingest data from various sources, integrate the data and store the data in a ‘data lake’ that can be readily accessed by both the management team and the data scientists.
Critically analyse the above case study and write a 1,500-word report. In your report, ensure that you:
- Identify the potential data sources that align with the objectives of the organisation’s data- driven strategy. You should consider both the internal and external data sources. For each data source identified, describe its characteristics. Make reasonable assumptions about the fields and format of the data for each of the sources;
- Identify the challenges that will arise in integrating the data from different sources and that must be resolved before the data are stored in the ‘data lake.’ Articulate the steps necessary to address these issues;
- Describe the ‘data lake’ that you designed to store the integrated data and make the data available for efficient retrieval by both the management team and data The system should be designed using a commercial and/or an open-source database, tools and frameworks. Demonstrate how the ‘data lake’ meets the big data storage and retrieval requirements; and
- Provide a schematic of the overall data The schematic should clearly depict the data sources, data integration steps, the components of the ‘data lake’ and the interactions among all the entities.
- Submit this task via the Assessment link in the main navigation menu in BDA601—Big Data and
The Learning Facilitator will provide feedback via the Grade Centre in the LMS portal. Feedback can be viewed in My Grades.
Academic Integrity Declaration
I declare that except where referenced, the work I am submitting for this assessment task is my own work. I have read and am aware of the Academic Integrity Policy and Procedure of Torrens University, Australia, viewable online at http://www.torrens.edu.au/policies-and-forms.
I am also aware that I need to keep a copy of all submitted material and any drafts and I agree to do so.