diff_months: 12

Programming for Data Analysts Assignment

Download Solution Now
Added on: 2023-05-17 05:41:16
Order Code: //priority code
Question Task Id: 0

Zappy Financial Services (ZFS) is a local company that provides small business loans. Last year, loan  applications increased by over 200%, largely because of a concerted online campaign to establish a  strong digital presence. Almost all loan applications and business leads are generated from search  

engines and digital advertisements, reflecting the decision to increase advertising spend on SEO  channels such as Google, Facebook, LinkedIn and similar platforms. 

Despite a strong digital marketing approach, the current loan application process remains manual.  It requires the online completion of information, including gender, marital status, number of  dependents, education, income etc. To date, several of these factors have been considered in the  approval decision. All applications are reviewed and approved by the loan team which, given the  recent increase in volumes, has resulted in skills shortages, longer loan approval times and increased  potential operational and control risk. The current operating model constrains further growth. Loan  decisions are categorised as either “approved” or “rejected.”  

You are employed by ZFS as a lead programmer, and have coding and data analytics knowledge, as  well as a deep appreciation for the need to balance business growth with a robust control  environment. You will be leading this project and have been tasked with providing a scalable  solution – that addresses key resourcing and control risks. 

Specifically, the Board has instructed you to develop several partial automation processes that will  help the existing loans team, freeing up their time for greater one-on-one customer contact. You need to provide a data-driven solution while working with a variety of key stakeholders each with  varying objectives such as marketing, internal audit and compliance. 

An in-house database administrator (DBA) was able to compile a PDF of past applications which the  loans team are hoping to map to previous loan approval outcomes.  

The two files provided by the DBA are:  

  • A file in PDF format called ‘Loans_Database_Table.pdf’  
  • An Excel file called ‘Zappy Loan Data.xlsx’  

The first file has been extracted from business loan records from the previous year, and it includes a  status field for each application, allowing the business to map inputs to outcomes for a possible  supervised machine learning exercise.  

The Excel file is maintained by the Sales team, and it is currently being saved in a shared folder. This  increases the chance of duplication and missing values.  

You will need to reflect the learnings throughout this module and consider the learning outcomes  particularly LO 3: Construct a programming solution to solve a defined business problem as you  create your answer. 

2.4. Part 1: Construct a Programming Solution (30 marks) (LO3) 

In Part 1, you will deliver an Interactive Python Notebook (a . Ipynb file) with the code used, with  comments , to explain the scripts, the libraries used, and the logic. All such commentary should be  written using the built-in markup language (Markdown text). 

The notebook which you create should highlight some of the key findings which you have in the data  and the insights which you can provide to the business. The tasks which need to be completed in  the Python Notebook include the following: 

Task 1: Loan Data Automation  

Create a new .ipynb notebook within Google Colab and load the TWO data files provided by the  DBA. Extract the two datasets from these two files which contains information about past loan  records. The numeric values stored in each column of the loan dataset are: 

  • Gender: 1-Male, 2-Female 
  • Married: 0-Single, 1-Married 
  • Dependents: 0, 1, 2, 3+ 
  • Graduate: 0-No, 1-Yes 
  • Self-Employed: 0-No, 1-Yes 
  • Credit_History: 0-No, 1-Yes 
  • Property_Area: 1-Urban, 2-Semiurban, 3-Rural

You should use Python to load the information contained within these datasets into memory. You  should also add comments to your notebook, explaining the steps taken to load the data, how you  treated the PDF and Excel data, the libraries called and the overall procedure. Recall this will be used  for training colleagues in future. 

Task 2: Descriptive analysis  

First, check the datasets and make sure the data that comes from these two files is valid. Ensure  your loan data is correctly indexed on the Loan_ID column. 

Then, clean the loan data. Provide an explanation of the steps taken to ensure data preparation for  analysis such as the correction of duplicates, missing values, outliers etc. 

Then, carry out Descriptive analysis on current loan data. Your notebook file should contain some  basic Exploratory Data Analysis (EDA) of the data. 

This should include items such as: 

  • The percentage of female applicants that had their loan approved 
  • The average income of all applicants 
  • The average income of all applicants that are self-employed 
  • The average income of all applicants that are not self-employed 
  • The average income of all graduate applicants 
  • The percentage of graduate applicants that had their loan status approved 

This code should then be copied and pasted as Appendix 1 in your Part 2 report. 

2.5. Part 2: Report - Business Case (60 marks) (LO1, LO2) 

Using the scenario given in Part 1 develop a business case, setting out WHY a programming solution  involving data analysis is needed and HOW you are going to carry out your analysis. The format of  the report should include: 

a) Introduction : This should cover the current business environment of companies like ZFS, the  problems your solution would address, and what impact and benefits your proposed programming  solution might have on the business. You should also mention the implications of not doing  anything, and the kind of human resources needed. Financial information or resources are NOT  required. 

b) Approach : Describe the approach you would take to implement your solution. i.e., the language,  software and tools to be used, explaining the reasons for their choice. Also, describe the steps  required in preparing the data and how visualisation will be used. You should provide a critical  discussion on the role of code libraries and include a brief discussion of the need for design and test  of any written code. 

c) Recommendations for future work: This should show the proposed route forward including an outline plan. Briefly explain how using the data provided, your solution could be further developed  to build a predictive model. A model that can be trained to predict if a new loan application is likely 

to be approved or rejected. Your recommendation should include a short explanation of the  techniques, libraries, tools, and objective function used to evaluate the precision of your  recommended predictive model. 

d) Conclusions: A brief conclusion summarising the main points in the report. 

e) Appendix 1 – Code: Copy and paste the contents of your programming notebook as Appendix 1 This does not contribute towards your word count. 

f) (Further appendices, to support your report): Again, these do not count towards your word count. 

In writing your report, use the insight and knowledge provided in this module but also leverage  sound academic research to support your report. 

As you develop your work, you should self-evaluate your developing draft against the criteria set out  in the Marking Guide below (See Section 5).

  • Uploaded By : Katthy Wills
  • Posted on : May 17th, 2023
  • Downloads : 0
  • Views : 110

Download Solution Now

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more