Group Assignment: Data Analysis and Visualization using Python DSA502
- Subject Code :
DSA502
- University :
uciss Exam Question Bank is not sponsored or endorsed by this college or university.
Group Assignment
Group: Form a group comprises of four members to work on this assessment component. Use the dataset given in the Dataset.zip to answer Q1 Q5 of this assessment component. Perform suitable function to perform the following tasks.
ASSIGNMENT QUESTIONS
Q1. Demographic Trends
Use the demographics_data.csv data set available in the Dataset.rar to perform the following tasks.
You are employed as a Data Scientist by the World Bank and working on a project to analyze the Worlds demographic trends.
You are required to produce a scatterplot illustrating Birth Rate and Internet Usage statistics by Country. The scatterplot also needs to be categorized by Countries Income Groups. You have received an urgent update from your manager.
You must produce a second scatterplot illustrating Birth Rate and Internet Usage statistics by Country. However, the scatterplot needs to be categorized by Countries Regions this time.
Tips
- Import library/API
- Read Dataset
- Data Information
- Visualize/produce plot
- Birth Rate against Internet Usage
- Country regions
- Countries Income group
- Conclude your finding!
Q2 Countries Regions Analysis.
Use the region_data.csv data set available in the Data set.rar to perform the following tasks.
The World Bank was very impressed with your deliverables, and they have a new project for you. You must produce a scatterplot depicting Life Expectancy (y-axis) and Fertility Rate (x-axis) statistics by Country. The scatterplot also needs to be categorized by Countries Regions. You have been supplied with data for 1960 and 2013, and you are required to produce a visualization for each of these years. Some data has been provided in a CSV file. The CSV file contains combined data for both years. All data manipulations must be performed in python (not in excel) because this project may be audited later.
You have also been requested to provide insights into how the two periods compare.
Note: Data for this section was sourced from data.worldbank.org
Tips
- Import library/API
- Read Dataset
- Data Information
- Visualize/produce plot of Life Expectancy (y-axis) and Fertility Rate by country
Note: use data that Ive provided.
- Conclude your finding!
Formula for Life Expectancy
qx = Mx / (Bx + (Mx/2))
where Mx = the number of deaths at the age of x to under x+1 years in the reported period;
Bx = average population aged x to under x+1 in the base period;
qx = death probability from age x to x+1.
Additional Data:
Country_Code <- ("ABW","AFG","AGO","ALB","ARE","ARG","ARM","ATG","AUS","AUT","AZE","BDI","BEL","BEN","BFA","BGD","BGR","BHR","BHS","BIH","BLR","BLZ","BOL","BRA","BRB","BRN","BTN","BWA","CAF","CAN","CHE","CHL","CHN","CIV","CMR","COG","COL","COM","CPV","CRI","CUB","CYP","CZE","DEU","DJI","DNK","DOM","DZA","ECU","EGY","ERI","ESP","EST","ETH","FIN","FJI","FRA","FSM","GAB","GBR","GEO","GHA","GIN","GMB","GNB","GNQ","GRC","GRD","GTM","GUM","GUY","HKG","HND","HRV","HTI","HUN","IDN","IND","IRL","IRN","IRQ","ISL","ITA","JAM","JOR","JPN","KAZ","KEN","KGZ","KHM","KIR","KOR","KWT","LAO","LBN","LBR","LBY","LCA","LKA","LSO","LTU","LUX","LVA","MAC","MAR","MDA","MDG","MDV","MEX","MKD","MLI","MLT","MMR","MNE","MNG","MOZ","MRT","MUS","MWI","MYS","NAM","NCL","NER","NGA","NIC","NLD","NOR","NPL","NZL","OMN","PAK","PAN","PER","PHL","PNG","POL","PRI","PRT","PRY","PYF","QAT","ROU","RUS","RWA","SAU","SDN","SEN","SGP","SLB","SLE","SLV","SOM","SSD","STP","SUR","SVK","SVN","SWE","SWZ","SYR","TCD","TGO","THA","TJK","TKM","TLS","TON","TTO","TUN","TUR","TZA","UGA","UKR","URY","USA","UZB","VCT","VEN","VIR","VNM","VUT","WSM","YEM","ZAF","COD","ZMB","ZWE") Life_Expectancy_At_Birth_1960 (65.5693658536586,32.328512195122,32.9848292682927,62.2543658536585,52.2432195121951,65.2155365853659,65.8634634146342,61.7827317073171,70.8170731707317,68.5856097560976,60.836243902439,41.2360487804878,69.7019512195122,37.2782682926829,34.4779024390244,45.8293170731707,69.2475609756098,52.0893658536585,62.7290487804878,60.2762195121951,67.7080975609756,59.9613658536585,42.1183170731707,54.2054634146342,60.7380487804878,62.5003658536585,32.3593658536585,50.5477317073171,36.4826341463415,71.1331707317073,71.3134146341463,57.4582926829268,43.4658048780488,36.8724146341463,41.523756097561,48.5816341463415,56.716756097561,41.4424390243903,48.8564146341463,60.5761951219512,63.9046585365854,69.5939268292683,70.3487804878049,69.3129512195122,44.0212682926829,72.1765853658537,51.8452682926829,46.1351219512195,53.215,48.0137073170732,37.3629024390244,69.1092682926829,67.9059756097561,38.4057073170732,68.819756097561,55.9584878048781,69.8682926829268,57.5865853658537,39.5701219512195,71.1268292682927,63.4318536585366,45.8314634146342,34.8863902439024,32.0422195121951,37.8404390243902,36.7330487804878,68.1639024390244,59.8159268292683,45.5316341463415,61.2263414634146,60.2787317073171,66.9997073170732,46.2883170731707,64.6086585365854,42.1000975609756,68.0031707317073,48.6403170731707,41.1719512195122,69.691756097561,44.945512195122,48.0306829268293,73.4286585365854,69.1239024390244,64.1918292682927,52.6852682926829,67.6660975609756,58.3675853658537,46.3624146341463,56.1280731707317,41.2320243902439,49.2159756097561,53.0013170731707,60.3479512195122,43.2044634146342,63.2801219512195,34.7831707317073,42.6411951219512,57.303756097561,59.7471463414634,46.5107073170732,69.8473170731707,68.4463902439024,69.7868292682927,64.6609268292683,48.4466341463415,61.8127804878049,39.9746829268293,37.2686341463415,57.0656341463415,60.6228048780488,28.2116097560976,67.6017804878049,42.7363902439024,63.7056097560976,48.3688048780488,35.0037073170732,43.4830975609756,58.7452195121951,37.7736341463415,59.4753414634146,46.8803902439024,58.6390243902439,35.5150487804878,37.1829512195122,46.9988292682927,73.3926829268293,73.549756097561,35.1708292682927,71.2365853658537,42.6670731707317,45.2904634146342,60.8817073170732,47.6915853658537,57.8119268292683,38.462243902439,67.6804878048781,68.7196097560976,62.8089268292683,63.7937073170732,56.3570487804878,61.2060731707317,65.6424390243903,66.0552926829268,42.2492926829268,45.6662682926829,48.1876341463415,38.206,65.6598292682927,49.3817073170732,30.3315365853659,49.9479268292683,36.9658780487805,31.6767073170732,50.4513658536585,59.6801219512195,69.9759268292683,68.9780487804878,73.0056097560976,44.2337804878049,52.768243902439,38.0161219512195,40.2728292682927,54.6993170731707,56.1535365853659,54.4586829268293,33.7271219512195,61.3645365853659,62.6575853658537,42.009756097561,45.3844146341463,43.6538780487805,43.9835609756098,68.2995365853659,67.8963902439025,69.7707317073171,58.8855365853659,57.7238780487805,59.2851219512195,63.7302195121951,59.0670243902439,46.4874878048781,49.969512195122,34.3638048780488,49.0362926829268,41.0180487804878,45.1098048780488,51.5424634146342) Life_Expectancy_At_Birth_2013 (75.3286585365854,60.0282682926829,51.8661707317073,77.537243902439,77.1956341463415,75.9860975609756,74.5613658536585,75.7786585365854,82.1975609756098,80.890243902439,70.6931463414634,56.2516097560976,80.3853658536585,59.3120243902439,58.2406341463415,71.245243902439,74.4658536585366,76.5459512195122,75.0735365853659,76.2769268292683,72.4707317073171,69.9820487804878,67.9134390243903,74.1224390243903,75.3339512195122,78.5466585365854,69.1029268292683,64.3608048780488,49.8798780487805,81.4011219512195,82.7487804878049,81.1979268292683,75.3530243902439,51.2084634146342,55.0418048780488,61.6663902439024,73.8097317073171,62.9321707317073,72.9723658536585,79.2252195121951,79.2563902439025,79.9497804878049,78.2780487804878,81.0439024390244,61.6864634146342,80.3024390243903,73.3199024390244,74.5689512195122,75.648512195122,70.9257804878049,63.1778780487805,82.4268292682927,76.4243902439025,63.4421951219512,80.8317073170732,69.9179268292683,81.9682926829268,68.9733902439024,63.8435853658537,80.9560975609756,74.079512195122,61.1420731707317,58.216487804878,59.9992682926829,54.8384146341464,57.2908292682927,80.6341463414634,73.1935609756098,71.4863902439024,78.872512195122,66.3100243902439,83.8317073170732,72.9428536585366,77.1268292682927,62.4011463414634,75.2682926829268,68.7046097560976,67.6604146341463,81.0439024390244,75.1259756097561,69.4716829268293,83.1170731707317,82.290243902439,73.4689268292683,73.9014146341463,83.3319512195122,70.45,60.9537804878049,70.2024390243902,67.7720487804878,65.7665853658537,81.459756097561,74.462756097561,65.687243902439,80.1288780487805,60.5203902439024,71.6576829268293,74.9127073170732,74.2402926829268,49.3314634146342,74.1634146341464,81.7975609756098,73.9804878048781,80.3391463414634,73.7090487804878,68.811512195122,64.6739024390244,76.6026097560976,76.5326585365854,75.1870487804878,57.5351951219512,80.7463414634146,65.6540975609756,74.7583658536585,69.0618048780488,54.641512195122,62.8027073170732,74.46,61.466,74.567512195122,64.3438780487805,77.1219512195122,60.8281463414634,52.4421463414634,74.514756097561,81.1048780487805,81.4512195121951,69.222,81.4073170731707,76.8410487804878,65.9636829268293,77.4192195121951,74.2838536585366,68.1315609756097,62.4491707317073,76.8487804878049,78.7111951219512,80.3731707317073,72.7991707317073,76.3340731707317,78.4184878048781,74.4634146341463,71.0731707317073,63.3948292682927,74.1776341463415,63.1670487804878,65.878756097561,82.3463414634146,67.7189268292683,50.3631219512195,72.4981463414634,55.0230243902439,55.2209024390244,66.259512195122,70.99,76.2609756097561,80.2780487804878,81.7048780487805,48.9379268292683,74.7157804878049,51.1914878048781,59.1323658536585,74.2469268292683,69.4001707317073,65.4565609756098,67.5223658536585,72.6403414634147,70.3052926829268,73.6463414634147,75.1759512195122,64.2918292682927,57.7676829268293,71.159512195122,76.8361951219512,78.8414634146341,68.2275853658537,72.8108780487805,74.0744146341464,79.6243902439024,75.756487804878,71.669243902439,73.2503902439024,63.583512195122,56.7365853658537,58.2719268292683,59.2373658536585,55.633) |
Q3. Movie Rating Analysis
Use the Movie_Ratings.csv data set available in the Dataset.rar to perform the following tasks.
You have been approached as an analytics consultant by a movie reviews website. They are writing up an article analyzing movie rating by critics, audience, and movie budgets for 2007-2011.
This is the first time they have done this analysis, and they dont know exactly what they need. They have asked you to look into the data and provide five graphs that tell a story about the data. However, there is one chart the CEO specifically requested - a diagram showing how the correlation between audience and critic ratings has evolved throughout the years by genre.
Tips:
- Import library/API
- Read Dataset
- Data Information
- Visualize/produce plots ofmovie rating by critics, audience, and movie budgets for 2007-2011
- Conclude your finding!
Q4. Loan Analysis
You have been approached as an analytics consultation by a Bank in Malaysia for analyzing the customer loan. Youre required to write a program using python to perform the following tasks:
- Read the dataset loans_data.csv into python and assign it to a variable called df.
- What was the most significant amount of loan instalment?
- What was the average (mean) interest rate?
- Which purpose had the highest average (mean) loan amount?
- Create a new variable risk, which is equal to high when the interest rate is 10% or above and normal when the interest rate is less than 10%
- Find the average loan amount for each purpose with at least 1000 loans made. Arrange from the highest average loan amount to the lowest.
- Looking only at loans for credit cards and debt consolidation, what is the average loan amount in high-risk and low-risk loans?
- Visualize the plot for each category of purpose
Formula to calculate Loan Amount:
Loan Amount = (Monthly payment x log.annual) + (Interest Rate * log.annual)
Example: (829.10*11) + (0.119 * 11) = 9120.10 + 1.309 = $9,121.41
- OBJECTIVES OF THIS COURSEWORK
This assignment will help you to explore and analyse a set of data and reconstruct it into meaningful representations for decision making.
- TYPE
Group Assignment
- COURSEWORK DESCRIPTION
GENERAL REQUIREMENTS
- The python program should compile and be executed without errors.
- Validation should be done for each entry from the users to avoid logical errors.
- No duplication is allowed in dataset.
- You should;
- Include good programming practice such as comments, variable naming conventions and indentation.
- Carried out additional research from Internet to comprehend the knowledge and information on the given dataset when examine the data.
- The analysis should be meaningful and effective in providing the information for the decision making.
- Any additional features implemented must improve the retrieval effects.
- In a situation where a student:
- Found to be involved plagiarism, the offence and will be dealt in accordance to SEGi and UCLAN University regulations on plagiarism.
DELIVERABLES:
- The complete ipynb (source code) and report must be submitted to Blackboard
- ipynb (Program Code):
o Name the file under your name and ID number.
o Start the first two lines in your program by typing your name and ID number. For example:
# NAME
#SUCKD123456
- For each question example, give an id and explain what you want to discover. For example:
# Question 1: Why staff would leave the company.
# Analysis 1-1: Find the relationship between job position with attrition
# Analysis 1-2: Find the relationship between job age and .
# Analysis 1-3: Find the relationship between
- For each extra feature example, give an id and provide the explanation.
# Extra feature 1
# comments about the extra feature
DOCUMENTS: COURSEWORK REPORT
- As part of the assessment, you must submit the project report in printed and softcopy form, which should have the following format:
Cover Page:
All reports must be prepared with a front cover. A protective transparent plastic sheet can be placed in front of the report to protect the front cover. The front cover should be presented with the following details:
- Module
- Coursework Title
- Intake
- Student name and id
- Date Assigned (the date the report was handed out).
- Date Completed (the date the report is due to be handed in).
Contents:
- Introduction and assumptions (if any)
- Data import / Cleaning / pre-processing / transformation
- Each question must start in a separate page and contains:
- Analysis Techniques - data exploration / manipulation / visualization
- Screenshot of source code with the explanation.
- Screenshot of output/plot with the explanation.
- Outline the findings based on the results obtained.
- The extra feature explanation must be in a separate page and contains:
- Screenshot of source code with the explanation.
- Screenshot of output/plot with the explanation.
- Explain how adding this extra feature can improve the results.
References
- The font size used in the report must be 12pt and the font is Times New Roman. Full source code is not allowed to be included in the report. The report must be typed and clearly printed.
- You may source algorithms and information from the Internet or books. Proper referencing of the resources should be evident in the document.
- All references must be made using the APA (American Psychological Association) referencing style as shown below:
The theory was first propounded in 1970 (Larsen, A.E. 1971), but since then has been refuted; M.K. Larsen (1983) is among those most energetic in their opposition.
/**
* Following source code obtained from (Danang, S.N. 2002)
*/
int noshape=2;
noshape=GetShape();
- List of references at the end of your document or source code must be specified in the following format:
Larsen, A.E. 1971, A Guide to the Aquatic Science Literature, McGraw-Hill, London.
Larsen, M.K. 1983, British Medical Journal [Online], Available from http://libinfor.ume.maine.edu/acquatic.htm
(Accessed 19 November 1995) Danang, S.N., 2002, Finding Similar Images [Online], The Code Project, *Available from http://www.codeproject.com/bitmap/cbir.asp, [Accessed 14th *September 2006]
- Further information on other types of citation is available in Petrie, A., 2003, UWE Library Services Study Skills: How to reference [online], England, University of Western England, Available from
http://www.uweac.uk/library/resources/general/info_study_skills/harvard2.htm, [Accessed 4th September 2003].
ASSIGNMENT ASSESSMENT CRITERIA
The assignment assessment consists of three components: Coding (60%), Documentation (30%) and Format of Report (10%). Details of the division for each component are as follows:
Coding (50%) |
Documentation (30%) |
||||
Criteria |
Marks Allocated |
Criteria |
Marks Allocated |
||
Data Exploration |
10% |
Structure of the report and references Content: Description and justification of the .py concepts incorporate. Program out screenshots, graphs Project description, limitation, and conclusion |
30% |
||
Data Manipulation |
10% |
||||
Data Transformation |
10% |
||||
Data Visualization |
20% |
||||
Report Format (10%) |
|||||
Criteria |
Marks Allocated |
||||
Well organized and fulfil Assignments requirement |
10% |
DEVELOPMENT TOOLS
The program written for this assignment should be written in Jupyter Notebook