Big Data Analytics Applications CSIP5203

Subject Code :
CSIP5203
University :
others Exam Question Bank is not sponsored or endorsed by this college or university.
Country :
India

Faculty of Computing, Engineering & Media (CEM) Coursework Brief 2024/25

Module name:	Big Data Analytics Applications
Module code:	CSIP5203
Title of the Assessment:	Assignment 2: Big Data Analytics
This coursework item is:				Summative
This summative coursework will be marked anonymously:					Yes
The learning outcomes that are assessed by this coursework are: 1. Demonstrate self-direction and originality in analysing vast amounts of unstructured data using massively parallel and scalable cloud computation systems. 2. Critically evaluate data mining and machine learning algorithms using a large-scale data analytics platform. 3. Research into an area of big data ethics, integrate complex and sometimes conflicting ideas into a coherent analysis that demonstrates integrative, synoptic, and analytical skills.
This coursework is:			Individual
If other or mixed ... explain here:
This coursework constitutes 60 % of the overall module mark.
Date Set:
Date & Time Due (the deadline):		31^st January 2025 at 12.00 noon
In accordance with the University Assessment and Feedback Policy, your marked coursework and feedback will be available to you on:					21^st February 2024
You should normally receive feedback on your coursework no later than 15 University working days after the formal hand-in date, provided that you have met the submission deadline If for any reason this is not forthcoming by the due date your module leader will let you know why and when it can be expected. The Associate Professor Student should be informed of any issues relating to the return of marked coursework and feedback.
When completed you are required to submit your coursework via: 1. LearningZone Assignment Link

Late submission of coursework policy:

Late submissions will be processed in accordance with current Please check the regulations carefully to determine what late submission period is allowed for your programme.

Academic Offences and Bad Academic Practices:

Tasks to be undertaken:

Specifications and Assessment

This coursework brief contains the information you need to prepare your final coursework report. This coursework consists of two tasks. The marking schemes (the last two pages) indicate how marks are given for each aspect of the tasks.

TASK 1: Analyse Data with Spark SQL (50%)

For this task, you are required to produce a practical implementation utilising Apache Spark to analyse a large dataset.

A dataset is given in this assessment brief for your analytics and report. However, if you would rather choose your dataset from an Open Data Portal this can be accommodated but you must get clearance for the dataset from the module leader before using it.

Assessments Datasets

Dataset 1 Police Open Source Data

https://data.police.uk/data/

Download the dataset with the cities and Data range that you are

Lends itself to various forms of analytics, such as trends across

Analysis of within cities and across cities (the downloaded data should cover different cities).

Analysis of various types of crimes and their possible prevalence in certain locations etc.

Datasets held for Leicester, Cumbria and Nottingham (and many others)

Open Data Portals (some)

https://data.gov.uk/

https://www.data.gov/

gouv.fr

https://data.europa.eu/euodp/en/home

http://dataportals.org/

https://data.world/datasets/financial Best Practice

Well-structured of your finding

Introduce your dataset g. mention its source, what it is, the period it covers and so on

Annotate your program, and if relevant, you can use snippets from it in your report

Ensure that the analytics done are clearly outlined and discussed (both text and visually)

Highlight any interesting or unexpected discovers from the analyses or from the use of Spark

Discuss how the analyses produce useful insights from the data

Mention your conclusions, possible future work

Reference external sources

Be creative

Ensure that the assessment is your independent work

As part of this task, ensure that the outputs presented in your report are validated with the code submitted in the separate file. The report should include annotated code snippets for clarity, but the full implementation must be present in the code file.

A suggestion is that it would be good if you complete this task by the end of Week 14.

-------------------------------------------------------------------------------------------------------

TASK 2: Implement machine learning algorithms for data analytics (50%)

For this task, you are required to produce a practical implementation utilising suitable machine learning algorithms available in Pypark Machine Learning Library (MLlib) to classify a given dataset.

Classification is one of the important applications in data science. And it could be applied to many real-life and human -centric prob lems. The goal of classification is to classify unseen samples into one or more predefined classes. It is a very important and challenging problem to build a classification model on raw data. The raw data can be unstructured, dirty, and redundant, posing ahuge challenge to the performance of the prediction.

To complete the classification task, please apply the following specific steps:

Firstly, pre-process the data with data cleaning and convert it into structured data.

Secondly, extract/select the features to describe the

Finally, conduct the classification of the extracted/selected

Please download the TitanicData from Learningzone shell -Module Assessments-Assessment 2: Big Data Analytics (60%)-TitanicData.

Useful Techniques

Here are some useful techniques to adopt based on the learning materials on LearningZone:

Data pre-processing

Data cleaning

Data normalisation

Data visualisation

Data analysis (generate your conclusion on the data with different tools)

Feature extraction/selection

Apply classification models to predict the survival rate on TitanicData

Experimental results analysis

Similarly, for this task, the complete implementation must be included in the separate code file. Outputs and results discussed in the report must correspond to the code submitted. Include visual and textual representations of results in the report, while ensuring the corresponding implementation can be reviewed in the separate file.

Use the report Template provided on LZ to complete your submission for this assignment, and you should address but not limit each of the following for the system:

Introduction: Define the problem, decompose the problem into smaller tasks and give a basic outline of what you propose to implement.

Details of the approach: Include any tools, modules, diagrams, and evidence of the successful execution of your code for each step -- anything that is necessary to explain your system/steps clearly and include snapshots of your code. More importantly, justify your choice all the time.

Results Analysis: Clearly describe your experimental protocols and results. Statistical analysis is recommended here. Be sure to include example output figures like the confusion matrix.

Discussion and conclusions: Summarize the main insights drawn from your analysis and And discuss how you improved your results during your implementation.

The code file submission is a critical component of this coursework. It will be used to verify the outputs presented in your report. While the marking will focus solely on the report, failure to submit the code file will lead to a mark of zero for this assessment.

Ensure your code file is well-organised and functional and aligns with the outputs discussed in your report.

General Specifications and Assessment:

Word count: 2,500 (allowance: 10% words). This does not include a bibliography and appendices.

Your work MUST be presented on A4 paper with numbered pages using 11pt Arial font and 5 spacing. Formatting, style and presentation of the essay will be considered during marking.

All deliverables, including the report and the separate code file, must be submitted via the LearningZone platform by the specified deadline. Late submissions will be handled as per the Universitys regulations.

You are encouraged to ensure their code runs correctly before submission, as this will be used to validate the findings and outputs in the report. Avoid using inappropriate resources such as blogs from unauthorised sources, tabloid press, Wikipedia, Techopedia etc.

Deliverables to be submitted for assessment: One technical report file containing two tasks work. One Juptyer Python notebook code file as a separate file for both tasks through the LearningZone platform. This file must include the full implementation of the code used to generate the outputs presented in the report.
How the work will be marked: Marks will be awarded according to the marking grids below, considering the distribution of marks outlined in the body of the coursework brief. Marking will be based on the report alone. Failure to submit the code file will result in a zero mark for this assessment, as the evidence of outputs cannot be validated.

CSIP5203 Big Data Analytics Assignment 2: Big Data Analytics marking/grade rubric sheets

Marking Sheet for TASK 1: Analysing Data with Spark SQL

Criteria	Fail 0-9%	Fail 10-19%	Fail 20-29%	Fail 30-39%	Fail 40-49%	Pass 50-59%	Merit 60-69%	Distinction 70-79%	Distinction 80-89%	Distinction 90-100%
Presentation (5%): - Appropriate structure, writing style, etc. - Code file submitted separately to validate outputs in the report.	Code file missing, unstructured report, and unclear or inconsistent academic style.	Barely recognisable structure; significant deficiencies.	Minimal structure and very poor formatting.	Disorganised structure with some effort toward formatting.	Weak structure and unclear academic style. Formatting errors evident.	Basic structure and style; noticeable formatting inconsistencies. Code file submitted but with issues or partially aligns with the report.	Clear presentation; minor formatting or stylistic weaknesses. Code file submitted but with issues or partially aligns with the report	Well-structured and coherent, with strong adherence to academic norms. Code file submitted, wellstructured, and aligns seamlessly with the outputs and discussions in the report.	Highly professional and clear presentation with minor inconsistencies. Code file submitted, wellstructured, and aligns seamlessly with the outputs and discussions in the report.	Exceptional clarity, structure, and academic style. Impeccable adherence to formatting guidelines. Code file submitted, wellstructured, and aligns seamlessly with the outputs and discussions in the report.
Introduction of dataset (5%): - Introduce your dataset e.g. mention its source, what it is, the period it covers, what it represents and so on	No attempt to describe the dataset.	No meaningful dataset description; irrelevant or erroneous.	Inadequate or incorrect dataset description; lacks relevance.	Minimal dataset information; limited relevance or accuracy.	Weak description with errors or missing critical details.	Basic description; lacks depth or detailed connection to the task.	Clear dataset description with adequate relevance.	Accurate dataset description with appropriate detail and relevance.	Detailed description of dataset with strong connection to the task.	Comprehensive and insightful description of dataset, including origin, coverage, and significance.
Appropriate annotation of code(5%): - Annotate your program, and if relevant, you can use snippets from it in your report.	No annotations provided.	Barely any annotations; hinders code comprehension.	Very weak annotations, impeding understanding.	Minimal or inconsistent annotations.	Limited annotations reducing clarity.	Basic annotations; some parts lack explanations.	Adequate annotations, though some areas lack depth or clarity.	Code is wellannotated, enhancing clarity.	Clear and consistent annotations with minimal omissions.	Comprehensive and insightful annotations enhancing understanding and usability.
Depth, breadth and variety of data analytics (15%): - Data analysed appropriately, in accordance with the chosen method, demonstrating depth and breath	No analytics performed.	Negligible or incoherent analytics.	Very weak analytics; results are superficial and poorly developed.	Minimal analytics performed; lacks critical thinking.	Limited analytical approaches, lacking depth or critical insights.	Basic analytics presented but lacks depth or variety.	Adequate depth with minor gaps or limited variety.	Good analytical depth, demonstrating critical reflection and some originality.	Strong analytical depth with minor omissions; demonstrates critical thinking.	Exceptional analytical depth and variety, demonstrating innovative and insightful approaches.
Data analytics (types and representations of analytics) (15%): - Ensure that the analytics done are clearly outlined and discussed both in text form and visual representations	No representations or relevant text provided.	Barely any visualisation or meaningful explanation.	Very weak or irrelevant representations.	Minimal effort in visualisation or text explanation.	Poor representation; visuals/text are unclear or lack insight.	Basic representations provided but lack clarity or integration.	Adequate representation, with some gaps in clarity or integration.	Clear representation using appropriate visuals and text.	Excellent visuals and textual explanation with minimal weaknesses.	Exceptional integration of visuals and text, conveying insights with high originality.

Discussion/Findings (5%): - Data analysis and literature are critically discussed with clear arguments.

- Discuss how the analytics produces useful insights from the data.

No meaningful discussion or findings presented.

Barely any discussion or connection to findings.

Very weak discussion; lacks coherence and depth.

Minimal discussion; lacks critical insights or literature.

Limited discussion; findings weakly connected to arguments or

literature.

Findings are presented but lack critical evaluation or literature connection.

Adequate discussion; lacks depth or minor gaps in literature use.

Clear discussion of findings with evidence of

critical thinking and literature support.

Findings are well- discussed, with strong arguments and relevant

literature.

Findings are critically evaluated with exceptional insight, supported by literature and clear arguments.

Marking/Grade Sheet for Task 2: Implement machine learning algorithms for data analytics

Criteria

Fail 0-9%

Fail 10-19%

Fail 20-29%

Fail 30-39%

Fail 40-49%

Pass 50-59%

Merit 60-69%

Distinction 70-

79%

Distinction 80-

89%

Distinction 90-100%

Introduction(5%):

- Understanding of the given problems

- Problem decomposition into subtasks - Literature reviews (The most relevant ones)

- Include your innovative thinking

after the review

No introduction or problem understanding presented.

Barely recognisable problem understanding or review.

Very weak introduction;

lacks coherence and relevance.

Minimal effort in decomposition or literature review.

Weak problem decomposition; irrelevant or minimal

literature review.

Basic decomposition with minimal or generic literature review.

Adequate problem decomposition; minor weaknesses in

literature review.

Clear problem decomposition and appropriate literature review.

Detailed problem decomposition and thorough, relevant literature review.

Exceptional problem decomposition and insightful literature review, showcasing advanced understanding.

Approach(15%):

- A clear description of the chosen approach for each step

- The justification behind the chosen solutions/tools that they are the most appropriate choice to the given scenario

- (Compare with different models/functions that you may choose as alternatives before experiments) - Use a flowchart/diagram to

demonstrate the logical data flow of your finalised approach intuitively. - Code file submitted separately to

validate outputs in the report.

No approach described or justified. Code file not

submitted.

Barely any description or justification.

Very weak or irrelevant approach description.

Minimal effort in explaining or justifying the approach.

Weak or unclear approach; no alternative

considerations.

Basic approach; minimal

justification or alternative

considerations. C ode file submitted but with issues

(e.g., not executable or mismatched outputs).

Adequate approach with basic justification; limited consideration of alternatives.

Logical and welldescribed approach, with justification and minor gaps. Code file submitted, complete, and aligns perfectly

with the outputs presented in the report.

Clear and welljustified approach

with consideration of alternatives.

Code file submitted, complete, and aligns perfectly with the outputs

presented in the report.

Highly innovative approach, clearly described and justified with robust comparisons to

alternatives. Code file submitted, complete, and aligns perfectly with the outputs presented in the report.

Results analysis(20%):

- A clear description of the processed data after each step

- Visualise the results after each step with a description.

- Draw your conclusion about the data during processing.

- Clear explanation of the experimental results

No results analysis or evaluation was provided.

Barely any results analysis or meaningful evaluation.

Very weak analysis; needs coherence or

detail.

Minimal results analysis with superficial visualisation or evaluation.

Limited analysis, visualisation, and statistical evaluation could be clearer and stronger.

Basic analysis provided; lacks depth in visualisation or statistical evaluation.

Adequate analysis; minor gaps in clarity or statistical evaluation.

Clear results analysis with appropriate visualisation and statistical evaluation.

Results are thoroughly analysed and visualised with strong statistical evaluation.

Results were critically analysed with advanced statistical evaluation and impactful visualisation.

Discussion and conclusions(10%):

- Evaluation of the performance of the chosen approach

- Compare the different models that you have chosen with a detailed discussion, and explain how you have improved the models/results for multiple trials.

- Issues encountered are described, and solutions/actions taken are justified. - Alternative solutions and

suggestions for future work

No meaningful conclusions were presented.

Barely recognisable conclusions or suggestions.

Very weak or irrelevant conclusions.

Minimal effort in providing conclusions or evaluation.

Weak conclusions; minimal evaluation or irrelevant suggestions.

Basic conclusions;

limited evaluation or future suggestions.

Adequate conclusions; lacks depth or minor gaps in evaluation or suggestions.

Clear conclusions with a good evaluation of

performance and some future suggestions.

Strong conclusions with critical

performance evaluation and clear suggestions for future work.

Conclusions are highly insightful, critically evaluate performance, and propose innovative future solutions.

Order New Solution

Uploaded By : Akshita
Posted on : May 24th, 2025
Downloads : 0
Views : 126

Big Data Analytics Applications CSIP5203

Faculty of Computing, Engineering & Media (CEM) Coursework Brief 2024/25

Tasks to be undertaken:

Specifications and Assessment

TASK 1: Analyse Data with Spark SQL (50%)

Assessments Datasets

Open Data Portals (some)

TASK 2: Implement machine learning algorithms for data analytics (50%)

Useful Techniques

General Specifications and Assessment:

CSIP5203 Big Data Analytics Assignment 2: Big Data Analytics marking/grade rubric sheets

Marking Sheet for TASK 1: Analysing Data with Spark SQL

Marking/Grade Sheet for Task 2: Implement machine learning algorithms for data analytics

Order New Solution

Order New Solution

Choose a Plan

Premium

Gold

Silver

Big Data Analytics Applications CSIP5203

Faculty of Computing, Engineering & Media (CEM) Coursework Brief 2024/25

Tasks to be undertaken:

Specifications and Assessment

TASK 1: Analyse Data with Spark SQL (50%)

Assessments Datasets

Open Data Portals (some)

TASK 2: Implement machine learning algorithms for data analytics (50%)

Useful Techniques

General Specifications and Assessment:

CSIP5203 Big Data Analytics Assignment 2: Big Data Analytics marking/grade rubric sheets

Marking Sheet for TASK 1: Analysing Data with Spark SQL

Marking/Grade Sheet for Task 2: Implement machine learning algorithms for data analytics

Order New Solution

Order New Solution

Choose a Plan

Premium

Gold

Silver

Request a Call Back