diff_months: 5

Big Data Analytics Applications CSIP5203

Flat 50% Off Order New Solution
Added on: 2025-05-24 07:07:51
Order Code: LD520774
Question Task Id: 0

Faculty of Computing, Engineering & Media (CEM) Coursework Brief 2024/25


Module name:


Big Data Analytics Applications


Module code:


CSIP5203


Title of the Assessment:


Assignment 2: Big Data Analytics


This coursework item is:


Summative



This summative coursework will be marked anonymously:


Yes



The learning outcomes that are assessed by this coursework are:


1. Demonstrate self-direction and originality in analysing vast amounts of unstructured data using massively parallel and scalable cloud computation systems.


2. Critically evaluate data mining and machine learning algorithms using a large-scale data analytics platform.


3. Research into an area of big data ethics, integrate complex and sometimes conflicting ideas into a coherent analysis that demonstrates integrative, synoptic, and analytical skills.


This coursework is:


Individual



If other or mixed ... explain here:


This coursework constitutes 60 % of the overall module mark.


Date Set:



Date & Time Due (the deadline):


31st January 2025 at 12.00 noon


In accordance with the University Assessment and Feedback Policy, your marked coursework and feedback will be available to you on:


21st February 2024


You should normally receive feedback on your coursework no later than 15


University working days after the formal hand-in date, provided that you have met the submission deadline



If for any reason this is not forthcoming by the due date your module leader will let you know why and when it can be expected. The Associate Professor Student should be informed of any issues relating to the return of marked coursework and feedback.


When completed you are required to submit your coursework via:


1. LearningZone Assignment Link


Late submission of coursework policy:



Late submissions will be processed in accordance with current Please check the regulations carefully to determine what late submission period is allowed for your programme.


Academic Offences and Bad Academic Practices:

Tasks to be undertaken:

Specifications and Assessment

This coursework brief contains the information you need to prepare your final coursework report. This coursework consists of two tasks. The marking schemes (the last two pages) indicate how marks are given for each aspect of the tasks.

TASK 1: Analyse Data with Spark SQL (50%)

For this task, you are required to produce a practical implementation utilising Apache Spark to analyse a large dataset.

A dataset is given in this assessment brief for your analytics and report. However, if you would rather choose your dataset from an Open Data Portal this can be accommodated but you must get clearance for the dataset from the module leader before using it.

Assessments Datasets

Dataset 1 Police Open Source Data

https://data.police.uk/data/



  • Download the dataset with the cities and Data range that you are

  • Lends itself to various forms of analytics, such as trends across

  • Analysis of within cities and across cities (the downloaded data should cover different cities).

  • Analysis of various types of crimes and their possible prevalence in certain locations etc.

  • Datasets held for Leicester, Cumbria and Nottingham (and many others)



Open Data Portals (some)



  • https://data.gov.uk/

  • gouv.fr

  • https://data.europa.eu/euodp/en/home

  • https://data.world/datasets/financial Best Practice

  • Well-structured of your finding

  • Introduce your dataset g. mention its source, what it is, the period it covers and so on

  • Annotate your program, and if relevant, you can use snippets from it in your report

  • Ensure that the analytics done are clearly outlined and discussed (both text and visually)

  • Highlight any interesting or unexpected discovers from the analyses or from the use of Spark

  • Discuss how the analyses produce useful insights from the data

  • Mention your conclusions, possible future work

  • Reference external sources

  • Be creative

  • Ensure that the assessment is your independent work



As part of this task, ensure that the outputs presented in your report are validated with the code submitted in the separate file. The report should include annotated code snippets for clarity, but the full implementation must be present in the code file.

A suggestion is that it would be good if you complete this task by the end of Week 14.

-------------------------------------------------------------------------------------------------------

TASK 2: Implement machine learning algorithms for data analytics (50%)

For this task, you are required to produce a practical implementation utilising suitable machine learning algorithms available in Pypark Machine Learning Library (MLlib) to classify a given dataset.

Classification is one of the important applications in data science. And it could be applied to many real-life and human -centric prob lems. The goal of classification is to classify unseen samples into one or more predefined classes. It is a very important and challenging problem to build a classification model on raw data. The raw data can be unstructured, dirty, and redundant, posing ahuge challenge to the performance of the prediction.

To complete the classification task, please apply the following specific steps:



  • Firstly, pre-process the data with data cleaning and convert it into structured data.

  • Secondly, extract/select the features to describe the

  • Finally, conduct the classification of the extracted/selected



Please download the TitanicData from Learningzone shell -Module Assessments-Assessment 2: Big Data Analytics (60%)-TitanicData.

Useful Techniques

Here are some useful techniques to adopt based on the learning materials on LearningZone:



  • Data pre-processing

  • Data cleaning

  • Data normalisation

  • Data visualisation

  • Data analysis (generate your conclusion on the data with different tools)

  • Feature extraction/selection

  • Apply classification models to predict the survival rate on TitanicData

  • Experimental results analysis



Similarly, for this task, the complete implementation must be included in the separate code file. Outputs and results discussed in the report must correspond to the code submitted. Include visual and textual representations of results in the report, while ensuring the corresponding implementation can be reviewed in the separate file.

Use the report Template provided on LZ to complete your submission for this assignment, and you should address but not limit each of the following for the system:



  • Introduction: Define the problem, decompose the problem into smaller tasks and give a basic outline of what you propose to implement.

  • Details of the approach: Include any tools, modules, diagrams, and evidence of the successful execution of your code for each step -- anything that is necessary to explain your system/steps clearly and include snapshots of your code. More importantly, justify your choice all the time.

  • Results Analysis: Clearly describe your experimental protocols and results. Statistical analysis is recommended here. Be sure to include example output figures like the confusion matrix.

  • Discussion and conclusions: Summarize the main insights drawn from your analysis and And discuss how you improved your results during your implementation.



The code file submission is a critical component of this coursework. It will be used to verify the outputs presented in your report. While the marking will focus solely on the report, failure to submit the code file will lead to a mark of zero for this assessment.

Ensure your code file is well-organised and functional and aligns with the outputs discussed in your report.

General Specifications and Assessment:



  • Word count: 2,500 (allowance: 10% words). This does not include a bibliography and appendices.

  • Your work MUST be presented on A4 paper with numbered pages using 11pt Arial font and 5 spacing. Formatting, style and presentation of the essay will be considered during marking.

  • All deliverables, including the report and the separate code file, must be submitted via the LearningZone platform by the specified deadline. Late submissions will be handled as per the Universitys regulations.

  • You are encouraged to ensure their code runs correctly before submission, as this will be used to validate the findings and outputs in the report. Avoid using inappropriate resources such as blogs from unauthorised sources, tabloid press, Wikipedia, Techopedia etc.


Deliverables to be submitted for assessment:


One technical report file containing two tasks work.


One Juptyer Python notebook code file as a separate file for both tasks through the LearningZone platform. This file must include the full implementation of the code used to generate the outputs presented in the report.


How the work will be marked:


Marks will be awarded according to the marking grids below, considering the distribution of marks outlined in the body of the coursework brief.


Marking will be based on the report alone.


Failure to submit the code file will result in a zero mark for this assessment, as the evidence of outputs cannot be validated.





CSIP5203 Big Data Analytics Assignment 2: Big Data Analytics marking/grade rubric sheets

Marking Sheet for TASK 1: Analysing Data with Spark SQL


Criteria


Fail 0-9%


Fail


10-19%


Fail 20-29%


Fail 30-39%


Fail 40-49%


Pass 50-59%


Merit 60-69%


Distinction 70-79%


Distinction 80-89%


Distinction 90-100%


Presentation (5%):


- Appropriate structure, writing style, etc.


- Code file submitted separately to validate outputs in the report.


Code file missing, unstructured report, and unclear or inconsistent academic style.


Barely recognisable structure; significant


deficiencies.


Minimal structure and very poor formatting.


Disorganised structure with some effort toward


formatting.


Weak structure and unclear academic style. Formatting errors evident.


Basic structure and style;


noticeable formatting inconsistencies.


Code file submitted but with issues or partially aligns with the report.


Clear presentation;


minor formatting or stylistic weaknesses.


Code file submitted but with issues or partially aligns with the report


Well-structured and coherent, with strong adherence to


academic norms.


Code file submitted, wellstructured, and aligns


seamlessly with the outputs and discussions in the report.


Highly


professional and clear presentation with minor inconsistencies.


Code file submitted, wellstructured, and aligns


seamlessly with the outputs and discussions in


the report.


Exceptional clarity, structure, and academic style.


Impeccable adherence to formatting


guidelines. Code file submitted, wellstructured, and aligns seamlessly with the outputs and discussions in the report.


Introduction of dataset (5%): - Introduce your dataset e.g. mention its source, what it is, the period it covers, what it represents and so on


No attempt to describe the dataset.


No meaningful dataset description; irrelevant or erroneous.


Inadequate or incorrect dataset description; lacks relevance.


Minimal dataset information;


limited relevance or accuracy.


Weak description with errors or missing critical details.


Basic description; lacks depth or detailed


connection to the task.


Clear dataset description with adequate relevance.


Accurate dataset description with appropriate


detail and relevance.


Detailed description of dataset with strong


connection to the task.


Comprehensive and insightful description of dataset, including origin, coverage, and significance.


Appropriate annotation of code(5%):


- Annotate your program, and if relevant, you can use snippets from it in your report.


No annotations provided.


Barely any annotations; hinders code comprehension.


Very weak annotations, impeding understanding.


Minimal or inconsistent annotations.


Limited annotations reducing clarity.


Basic annotations; some parts lack explanations.


Adequate annotations, though some areas lack depth or clarity.


Code is wellannotated, enhancing clarity.


Clear and consistent annotations with minimal omissions.


Comprehensive and insightful annotations enhancing understanding and usability.


Depth, breadth and variety of data analytics (15%):


- Data analysed appropriately, in accordance with the chosen method, demonstrating depth and breath


No analytics performed.


Negligible or incoherent analytics.


Very weak analytics; results are superficial and poorly developed.


Minimal analytics performed; lacks critical thinking.


Limited analytical approaches, lacking depth or critical insights.


Basic analytics presented but lacks depth or variety.


Adequate depth with minor gaps or limited variety.


Good analytical depth, demonstrating critical reflection and some originality.


Strong analytical depth with minor omissions; demonstrates


critical thinking.


Exceptional analytical depth and variety, demonstrating innovative and insightful approaches.


Data analytics (types and representations of analytics) (15%):


- Ensure that the analytics done are clearly outlined and discussed both in text form and visual representations


No representations or relevant text provided.


Barely any visualisation or meaningful explanation.


Very weak or irrelevant representations.


Minimal effort in visualisation or text explanation.


Poor


representation; visuals/text are unclear or lack insight.


Basic


representations provided but lack clarity or


integration.


Adequate representation, with some gaps in clarity or integration.


Clear representation using appropriate


visuals and text.


Excellent visuals and textual explanation with minimal weaknesses.


Exceptional


integration of visuals and text, conveying insights with high originality.


Discussion/Findings (5%): - Data analysis and literature are critically discussed with clear arguments.


- Discuss how the analytics produces useful insights from the data.


No meaningful discussion or findings presented.


Barely any discussion or connection to findings.


Very weak discussion; lacks coherence and depth.


Minimal discussion; lacks critical insights or literature.


Limited discussion; findings weakly connected to arguments or


literature.


Findings are presented but lack critical evaluation or literature connection.


Adequate discussion; lacks depth or minor gaps in literature use.


Clear discussion of findings with evidence of


critical thinking and literature support.


Findings are well- discussed, with strong arguments and relevant


literature.


Findings are critically evaluated with exceptional insight, supported by literature and clear arguments.

Marking/Grade Sheet for Task 2: Implement machine learning algorithms for data analytics


Criteria


Fail 0-9%


Fail 10-19%


Fail 20-29%


Fail 30-39%


Fail 40-49%


Pass 50-59%


Merit 60-69%


Distinction 70-


79%


Distinction 80-


89%


Distinction 90-100%


Introduction(5%):


- Understanding of the given problems


- Problem decomposition into subtasks - Literature reviews (The most relevant ones)


- Include your innovative thinking


after the review





No introduction or problem understanding presented.





Barely recognisable problem understanding or review.





Very weak introduction;


lacks coherence and relevance.





Minimal effort in decomposition or literature review.





Weak problem decomposition; irrelevant or minimal


literature review.





Basic decomposition with minimal or generic literature review.





Adequate problem decomposition; minor weaknesses in


literature review.





Clear problem decomposition and appropriate literature review.





Detailed problem decomposition and thorough, relevant literature review.





Exceptional problem decomposition and insightful literature review, showcasing advanced understanding.


Approach(15%):


- A clear description of the chosen approach for each step


- The justification behind the chosen solutions/tools that they are the most appropriate choice to the given scenario


- (Compare with different models/functions that you may choose as alternatives before experiments) - Use a flowchart/diagram to


demonstrate the logical data flow of your finalised approach intuitively. - Code file submitted separately to


validate outputs in the report.









No approach described or justified. Code file not


submitted.










Barely any description or justification.










Very weak or irrelevant approach description.










Minimal effort in explaining or justifying the approach.









Weak or unclear approach; no alternative


considerations.






Basic approach; minimal


justification or alternative


considerations. C ode file submitted but with issues


(e.g., not executable or mismatched outputs).









Adequate approach with basic justification; limited consideration of alternatives.






Logical and welldescribed approach, with justification and minor gaps. Code file submitted, complete, and aligns perfectly


with the outputs presented in the report.





Clear and welljustified approach


with consideration of alternatives.


Code file submitted, complete, and aligns perfectly with the outputs


presented in the report.








Highly innovative approach, clearly described and justified with robust comparisons to


alternatives. Code file submitted, complete, and aligns perfectly with the outputs presented in the report.


Results analysis(20%):


- A clear description of the processed data after each step


- Visualise the results after each step with a description.


- Draw your conclusion about the data during processing.


- Clear explanation of the experimental results






No results analysis or evaluation was provided.






Barely any results analysis or meaningful evaluation.






Very weak analysis; needs coherence or


detail.





Minimal results analysis with superficial visualisation or evaluation.




Limited analysis, visualisation, and statistical evaluation could be clearer and stronger.





Basic analysis provided; lacks depth in visualisation or statistical evaluation.





Adequate analysis; minor gaps in clarity or statistical evaluation.





Clear results analysis with appropriate visualisation and statistical evaluation.





Results are thoroughly analysed and visualised with strong statistical evaluation.






Results were critically analysed with advanced statistical evaluation and impactful visualisation.


Discussion and conclusions(10%):


- Evaluation of the performance of the chosen approach


- Compare the different models that you have chosen with a detailed discussion, and explain how you have improved the models/results for multiple trials.


- Issues encountered are described, and solutions/actions taken are justified. - Alternative solutions and


suggestions for future work








No meaningful conclusions were presented.








Barely recognisable conclusions or suggestions.








Very weak or irrelevant conclusions.








Minimal effort in providing conclusions or evaluation.







Weak conclusions; minimal evaluation or irrelevant suggestions.







Basic conclusions;


limited evaluation or future suggestions.







Adequate conclusions; lacks depth or minor gaps in evaluation or suggestions.







Clear conclusions with a good evaluation of


performance and some future suggestions.






Strong conclusions with critical


performance evaluation and clear suggestions for future work.








Conclusions are highly insightful, critically evaluate performance, and propose innovative future solutions.

  • Uploaded By : Akshita
  • Posted on : May 24th, 2025
  • Downloads : 0
  • Views : 126

Order New Solution

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more