Your assignment should be submitted through the link provided on

Blackboard and should be separated into two formats. First, a single PDF report named Surname_studentid.pdf and second, a zipped file containing your codes also named Surname_ studentid.zip. This should contain working Python code for all parts of the assessment, in either .py or .ipynb format. You should also include a clearly written description of each file and its use in a Read Me.txt file in the same zip file.

Assessment task details This coursework will allow you to use the machine and instructions learning and data mining techniques covered in this module to analyse datasets that interest you, to draw conclusions based on your analysis, and finally to present your results in the form of a report.

There are three tasks to this assignment. For each assignment, you should choose a dataset amenable to the data mining task. For example, for Task 1 you should choose a dataset which contains a potential target variable suitable for classification. You should use a different dataset for each task.

For each task, you must fully explain and document your full experimental process, including the exploratory data analysis, data preparation and cleaning and the algorithms selected. A writing frame is provided on page six of this document, and you should complete a report for each of the three tasks (please provide all three reports in one PDF document). Overall, your submission should be around 5,500 words excluding references.

A report framework is provided at the end of this brief to provide more guidance on the expected contents of the report.

Task 1 (40 Marks)

Apply two classification algorithms of your choice on your chosen dataset using Python. Compare the performance of the two algorithms, justifying your choice of performance metrics. You should critically evaluate your classification models, and recommend which, if any, would be appropriate for future deployment. (25 Marks) classification algorithms: logistics regression, k nearest neighbour

Use Azure Machine Learning Designer to apply two classification algorithms to the same dataset as you used for part a). (15 Marks)

Task 2 (35 Marks)

Apply two clustering algorithms on a selected dataset of your choice using Python. You should provide an analysis and evaluation of the clusters identified and discuss which clustering method may be

better suited to your data.

Task 3 (25 Marks)

Using Python, apply text mining and sentiment analysis on a selected text dataset of your choice. Apply the needed preprocessing steps and analyze your results.

Assessment Criteria Assessment criteria are provided alongside this brief.

You should look at the assessment criteria to find out what we are specifically looking at during the assessment.

Assessed intended learning outcomes

On successful completion of this assessment, you will be able to:

Knowledge and

Understanding

4. Critically assess diverse issues regarding the use of data mining and machine learning in real-world contexts, including ethics Design and create reports to present analytical and interpretative information in creative and effective ways

Devise strategies for making effective use of analytical software such as Python and Azure Machine Learning Studio. Learn about different algorithms, such as classification,

Practical, Professional or

Subject Specific Skills clustering and text mining methods.

Diverse issues regarding the use of data mining techniques to real-world datasets

Discover patterns within a dataset using exploratory data analysis

Use of Python / Azure for data mining

Discover techniques to leverage Pythons features and work with its libraries

Reporting and presentation of analytical and interpretative information

Word count/ duration (if Your assessment should be 5,500 words (+/- 10%) in total across the applicable) three tasks.

Key elements of the report

You should have only one PDF file but for each task you should include the following elements (so

for example for task1 you will have a title, introduction, , references and similarly for task2 you will have a different title, introduction, , references )

For each part of your work, please include screenshots of the related code and its corresponding output in your report and explain it. Failure to do so will result in a deduction of marks.

First page

Include your first name, last name, student ID, school name, and the University of Salford logo.

Title

The title should provide an overview of the focus of your problem and the expected solution (for each task you should have a separate title).

Introduction

This section contains a brief background to the topic and leads to the formulation of the specific question, based on your selected topic. The research question must be focused and clear. You should provide a brief summary of relevant academic literature you can find which is relevant to the application of machine learning to your chosen dataset for the task

Datasets

You are welcome to choose any datasets that interest you, and that has enough data to enable meaningful analysis. In making your choice, you should be sure to consider what problems you would be able to solve by employing data mining on the dataset. In other words, you should ask yourself: How could I use data mining to answer one or more questions about the datasets?

Briefly describe the datasets you have used, independent and dependent variables, datatypes and the link/source from which you downloaded/obtained the data

Explanation and preparation of datasets (Exploratory Data Analysis)

Explain any preparation tasks (e.g., normalisation, dealing with missing values, handling class imbalance etc.) carried on the datasets.

Implementation in Python / Azure Machine Learning Designer

Implement your proposed approach using libraries available in Python or Azure (in classification task). This section will include:

A brief description of the algorithms used.

The application of data-mining techniques to selected datasets that you choose using Python (or Azure Machine Learning Designer for Task 1b).

Explanation of the experimental procedure, including the setting and optimisation of model hyperparameters during training, and your approach to validation.

Visualisation of the results.

Results analysis and discussion

Explain and justify the performance metric you choose to use to evaluate the model(s).

A clear and compelling presentation of the results that you obtain, both from the data mining and any other analysis that you may perform.

Evaluate and discuss the results. For tasks that require you to use more than one algorithm, you should compare and discuss the results obtained from each.

You should also consider and discuss any ethical, legal or professional considerations in using machine learning and data mining on the datasets you have selected.

Conclusions

The key points from the assignment must be synthesised within the conclusion. This must relate back to the introduction and the research question and provide an overall evaluation of the validity of the solution you have proposed.

4251964860671

References

You will list all publications referenced in the report. You should show evidence of sufficient readings related to your work.

References must follow the Harvard formatting system as in this guide:

APA 7TH edition referencing

Appendices (Optional)

Appendices may be used to provide relevant supporting evidence for reference but should only be used if necessary. Students may wish to include in appendices, evidence which confirms the originality of their work or illustrates points of principle set out in the main text.

Links for obtaining your datasets

The following links are provided as examples of data repositories which may be useful to obtain datasets. You are welcome to use them if you wish or you can use any other resource. You may want to choose ones in domains you have existing experience or interest in.

You shouldnt use the datasets we have used in any of the workshops.

We recommend avoiding very large datasets

Some data repositories:

UCI Machine Learning Repository https://archive.ics.uci.edu/datasets

Harvard Dataverse https://dataverse.harvard.edu/

Google Dataset https://toolbox.google.com/datasetsearch

Microsoft Research Open Data https://msropendata.com/

Data.gov.uk https://data.gov.uk/

re3data.org https://www.re3data.org/

Assessment criteria

Overall level 0-29% 30-49% 50-69% 70-89% 90-100%

tremely poor

Very

poor

Very

poor

Poor

Inadequate

Unsatisfactory

Satisfactory

Good

Very

Good

Very

Good

Excellent

Outstanding

Title,

Introduction,

Conclusion and

References

(15%) No Title/Very vague title Uninformative title, vague introduction, unreliable conclusion, Inadequate attempt made at proper referencing, many

errors/omissions

Satisfactory title, introduction well defines the studied problem and the intended tasks, relevant literature is lacking , incomplete conclusion, Acceptable attempt made at proper referencing, with a number of errors/omissions Informative and attractive title, clear setting of the scene and boundaries of the report in introduction, some relevant literature, conclusion drawn persuasively from results analysis and discussion. Referencing good, but with some errors and omissions.

Concise and appealing title, introduction presents an excellent clarity of focus of the report, relevant literature, conclusions are reliable and can be trustfully used by users.

Referencing are perfect.

Explanation of datasets, legal and ethical issues (if any)

(15%)

Did not perform data preparation steps for

ML correctly Insufficient collection of primary information, datasets are barely explained.

Direct download and no preparation of data for data mining task. Adequate engagement with relevant information collection, Adequate dataset explanation.

Good information collection, relevant to the assignment, Datasets clearly explained.

Detailed handling of data also mentioned some ethical and legal issues. Information collection of very high standard, relevant to assignment.

Concise and informative dataset explanation.

Outstanding handling of data, also considered important legal and ethical issues.

Implementation in Python (or Azure Machine Learning

Designer for Task

1b) (40%) No implementation

Implementation not justified for the task considered. Experimental

implementation and setup is lacking detail, little or no relevant description and discussion of relevant package and functions, and no critique of designs. Basic descriptions experiments, design, and statistics that could conducted, little or critique. of

be no Good descriptions of experiments, design, statistics that could be conducted, basic critique.

Detailed justifying of decision made for ethical principles throughout the data mining algorithmic selection and usage. Detailed descriptions of experiments, design, statistics that could be conducted, critique.

Outstanding justifying of decision made for ethical principles throughout the design, build and use of business intelligence systems and data mining algorithmic selection.

Results analysis No results Results are not presented Results are Results are clearly and Results are

and discussion interpretation or professionally, little or no presented using informatively presented. professionally presented

(30%) discussion was results analysis and proper means such Results analysis and at standard of a journal

presented in the report discussion as tables and

graphs, results discussion are specific and sufficient. publication. Results are critically analysed and

analysis and discussed. Valuable

discussion is general observation and finding

and shallow. are made from the results.

Download Solution Now

Uploaded By : Pooja Dhaka
Posted on : November 20th, 2024
Downloads : 0
Views : 212

Your assignment should be submitted through the link provided on

Download Solution Now

Download Solution Now

Choose a Plan

Premium

Gold

Silver

Your assignment should be submitted through the link provided on

Download Solution Now

Download Solution Now

Choose a Plan

Premium

Gold

Silver

Request a Call Back