diff_months: 11

NIT6160 Data Warehousing And Mining Project

Download Solution Now
Added on: 2023-06-09 05:31:51
Order Code: clt309084
Question Task Id: 0
  • Subject Code :

    NIT6160

  • Country :

    Australia

Introduction

Airbnb has successfully disrupted the traditional hospitality industry as more and more travellers decide to use Airbnb as one of the primary accommodation providers. From the Barwon South West data set provided in inside Airbnb, we hope something valuable to potential investors and hosts could be found out with data mining and machine learning.

Dataset

Inside Airbnb - Barwon South West, Vic, Victoria, Australia The dataset could be downloaded from the link below:
http://insideairbnb.com/get-the-data.html

Files:

There are two files that will be mainly used for quantitative analysis with python data mining.

  1. listings.csv: Detailed listings for Barwon South West
  2. reviews.csv: Detailed Review Data for Barwon South West

9june1-1686298571.jpg

Task 1: Data Pre-processing

Pre-processing is designed to select the proper columns data to work with and clean the dataset like removing the Nan values and dealing with the data format.

  1. Deciding which columns to work with We want to keep the information from the dataset as much as possible while removing those irrelevant columns. Removing the irrelevant information could effectively reduce the unnecessary information and avoid the curse of dimensional, thus to increase the model’s performance.
  2. Cleaning prices and dealing with missing values Operations to change the currency to float values and drop the rows with Nan values

Task 2: Exploratory Data Analysis (EDA) with Data Visualization

As we are focusing on predicting the prices for accommodations and finding out features that contribute to high prices. We first see the price distribution through boxplot and real street map.

9june2-1686298577.jpg

  • Summarize the number of accommodations in each market/ each region
  • Summarize the mean price of accommodations in each market/ each region

Task 3: Building the Accommodation Prediction Model

Now, we are trying to build a model to predict the price. The samples are divided into a training set (80?ta samples ) and a testing set (20?ta samples).

  1. Choose a supervised model such as Xgboost, ANNs and other models to implement your price predictor
  2. Perform an analysis to discuss what kinds features are most related to the accommodation price

Task 4 ?Advanced tasks??Sentiment analysis

  1. Perform the sentiment analysis for the review comments
  2. Analysis of the reasons why people like and dislike the accommodations (open question)The analysis could be performed in the following aspects: The hotel view, the location, staff attitude and

Prepare a report

Your report should contain the following:

  • Introduction
  • The methods applied for solving each task and why you choose it
  • Results: Include results and screenshots of the above experimentations.
  • Discussion and error analysis: Try to interpret the results of your model. Discuss intuitions or hypothesis that can be obtained by visual inspections of the resulting classes or clusters. Mention about assumptions if any, discuss issues that might have affected the model's performance.
  • Challenge and problem during project

Are you struggling to keep up with the demands of your academic journey? Don't worry, we've got your back! Exam Question Bank is your trusted partner in achieving academic excellence for all kind of technical and non-technical subjects.

Our comprehensive range of academic services is designed to cater to students at every level. Whether you're a high school student, a college undergraduate, or pursuing advanced studies, we have the expertise and resources to support you.

To connect with expert and ask your query click here Exam Question Bank

 

  • Uploaded By : Katthy Wills
  • Posted on : June 09th, 2023
  • Downloads : 0
  • Views : 101

Download Solution Now

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more