diff_months: 11

Statistical analysis a Kaggle dataset

Download Solution Now
Added on: 2024-11-24 18:30:36
Order Code: SA Student Rihanaa Statistics Assignment(5_24_42747_765)
Question Task Id: 508265

Statistical analysis a Kaggle dataset

Student name: Rabia naseem

Table of Contents

Introduction2

Hypothesis Development...3

Methodology4

Data Analysis..5

Conclusion...10

Introduction

The following crucial variables are used while downloading the "all_perth" data set from https://www.kaggle.com/ for this assessment, price (Y) is continuous variable. Area of land (continuous variable denoted by X1, number of bedrooms, a variable that is continuous denoted by X2. And a grouping variable denoted by X3, which is number of bathrooms; value '0' for one bathroom and '1' for more than one. Grouping variable X4: garage has a value of '0' for a single garage and '1' for many garages. This study focuses on the response variable that is prices of properties, which a highly influenced by Xis. This report refers to an opportunity to model and analyse the property prices.The following is the analytical query or business problem that data analysis will investigate:

What aspects affect Perth real estate values, and how can these aspects be efficiently modelled and analysed to give stakeholders relevant information?

Importance:

The importance of this study in context of business intelligence covers strategic decisions making, market intelligence, risk management, resource allocation, customer insights and competitive advantages. In conclusion, firms in the real estate industry must use data analysis to analyse property pricing to make well-informed decisions, reduce risks, allocate resources optimally, comprehend client preferences, and obtain a competitive edge in the market. It is the cornerstone of business intelligence, directing processes for long-term profitability and success in strategic planning and decision-making. Thus, this indicates the importance and relevance of studying property prices in context of business intelligence.

Hypothesis Development

(a)

Using X3 as grouping variable, we have divided the prices into two groups with respect to X3=0,1. Now assuming that both groups of prices are as a random and independent sample drawn from two normal populations with equal variances. Lets test the claim whether the average prices for properties with number of bathrooms (with one and more than one) differ significantly at 5% level of significance. Suppose _1 and _2 the average prices of properties with 1 bath and more than one bath respectively.

Null Hypothesis

H0: The average price of property is same for both groups. (H1: 1 = _2)

Alternative Hypothesis

Ha: The average price of property is different for both groups. (Ha: _1_2)

(b)

Linear regression models are useful tools in business intelligence that help extract meaningful information from data. By using historical trends, these models enable firms to foresee future results through predictive analytics. Businesses can uncover important drivers of their objectives by gaining valuable insights into the relationship between variables by understanding the regression model's coefficients. These insights help with risk management and resource allocation, among other strategic decision-making processes. Regression analysis also helps with benchmarking and performance monitoring, allowing companies to measure their success and compare it to industry norms. If the prices of properties depend upon the land area, as larger the land area may lead to cause a price to be very high. Thus, larger land area affects more the price of any property. While smaller land area less affects the price of any property. That is, there is a positive linear relationship between the two variables. Using this scenario, we can set up a simple linear regression model property prices (Y) on land area (X1).

Null Hypothesis

H0: There is no linear relationship between price and land area of properties.

Alternative Hypothesis

Ha: There is a linear relationship between price and land area of properties.

Methodology

While downloading the "all_perth" data set from https://www.kaggle.com/, the following essential variables are used: Price (Y) is considered a continuous variable for this assessment. The continuous variable X1 represents the area of land, whereas X2 denotes the number of bedrooms. and the number of bathrooms, represented by the grouping variable X3, with a value of '0' for one bathroom and '1' for more than one. The value of the grouping variable X4: garage is '0' for a single garage and '1' for many garages. The response variable in this study is property prices, which are significantly impacted by Xi's. The opportunity to model and assess the pricing of real estate is mentioned in this study. For continuous variables that are scores in land area, number bedrooms and price, the histograms, boxplot, and scatter plot are constructed in Excel. The two-sample a independent t-test with equal variance is performed with respect to X3=0,1, we have split the prices into two groups using X3 as the grouping variable. Assuming for the moment that both pricing groups are independent, random samples taken from two normal populations with equal variances. Let's investigate the hypothesis that, at the 5% level of significance, the average prices of properties with a given number of bathrooms (one or more) differ significantly. Assume that the average prices of properties with one bathroom and more than one bathroom are _1 and _2, respectively.

Another tool to analyse a bivariate study is regression analysis. In this context, we performed a simple linear regression analysis to study the response variable that is property prices based on independent variable which is land area. In business intelligence, linear regression models are helpful instruments that aid in deriving important insights from data. These models allow businesses to use predictive analytics to forecast future outcomes by utilizing historical trends.

Data Analysis

For descriptive analytics, we used to plot the prices of properties and other variables through histogram, boxplot and scatter plot to understand the distribution of these variables. In this regard, the histogram of property prices and some other important variables as follows:

Figure 1

Figure 2

From the histograms in figure 1, 2 and 3, it can be concluded that the distribution of prices, land areas, and number of bedrooms are normally distributed as the shape of histogram is making an approximate a bell shape.

Figure 3

Figure 4

The above side-by-side box plot in figure 4, is constructed that shows that the distribution of prices with lot of outliers. Also, below is the scatter plot constructed for observing any relationship between prices and land area as indicated by figure 5. Since the fluctuations in the points is moving from left to right in an upward direction, thus there is a direct or positive relationship between the two variables.

Figure 5

For grouping variables that are number of bathrooms and garage, a stacked bar chart is constructed in figure 6.

Figure 6

Two sample t-test for independent samples

Decision and Conclusion:

The null hypothesis is rejected because the P-value from Table 2 is less than the significance level of 0.05. This means that there is enough statistical evidence to substantiate the assertion that the average prices for properties with a variety of bathroomsboth one and more than onediffer significantly at the 5% level of significance. The computations above make use of the following formula texts.

Estimating simple linear regression (SLR) model:

The estimated simple linear regression model from table 3 is:

The value of intercept is 597361.289, which is the average price for zero land area and thus have no practical interpretation here in our study. The value of slope is 5.3914, which indicates the average change in price for one unit change in land area. The value of coefficient of determination (R2) is 0.0029, which means that 0.29% of the total variation in the price is explained by land area. Also, the p-value of F-statistic for testing the overall significance of the simple regression model from table 3, is (0.0914) which is more than 5% level of significance, thus this model is not useful to estimating and predicting the price using land area.

Testing significance of regressor in SLR

For testing the significance of regressor in simple linear regression model (Model 1), we have:

From the above table 5, the p-value for t-test for testing the significance of regression coefficients or slope of X1 is 0.0914, which indicates a insignificant result as the p-value is greater than alpha=0.05. Thus, we conclude that, there is insufficient statistical evidence available to say that the independent variable X1 make a significant contribution to SLR model or Model 1, also, the 95% confidence interval for regression coefficient contains no zero representing the same idea.

Conclusion

We have the "all_perth" data set, which we downloaded from https://www.kaggle.com/for this assessment. The quantitative variables in this data set include land area, number of bedrooms, and property prices. In addition, the number of bathrooms and the presence of a garage are our two category or grouping factors. We visualise these variables using various graphical tools and discover some intriguing trends that are covered in section 1. Our continuous variables, with a few outliers, have a bell-shaped distribution that roughly corresponds to a normal distribution.

To ascertain whether or not the average prices of homes with multiple bathrooms differ substantially from the average price of properties with only one bathroom, we have conducted two-sample independent t-test and simple linear regression analysis at the 5% level of significance

  • Uploaded By : Pooja Dhaka
  • Posted on : November 24th, 2024
  • Downloads : 0
  • Views : 192

Download Solution Now

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more