Predicting Household Energy Consumption Using Weather and Sensor Data DSC502

Subject Code :
DSC502

SEE ALL HIGHLIHTED IN YELLOW

Question 1: The dataset contains house temperature and humidity conditions monitored with a ZigBee wireless sensor network. There are 9 wireless nodes in the system that each node transmitted the temperature and humidity conditions in 80 randomly selected days of year 2016. These nodes are in kitchen (R1), living room (R2), laundry room (R3), office (R4), bathroom (R5), outside the building (R6), ironing room (R7), teenagers room (R8), and parents room (R9). The energy consumption data of appliance and lights for the same dates were logged with m-bus energy meters. The weather data including wind speed, pressure, Tdewpoint and temperature and humidity of the nearest airport weather station is also included in the dataset. The data also contains data record date and time information.

Part a) Build a model to predict the total energy consumption of the house based on weather data. You may need to implement preprocessing before fitting an appropriate model.

I create total energy consumption of the house by adding

(TemperatureR1+HumidityR1+TemperatureR2+HumidityR2+TemperatureR3+HumidityR3+ TemperatureR4+TemperatureR5+HumidityR5+TemperatureR6+HumidityR6+TemperatureR7+TemperatureR8+ HumidityR8+TemperatureR9+HumidityR9).

Missing values, were checked no missing value in our data.

Existence of unusual observation checked, I check validation data, no cases were found.

Checking the distribution of my dependent variable total energy consumption of the households using histogram

As I see from below histogram of total energy consumption in the house is normal distributed. So multiple linear regression is a good candidate for our data.

Validation Tests: Efficiency of the Model

As I see from the below table of model summary, the value of R shows that model 4 explains 81.6% of variability in the total energy consumption in the house were accounted by the inclusion of date, tdewpoint, time in hours, and station humidity in the model.

Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate Durbin-Watson

1 .524a .275 .266 28.85352 2 .772b .596 .585 21.68195 3 .802c .643 .629 20.50150 4 .816d .666 .648 19.98641 1.270

a. Predictors: (Constant), Date

b. Predictors: (Constant), Date, Tdewpointc. Predictors: (Constant), Date, Tdewpoint, Time in hours

d. Predictors: (Constant), Date, Tdewpoint, Time in hours, Station Humidity

e. Dependent Variable: Total energy consumption in the house

Validation Tests: Overall Model Test

As I see from below ANOVA table the overall model is significant at 5% level of significance, because p-value were less than 0.05. So, this indicate that at least one predictor variables in the model (date, tdewpoint, time in hours, and station humidity) were statistical predict the value of total energy consumption in the house.

ANOVAaModel Sum of Squares dfMean Square F Sig.

4 Regression 59609.159 4 14902.290 37.306 .000b

Residual 29959.258 75 399.457 Total 89568.416 79 a. Dependent Variable: Total energy consumption in the house

b. Predictors: (Constant), Date, Tdewpoint, Time in hours, Station Humidity

Interpretation of Model Results

As I see from below table of our multiple linear regression model results. So our model is:

y= 114547.85-8.34e-6X1+5.703X2-0.859X3+0.441X4where y- Total energy consumption in the house, x1- date, x2- tdewpoint, x3- time in hours, x4- station humidity

For one additional day increases in the use of electricity, the total energy consumption in the house were decreases by -8.343E-6Wh.

For one-unit increase in tdewpoint, the total energy consumption in the house were increased by 5.703 WhIf humidity increase by one unit, the total energy consumption in the house were increased by 0.441 Wh.

One more hour use of electricity in the houses, would results in the decreases of total energy consumption in the house by 0.859Wh.

Model Unstandardized Coefficients t Sig. 95.0% Confidence Interval for B

B Std. Error Lower Bound Upper Bound

4 (Constant) 114547.855 12242.314 9.357 .000 90159.914 138935.795

Date -8.343E-6 .000 -9.325 .000 .000 .000

Tdewpoint5.703 .683 8.348 .000 4.342 7.064

Time in hours -.859 .379 -2.268 .026 -1.614 -.104

Station Humidity .441 .198 2.229 .029 .047 .836

As I see from below normal p-p plot of standardized residual shows that all point were fall in the straight line, which is the indication of the fulfilling of normality assumption.

As I see from below scatter plot of residual versus predicted value, all point was horizontal equal distributed around the center (zero), which is the indication of the fulfillments of homogeneity of variance assumptions.

Part b) Which one of the following factors affects the total energy consumption of the lights and appliances?

Station humidity

Station temperature

Wind speed

Pressure

TdewpointDate

Time

I created total energy consumption of the house hold by adding the energy consumption data of appliance and lights for the same dates were logged with m-bus energy

Missing values, were checked no missing value in our data.

Existence of unusual observation checked, I all check validation data, no cases were found.

Checking the distribution of our dependent variable total energy consumption of the households using histogram

As I see from below histogram of total energy consumption is positively skewed. So transformation was recommended to run regression analysis as dependent variable. When variable distributed positively skewed, log transformation was recommended, so now going to apply it.

Validation Tests: Efficiency of the Model

As I see from the below table of model summary, the value of R shows that 43.0% of variability in the total energy consumption from light and appearances were accounted by the inclusion of time in hours and tdewpoint in the model.

Model SummarycModel R R Square Adjusted R Square Std. Error of the Estimate Durbin-Watson

1 .363a .132 .120 .26350 2 .430b .185 .164 .25691 1.772

a. Predictors: (Constant), Time in hours

b. Predictors: (Constant), Time in hours, Tdewpointc. Dependent Variable: Total energy consumption from light and appearances

Validation Tests: Overall Model Test

As I see from below ANOVA table the overall model is significant at 5% level of significance, because p-value were less than 0.05. So, this indicate that at least one predictor variables from the model whichn is either time in hours and tdewpoint were statistical predict the value of Total energy consumption from light and appearances

ANOVAaModel Sum of Squares dfMean Square F Sig.

2 Regression 1.154 2 .577 8.746 .000c

Residual 5.082 77 .066 Total 6.237 79 a. Dependent Variable: Total energy consumption from light and appearances

b. Predictors: (Constant), Time in hours

c. Predictors: (Constant), Time in hours, TdewpointAs I see from below table of our multiple linear regression model results, we get only one significant predictor variables;

logy= 1.697+0.012X1+0.017X2As the time of using electricity/energy increases by one hour the total energy consumption from light and appearances were increased by logarithms of 0.012 Wh or by (e0.012=1.012 Wh), keeping constant other variables in the models.

Model Unstandardized Coefficients t Sig. 95.0% Confidence Interval for B

B Std. Error Lower Bound Upper Bound

2 (Constant) 1.697 .060 28.322 .000 1.577 1.816

Time in hours .012 .005 2.659 .010 .003 .021

Tdewpoint.017 .007 2.249 .027 .002 .032

As I see from below normal p-p plot of standardized residual shows that all point was fall in the straight line, which is the indication of the fulfilling of normality assumption.

Part c) Use an appropriate technique to identify whether weekdays affect appliance energy consumption of the house or not.

I used one-way ANOVA to check if the appliance energy consumption of house affected by weekdays.

As I see from below results the appliance energy consumption of the house doesnt vary across the weekdays; this indicate that weekdays doesnt affect the appliance energy consumption of the house.

ANOVA

Appliances Energy (Wh)

Sum of Squares dfMean Square F Sig.

Between Groups 88433.836 6 14738.973 1.726 .127

Within Groups 623484.914 73 8540.889 Total 711918.750 79 Part d) Given the data collected from wireless sensors, build single metric for house temperature based on the data collected from temperature sensors. Discuss your results.

MISSING

Question 2: Given the data collected for the following features of 20 different batteries:

Charge rate (Continuous)

Discharge rate (Continuous)

Depth of discharge (Continuous)

Temperature (Categorical)

End of charge volt (Continuous)

Failed or not failed (Binary)

NEED MODEL SPSS

Part a) Which technique can be applied to predict the probability of battery failure based on these features?

Logistics regression can be used to predict the probability of battery failure based the given features.

Part b) Explain in detail model evaluation and validation steps of this technique.

NEED MODEL SPSS

To evaluate our logistics regression model I use the following techniques;

Hosmer-Lemeshow Test

Classification matric ( using prediction accuracy)

Wald test

Deviance test

AIC, BIC

To validate my logistics regression model I use the following techniques, I classify our data into two;

Training data: I use this data to fit our model

Testing data: I use this data to predict the dependent variable and compare with the actual values.

Question 3: Compare clustering and classification and explain their similarities and differences.

NEED MODEL SPSS

Classification and clustering are data mining methods for analyzing data sets and dividing them based on some specific classification rules or the association between objects. Classification categorizes data using provided training data. Clustering, on the other hand, categorizes data using various similarity measures.

Their Similarities

Both classification and clustering is used for categorization or dividing of objects into groups based on one or more characteristics.

Both classification and clustering has one common aim that is identifying similarities among data.

Their Differences

The main difference between classification and clustering is that classification assigns objects to predefined classes, whereas clustering identifies similarities between objects and groups them based on those characteristics in common that distinguish them from other groups of objects. These groups are referred to as "clusters."

Classification is used for the supervised learning and clustering is used for unsupervised learning.

Because classification has labels, training and testing datasets are required for validating the model, whereas clustering does not require training and testing datasets.

In the classification method, a training sample is provided, whereas in the clustering method, no training data is provided.

Question 4: Assume that you have access to 1000 records of a population with the following variables:

X1: Gender (F/M)

X2: Higher Education (Y/N)

X3: Age (Numerical)

X4: Income (Numerical)

X5: Years of Work Experience (Numerical)

X6: Organization (Categorical)

X7: City (Categorical)

X8: Healthy Life (Y/N)

X9: Marriage Status (Categorical)

X10: Exercise (Y/N)

X11: Diagnosed Cancer (Y/N)

X12: Job (Categorical)

X13: Weight (Numerical)

X14: Test Score (Numerical)

List three questions that you can investigate using this data. Identify an appropriate data analytic technique that can be applied to answer each question. List independent and dependent variables that you will select for this analysis.

Answer: Based on 1000 records of population and variables, I can investigate the following questions; NEED MODEL SPSS

To predict the probability of population having healthy life based on the following features: age, exercise, gender, higher education, job, income, and marital status.

To answer my current question- logistics regression model were appropriate.

Independent variables were: age, exercise, gender, higher education, job, income, and marital status.

Dependent variable was health life which is categorized into yes or No.

To predict the probability of population diagnosed cancer based on the following features: age, higher education, job, marital status, job, weight, and exercise.

To answer my current question- logistics regression model were appropriate.

Independent variables were: age, higher education, job, marital status, job, weight, and exercise.

Dependent variable was diagnosed cancer which is categorized into yes or No.

To predict the average income of the population based on the following features: age, gender, marriage status, higher education, years of experience, job, city, and organization

To answer my current question- multiple regression model were appropriate.

Independent variables were: age, gender, marriage status, higher education, years of experience, job, city, and organization

Dependent variable was Income of population measured in currency which is numerical

Download Solution Now

Uploaded By : Nivesh
Posted on : December 21st, 2024
Downloads : 0
Views : 203

Predicting Household Energy Consumption Using Weather and Sensor Data DSC502

SEE ALL HIGHLIHTED IN YELLOW

Download Solution Now

Download Solution Now

Choose a Plan

Premium

Gold

Silver

Predicting Household Energy Consumption Using Weather and Sensor Data DSC502

SEE ALL HIGHLIHTED IN YELLOW

Download Solution Now

Download Solution Now

Choose a Plan

Premium

Gold

Silver

Request a Call Back