Bayesian analysis project: Vinho Verde

Perform a logistic regression on a true dataset - Wine dataset CSV file is attached.

This dataset is related to red variants of the Portuguese "Vinho Verde" wine. The dataset is described in the publication by Cortez, P., Cerdeira, A., Almeida, F., Matos, T., & Reis, J. (2009). Modeling wine preferences by data mining from physicochemical properties.

The input variables (based on physicochemical tests) are:

fixed acidity

volatile acidity

citric acid

residual sugar

chlorides

free sulfur dioxide

total sulfur dioxide

density

pHsulphates

alcohol

The output variable (based on sensory data) is quality (a score between 0 and 10).

Analysis:

[2 points] Submit R code works with no error, independently from the correctness of it. Additionally, you must provide snippets of code within your report itself when answering each of the below questions.

[2 points] Read the dataset into R. Check if there are missing values (NA) and, in case there are, remove them.

[2 points] We want to implement a logistic regression, therefore we want a response variable which assume values either 0 or 1 . Suppose we consider "good" a wine with quality above 6.5 (included).

[4 points]Run a frequentist analysis on the logistic model, using the glm() function. What are the significant coefficients?

[5 points]Estimate the probabilities of having a "success": fix each covariate at its mean level, and compute the probabilities for a wine to score "good" varying total. sulfur. dioxide, and plot the results.

[15 points] Perform a Bayesian analysis of the logistic model for the dataset, i.e. approximate the posterior distributions of the regression coefficients, following these steps:

Write an R function for the log posterior distribution.

Fix the number of simulation at 104.

Choose 4 different initialisations for the coefficients.

For each initialisation, run a Metropolis-Hastings algorithm.

Plot the chains for each coefficients (the 4 chains on the same plot) and comment.

(Question 6 HINT: Generate separate plots for all coefficients (not just significant ones) and each plot should have 4 separate chains plotted on it corresponding to each initialization. When initializing at different locations, think about the purpose of this to help you decide how to choose the 4 different initializations. There's no single correct way to do it and you are free to do it, so long as it serves the purpose of considering multiple chains.)

[5 points] Approximate the posterior predictive distribution of an unobserved variable characterized by

fixed acidity: 7.5

volatile acidity: 0.6

citric acid: 0.0

residual sugar: 1.70

chlorides: 0.085

free sulfur dioxide: 5

total sulfur dioxide: 45

density: 0.9965

pH: 3.40

sulphates: 0.63

alcohol: 12

Plot the approximate posterior predictive distribution.

(Question 7 HINT: when plotting the distribution of the response variable, is the output what you would expect given what you know about the properties of the response variable (discrete random variable, probability mass function rather than probability density function).

[5 points] Use the metrop ( ) function available in the mcmc package to perform the same analysis on the posterior distribution you have approximated for Question 6. Choose again 104 simulations and compare the results with the results obtained with your code. (Here a visual comparison of the chains is enough to get full mark).

Download Solution Now

Uploaded By : Pooja Dhaka
Posted on : November 13th, 2024
Downloads : 1
Views : 185

Download Solution Now

Choose a Plan

Premium

80 USD

All in Gold, plus:
30-minute live one-to-one session with an expert
- Understanding Marking Rubric
- Understanding task requirements
- Structuring & Formatting
- Referencing & Citing

Most
Popular

Gold

30 50 USD

Get the Full Used Solution
(Solution is already submitted and 100% plagiarised.
Can only be used for reference purposes)

Save 33%

Silver

20 USD

Journals
Peer-Reviewed Articles
Books
Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more

Bayesian analysis project: Vinho Verde

Download Solution Now

Download Solution Now

Choose a Plan

Premium

Gold

Silver

Request a Call Back