diff_months: 8

Multivariate Data Analysis Project

Download Solution Now
Added on: 2025-03-21 18:30:18
Order Code: SA Student Ragavendra Statistics Assignment(9_24_45381_502)
Question Task Id: 515288

Multivariate Data Analysis Project

Background

You have been hired by an Australian company that manufactures equipment used by divers harvesting abalone, a kind of marine snail used primarily for its meat but also its eggs. For their next generation product, they want to develop an integrated set of calipers and diving goggles that will allow the diver to quickly measure the dimensions (length, diameter, and height) of an abalone and immediately view some statistically predicted information about it in a heads up display, aiding in the decision whether to harvest it or to leave it alone for the time being. In particular, to assess the profitability of harvesting a particular abalone, they are interested in predicting the shucked weight of the abalone (the amount of meat for human consumption), its viscera weight (biomass useful for other purposes), and the relationship between the two, as well as the abalones sex. (This is because female abalone can also be sold for their eggs, which are more valuable.) To assess the sustainability impact of harvesting the abalone, they are also interested in predicting the age and the sex of an abalone. (In particular, harvesting female abalone has a greater sustainability impact, in that it has a stronger effect on the size of the next generation of abalone.) It is also helpful to predict how profitable a particular abalone is likely to be given the market prices. Furthermore, the requirement that the microcircuitry built into the diving equipment be rugged, compact, energy-efficient, and inexpensive means that its computing power is limited, and so your client would prefer prediction methods that (once fitted) are computationally cheap for predicting new observations, so deep learning techniques are out, but, say, something that uses some (potentially transformed) linear or quadratic prediction or a support vector machine that doesnt use too many support vectors would be suitable. Data

Data were collected about a large sample of abalone and can be found in abalone.csv. Researchers collected the following information about each specimen: Sex Male, Female, or Infant Length (mm) longest shell measurement Diameter (mm) perpendicular to length Height (mm) with meat in shell Whole weight (grams) whole abalone Shucked weight (grams) weight of meat Viscera weight (grams) gut weight (after bleeding) Shell weight (grams) after being dried Rings number of rings (can be used to estimate the mollusc's age: adding 1.5 gives the age in years)

Questions

Question 1: Sustainability Propose, justify, and assess (compare) methods (at least two) for predicting the sex of the abalone (with Infant being considered its own sex for the purposes of this) based on its exterior measurements (length, diameter, and height). Of interest are:

1. predicting the sex of the abalone in general; 2. predicting specifically Infants as opposed to others (to avoid harvesting them);

3. predicting specifically Females as opposed to others (when profitability is prioritised); and

4. predicting specifically Males as opposed to others (when sustainability is prioritised)

Question 2: ProfitabilityPropose, justify, and assess methods for predicting its shucked and visceral weights (transformed, if necessary) of an abalone based on its exterior measurements (length, diameter, and height).

In particular, because prices fluctuate, the relative profitability of meat and viscera can vary over time. This means that it would be helpful to let the user get a profitability index for an abalone given that day's prices without having to reprogram it from scratch. Develop an algorithm (i.e., a series of steps or a function) that takes as its inputs:

length, diameter, and height of an abalone;vshucked, the dollar value of 1 gram of shucked weight; and

vviscera, the dollar value of 1 gram of viscera weight;

and produces:

an estimate of the value of that abalone: S=vshuckedXshucked+vvisceraXviscera, where Xshucked is the abalone's shucked weight in grams and Xviscera is the abalone's viscera weight in grams;a prediction interval that contains the true value of the abalone some specified (e.g., 90) percent of the time. (Pay attention to the difference between prediction interval and confidence interval.)

Note that due to the computational constraints, this algorithm must rely in precomputed summaries of the data (e.g., means and covariance matrices), and one that requires refitting the prediction model for every new vshucked and vviscera is not a valid solution. On the other hand, since the sample size is quite large, you may assume that any parameters you estimate are not meaningfully different from their true population parameters.

Other Rules and Guidelines

Assumptions and requirements

You are responsible for ensuring that the assumptions and requirements of the techniques you use are met, as well as for cleaning data of outliers (if appropriate) and making appropriate transformations; be sure to document them and provide the appropriate graphical and other diagnostics in the R notebook.

Techniques

Obviously, there are some techniques that you have learned outside of this course, such as logistic regression, that are also suitable for some of these tasks. Feel free to attempt them and mention them in your report if they turn out to be superior, but you must also provide evidence that you have attempted to apply the appropriate techniques covered in this course and mention them in your report.

Sources

The data set is based (with modifications) on Warwick J. Nash, Tracy L. Sellers, Simon R. Talbot, Andrew J. Cawthorn and Wes B. Ford (1994) The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait, Sea Fisheries Division, Technical Report No. 48 (ISSN 1034-3288). Retrieved from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/abalone). Some of the questions are motivated by this article: https://www.theatlantic.com/health/archive/2010/03/how-to-sex-an-abalone-a-sea-snails-story/37198/

Data Analysis Report Instructions:

1,000 words equivalent (not including graphics, computer code and output).

The requirement comprises two parts

Part 1: Select the appropriate statistical techniques (attached lecture notes) to answer the research questions and carry out the analysis in R, including verification of the assumptions and requirements of the techniques.

Part 2: Write a concise report summarising your analysis and providing the answers to the substantive questions based on your analysis, aimed at a non-specialist audience.

Presentation style/format

Part 1: The submission must be an R notebook (an HTML file incorporating R commands and their output, also having the original R Markdown file embedded in it). It should be possible to reproduce your analysis computations by running the R Markdown file.

The textual annotations around the commands and the output must justify why a particular function, transformation, or other statistical technique was used. Similarly, the output must be annotated with the conclusions drawn from it.

Part 2: Your report should be between 1,000 words per dataset (word count need not be strictly followed).

Your report should be rendered into a machine-readable PDF. You may compose it in R markdown. It should be written using correct spelling, grammar, and punctuation. You may use subsections as needed.

Quantitative information should be well-formatted, clearly described, and appropriately communicated (e.g., using figures and tables that are appropriately labeled, and not pasting computer output directly).

  • Uploaded By : Pooja Dhaka
  • Posted on : March 21st, 2025
  • Downloads : 0
  • Views : 166

Download Solution Now

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more