diff_months: 10

Stage 2 Folio task, Mathematical Methods

Download Solution Now
Added on: 2024-12-23 12:00:37
Order Code: SA Student Minh Statistics Assignment(10_22_29393_91)
Question Task Id: 468408

Stage 2 Folio task, Mathematical Methods

The purpose of this investigation task is to demonstrate the knowledge, skills and concepts you have learned by investigating a statistical problem of your own choosing.

You will collect data from a reliable source or conduct a survey to obtain data. Organise and display the data appropriately and perform statistical calculations in order to interpret and use the data.

It is up to you to choose the topic/question you shall investigate; however, you should consider whether data is readily available. Topics of a more social or community nature often lend themselves well to a meaningful analysis and discussion.

Use two sets of data, representing the values of two continuous random variables that are linearly correlated to each other.Critically analyse your results and describe what conclusions you can draw from your investigations.

You should consider:

the appropriateness of the size of the data used

the effect of outliers on your discoveries

calculation of statistics and statements of your sample compared to the population

using the sample to predict elements of the population data

comparative analysis of data

use of graphs to support analysis

any assumptions and limitations of the investigation.

PART A (Finding a probability density function (PDF) for the 1st set of data; say variable, X)

The purpose is to create the PDF that approximates the distribution of the values of X.

Some points to consider:

Apply different methods in finding a PDF for the variable X (see examples provided in class).

State any assumptions made and any limitations of the methods used in finding the PDF.

Explain and comment on the appropriateness of the methods used and of the results obtained.

Describe, by giving examples, how the PDF obtained could be used in interpreting the data set of values for the variable X.

Apply the PDF created in solving problems of personal interest.

PART B (Finding a probability density function (PDF) for the 2nd set of data; say variable, M)

Determine the linear correlation between the variables M and X, i.e. M = aX + b

Analyse variable M, similarly to the analysis for variable X.

Describe how a PDF for variable M could be found by using the PDF of variable X and the linear relationship.

Generalisation:

Find the PDF of a continuous variable M=aX+b, of values m=ax+b,

given the PDF of the variable X taking values x, where x1xx2.

Complete a report for the mathematical investigation.

The investigation report should be a maximum of 15 single-sided A4 pages if written, or the equivalent in multimodal form.

The report may take a variety of forms, but would usually include the following:

an outline of the problem and context

the method required to find a solution, in terms of the mathematical model or strategy used

the application of the mathematical model or strategy, including

relevant data and/or information

mathematical calculations and results, using appropriate representations

the analysis and interpretation of results, including consideration of the reasonableness and limitations of the results

the results and conclusions in the context of the problem.

A bibliography and appendices, as appropriate, may be used.

The format of an investigation report may be written or multimodal.

The report, excluding bibliography and appendices if used, must be a maximum of 15 A4 pages if written, or the equivalent in multimodal form. The maximum page limit is for single-sided A4 pages with minimum font size 10. Page reduction, such as 2 A4 pages reduced to fit on 1 A4 page, is not acceptable. Conclusions, interpretations and/or arguments that are required for the assessment must be presented in the report, and not in an appendix. Appendices are used only to support the report, and do not form part of the assessment decision.

Assessment Design Criteria

Concepts and Techniques

The specific features are as follows:

CT1Knowledge and understanding of concepts and relationships.

CT2Selection and application of mathematical techniques and algorithms to find solutions to problems in a variety of contexts.

CT3Application of mathematical models.

Reasoning and Communication

The specific features are as follows:

RC1Interpretation of mathematical results.

RC2Drawing conclusions from mathematical results, with an understanding of their reasonableness and limitations.

RC3Use of appropriate mathematical notation, representations, and terminology.

RC4Communication of mathematical ideas and reasoning to develop logical arguments.

Performance Standards for Stage 2 Specialist Mathematics

- Concepts and Techniques Reasoning and Communication

A Comprehensive knowledge and understanding of concepts and relationships.

Highly effective selection and application of mathematical techniques and algorithms to find efficient and accurate solutions to routine and complex problems in a variety of contexts.

Successful development and application of mathematical models to find concise and accurate solutions.

Appropriate and effective use of electronic technology to find accurate solutions to routine and complex problems. Comprehensive interpretation of mathematical results in the context of the problem.

Drawing logical conclusions from mathematical results, with a comprehensive understanding of their reasonableness and limitations.

Proficient and accurate use of appropriate mathematical notation, representations, and terminology.

Highly effective communication of mathematical ideas and reasoning to develop logical and concise arguments.

Effective development and testing of valid conjectures, with proof.

B Some depth of knowledge and understanding of concepts and relationships.

Mostly effective selection and application of mathematical techniques and algorithms to find mostly accurate solutions to routine and some complex problems in a variety of contexts.

Some development and successful application of mathematical models to find mostly accurate solutions.

Mostly appropriate and effective use of electronic technology to find mostly accurate solutions to routine and some complex problems. Mostly appropriate interpretation of mathematical results in the context of the problem.

Drawing mostly logical conclusions from mathematical results, with some depth of understanding of their reasonableness and limitations.

Mostly accurate use of appropriate mathematical notation, representations, and terminology.

Mostly effective communication of mathematical ideas and reasoning to develop mostly logical arguments.

Mostly effective development and testing of valid conjectures, with substantial attempt at proof.

C Generally competent knowledge and understanding of concepts and relationships.

Generally effective selection and application of mathematical techniques and algorithms to find mostly accurate solutions to routine problems in a variety of contexts.

Successful application of mathematical models to find generally accurate solutions.

Generally appropriate and effective use of electronic technology to find mostly accurate solutions to routine problems. Generally appropriate interpretation of mathematical results in the context of the problem.

Drawing some logical conclusions from mathematical results, with some understanding of their reasonableness and limitations.

Generally appropriate use of mathematical notation, representations, and terminology, with reasonable accuracy.

Generally effective communication of mathematical ideas and reasoning to develop some logical arguments.

Development and testing of generally valid conjectures, with some attempt at proof.

D Basic knowledge and some understanding of concepts and relationships.

Some selection and application of mathematical techniques and algorithms to find some accurate solutions to routine problems in some contexts.

Some application of mathematical models to find some accurate or partially accurate solutions.

Some appropriate use of electronic technology to find some accurate solutions to routine problems. Some interpretation of mathematical results.

Drawing some conclusions from mathematical results, with some awareness of their reasonableness or limitations.

Some appropriate use of mathematical notation, representations, and terminology, with some accuracy.

Some communication of mathematical ideas, with attempted reasoning and/or arguments.

Attempted development or testing of a reasonable conjecture.

E Limited knowledge or understanding of concepts and relationships.

Attempted selection and limited application of mathematical techniques or algorithms, with limited accuracy in solving routine problems.

Attempted application of mathematical models, with limited accuracy.

Attempted use of electronic technology, with limited accuracy in solving routine problems. Limited interpretation of mathematical results.

Limited understanding of the meaning of mathematical results, their reasonableness, or limitations.

Limited use of appropriate mathematical notation, representations, or terminology, with limited accuracy.

Attempted communication of mathematical ideas, with limited reasoning.

Limited attempt to develop or test a conjecture.

Introduction

Continuous random variable can take on infinite different values within a range. consider a continuous random variable x, which has a probability density function, that defines the range of probabilities taken by this function as f(x). After plotting the pdf, you get a graph as shown below:

6743707112000

A probability density function (PDF) is a function that describes the relative likelihood for this random variable to take on a given value. It is given by the integral of the variables density over that range. It can be represented by the area under the density function but above the horizontal axis and between the lowest and greatest value of the range.

The investigation is to analyse the correlation between the marks students obtained with the hours of study. The data is of a sample size of 110 and was taken from Openintro. The investigation aims to define the number of hours of study required to achieve the maximum mark as well as apply other forms of statistical analysis to determine correlation between the set of data.

Part A:

Method 1:

The 2 continuous random variables data of hours of study and the grade were found from the website on the website (openintro, 2012) which was collected at a private US university as part of an anonymous survey in an introductory statistics course. The data frame with 110 observations on the following 2 variables.

List 1 - study hours: number of hours student spend to study per week

List 2 - gpa: grade point average (GPA) the student ranging from 0-4

24244305461000Step 1: The data set for use in this investigation has been downloaded as a CSV-file form in order to transfer the data from laptop to Casio graphic calculator.

Because of the large sample size of data set, the data has been used with different range of interval to make the data size smaller. The interval chosen for list 1 study hours included interval 1, 2, 4, and 10. The interval chosen for list 2 GPA which is interval 1, 0.125 and 0.5.

2519045-8636000Step 2: When the specific interval is determined, using graphic calculator to draw the histogram corresponding to that interval from the raw data.

25793705778500Step 3: After the histogram is drawn, using trace mode to find the mid-point of the bars of the histograms and the frequency. Entering the midpoint and frequency is found into list 3 and list 4.

Step 4: Finding the relative frequency by using the frequency list divide by the total sample which is List 4110 and input the data into List 5

27012906921500Step 5: Find the cumulative frequency by using the Cuml mode, the cumulative frequency is found based on relative frequency List 5 and then input the data into List 6

278511015621000Step 6: Logistic function need to be found in order to find the probability density function. By using the logistic function of graphic calculator, the logistic regression has been found.

268097025717500Step 7: Copy the logistic regression to the graph Y5, the derivative of the logistic function is the probability density function Y6=ddxY5Table 1: Variable 1- GPA

Histogram Probability density function Interval 1 450215-26416000Figure 1 400685-34988500

Figure 3.

452755-32385000Figure 5.

Interval 0.125 424815-25082500

Figure 2.

433705-17716500

Figure 4. 434340-18224500

Figure 6.

Interval 0.5 482600-6667500 520700127000 580390-4445000

The probability density function:

Interval 1: Fx=1.00004671+13610729.7928e-6.426x ddx1.00004671+13610729.7928e-6.426x =1.0000467ddx11+13610729.7928e-6.426x

=1.0000467ddx(1+13610729.7928e-6.426x)-1 PDF=87466634.1496e-6.426x(1+13610729.7298e-6.426x)2Interval 0.125: Fx=1.0241+4014470448.062e-6.274xPDF=25791270493.32837e-6.274x(1+4014470448.062e-6.274x)2Interval 0.5: Fx=1.002181+7495123383.51e-7.284xPDF=54713494689.1084e-7.284x(1+7495123383.51e-7.284x)2According to the table above of the results found after all the calculation has been done to draw the histogram graph and to find the probability density function of the GPA variable in different interval, the conclude has been drawn. A histogram is a type of chart that allows me to visualise the distribution of values in a dataset. The x-axis displays the values in the data set and the y-axis shows the frequency of each value. Depending on the values in the dataset, a histogram can take on many different shapes. The histography with interval 1 and 0.125 is kind of defined as a bell-shaped as it resembles a bell curve and has one single peak in the middle of the distribution. This type of distribution is normal distribution. According to the histogram showing above (Figure 1 and 2), the histogram with smaller interval includes more columns and is narrower.

The graph of probability density function with interval 1 (Figure 3) shows that for a set data size 110 people surveyed shows that for a larger sized interval. Where the GPA is larger, more people are grouped in one category resulting in a more skewed and less detailed normal distribution graph. The second graph with interval 0.125 (Figure 4) shows that for a smaller sized interval, a more detailed normal distribution graph is produced.

The first graph has an interval of 1 data set. Which means the probability function area is 0.82 In comparison to the second graph, which display a smaller interval of 0.125. A larger area of probability density function of 0.85 is shown. This makes sense as the area is larger for a smaller interval size.

Methods 2:

The probability density function of the continuous random variable is the derivative of the variables cumulative distribution function

Based on theoretical:

If X is normally distributed then its probability density function is

65987614179900

68326018415000

Where:

Using graphic calculator based on the given raw data to find the mean and the standard variation to support to find the PDF of the variable GPA

Let X is the variable GPA

fx=10.2752e-12(x-3.60.275)22.64.310.2752e-12(x-3.60.275)20.9941So fx=10.2752e-12(x-3.60.275)2is a probability density function

In this case we are using the PDF to interpret the data set of the value for the variable X which is the GPA of students. We will use the PDF of Interval 0.125 as an example due to its accuracy. The PDF obtained shows the GPA of 3.75 is obtained by a group of 25 students. To have this GPA, students in this group averaged 10 hours of study to have this probability. In contrast to this, one may believe that excessive hours of study will result in a higher GPA. But the PDF proves this to be wrong. For example, a GPA of 3.8 shown above, had spent 69 hours to study.

The probability of GPA more than 4

PX4=normalCD4, 1099, 0.275, 3,60.072compare

Assumption and limitation of both methods

We know all probabilities must not be zero, because we know that the total probability must add up to one. In fact, we know that, somehow, there must be something special for the probability of numbers 0x1. We know that X is somewhere in that interval with probability one, and the probability that X is outside that interval is zero.

The type of distribution depends on the condition surrounding the variable common distributions type are normal, triangle, uniform and lognormal. According to the graph has been drawn (figure 2,4). In this investigation, variable X is a slightly normal distribution.

The first method can be applied for any distribution function. However, its limitation is that it simply takes more time to produce. As seen from above, you need to find the logistic function before being able to find the probability density function.

For the second methods by applying the theoretical function to find the probability density function is faster but only apply under the condition when the X variable is an observation from a normal distribution.

Part B:

Table 2 variable-study hours

Histogram Probability density function Interval 1 594995-27178000 561340-21399500 Interval 4 604520-14795500

692150-16891000 450850-20002500

Interval 10 464820-30607000 607695-18669000 -825510668000

The probability density function:

Interval 1

Fx=0.9671+13.703e-0.187xfx=2.478e-0.187x(1+13.703e-0.187x)2Interval 4

Fx=1.1631+5.85e-0.114xfx=0.777e-0.114x(1+5.85e-0.114x)2Interval 10

Fx=0.9931+1.484e-0.125xfx=0.1.84e-0.125(1+1.484e-0.125x)2Let M is a variable of study hours

Apply the probability density function:

fx=114.12e-12(x-1814.1)20.87 1So x=114.12e-12(x-1814.1)2 is not a probability density functionTable 3: The linear correlation between variable X and M

552450-54102000 551815-45339000 636905-41846500

Linear equation between X and M is obtained as shown in the table 3

M=aX+b=0.0014X+3.581X=M-ba=M-3.5810.0014fM-bais a geometric tranformation of function f(X)The probability density function of variable X

fx=10.2752e-12(x-3.60.275)2 with a domain (2.6<X<4.3)

The domain of variable M:

0.0014X+3.581M0.0014X+3.5813.585M3.587

  • Uploaded By : Pooja Dhaka
  • Posted on : December 23rd, 2024
  • Downloads : 0
  • Views : 172

Download Solution Now

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more