diff_months: 5

Exploratory Data Analysis DATA2001

Flat 50% Off Order New Solution
Added on: 2025-05-09 09:58:09
Order Code: LD526371
Question Task Id: 0
  • Subject Code :

    DATA2001

ExploratoryDataAnalysis(EDA)process

EDA1isaboutbuildinganunderstandingaboutyourdata.Itinvolvesasystematic process that applies all the theory and concepts you have learnt in this module. AccordingtoPengandMatsui(2017)2thegoalsofEDAare:

  • Todetermineifthereareanyproblemswiththedata(e.g.missingvaluesor duplicates).
  • Todetermineifthequestionyouareaskingcanbeansweredwiththedata you have.
  • Todevelopasketchoftheanswertoyour

TheEDAprocess3coversthefollowingsteps:

  1. Identify the question(s) to be addressed/explored: this involves asking questions suchaswhyyouaredoingEDA,whatquestionsyouarehopingtoanswer,who willbeusingyouranswers,forwhatpurpose,andhow?Alwaysstartwitha question or question(s); do not expect that the insights from the data will emergewithoutquestionsguidingyour
  2. Learnaboutyourdataset(i.e.variables/features,types,instances/samples, etc.) before you even start applying any data analysis methods. Without knowing what the variables and instances in your dataset represent you cannotexpecttoreachanymeaningfulanswertothequestion(s)drivingthe dataSometimesitisevennecessarytounderstandtheprocessby whichthedatawascreated(seethefootnote3below).Inthisstepyouought to investigate if there are any problems with the dataset that may have negative consequences for the analysis4.An in-depth knowledge of your datasetwillhelpyouidentifyaneedtoreformatsomeofthevariables/features (e.g.fromnumericaltocategorical)and/orderivenewvariables/featuresby combingtheexistingvariables/features.
  3. CheckformissingvaluesandArethereanymissingvaluesand/or duplicates,whatcouldexplainthose?Decidewhatyouaregoingtodoabout missingvaluesand/orduplicates.
  4. Generate univariate and bivariate descriptive statistics for your variables.Investigatethosestatistics(e.g.howthevariablesaredistributedandtheir shape, i.e.skewness, modality, kurtosis) beforeyou moveto the next step.The findingsfromthisinvestigation shouldinformthenexttwo
  5. Create visualizations of your data(univariate and bivariate) to get visual representationofhowthedataofallyourvariablesareUsefindings

1JohnW.TukeycalledEDAadetectiveworkandgraphicaldetectivework.

2https://bookdown.org/rdpeng/artofdatascience/

3TheEDAprocessoutlinedhereassumesthatthedatarequiredtoaddressthequestionhavebeen identified,mined,collected,prepared,cleansed,combined,andformattedsoitisinarectangularform (whereeachcolumnisavariable/feature,andeachrowisacompleteanduniqueinstance)readyfor analysis(seeTidyData.pdf).However,often,preparingdataforanalysisisthemosttimeandresource consumingstepintheentiredataanalysisprocess.Tolearnmore,checkfollowing:link1andlink2.

4 For example, the dataset may violate some or all principles of Tidy Data (see Tidy Data.pdf).fromtheprevioussteptoenrichyouranalysis.Thefindingsofthisstepshould helpyougeneratehypothesestobetestedinthenextstep.

  1. Formulate and test hypothesesg. about all potentially informative associations between variables identified in the previous two steps or any other hypothesis you may want to test (e.g. statistical significance of correlations and associations between different variables).
  2. Discuss, compare, and synthesise your findings into an answerin the context of domainknowledge,practice,reason,andThisstepaimstomakesense ofandsynthesise(blendorcombine)allthefindingsfromyouranalysisintoan answer(s)totheanalysisquestion(s)identifiedinStep1.
  3. Sharetheanswer(s)toyouranalysisquestion(s) byusingtheformatagreedat theMakesureyouranswer(s)addressesallthequestionsraisedbyyour target audience (see Step 1).
  • Uploaded By : Nivesh
  • Posted on : May 09th, 2025
  • Downloads : 0
  • Views : 147

Order New Solution

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more