diff_months: 19

Spark Structured API Assignment

Download Solution Now
Added on: 2022-11-30 11:12:29
Order Code: 478859
Question Task Id: 0
  • Country :

    United States America

INSTRUCTIONS

In the lecture on Spark Structured API, we did not specify the schema of our dataset. We relied on the inference of the Spark engine which generally loads data as Strings. We can create a schema by using an object of a class called StructType consisting of an array of StructFields.

The code to load the youtube dataset used in the lectures with a schema has been provided as a guide.

Note that dates in Spark are only recognized if they have a special format (we can load them if we specify the schema and date format but this is a bit beyond this course). For simplicity, you can treat dates as strings in this assignment. Once you have loaded the stock datasets with the correct schema in Spark, answer the following questions. For any questions that require you to execute a command take a screenshot(s) of the command and the output.

Load the large stocks dataset (400 MB) into HDFS and use the dataset to create a scala DataFrame with the correct schema specified.

  1. Write a command to find the stocks with an average daily volume larger than 1 million shares
  2. Write a query to find the top 3 stocks by volume for the year 2004.
  3. Write a query to find the top 3 stocks by volume and whose symbols start with the first letter of your name (for example for Saber, it is symbols starting with S). If there is no stocks with the letter you specify, choose another letter.
  4. Write a query to find all the stock symbols whose closing price is larger than your age.
  5. Write a query to find the top 10 stocks with the largest intraday price change (difference between high and low prices during a trading day) and also display the amount of the change.

Deliverables:

Answer all questions in a well-laid-out single PDF or Word document (don’t just submit a bunch of screenshots). For any query commands, make sure to include the screenshot(s) of the command being executed and the corresponding output. Do not submit multiple files. Do not submit any compressed files (for example zip or rar).

 

This IT Computer Science  Assignment has been solved by our IT Computer Science Expert at Exam Question Bank. Our Assignment Writing Experts efficiently provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well-trained to follow all marking rubrics & referencing Styles. Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turn tin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.

  • Uploaded By : Katthy Wills
  • Posted on : November 30th, 2022
  • Downloads : 0
  • Views : 163

Download Solution Now

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more