distributive data processing framework -Apache Spark Report writing
- Country :
Australia
Tasks:
- Using Spark, write a program to count the number of words in “book.txt”.
Example:
Input: “A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system.”
Output: system: 2, distributed: 1, …..
- Using Spark, write a program to count how many times each letter appeared in the “book.txt.”
- Using Spark, write a program to replace the words to lowercase letters and write it the file “words_lower.txt.”
- Using Spark, write a program to replace spaces with “-” in the “book.txt” and write it to “words-.txt”.
- Using Spark, compute the sum of the numbers given in “numbers.txt” in the numbers.zip file.
- Additionally, you are given files Numbers2.txt, Numbers4.txt, Numbers8.txt, Numbers16.txt, and Numbers32.txt.
- Compute the sum of the numbers in the individual files and plot a bar-graph. On the x-axis plot the size of the file and on y-axis plot the time taken by the Spark to compute the result.
Report
Write a 1-page report on Spark and mention its main features & use cases. For instance, what kind of data can be processed in it. What are RDDs?