distributive data processing framework -Apache Spark Report writing
- Country : Australia 
Tasks:
- Using Spark, write a program to count the number of words in book.txt.
Example:
Input:A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system.
Output:system: 2, distributed: 1, ..
- Using Spark, write a program to count how many times each letter appeared in the book.txt.
- Using Spark, write a program to replace the words to lowercase letters and write it the file words_lower.txt.
- Using Spark, write a program to replace spaces with - in the book.txt and write it to words-.txt.
- Using Spark, compute the sum of the numbers given in numbers.txt in the numbers.zip file.
- Additionally, you are given files Numbers2.txt, Numbers4.txt, Numbers8.txt, Numbers16.txt, and Numbers32.txt.
- Compute the sum of the numbers in the individual files and plot a bar-graph. On the x-axis plot the size of the file and on y-axis plot the time taken by the Spark to compute the result.
Report
Write a 1-page report on Spark and mention its main features & use cases. For instance, what kind of data can be processed in it. What are RDDs?
 
								