Implement your own Girvan-Newman algorithm using the Spark Framework
1. Overview
To explore the spark GraphFrames library as well as implement your own Girvan-Newman algorithm using the Spark Framework to detect communities in graphs. You will use the
Network Data Repository for this. The goal is to practice how to use the Girvan-Newman algorithm to detect communities in an efficient way within a distributed environment.
2. Requirements
2.1 Programming Requirements
a. You must use Python to implement al tasks.
b. You can use the Spark DataFrame and GraphFrames library for task1, but for task2 you can ONLY use Spark ROD and standard Python libraries.
2.2 Programming Environment Python 3.7+ and Spark 3.3.2
2.3 Write your own code.
2.4 What you need to turn in.
Your submission must be a zip file with the name convention:
stname_lastname_hw1.zip
You should pack the following required (and optional) files in the zip file (see Figure 1):
IRS her
a.1 [REQUIRED] two Python notebooks if used Google Colab, named: (all lowercase)
rstname_lastname_task1.ipynb
firstname_lastname_task2.ipynb
2.2 [REQUIRED two Python scripts if not used Google Colab, named: (all lowercase)
firstname_lastname_task1.py
firstname_lastname_task2.py
[REQUIRED] thy n Lower
firstname_lastname_task1_community_python.txt
firstname_lastname_task2_edge_betweenness_python.txt
firstname_lastname_task2_community_python.txt
i 1 : (al
firstname_lastname_hw1. jar
. [OPTIONAL] You can include other scripts to support your programs, and name it with the
prefix;
firstname_lastname_filename. py