diff_months: 10

Webscraping, Feature Engineering, Dataset Creation Progress Assignment

Download Solution Now
Added on: 2023-07-12 05:08:53
Order Code: clt317585
Question Task Id: 0
  • Country :

    Australia

DATASETS:

trancolist (Legitimate URLs) and phishtank (Phishing URLs) datasets were used.

LIBRARIES INSTALLED (only till review 2):

  • beautifulsoup4
  • Matplotlib
  • Pandas
  • Requests
  • urllib3

CONCEPTS (only till review 2) – WEBSCRAPING, FEATURE ENGINEERING, DATASET CREATION PROGRESS TILL REVIEW-2:

binary-1689138212.jpg

  • Using BeautifulSoup module, soup object was created and a vector was created by calling all the functions for the soup object. Feature Extraction was defined and the data was saved in a directory.
  • The .csv datasets were converted to dataframe using pandas library. URLs were retrieved from both datasets and the above-mentioned features were extracted. We extracted data from only 1000 URLs from both datasets for now and created structured data.
  • The new dataframes with the values of these features(structured data), were created for both Legitimate and Phishing datasets.
  • These dataframes were converted to .csv files our new datasets upon which we will work using various Machine Learning algorithms.

RESULTS:

Successfully created datasets for Legitimate and Phishing websites by webscraping, feature engineering, dataframes, datasets concepts.

Are you struggling to keep up with the demands of your academic journey? Don't worry, we've got your back! Exam Question Bank is your trusted partner in achieving academic excellence for all kind of technical and non-technical subjects.

Our comprehensive range of academic services is designed to cater to students at every level. Whether you're a high school student, a college undergraduate, or pursuing advanced studies, we have the expertise and resources to support you.

To connect with expert and ask your query click here Exam Question Bank

  • Uploaded By : Katthy Wills
  • Posted on : July 12th, 2023
  • Downloads : 0
  • Views : 194

Download Solution Now

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan

Premium

80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing
Most
Popular

Gold

30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%

Silver

20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more