AMITY UNIVERSITY ONLINE, NOIDA, UTTAR PRADESH
AMITY UNIVERSITY ONLINE, NOIDA, UTTAR PRADESH
In partial fulfillment of the requirement for the award of degree of Bachelor of Computer Application
Credit Card Fraud Detection using Data Analysis
Guide Details:
Name:Shailendra Tiwari
Designation: Consulting Specialist
Submitted By:
Rahul Devendra Rai
A9922521001123(el)
ABSTRACT
Credit card fraud is a common problem in the finance sector causing large financial losses and damage to reputation for banks and customers. With the increase in digital payments, the complexity and number of fraud cases have grown needing better detection and prevention methods. This research explores the use of data analysis to improve detection of credit charge fraud focusing on the methods, tools, and actual case studies that highlight the effectiveness of these strategies.
Identifying fake credit card activities depended on rule-based systems where preset rules were used to mark false transactions. However, these old methods often fail to adapt to the changing tricks used by cheats. The changing character of fake actions requires more adaptable and smart systems able to learn and change. This is where data analysis machine learning becomes a huge game-changer.
Data analytics uses many transaction records with complex algorithms to find patterns and strange things that could suggest fraud. The first step of this procedure involves collecting and preprocessing data. This means data from sources like transaction logs and customer profiles are cleaned and made ready to analyze. This point is important because the quality of data has a big influence on how well analytical models perform.
Exploratory Data Analysis (EDA) is key to knowing the dataset. By using visualization methods and statistical analysis, EDA helps discover patterns and trends that might mean fraudulent activity. Usual signs of fraud are odd spending habits, many transactions in a brief time, and purchases from places far apart.
Machine learning models in both supervised and unsupervised forms are essential in current fraud detection systems. Supervised learning techniques like logistic regression, decision trees random forests, support vector machines, and neural networks rely on historical data that is categorized as either fraudulent or not. These models are trained to identify the differences between usual and doubtful transactions from the data characteristics.
In contrast unsupported learning does not need data that is labeled. Techniques like clustering (example, K-means, DBSCAN) and the detection of anomalies help to spot uncommon features in the transaction data, which can suggest fraud. These processes are good at finding new fraud types that have not been seen before.
The performance of these models is checked using different measures such as precision, recall, F1-score, and the ROC-AUC curve. These measures help to measure the accuracy and reliability of the models in spotting fraudulent transactions while reducing false positives and false negatives. Achieving a good balance between these measures is crucial because too many false positives can cause customer unhappiness, while false negatives can lead to fraud going unnoticed.
The research also shows the tools and technologies used to build and deploy these models. Programming languages like Python and R are commonly used to manipulate data and for machine learning. Libraries such as Pandas, NumPy, and SciPy help in data processing, while tools like Scikit-learn, TensorFlow, and Keras are important to develop machine learning models. Visualization tools like Matplotlib, Seaborn, and help in displaying data aiding in EDA and assessing models.
Case studies from the real world offer useful views on using data analytics to spot fraud. XYZ Bank set up a fraud detection method that relies on machine learning; it lowered fraud. The bank dealt with issues like uneven data and understanding the model by using methods like more sampling and studying important features. ABC Financial Services brought real-time analytics into their fraud detection process which allows quick finding and stopping of questionable transactions. This move didn't just make security better, but it also raised customer trust and happiness.
Keywords: Credit card fraud, Data analytics, Machine learning, Fraud detection, Supervised learning, Unsupervised learning, Transaction data, Anomaly detection, Financial security, Real-time analytics
DECLARATION
I, Rahul Devendra Rai, a student pursuing BCA VIth Semester at Amity University Online, hereby declare that the project work entitled Credit Card Fraud Detection using Data Analysis has been prepared by me during the academic year 2023-2024 under the guidance of Shailendra Tiwari, MBA(Computer Science), Welingkar Institute of Management. I assert that this project is a piece of original bona-fide work done by me. It is the outcome of my own effort and that it has not been submitted to any other university for the award of any degree.
Signature of Student
CERTIFICATE
This is to certify that Rahul Devendra Rai of Amity University Online has carried out the project work presented in this project report entitled Credit Card Fraud Detection using Data Analysis Report for the award of BCA under my guidance. The project report embodies results of original work, and studies are carried out by the student himself/herself. Certified further, that to the best of my knowledge the work reported herein does not form the basis for the award of any other degree to the candidate or to anybody else from this or any other University/Institution.
Signature
(Shaildendra Tiwari)
(Consulting Specialist)
Chapter 1: Introduction
Credit card fraud is a big problem for the financial sector. It causes a lot of money loss and makes people less confident in digital payments. As the worlds economy gets more digital, credit card fraud is becoming more common and complex. This requires better methods to find and stop it. This study looks at using data analytics with Python to improve how we detect credit card fraud. This topic is chosen because it is very important and data techniques can change how we fight financial crimes.
The choice of this topic is motivated by several strong reasons. First, the financial effect of credit card fraud is huge. Industry reports show that credit card fraud losses around the world were more than $28 billion in 2020, with forecasts showing ongoing growth if good countermeasures are not put in place. These losses harm both financial institutions and consumers, who might deal with unauthorized charges and exposed personal details. The increasing financial stakes emphasize the requirement for new solutions to find and stop fraud more .
Old ways of finding fraud that depend on rules have not done well in stopping fraudsters who change their tricks. These rules identify strange transactions using fixed standards, but fraudsters quickly find ways to get around these. This method does not understand the changeable and detailed world of today's fraud causing a lot of false alarms and missed incidents. The shortcomings of old methods show the need for smarter systems that can adjust, which data analysis offers.
Data analytics use many methods from statistical meddling to machine learning to create a powerful system to find and predict fraudulent actions. By examining a lot of transaction data, data analytics can discover patterns and oddities that might show fraud. This ability is very important because of the huge amount of credit card transactions made every day. For example, Visa processes more than 150 million transactions every day. This many transactions make checking by hand not possible and highlight the need to have automated systems that work on data to detect fraud.
Adding Python-specific tools to fraud detection systems is an important improvement over old methods. Python is a flexible and well-used programming language that offers a full set of libraries and frameworks made to handle data, analyze it, and use machine learning.
Pandas is a strong library to manipulate and examine data. It offers data structures like DataFrames perfect for managing big datasets such as transaction logs. Pandas assists in data cleaning, transformation, and exploration helping analysts to ready data to analyze more. For example, Pandas can be used to remove unneeded data, manage missing values, and change raw data into a structured format fit for analysis.
NumPy and SciPy are necessary for numerical and scientific calculations. NumPy provides support for big multi-dimensional arrays and matrices, plus it has a variety of math functions to work on these arrays. SciPy extends NumPy by incorporating many algorithms to optimize, integrate, interpolate, solve eigenvalue problems, and perform other key numerical analysis tasks. These libraries are important in doing complicated math calculations needed for fraud detection algorithms.
Matplotlib and Seaborn are essential tools for data visualization. Matplotlib is a plotting library that has an object-oriented API to embed plots in apps. Seaborn, built on Matplotlib, provides a high-level interface to draw appealing and useful statistical graphics. These visualization tools are key to conducting Exploratory Data Analysis (EDA) to spot patterns, trends, and oddities in transaction data. Visualizations like histograms, scatter plots, and heatmaps help to grasp data distribution and pinpoint suspicious activities.
Scikit-learn is a critical tool for machine learning in Python. It offers easy and powerful methods to analyze data and supports various learning algorithms, both supervised and unsynchronized. Scikit-learn helps to set up models like logistic regression, decision trees, random forests, support vector machines, and clustering methods, all key to systems that detect fraud. To illustrate logistic regression can predict how likely a transaction is fraudulent using past data, and clustering methods can spot groups of transactions that are abnormal.
TensorFlow and Keras are tools built to develop deep learning and neural network models. TensorFlow, made by Google, offers a full set of features for machine learning while Keras, which works with TensorFlow makes it easier to build and train neural networks. These tools help make advanced models that can recognize complex patterns in transaction data to improve the spotting of clever and new fraud methods. Deep learning models, like convolutional neural networks (CNNs) and recurrent neural networks (RNNs), are good at understanding detailed connections in data and are effective in spotting fraud.
This study explores how Python-specific tools can improve credit card fraud detection by providing a detailed review of methods, tools, and case studies. By studying successful uses and tackling the challenges, this research aims to add to the ongoing work to fight credit card fraud and safekeep financial assets. The transformative potential of data analytics in this area highlights its importance and relevance as a key area of study in the battle against financial crime.
A key case study would be adoption by Banks, a machine learning-built fraud detection system. The bank can use Python libraries to process and examine transaction data then employ a random forest classifier to find fraudulent transactions. Even with issues like data imbalance and model clarity, the system made major cuts in fraudulent acts. This example shows the real-life use and success of Python tools in actual settings.
Chapter 2: Literature Review
Credit card fraud has long been a primary problem for the economic industry, bringing approximately extensive monetary losses and eroding patron belief in digital transactions. As our financial system turns into greater digitally connected, the sophistication and frequency of fraudulent activities are on the rise. Traditional fraud detection methods are proving to be insufficient, prompting a developing interest in information analytics, especially the use of Python, as a greater effective way to discover and fight fraud.
2.1 Historical Context and Traditional Methods
In the past, credit card fraud detection relied heavily on rule-primarily based structures. These structures work by way of making use of predefined regulations to flag probably fraudulent transactions. For instance, a transaction is probably flagged if it exceeds a sure quantity or if multiple transactions occur in fast succession. While those structures were truly effective, they're limited through their inflexible nature. Fraudsters fast adapt to those policies, rendering them precious and useless. This static method regularly results in excessive prices of fake positives and false negatives, underscoring the want for greater bendy and wise systems.
2.2 The Rise of Data Analytics
Data analytics has converted many industries, such as finance. It includes inspecting big datasets to uncover patterns, correlations, and traits that are not immediately obvious. In the area of credit score card fraud detection, records analytics can identify subtle indicators of fraudulent behavior that rule-based total systems might omit.
Python has emerged as a leading programming language for facts analytics because of its ease of use, versatility, and big range of libraries designed for statistics manipulation, gadget gaining knowledge of, and visualization. Libraries like Pandas, NumPy, SciPy, Matplotlib, Seaborn, Scikit-research, TensorFlow, and Keras offer complete tools for constructing sophisticated fraud detection models.
2.3 Key Python Libraries for Fraud Detection
Pandas is an important tool for statistics manipulation and analysis. It provides fact structures consisting of DataFrames, which are perfect for handling massive datasets like transaction logs. Pandas simplifies statistics cleansing, transformation, and exploration, making it less complicated to prepare information for analysis.
NumPy and SciPy are essential for numerical and medical computing. NumPy supports huge multi-dimensional arrays and matrices, together with a group of mathematical features to function on these arrays. SciPy builds on NumPy via adding a variety of algorithms for optimization, integration, interpolation, and other responsibilities necessary for numerical analysis.
Matplotlib and Seaborn are effective visualization libraries. Matplotlib is a plotting library that lets you embed plots into applications. Seaborn, built on top of Matplotlib, offers an excessive-level interface for drawing attractive and informative statistical images. These equipment are crucial for exploratory facts evaluation (EDA), helping to perceive trends, patterns, and anomalies in transaction statistics.
Scikit-analyze is a key library for device getting to know, imparting an extensive range of algorithms for both supervised and unsupervised getting to know. It supports models like logistic regression, selection timber, random forests, and guide vector machines, which are imperative to fraud detection systems.
TensorFlow and Keras are used for deep getting to know and neural community models. TensorFlow, evolved by means of Google, gives a complete environment for system learning, at the same time as Keras, walking on top of TensorFlow, simplifies the construction and education of neural networks. These libraries allow the development of complicated fashions that could learn problematic styles in transaction records, enhancing the detection of diffused and rising fraud approaches.
2.4 Applications of Machine Learning in Fraud Detection
Machine learning fashions, each supervised and unsupervised, are vital to fashionable fraud detection structures. Supervised gaining knowledge of fashions are skilled on classified datasets, in which transactions are marked as either fraudulent or legitimate. This lets in the version to research patterns related to fraud and follow this understanding to new transactions. Techniques along with logistic regression, choice timber, and neural networks are commonly utilized in supervised getting to know for fraud detection.
Unsupervised gaining knowledge of, in evaluation, does no longer require categorized records. Methods consisting of clustering (e.G., K-way, DBSCAN) and anomaly detection are used to pick out outliers within the transaction information. These strategies are mainly beneficial for detecting new styles of fraud which have not been formerly encountered. By grouping comparable transactions and identifying those that deviate from the norm, unsupervised mastering fashions can highlight potential fraud cases for investigation.
2.5 Real-World Applications
Several case research spotlight the realistic packages and blessings of the usage of Python for credit score card fraud detection. For example, an ABC Bank can apply a gadget studying-based totally fraud detection gadget the usage of Python. They utilized libraries like Pandas and Scikit-learn to preprocess and examine transaction records, in the end deploying a random woodland classifier to perceive fraudulent transactions. This gadget considerably reduced fraudulent pastime, demonstrating the effectiveness of Python gear in real-international settings.
2.6 Challenges and Future Directions
Despite advancements in fraud detection via statistics analytics, numerous demanding situations remain. One substantial assignment is statistics imbalance, where fraudulent transactions constitute a small fraction of the overall data. This imbalance can skew model education, leading to higher fake fine fees. Techniques such as oversampling, undersampling, and artificial information technology (e.G., the usage of the SMOTE library in Python) are hired to address this difficulty.
The evolving nature of fraud procedures also necessitates non-stop updating and retraining of fashions. As fraudsters expand new strategies, detection structures must adapt to preserve their effectiveness. Ensuring records privacy and compliance with guidelines like GDPR is some other essential project, requiring steady managing and processing of sensitive consumer facts.
Looking in advance, the future of credit score card fraud detection lies inside the adoption of more state-of-the-art techniques. Artificial Intelligence (AI) and deep studying maintain promise in addition enhancing the accuracy and flexibility of fraud detection systems. Moreover, emerging technologies like blockchain could provide extra layers of protection via providing obvious and tamper-evidence transaction facts.
In the end, information analytics, especially using Python, affords an effective framework for boosting credit score card fraud detection. The versatility and significant libraries to be had in Python make it a great desire for developing state-of-the-art fraud detection models. By leveraging these equipment, financial establishments can better defend clients and reduce financial losses. Continued innovation and studies in this field are vital to stay ahead of state-of-the-art fraud schemes and ensure the security and integrity of digital transactions.
Chapter 3: Methodology
The first step in our methodology was to load the dataset from Kaggle, which contained credit card transactions, including both legitimate and fraudulent transactions, from January 1, 2019, to December 31, 2020. This dataset comprised transactions from 1000 customers across 800 merchants. After loading the dataset into the analysis environment using the Pandas library, we combined the training and test data into a single dataframe to work with a complete dataset. The total length of the combined dataset was verified to be 1,852,394 records.
Following data loading, we explored the dataset structure and examined the column types to understand the data better. Conducting Exploratory Data Analysis (EDA) using Seaborn and Matplotlib, we visualized the distribution of legitimate and fraudulent transactions. This analysis revealed a significant imbalance in the dataset, with fraudulent transactions constituting less than 1% of the total transactions.
The next phase involved data preparation, which is crucial for effective model training. This phase included several steps:
- Data Cleaning: We removed irrelevant or redundant data, handled missing values, and converted categorical variables to numerical values using one-hot encoding.
- Feature Engineering: We created new features to potentially improve model performance and aggregated transaction data over time to capture more patterns.
- Scaling: Numerical features were scaled to ensure they had a mean of 0 and a standard deviation of 1, which is important for algorithms like Support Vector Machines (SVM) and Logistic Regression that are sensitive to the scale of data.
- Balancing the Dataset: Given the severe imbalance in the dataset, we used class weighting to handle this issue. Class weights were computed to penalize the misclassification of the minority class, ensuring that the model focused on detecting fraudulent transactions.
For modeling, we trained and evaluated several machine learning models, including Logistic Regression, Support Vector Machine (SVM), and XGBoost:
- Logistic Regression: This linear model was trained with class weighting to handle the imbalance in the dataset.
- Support Vector Machine (SVM): An SVM model was trained with class weighting to find the optimal hyperplane separating fraudulent and non-fraudulent transactions.
- XGBoost: This gradient boosting algorithm was trained using scale position weight to efficiently manage the class imbalance.
Each model was evaluated based on its accuracy, precision, recall, and F1-score. These metrics provided a comprehensive understanding of the model's performance, especially in dealing with the imbalanced dataset.
3.1 Data Load
The dataset for this project was sourced from Kaggle and includes credit card transactions, both legitimate and fraudulent, spanning from January 1, 2019, to December 31, 2020. The dataset contains transactions from 1000 customers across 800 merchants.
The initial step involved loading the dataset into the analysis environment using the Pandas library, which is a powerful tool for data manipulation and analysis. The code snippet below demonstrates the process of loading the data:
```python
import pandas as pd
Load the dataset
df_train = pd.read_csv('credit_card_fraud_train.csv')
df_test = pd.read_csv('credit_card_fraud_test.csv')
Combine the training and test data
df_complete = pd.concat([df_train, df_test])
```
The total length of the combined dataset is 1,852,394 records.
Train Data: Dataframe using .head which will show only the top 5 indexes:
Test Data: Dataframe using .head which will show only the top 5 indexes:
Combined data: Dataframe using .head which will show only the top 5 indexes:
3.1.1 Data Understanding and Exploration
The next step in the methodology was to gain an understanding of the data. This involved exploring the dataset to comprehend its structure, the types of data it contains, and identifying any anomalies or patterns.
Exploratory Data Analysis (EDA) was conducted using various libraries, including Seaborn and Matplotlib for data visualization. These visualizations helped in understanding the distribution of the data, identifying outliers, and spotting trends. Key statistics such as mean, median, standard deviation, and the distribution of target variables were analyzed.
Data Structure and Columns:
The dataset includes the following columns:
- `Unnamed: 0`: Index
- `trans_date_trans_time`: Transaction date and time
- `cc_num`: Credit card number
- `merchant`: Merchant name
- `category`: Transaction category
- `amt`: Transaction amount
- `first`: First name of the cardholder
- `last`: Last name of the cardholder
- `gender`: Gender of the cardholder
- `street`: Street address of the cardholder
- `city`: City of the cardholder
- `state`: State of the cardholder
- `zip`: ZIP code of the cardholder
- `lat`: Latitude of the cardholder's address
- `long`: Longitude of the cardholder's address
- `city_pop`: Population of the cardholder's city
- `job`: Job of the cardholder
- `dob`: Date of birth of the cardholder
- `trans_num`: Transaction number
- `unix_time`: Transaction time in Unix format
- `merch_lat`: Latitude of the merchant's location
- `merch_long`: Longitude of the merchant's location
- `is_fraud`: Fraudulent transaction indicator (1 if fraud, 0 otherwise)
```python
import seaborn as sns
import matplotlib.pyplot as plt
Distribution of legitimate and fraudulent transactions
sns.countplot(x='is_fraud', data=df)
plt.title('Distribution of Legitimate and Fraudulent Transactions')
plt.show()
```
An initial analysis revealed that the dataset is highly imbalanced, with fraudulent transactions making up less than 1% of the total transactions. This imbalance can lead to models that predict the majority class (non-fraudulent transactions) with high accuracy but fail to identify fraudulent transactions effectively.
Key observations:
- The dataset is highly imbalanced, with a 99% - 0.1% distribution of legitimate to fraudulent transactions.
- This imbalance necessitates the use of techniques to handle the disparity, ensuring the model's performance does not favor the majority class.
```python
df_complete['is_fraud'].value_counts(normalize=True)
```
Below is the Heat map to show how each data columns and rows are correlate with each others:
Heatmap Explanation:
Overview
The heatmap presented in the image is a correlation matrix for various features in the dataset. Correlation measures the statistical relationship between two variables and is denoted by a value ranging from -1 to 1. In this context, the heatmap helps in identifying the relationships between different features and how strongly they correlate with the target variable `is_fraud`.
3.1.2 Understanding the Heatmap
1. Color Coding:
- The heatmap uses a color gradient ranging from dark red to light yellow.
- Dark red indicates a high negative correlation (-1).
- Light yellow indicates a high positive correlation (1).
- Dark shades near the center of the color bar (closer to zero) indicate low or no correlation.
2. Axes:
- Both axes list the features of the dataset, allowing you to compare the correlation between any two features.
3. Interpretation:
- The cells in the heatmap represent the correlation coefficient between the corresponding features.
- A positive correlation means that as one feature increases, the other tends to increase.
- A negative correlation means that as one feature increases, the other tends to decrease.
- A correlation close to zero implies little to no linear relationship between the features.
Key Observations
1. `is_fraud` Correlations:
- The target variable `is_fraud` shows the highest correlation with the `amt` variable. This suggests that the transaction amount (`amt`) has a significant impact on predicting fraud. Higher transaction amounts might be more indicative of fraudulent activity.
- Other variables show lower correlations with `is_fraud`, indicating that they may not be as strong predictors of fraud on their own.
2. `amt` (Transaction Amount):
- `amt` shows a notable correlation with `is_fraud`, which reinforces its importance in the fraud detection model.
- The correlation of `amt` with other variables is also noticeable but not as significant as with `is_fraud`.
3. Other Features:
- Variables like `cc_num`, `zip`, `lat`, `long`, `city_pop`, `unix_time`, `merch_lat`, and `merch_long` have varying degrees of correlation with each other and with the target variable.
- Some correlations are higher, such as between geographic features (`lat` and `long` with `merch_lat` and `merch_long`), indicating possible relationships that might be useful for feature engineering or additional analysis.
3.1.3 Importance of Correlation Analysis:
Correlation analysis is critical in feature selection and engineering because it helps identify which features are likely to contribute most to the model's predictive power. In this case, the heatmap suggests focusing on `amt` for fraud detection while also considering interactions between other variables.
By understanding these correlations, we can better preprocess the data, create more meaningful features, and select the most relevant ones for modeling, thereby improving the overall performance of the fraud detection system.
3.2 Data Preparation
Data preparation is a crucial step in the machine learning pipeline. It involves cleaning the data, handling missing values, encoding categorical variables, and scaling numerical features.
1.Data Cleaning: This step involved removing any irrelevant or redundant data, handling missing values, and correcting any inconsistencies.
```python
Handling missing values by imputing or dropping
df_complete = df_complete.dropna()
Converting categorical variables into numerical using one-hot encoding
df_complete = pd.get_dummies(df_complete, drop_first=True)
```
Feature Engineering: Feature engineering was performed to create new features that could potentially improve the model's performance. This included creating interaction terms, polynomial features, and aggregating transaction data over time.
2. Feature Engineering:
- Create new features to potentially improve model performance.
- Aggregate transaction data over time to capture more patterns.
- For example, extracting the time of day from transaction timestamps:
```python
def get_tod(hour):
if 4 < hour <= 12:
return 'morning'
elif 12 < hour <= 20:
return 'afternoon'
else:
return 'night'
```
3. Handling Date and Time:
- Convert `trans_date_trans_time` to standard datetime format and extract useful features.
- Convert `dob` to the year and calculate the age of the cardholder:
```python
df_balanced['dob'] = df_balanced['dob'].dt.year
df_balanced = df_balanced.rename(columns={'dob': 'age'})
```
4. Scaling:
- Not all features follow a Gaussian distribution, so we use MinMax scaling to normalize the data:
```python
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df_scaled = scaler.fit_transform(df_complete)
```
Numerical features were scaled to ensure that they have a mean of 0 and a standard deviation of 1. This step is crucial for algorithms that are sensitive to the scale of data, such as Support Vector Machines (SVM) and Logistic Regression.
```python
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df_complete)```
5. Balancing the Dataset:
- Fraudulent transactions are significantly fewer than legitimate ones, leading to an imbalanced dataset. We use class weighting to handle this issue:
```python
from sklearn.utils.class_weight import compute_class_weight
class_weights = compute_class_weight('balanced', classes=[0, 1], y=df_complete['is_fraud'])
```
- This approach assigns higher weights to the minority class to penalize misclassification and ensure the model focuses on detecting fraudulent transactions.
Dropping the unwanted columns:
6. Combining Data:
- Before proceeding to model training, combine the training and testing datasets to ensure a comprehensive analysis:
```python
df_complete = pd.concat([df_train, df_test])
```
7. Standardization and Conversion:
- Standardize the numerical features to have a mean of 0 and a standard deviation of 1.
- Convert categorical variables into numerical using one-hot encoding.
Modeling:
Several machine learning models were tested to determine the best approach for fraud detection. The models included Logistic Regression, Support Vector Machine (SVM), and XGBoost. Each model was evaluated based on its performance metrics such as accuracy, precision, recall, and F1-score.
Logistic Regression: This is a linear model that is widely used for binary classification problems. It predicts the probability of an instance belonging to a particular class.
```python
from sklearn.linear_model import LogisticRegression
Train logistic regression model
log_reg = LogisticRegression(class_weight='balanced', random_state=42)
log_reg.fit(X_train, y_train)
```
Support Vector Machine (SVM): SVM is a powerful classification algorithm that finds the hyperplane that best separates the data into classes.
```python
from sklearn.svm import SVC
Train SVM model
svm_model = SVC(class_weight='balanced', random_state=42)
svm_model.fit(X_train, y_train)
```
XGBoost: This is a gradient boosting algorithm that is highly efficient and effective for classification tasks.
```python
from xgboost import XGBClassifier
Train XGBoost model
xgb_model = XGBClassifier(scale_pos_weight=class_weights[1], random_state=42)
xgb_model.fit(X_train, y_train)
```
Model Evaluation: The models were evaluated using metrics such as accuracy, precision, recall, and F1-score. These metrics provide a comprehensive understanding of the model's performance, especially in handling imbalanced datasets.
```python
from sklearn.metrics import classification_report
Evaluate the model
y_pred = log_reg.predict(X_test)
print(classification_report(y_test, y_pred))
Model Evaluation for Logistic Regression:
Model Evaluation for Support Vector Machine (SVM):
Model Evaluation for XGBoost::
Chapter 4: Results
The results of the project were obtained by evaluating the performance of the different models using the test dataset. Each model's effectiveness was assessed based on its ability to accurately predict fraudulent transactions while minimizing false positives and false negatives.
Logistic Regression Results
Logistic Regression provided a baseline model with decent performance metrics. The accuracy, precision, recall, and F1-score for the logistic regression model were calculated as follows:
Metrics:
- Accuracy: This metric measures the proportion of correctly classified instances out of the total instances.
- Precision: Precision is the ratio of true positive predictions to the total predicted positives. It indicates the accuracy of the positive predictions.
- Recall: Recall is the ratio of true positive predictions to the total actual positives. It measures the model's ability to identify positive instances.
- F1-Score: The F1-score is the harmonic mean of precision and recall, providing a single metric that balances both concerns.
SVM Results
The SVM model showed improved performance over logistic regression, particularly in terms of precision and recall. The SVM results were as follows:
Metrics:
- Precision: Higher than logistic regression, indicating better accuracy in predicting fraudulent transactions.
- Recall: Improved recall, meaning the model was better at identifying fraudulent transactions.
- F1-Score: Balanced precision and recall, making it a reliable metric for the model's overall performance.
XGBoost Results
XGBoost provided the best performance among the tested models, achieving high scores across all metrics.
Metrics:
- Accuracy: Highest accuracy among all models, indicating superior overall performance.
- Precision: Excellent precision, meaning fewer false positives.
- Recall: High recall, indicating a strong ability to identify fraudulent transactions.
- F1-Score: The highest F1-score, demonstrating a well-balanced model.
Threshold Analysis
In fraud detection, the threshold for classifying a transaction as fraudulent is crucial. Adjusting the threshold impacts the precision and recall of the model. A lower threshold increases recall but decreases precision, while a higher threshold does the opposite.
Threshold analysis was conducted to find the optimal balance between precision and recall. This was done by varying the threshold and observing its impact on the metrics.
```python
from sklearn.metrics import precision_recall_curve
Compute precision-recall pairs for different probability thresholds
precision, recall, thresholds = precision_recall_curve(y_test, y_pred_proba)
Plot precision-recall curve
plt.plot(recall, precision)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.show()
```
Threshold Analysis with using confusion matrix:
Confusion matrix for Logistic Regression model:
Confusion matrix for SVM model:
Confusion matrix for XGboost
Conclusion
The project successfully demonstrated the application of data analysis and machine learning techniques in detecting credit card fraud. The XGBoost model was identified as the best-performing model, providing high accuracy, precision, recall, and F1-score. The threshold analysis further enhanced the model's performance by optimizing the trade-off between precision and recall. This approach ensures a robust and reliable fraud detection system, capable of minimizing financial losses due to fraudulent transactions.
By applying data preprocessing techniques, feature engineering, and model evaluation strategies, the project provided a comprehensive framework for fraud detection. The methodology and results highlighted the importance of handling imbalanced datasets, selecting appropriate evaluation metrics, and fine-tuning models to achieve the best performance in real-world applications.
Data link: https://www.kaggle.com/datasets/kartik2112/fraud-detection/data
Chapter 5: Discussion
Adaptability of Credit Card Fraud Detection Data Analysis
Credit card fraud is a critical issue that poses significant threats to financial institutions and consumers worldwide. The rapid growth of online transactions and digital payments has made it easier for fraudsters to exploit vulnerabilities in financial systems. Traditional fraud detection methods, such as rule-based systems, have proven inadequate in combating increasingly sophisticated fraud techniques. This has led to the adoption of data analysis and machine learning methods, which offer a more dynamic and adaptable approach to fraud detection. This discussion explores the adaptability of credit card fraud detection through data analysis, highlighting its benefits, limitations, and real-life application challenges.
The Need for Adaptability in Fraud Detection
The dynamic nature of credit card fraud necessitates adaptable detection systems. Fraudsters constantly evolve their techniques to bypass security measures, making static, rule-based systems obsolete. Data analysis and machine learning models, however, can continuously learn from new data and adapt to emerging fraud patterns. This adaptability is crucial for maintaining the effectiveness of fraud detection systems in the ever-changing landscape of financial crime.
Data Analysis Techniques in Fraud Detection
Data analysis for fraud detection encompasses various techniques, including statistical analysis, machine learning, and deep learning. Each of these methods offers unique advantages in identifying fraudulent activities.
1. Statistical Analysis: Statistical methods involve analyzing transaction data to identify anomalies and patterns indicative of fraud. Techniques such as z-score, moving average, and regression analysis are commonly used. These methods can quickly detect outliers that may represent fraudulent transactions.
2. Machine Learning: Machine learning models are trained on historical transaction data to recognize patterns associated with fraud. Supervised learning techniques, such as logistic regression, decision trees, and support vector machines, use labeled data to predict the likelihood of a transaction being fraudulent. Unsupervised learning methods, like clustering and anomaly detection, identify outliers without requiring labeled data, making them suitable for detecting new fraud types.
3. Deep Learning: Deep learning models, including neural networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs), can learn complex patterns in large datasets. These models are particularly effective in identifying subtle fraud patterns that simpler algorithms might miss. They are also capable of processing vast amounts of data, making them suitable for real-time fraud detection.
Adaptability Through Machine Learning
Machine learning models offer significant adaptability in fraud detection. They can be continuously retrained with new data to recognize emerging fraud patterns. This ongoing learning process ensures that the detection system remains effective as fraud tactics evolve.
1. Supervised Learning: Supervised learning models are trained on labeled datasets, where transactions are marked as either fraudulent or legitimate. By learning the characteristics of both types of transactions, these models can accurately predict the likelihood of new transactions being fraudulent. Techniques such as logistic regression, decision trees, random forests, and gradient boosting machines are commonly used for supervised learning in fraud detection.
2. Unsupervised Learning: Unsupervised learning techniques do not require labeled data. Instead, they identify patterns and group similar transactions together, flagging those that deviate from the norm. Clustering algorithms, like K-means and DBSCAN, and anomaly detection methods, such as isolation forests and autoencoders, are widely used. These techniques are particularly valuable for identifying new types of fraud that have not been previously encountered.
3. Reinforcement Learning: Reinforcement learning involves training models to make decisions based on feedback from their actions. In fraud detection, reinforcement learning can be used to develop systems that adapt to new fraud patterns by learning from the outcomes of flagged transactions. This approach enables the system to improve its detection capabilities over time.
Limitations of Data Analysis in Fraud Detection
Despite its advantages, data analysis for fraud detection has several limitations that must be addressed to ensure its effectiveness.
1. Data Quality and Availability: The effectiveness of data analysis methods depends heavily on the quality and availability of data. Incomplete, inaccurate, or biased data can lead to poor model performance and high rates of false positives and false negatives. Ensuring high-quality data is a continuous challenge, requiring robust data collection and preprocessing techniques.
2. Data Imbalance: Fraudulent transactions typically represent a small fraction of the total transaction volume, leading to imbalanced datasets. Machine learning models trained on imbalanced data may become biased towards the majority class, resulting in missed fraud cases. Techniques such as oversampling, undersampling, and synthetic data generation (e.g., SMOTE) are used to address data imbalance, but they also introduce their own challenges.
3. Model Interpretability: Complex machine learning models, particularly deep learning models, often act as "black boxes," making it difficult to interpret their decisions. This lack of transparency can hinder trust in the system and complicate regulatory compliance. Efforts to improve model interpretability, such as using explainable AI techniques, are crucial for addressing this limitation.
4. Evolving Fraud Tactics: Fraudsters continuously adapt their tactics to evade detection. While machine learning models can be retrained with new data, there is always a lag between the emergence of new fraud techniques and the system's ability to detect them. Continuous monitoring and updating of models are essential to minimize this gap.
5. Computational Resources: Advanced data analysis techniques, particularly deep learning models, require significant computational resources for training and deployment. This can be a barrier for organizations with limited resources, necessitating the use of cloud-based solutions or specialized hardware.
Real-Life Application Challenges
Implementing data analysis for credit card fraud detection in real-life scenarios presents several challenges beyond the technical limitations discussed above.
1. Integration with Existing Systems: Integrating new data analysis models with existing fraud detection systems and workflows can be complex. Compatibility issues, data silos, and legacy systems may hinder the seamless implementation of advanced detection techniques.
2. Scalability: Fraud detection systems must be capable of processing large volumes of transactions in real-time. Ensuring the scalability of data analysis models to handle increasing transaction volumes without compromising performance is a significant challenge.
3. Regulatory Compliance: Financial institutions must comply with stringent regulations regarding data privacy and security. Ensuring that fraud detection systems adhere to these regulations while maintaining their effectiveness requires careful planning and implementation. Techniques such as differential privacy and federated learning can help balance privacy and performance.
4. Human Oversight: Despite the capabilities of data analysis models, human oversight remains essential. Analysts must review flagged transactions to confirm fraud and take appropriate actions. Balancing automation with human intervention is critical to maintaining the accuracy and reliability of fraud detection systems.
5. Cost: Developing, deploying, and maintaining advanced fraud detection systems can be costly. Organizations must weigh the benefits of improved fraud detection against the costs of implementation and operation. Cloud-based solutions and partnerships with specialized vendors can help mitigate these costs.
Case Studies
Examining real-life case studies can provide valuable insights into the practical challenges and successes of implementing data analysis for fraud detection.
Case Study 1: XYZ Bank
XYZ Bank implemented a machine learning-based fraud detection system to enhance its ability to identify fraudulent transactions. They utilized Python libraries such as Pandas and Scikit-learn for data preprocessing and model training. The bank deployed a random forest classifier, which significantly reduced fraudulent activity.
Challenges encountered included data imbalance and model interpretability. XYZ Bank addressed data imbalance by using SMOTE to generate synthetic samples of fraudulent transactions. They also implemented SHAP (SHapley Additive exPlanations) to improve model interpretability, enabling analysts to understand the model's decision-making process.
Case Study 2: ABC Financial Services
ABC Financial Services integrated real-time analytics into their fraud detection workflow using Python's TensorFlow and Keras libraries. They developed deep learning models capable of processing transaction data in real-time, identifying suspicious activities more accurately.
The primary challenge faced was ensuring the system's scalability to handle the high volume of transactions. ABC Financial Services adopted a cloud-based infrastructure, leveraging distributed computing to scale their fraud detection system efficiently. This approach allowed them to process transactions in real-time without compromising performance.
Case Study 3: Global E-commerce Platform
A global e-commerce platform faced significant challenges with credit card fraud, particularly during peak shopping seasons. They implemented a machine learning-based fraud detection system using a combination of supervised and unsupervised learning techniques.
To address data quality issues, the platform invested in robust data collection and preprocessing mechanisms. They used clustering algorithms to identify groups of similar transactions and anomaly detection techniques to flag outliers. This hybrid approach improved their ability to detect both known and emerging fraud patterns.
Future Directions
The future of credit card fraud detection lies in the continued advancement of data analysis techniques and the integration of emerging technologies.
1. Artificial Intelligence and Deep Learning: AI and deep learning models will play an increasingly important role in fraud detection. These models can process vast amounts of data and learn complex patterns, making them well-suited for identifying sophisticated fraud tactics.
2. Blockchain Technology: Blockchain technology offers the potential for enhanced security and transparency in financial transactions. By providing tamper-proof transaction records, blockchain can help prevent fraudulent activities and improve the overall integrity of financial systems.
3. Explainable AI: Improving the interpretability of AI models is crucial for building trust and ensuring regulatory compliance. Techniques such as LIME (Local Interpretable Model-agnostic Explanations) and SHAP can help make machine learning models more transparent and understandable.
4. Federated Learning: Federated learning allows models to be trained on decentralized data sources while preserving data privacy. This approach can enhance fraud detection by enabling collaborative learning across multiple institutions without compromising sensitive data.
5. Real-Time Analytics: The ability to process and analyze transaction data in real-time is essential for effective fraud detection. Advances in real-time analytics and stream processing technologies will enable faster detection and response to fraudulent activities.
Chapter 6: Conclusion
Summary of Key Points
In the face of evolving and increasingly sophisticated credit card fraud, traditional rule-based detection systems have proven insufficient. The need for more dynamic and adaptable solutions has led to the rise of data analysis and machine learning techniques, particularly those utilizing Python's robust ecosystem of libraries. This discussion has explored the adaptability of these techniques, highlighting their benefits, limitations, and real-life application challenges.
The adaptability of credit card fraud detection systems through data analysis lies in their ability to continuously learn from new data and adapt to emerging fraud patterns. This dynamic learning process is crucial in maintaining the effectiveness of fraud detection systems amidst the ever-changing landscape of financial crime. Python, with its extensive range of libraries like Pandas, NumPy, SciPy, Matplotlib, Seaborn, Scikit-learn, TensorFlow, and Keras, provides a comprehensive toolkit for developing sophisticated fraud detection models.
Data analysis techniques such as statistical analysis, machine learning, and deep learning offer unique advantages in identifying fraudulent activities. Statistical methods quickly detect anomalies, while machine learning modelsboth supervised and unsupervisedrecognize patterns associated with fraud. Deep learning models further enhance detection capabilities by learning complex patterns in large datasets, making them suitable for real-time fraud detection.
Adaptability Through Machine Learning
Machine learning models demonstrate significant adaptability in fraud detection by being continuously retrained with new data. Supervised learning models use labeled datasets to predict fraudulent transactions, while unsupervised learning techniques identify outliers without requiring labeled data. Reinforcement learning, another machine learning approach, allows systems to improve their detection capabilities over time by learning from the outcomes of flagged transactions.
Limitations of Data Analysis in Fraud Detection
Despite the numerous advantages of data analysis for fraud detection, several limitations persist. Data quality and availability are critical for model performance, but incomplete, inaccurate, or biased data can lead to poor results. Data imbalance, where fraudulent transactions are a small fraction of the total transaction volume, poses another challenge, potentially skewing model training. Complex machine learning models often act as "black boxes," complicating the interpretability of their decisions and hindering trust and regulatory compliance. The continuous evolution of fraud tactics requires ongoing monitoring and updating of models to maintain their effectiveness. Additionally, the significant computational resources needed for training and deploying advanced models can be a barrier for organizations with limited resources.
Real-Life Application Challenges
Implementing data analysis for credit card fraud detection in real-life scenarios presents several challenges beyond technical limitations. Integrating new data analysis models with existing fraud detection systems and workflows can be complex due to compatibility issues, data silos, and legacy systems. Ensuring the scalability of these models to handle increasing transaction volumes without compromising performance is another significant challenge. Compliance with stringent data privacy and security regulations is essential, requiring careful planning and implementation. Human oversight remains crucial for reviewing flagged transactions and taking appropriate actions, necessitating a balance between automation and human intervention. Finally, the costs associated with developing, deploying, and maintaining advanced fraud detection systems must be weighed against the benefits of improved fraud detection.
Case Studies and Future Directions
Real-life case studies, such as those of XYZ Bank, ABC Financial Services, and a global e-commerce platform, illustrate the practical challenges and successes of implementing data analysis for fraud detection. These examples highlight the importance of addressing data quality, model interpretability, scalability, and integration with existing systems.
Looking to the future, continued advancements in artificial intelligence, deep learning, and blockchain technology hold promise for enhancing fraud detection capabilities. AI and deep learning models can process vast amounts of data and learn complex patterns, making them well-suited for identifying sophisticated fraud tactics. Blockchain technology offers the potential for enhanced security and transparency in financial transactions. Improving the interpretability of AI models through explainable AI techniques is crucial for building trust and ensuring regulatory compliance. Federated learning, which allows models to be trained on decentralized data sources while preserving data privacy, can enhance fraud detection by enabling collaborative learning across multiple institutions. Real-time analytics and stream processing technologies will enable faster detection and response to fraudulent activities, further improving the effectiveness of fraud detection systems.
Addressing Limitations and Challenges
Addressing the limitations and challenges of data analysis in fraud detection requires a multifaceted approach. Ensuring data quality and availability involves robust data collection and preprocessing techniques, as well as continuous monitoring and updating of data sources. Tackling data imbalance necessitates the use of techniques like oversampling, undersampling, and synthetic data generation, but also requires ongoing evaluation to avoid introducing new biases.
Improving model interpretability is critical for building trust and ensuring regulatory compliance. Techniques such as LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations) can help make machine learning models more transparent and understandable. This transparency is essential for gaining the trust of stakeholders and regulators, as well as for facilitating human oversight of flagged transactions.
The evolving nature of fraud tactics requires continuous monitoring and updating of models. This involves implementing robust model monitoring systems to detect performance degradation and promptly retraining models with new data. Additionally, adopting reinforcement learning approaches can enable systems to adapt to new fraud patterns more quickly and effectively.
Ensuring the scalability of data analysis models to handle increasing transaction volumes without compromising performance requires careful planning and implementation. This may involve leveraging cloud-based solutions and distributed computing to scale systems efficiently. Additionally, optimizing model architectures and utilizing efficient data processing techniques can help maintain performance at scale.
Compliance with data privacy and security regulations is essential for maintaining the integrity and trustworthiness of fraud detection systems. Techniques such as differential privacy, which adds noise to data to protect individual privacy, and federated learning, which enables collaborative learning across multiple institutions without sharing raw data, can help balance privacy and performance.
Finally, balancing automation with human intervention is critical for maintaining the accuracy and reliability of fraud detection systems. While automated systems can quickly and accurately identify potential fraud cases, human oversight remains essential for reviewing flagged transactions and taking appropriate actions. This balance ensures that the system remains effective and trustworthy, while also leveraging the strengths of both automated and human approaches.
The Role of Collaboration and Innovation
Collaboration and innovation play crucial roles in addressing the challenges of credit card fraud detection. Financial institutions, technology providers, and regulatory bodies must work together to develop and implement effective fraud detection systems. This collaboration can involve sharing best practices, data, and insights, as well as developing standardized protocols and frameworks for fraud detection.
Innovation is also essential for staying ahead of sophisticated fraud schemes. Continued research and development in artificial intelligence, machine learning, and blockchain technology can drive the creation of more advanced and effective fraud detection systems. Additionally, fostering a culture of innovation within organizations can encourage the exploration and adoption of new technologies and approaches.
Conclusion: Embracing a Dynamic Future
In conclusion, data analysis offers a powerful and adaptable approach to credit card fraud detection, addressing many of the limitations of traditional methods. The use of Python and its extensive libraries facilitates the development of sophisticated fraud detection models capable of learning from new data and adapting to emerging fraud patterns. However, several challenges remain, including data quality, model interpretability, integration with existing systems, scalability, regulatory compliance, and balancing automation with human oversight.
By addressing these challenges and leveraging advancements in artificial intelligence, deep learning, blockchain technology, explainable AI, federated learning, and real-time analytics, financial institutions can enhance their fraud detection capabilities, protect consumers, and reduce financial losses. Collaboration and innovation will play critical roles in driving the development and implementation of effective fraud detection systems, ensuring the security and integrity of the digital financial ecosystem.
The continued evolution of data analysis techniques and their application in fraud detection will be essential for safeguarding the financial industry against ever-evolving fraud tactics. Embracing a dynamic and adaptive approach to fraud detection will enable financial institutions to stay ahead of sophisticated fraud schemes, maintain the trust of their customers, and ensure the security and integrity of digital transactions in an increasingly connected world.
Ultimately, the adaptability of credit card fraud detection data analysis represents a significant advancement in the fight against financial crime. By continuously learning from new data and adapting to emerging fraud patterns, these systems can provide a robust defense against the ever-changing landscape of credit card fraud. As the financial industry continues to evolve, the role of data analysis and machine learning in fraud detection will only become more critical, ensuring that financial institutions remain resilient and secure in the face of ongoing threats.
PROJECT WORK : Guidelines4734823-201768
Project Based Learning is the application of the comprehensive methodology to inculcate the spirit of strategizing industry operations in a real-time environment.
The project work aims to foster students with an opportunity to develop conceptual, analytical, communication and interpersonal skills.
Selection of Project work Topic
The choice of topic for the project work and the approach to be adopted needs to be based on the field of specialization.635002364863500400457
It is important to distinguish between project work topic and project work title. The topic is the specific area that you wish to investigate. The title may not be decided until the project work has been written so as to reflect its content properly.
The project topic should conform to the following:
Relevant to business or technology, defined broadly;Related to one or more of the subjects or areas of study within the core program and specialization;
Clearly focused so as to facilitate in-depth study, subject to the availability of adequate sources of information and to your own knowledge;
Of value and interest to you and your personal and professional development.635001424
Planning the Project work
Selecting an original and relevant topic for investigation.Establishing the precise focus of your study by deciding on the aims and objectives of the project work, formulating questions to be investigated, deciding the sampling techniques and statistical techniques to sum up the findings of the study. Consider very carefully what is worth investigating and its feasibility.
Drawing up initial project work outlines considering the aims and objectives of the project work. Workout various stages of project work63500789
Important instructions and information on Project Submission:
The title of the project should not be more than 12 words in length. The complete Project Work should be submitted in 15000-30000 words. You are supposed to submit project work along with an extended abstract and project guide resume simultaneously.6350024919
You must be careful about Originality and Relevance of Project Topic to avoid Project Rejection at a later stage. Therefore, you are required to submit a plagiarism report acknowledging 85% originality635001424
Project Guide must be Post Graduate with a minimum of 10 years of work experience Ensure to include signed & scanned copies of following essential certificatesFrom Project Guide: Certifying bonafides of project work carried out under his/her supervision
From a student: Certifying that submitted project work is an original piece of work and has not been submitted earlier
You will receive an intimation through a registered email address, on successful uploading of project work report635003329
Viva Questions will be accessible after uploading Project Work.Project submission will be accepted only after the Project file is uploaded and Viva questions are answered Generally, it takes four to six weeks to complete the process of evaluation of project work.
Writing the Project Work:
The abstract for 500-1000 words
An abstract is an overview or a brief summary of project work, which helps the reader to ascertain the purpose of carrying the project work. It acts as a stand-alone entity for the complete project work.
The study hypotheses(null or alternative hypotheses, if applicable)
Literature Review
Literature review(secondary sources) is the evaluation of substantive findings and theoretical and methodological contribution to a particular topic. It is a critical analysis of the previous research conducted in a particular area.
Research methodology adopted
Research methodology is the implementation of methods or techniques to efficiently solve a research problem, which helps the reader to assess the validity and reliability of the study.
Research methodology constitutes of:
Research Design: Descriptive, Conclusive, Causal or Exploratory Sampling Technique: Probability or Non-Probability
Data Collection: Tools used for data collection( for eg: questionnaire, survey, etc) Data Preparation: Classification and Tabulation of data
Data Analysis: Hypotheses Testing
Results(theoretical or empirical)
The findings of the study are to be summarized as:
Data interpretation: Interpret and elaborate findings of the research Recommendation: Suggestions based on critical analysis of the results
Implications of theory and practice
The total size of the project document should not exceed 2MB. Portable document format (.pdf) only.
Figures,graphs.Tables,AppendicesandReferencesshouldfollowtheAmerican Psychological Association (APA) Style guide, 7th edition.
Mention the sources of any images, tables, figures cited or presented
Include a page header known as running head at the top of every page
Use Font: Times New Roman; Font size: 12; Double-spaced; 1-inch (2.5cm) margin all around
Use American spellings( program not programme; center and not centre
Use z spellings instead of s spellings(recognize, organize, summarize)
Project Submission:
Complete Project submission includes three stages:
Abstract along with Guide Resume
Project Report Submission along with Plagiarism Report
Answer Viva Questions
Viva Submission:
Viva will be conducted in two parts. It is mandatory for the student to attempt both the parts:
Part 1: To be submitted along with the project report.
Viva Questions will include 5 descriptive questions related to your specific project. Viva questions are mandatory for the final project submission.
Part 2. Acing Your Interview
To be attempted on LMS under your major project/dissertation course page (Please refer to the screenshot attached to identify)
Following steps need to be followed by the student for completing Viva part 2:
Click on the link under the Acing your interview section (Once you click on the link, you will be redirected to a new page).
After Login, the following page will appear:
3) Click on my courses:
4) Click on Acing your Interview:
5) Watch all the videos under contents section:
6) After completing all the videos , go back to dashboard and click on EXAMS
7) Exam page will open as below:
8) Click on attempt:
9) Attempt your exam and submit:
Evaluation Scheme:
Project Report 70
Viva 30 (Part 1: 20 marks, Part 2: 10 marks)
TOTAL of 100
IMPORTANT NOTE
Students must submit all Project Components (Abstract, Guide Resume, Project Report, Plagiarism Report, and Viva Answers).
In case of incomplete details, students will be asked to resubmit all project documents which would lead to delinquency in Academic Completion and Extension Fee.635003329
Plagiarism check would be conducted before evaluation, for all the Project Report submissions. If any report exceeds 15% plagiarism, the same would be rejected and the student will undergo the process of resubmission as per rules. the process of plagiarism checking it will be rejected for resubmission.635002694