Abdul Manan Abbasi

Logo

text below image

View the Project on GitHub mananabbasi/Portfolio

🎯 Data Scientist Portfolio

🌟 Welcome to my Data Scientist Portfolio! 🚀
A showcase of my work in data analysis, machine learning, and visualization, highlighting my ability to extract insights and build data-driven solutions.


📌 Table of Contents


🌟 About Me

I am a data scientist dedicated to solving complex problems using data-driven methods. Specializing in statistics, machine learning, data visualization, and big data processing, I transform raw data into valuable insights. Always eager to learn, I continuously expand my knowledge and adapt to new technologies in the field.


🛠 Skills

Technical Skills:

Soft Skills:


⚙ Tools & Technologies

I am proficient in a range of tools and technologies that help me effectively analyze data and develop insights.


🚀 Projects

Dissertation

This dissertation explores the growth and volatility of the cryptocurrency market, focusing on Bitcoin, and examines how social media sentiment and app adoption influence market behavior. The study analyzes global trends in cryptocurrency exchange app adoption from 2015 to 2022, considering regional and demographic differences. It also investigates the impact of over 500,000 Bitcoin-related tweets (2021–2023) using NLP, sentiment analysis, and LDA techniques, with a focus on how influential figures shape public sentiment and Bitcoin price movements.

The research aims to better understand the drivers of cryptocurrency market volatility and price fluctuations. By combining historical Bitcoin price data with social media sentiment, the study will develop predictive models to forecast future trends and volatility, offering valuable insights for traders, investors, and policymakers.

1. Introduction

This dissertation focuses on analyzing cryptocurrency adoption, social media sentiment, and market behavior, specifically for Bitcoin. It involves the collection and preprocessing of various datasets, including global cryptocurrency adoption trends, Bitcoin-related tweets from 2021 to 2023, and minute-by-minute Bitcoin price data. The data is cleaned, standardized, and aligned to ensure consistency, and then explored through visualizations such as global adoption maps, time series plots, and sentiment distributions.

The research employs Natural Language Processing (NLP) to analyze tweet sentiment and extract key topics, and it applies feature engineering techniques to Bitcoin data, creating indicators such as moving averages and volatility. To identify patterns in the data, clustering techniques are also used.

For modeling, two regression models—XGBoost and Random Forest—are selected due to their ability to handle complex, non-linear relationships. These models will be trained using data from 2021, 2022, and 2023, and evaluated based on metrics such as RMSE, R-squared, volatility, and MAPE. The goal is to develop predictive models for cryptocurrency trends and volatility, offering insights for traders, investors, and policymakers.

2. Dataset

I have gathered reliable data from multiple platforms, including CoinMarketCap, Mendeley Data, Twitter, and Kaggle, to create four distinct datasets, each serving a specific purpose.

Datasets Overview

1. Cryptocurrency Adoption and Exchange Activity (2015–2022) This dataset tracks the growth of cryptocurrency users, app downloads, and activity across various cryptocurrency exchanges from 2015 to 2022. Data is sourced from the Bank for International Settlements.

2. Bitcoin Tweets (2021–2023) This collection includes Bitcoin-related tweets from 2021, 2022, and 2023, which will be analyzed using Natural Language Processing (NLP) and sentiment analysis to assess public sentiment and its impact on Bitcoin price movements.

3. Influencer Tweets This dataset contains tweets from 52 influential figures in the cryptocurrency space. The data is sourced from Mendeley Data - Influencers Tweets.

4. Bitcoin Historical Price Data (2012–Present) This dataset includes minute-by-minute Bitcoin price data from 2012 to the present, sourced from Kaggle Bitcoin Historical Data.

These datasets will provide a comprehensive basis for analyzing global cryptocurrency adoption trends, social media sentiment, and Bitcoin price behavior. By integrating these sources, the analysis aims to uncover insights into market trends, investor psychology, and the impact of social media on Bitcoin price fluctuations.

Trends from 2015–2022, global app usage, download spikes, and key event impacts.

3.1 Introduction This analysis explores cryptocurrency exchange app adoption across the G20, G7, and 33 countries, using a comprehensive dataset spanning from August 2015 to June 2022. The dataset is organized across ten sheets and captures various dimensions, including user activity, demographics, and the relationship between Bitcoin price trends and app usage. It also offers insights into how external events have influenced cryptocurrency adoption and overall market behavior.

3.2 Dataset Source The dataset can be accessed via the following link: Bank for International Settlements Data

3.3 Dataset Overview The dataset provides monthly indicators such as downloads, daily active users (DAUs), user demographics, and correlations with Bitcoin price movements. It offers a global perspective on crypto adoption trends and behavioral patterns. The data is distributed across ten distinct sheets:

3.4 Key Visualizations

Key Visualizations

The analysis includes several key visualizations that help in understanding global cryptocurrency adoption trends, sentiment analysis, and the relationship between Bitcoin price movements and social media activity. Below are the primary types of visualizations included in this project:

  1. Global Cryptocurrency Adoption Trends
    • Choropleth Maps: Visualize the global distribution of cryptocurrency adoption by country, highlighting regions with the highest growth in user activity, downloads, and exchange activity from 2015 to 2022. Global Adoption
  2. Bitcoin Price and Trading Volume
    • Bitcoin Price and Trading Volume: Line plots and candlestick charts to track Bitcoin price fluctuations and trading volume over time, highlighting major events and market movements. Bitcoin Price vs Downloads
  3. Active Users Per 100k Sentiment Trends

4. Correlation Between Active Users and Downloads across G7

Countries Correlation

6. Top 10 Countries for Crypto Adoption

7. Other Visualizations

These visualizations provide a comprehensive view of the dynamics influencing cryptocurrency markets and adoption, allowing for a deeper understanding of market behavior and sentiment shifts.

Conducting Statistical Testing to Check Impact of Events on Bitcoin Price

Using Wilcoxon signed-rank test and event windows to evaluate how key announcements moved prices. For the Wilcoxon signed-rank test, which is a non-parametric test (used if the data is not normally distributed): • Null Hypothesis (H₀): There is no significant difference in Bitcoin prices before and after the event. • Alternative Hypothesis (H₁): There is a significant difference in Bitcoin prices before and after the event

Since the p-value is less than 0.05, we reject the null hypothesis. This indicates that there is a statistically significant difference between Bitcoin prices before and after the event. Result from Wilcoxon signed-rank test Wilcoxon Test

• The before event and after event Bitcoin prices are significantly different, based on the Wilcoxon signed-rank test. • Bitcoin price is the biggest driver of engagement, influencing app usage by around 50%, with external events like China’s crypto crackdown and Kazakhstan’s unrest also playing significant roles.

Event Impact on Bitcoin: A chart showing the effect of major events on Bitcoin’s price and volatility. Impact of Major Events on Bitcoin

4. Social Media Sentiment and Behavioral Insights: Analyzing Bitcoin Tweets (2021–2023)

4.1 Introduction In this section, I shift focus from app usage trends to examining the influence of social media sentiment on cryptocurrency adoption and behavior. Specifically, I analyze Bitcoin-related tweets collected monthly from 2021 to 2023. To uncover patterns in public opinion and community dynamics, I applied Natural Language Processing (NLP) techniques, including sentiment analysis and unsupervised clustering. These methods help categorize sentiment trends over time, identify key discussion topics, and explore how these patterns align with Bitcoin price movements and major news events.

4.2 Dataset Source: All datasets were collected from Kaggle:

4.3 Dataset Overview The datasets consist of three years’ worth of Bitcoin-related tweets, covering 2021, 2022, and 2023. Each dataset contains metadata such as:

4.4 Key Visualizations

3.3.4 Key Visualizations: Social Media Sentiment and Tweet Volumes

The following visualizations help analyze Bitcoin-related tweet volume and sentiment over time, revealing patterns in public opinion and engagement with the cryptocurrency:

  1. Combined Daily Tweet Volume
    This graph displays the daily tweet volume for Bitcoin across 2021, 2022, and 2023, helping to track spikes in tweet activity. Combined Daily Tweet Volume

  2. Combined Hourly Tweet Volume
    This chart shows the hourly distribution of Bitcoin-related tweets, providing insights into peak times for discussions. Combined Hourly Tweet Volume

  3. Daily Tweet Count (2021–2023)
    A daily count of Bitcoin-related tweets for the years 2021, 2022, and 2023, helping to identify key days of high activity. Daily Tweet Count (2021–2023)

  4. Monthly Tweet Volume (2021–2023)
    This plot illustrates the monthly volume of Bitcoin-related tweets, showing seasonal trends and long-term shifts in interest. Monthly Tweet Volume (2021–2023)

  5. Top 10 Days for Tweet Volume (2021–2023)
    A bar chart highlighting the top 10 days with the highest tweet volumes for Bitcoin, reflecting significant events. Top 10 Days for Tweet Volume

  6. Tweet Count per Hour (2021–2023)
    This visualization displays the tweet volume per hour for Bitcoin-related tweets across the three years. Tweet Count per Hour (2021–2023)

These visualizations provide a comprehensive overview of the tweet volume dynamics surrounding Bitcoin, allowing for a better understanding of how public sentiment evolves over time.

Minibatch K-Means Clustering Algorithm

Detecting tweet clusters (e.g., memes, analysis, FUD, bullish signals).

These visualizations help to understand the underlying clusters of Bitcoin-related tweets and the most prominent topics within each cluster:

  1. Number of Clusters
    This visualization shows the number of clusters identified in the Bitcoin tweet dataset, which helps categorize tweets into distinct topics based on sentiment and content. Number of Clusters

  2. PCA Visualization of Clusters
    A Principal Component Analysis (PCA) plot visualizing the tweet clusters in a 2D space, helping to identify relationships and groupings between different tweet topics. PCA Visualization of Clusters

  3. Top Words in Clusters
    This word cloud highlights the most frequent terms within each cluster, giving insight into the major themes and topics discussed in Bitcoin-related tweets. Top Words in Clusters

These visualizations provide valuable insights into how Bitcoin-related discussions are organized and what topics dominate conversations on social media.

Sentiment Analysis

Using VADER and TextBlob to derive user sentiment patterns.

1 Overall Sentiment Distribution

Overall Sentiment

This chart shows the general distribution of tweet sentiments—positive, negative, and neutral—across the entire dataset. It helps establish a high-level understanding of public opinion.

2 Sentiment Across Tweets (2021–2023)

Sentiment Across Tweets

This visualization highlights how sentiment fluctuated across individual tweets throughout the years 2021 to 2023. It’s useful for identifying peaks or drops in sentiment linked to real-world events.

3 Sentiment Trend Across Years

Sentiment Trend

This line graph displays the average sentiment per year, showing how public sentiment evolved over time. Trends can provide insights into broader societal or platform-specific changes.

4 Word Cloud of Frequent Terms

Word Cloud

The word cloud highlights the most frequently used words in the tweet dataset. Larger words represent higher frequency, offering a glimpse into common topics and keywords.

5. Influencer Users’ Tweets on Cryptocurrency (Feb 2021 – Jun 2023) Tweet Dataset

Tracking popular crypto influencers, tweet timelines, and user engagement.

5.1 Introduction

After analyzing sentiment trends from the 500,000 public tweets collected between 2021 and 2023, the next step involves exploring insights from 52 selected influencer accounts.
Since this dataset already includes a cleaned text column, additional NLP preprocessing will not be performed.


5.2 Dataset Source

🔗 Access the dataset on Mendeley Data


5.3 Dataset Overview

This dataset includes tweets from 52 individuals discussing cryptocurrency over a span of more than two years.
It features:

This curated dataset provides a rich foundation for examining influencer behavior and public influence trends in the crypto space.

5.4 Key Visulization

This section presents visual insights based on tweets from 52 key influencers in the cryptocurrency domain, covering sentiment impact, engagement behavior, tweet characteristics, and overall trends.

1 Influencer Engagement Metrics

Influencer Engagement Metrics

This chart displays engagement levels (likes, retweets, replies) from influential accounts. It helps identify which influencers generate the most interaction and community response.

2 Post Volume vs. Sentiment

Post Volume and Sentiment

This visualization compares the number of tweets posted by influencers with their associated sentiment. It offers insight into whether higher tweet activity correlates with more positive or negative sentiment.


3 Sentiment Impact on Tweet Engagement

Sentiment Impact

This chart explores how sentiment (positive, neutral, negative) affects the engagement rate of tweets. Do positive tweets get more likes? Are negative tweets more viral?


4 Tweet Length Distribution

Tweet Length

This figure shows the distribution of tweet lengths among influencers. It helps in understanding whether shorter or longer tweets are more common and how length might relate to engagement or sentiment.

5 Tweet Posting Trends Over Time

Tweet Trend

This time series graph reveals tweet activity patterns from influencers over the analyzed period. It highlights key periods of increased activity or dormancy, potentially linked to major events.

Topic Modeling: Analyzing Tweet Themes and Engagement

LDA modeling to explore key topics and trends within influencer tweets.

Results from Topic Modelling Results from Topic Modelling

Topics Engagement and Trends Topics Engagement and Trends

This section analyzed tweets from 52 influential crypto figures between February 2021 and June 2023, revealing spikes in engagement during key events, particularly in late 2021, late 2022, and early 2023. While positive sentiment prevailed overall, the most-liked tweet was negative, emphasizing the viral impact of emotionally charged content. Topic modeling showed market analysis, Ethereum/DeFi, and Bitcoin updates were most engaging, with BTC discussions having more negative sentiment and a growing interest in whale tracking and transaction-level insights.

Decomposing Bitcoin volatility and trend cycles using clustering.

6.1 Introduction

In this analysis, I employ advanced Exploratory Data Analysis (EDA) and clustering techniques to uncover the key trends in Bitcoin’s market data. By analyzing minute-by-minute price data from various Bitcoin exchanges, I aim to identify patterns in Bitcoin’s price movements and explore how market sentiment influences these trends.
Using clustering, I seek to gain a deeper understanding of the factors driving Bitcoin’s market behavior, providing valuable insights into its volatility and trading dynamics.


6.2 Dataset Source

🔗 Access the dataset on Kaggle


6.3 Dataset Overview

This dataset contains historical minute-by-minute data for Bitcoin’s price across various exchanges. It includes:

The dataset spans a significant period and offers a detailed view of Bitcoin’s price volatility and trading trends, making it ideal for EDA and clustering techniques.

6.4 Key Visualizations

1 Bitcoin Average Price Daily Bitcoin Average Price Daily

This chart shows the average Bitcoin price on a daily basis. It provides an overview of the general trend of Bitcoin’s price over time.


2 Bitcoin Average Price Over the Year Bitcoin Average Price Over the Year

This graph shows how Bitcoin’s average price changed throughout the year. It helps to identify seasonal trends and year-on-year comparisons.


3 Bitcoin Close Price Over Time Bitcoin Close Price Over Time

This visualization tracks Bitcoin’s closing price over a defined period, showcasing key fluctuations and trends in its daily close prices.


4 Bitcoin Price and Rolling Statistics Bitcoin Price and Rolling Statistics

This plot visualizes Bitcoin’s price along with its rolling statistics (like moving averages), providing insights into short-term price trends and volatility.


5 Bitcoin Price for Three Different Years (2021, 2022, 2023) Bitcoin Price for Three Different Years

This graph compares Bitcoin’s price trends across three years: 2021, 2022, and 2023. It highlights key differences in price behavior between these years.


6 Bitcoin Trading Volume Over Time Bitcoin Trading Volume Over Time

This chart shows the trading volume of Bitcoin over time, reflecting the level of market activity and investor interest during various periods.


#7 Correlation Heatmap Correlation Heatmap

This heatmap illustrates the correlation between different Bitcoin market variables (such as price, volume, market cap). It helps in understanding how these factors interact.


8 Monthly Average Bitcoin Price Monthly Average Bitcoin Price

This plot shows the monthly average price of Bitcoin, helping to identify long-term trends and how Bitcoin’s price behaves across different months of the year.


9 Volume and Close Price Volume and Close Price

This chart shows the relationship between Bitcoin’s trading volume and its closing price. It highlights how trading volume can affect price movements and indicate market sentiment.

K-Means Clustering

Labeling volatility phases, identifying price regimes and anomaly detection.

PCA Components for Bitcoin PCA Components for Bitcoin

This plot visualizes the Principal Component Analysis (PCA) components for Bitcoin. It helps in understanding how different features contribute to the overall variance in Bitcoin’s price data.

Cluster Summary Cluster Summary

This summary plot provides insights into the clustering results. It shows how the different clusters of Bitcoin market data are distributed, giving an overview of the key groupings and their characteristics.

Number of Clusters for Bitcoin Number of Clusters for Bitcoin

Seasonal and weekly patterns reveal price peaks in mid-February, March, and November, with notable dips in summer and on Sundays, while clustering analysis identifies two market behaviors: Cluster 0 with high, stable prices and low volume, and Cluster 1 with lower prices, higher volumes, and more volatility.

7. Combined Analysis of Twitter Sentiment and Bitcoin Market Data

Aligning tweets and price by timestamp to extract correlation patterns.

7.1 Introduction

In this section, I integrate the cleaned Twitter dataset from 2021 to 2023, complete with sentiment labels and intensity scores, with Bitcoin market data to explore potential relationships between public sentiment and price dynamics. By aligning both datasets on the date column, I aim to uncover how fluctuations in social media sentiment correspond with Bitcoin’s price movements, trading volume, and volatility.
This combined analysis forms a crucial bridge between behavioral signals and financial trends, setting the stage for predictive modeling in the following section.

7.2 Dataset Source

I will use the dataset described in Chapter 3.2, which was created by merging tweets from various months in 2021, 2022, and 2023 into a single collection. Before combining, the tweets were carefully filtered to remove spam and irrelevant content.

7.3 Dataset Overview

The integrated dataset contains:

By combining these two datasets, this analysis provides an opportunity to explore how social media sentiment may correlate with Bitcoin’s market movements.

7.4 Key visualization

1 2021 Sentiment and Price Distribution 2021 Sentiment and Price Distribution

This chart visualizes the sentiment distribution and corresponding Bitcoin price data for 2021. It helps to see how public sentiment affected Bitcoin’s price during this period.


2 2022 Sentiment and Price Distribution 2022 Sentiment and Price Distribution

This visualization tracks sentiment and Bitcoin price distribution in 2022, providing insights into the relationship between market sentiment and price fluctuations over this year.


3 2023 Sentiment and Price Distribution 2023 Sentiment and Price Distribution

Here, sentiment distribution for 2023 is compared with Bitcoin’s price trends. The analysis highlights how sentiment may have impacted Bitcoin’s price in this year.


4 Average Compound Score and Sentiment Average Compound Score and Sentiment

This graph displays the average sentiment compound scores and their corresponding intensity over time, providing insights into overall sentiment shifts across the three years.


5 Correlation Across All Three Time Periods Correlation Across All Three Time Periods

This heatmap visualizes the correlation between sentiment and Bitcoin’s market data (price, volume, volatility) for the years 2021, 2022, and 2023. It helps to identify any changes or consistent relationships over time.


6 Sentiment vs. Average Closing Price Sentiment vs. Average Closing Price

This chart explores the relationship between sentiment and Bitcoin’s average closing price. It aims to uncover any correlation between positive/negative sentiment and Bitcoin’s daily closing price.


7 Sentiment vs. Trading Volume Sentiment vs. Trading Volume

This visualization compares sentiment scores with Bitcoin’s trading volume, helping to assess whether increased sentiment intensity leads to more trading activity.


8 Sentiment vs. Trading Volume (Alternate View) Sentiment vs. Trading Volume

This alternate view of sentiment vs. trading volume provides additional insights into how public sentiment correlates with Bitcoin’s trading volume over time.

Doing Statistical Testing

T-test

Null Hypothesis (H₀): There is no significant difference in the average Bitcoin closing price between positive sentiment and negative sentiment. Alternative Hypothesis (H₁): There is a significant difference in the average Bitcoin closing price between positive sentiment and negative sentiment.

📉 T-Test Results

T-Test Results

Reject the null hypothesis If the p-value is below 0.05, it suggests that the average Bitcoin price is significantly different between positive and negative sentiment.

📈 Plotting the Results

Plotting the Results

The boxplot shows that Bitcoin closing prices are slightly higher on positive sentiment days compared to negative ones. While both sentiment types show similar price ranges and outliers, the median price is higher with positive sentiment, suggesting a mild link between optimistic tweets and stronger Bitcoin performance.

8. Predictive Modeling Using Combined Sentiment and Bitcoin Price Data

8.1 Introduction

After combining the Bitcoin Twitter sentiment dataset with the Bitcoin price dataset, I obtained a unified dataset that captures both market behavior and public sentiment over time. This dataset will serve as the foundation for building predictive models aimed at forecasting Bitcoin price movements.

To determine the most suitable modeling approach, I will begin by conducting normality tests on the features and target variable. This step will help assess whether data distribution supports the use of linear models or calls for more flexible, non-linear alternatives. If the data appears stable and well-behaved, I will consider using XGBoost due to its accuracy and ability to handle structured data efficiently. On the other hand, if the data shows signs of volatility or noise, Random Forest will be the preferred choice, as it tends to perform well with complex, fluctuating patterns. This model selection process will ensure that the final approach aligns with the nature of the data and yields the most reliable predictive results.


8.2 Dataset Source

Following the integration of the Bitcoin Twitter sentiment dataset with the Bitcoin price dataset in Chapter 3.5, I obtained a unified dataset that reflects both market dynamics and public sentiment over time. This combined dataset will form the basis for developing predictive models to forecast Bitcoin price movements.


8.3 Dataset Overview for Modelling

The unified dataset includes the following features:

This combined dataset will be used to develop models aimed at predicting Bitcoin’s price movements, considering the impact of social media sentiment on the market.

Model Selection and Training

Modeling pipeline using cross-validation and grid search. QQ Plot for Normality Test QQ Plot for Normality Test

This QQ plot helps assess whether the data follows a normal distribution, which is important for determining the suitability of linear models for predictive modeling.

Shapiro-Wilk Test for Normality Shapiro-Wilk Test for Normality

The Shapiro-Wilk test results provide another measure of normality. If the p-value is above a threshold, the null hypothesis of normality cannot be rejected, indicating that the data is approximately normally distributed.

Non-Linear Models

Result: No Linearity Result: No Linearity

This visualization indicates that there is no clear linear relationship between the features and the target variable. This suggests that linear models may not be the most suitable approach for this dataset. Non-linear models such as XGBoost or Random Forest may be more appropriate for capturing the complex patterns in the data.

Since the data does not meet the assumption of normality, I will use non-linear models for analysis. I will use two Nonlinear models and compare their results

Model 1: XGBoost Regressor Model

Powerful gradient boosting approach for price prediction.

Feature Importance in XGBoost Feature Importance in XGBoost

This plot illustrates the feature importance in the XGBoost model. It highlights the most significant features contributing to the prediction of Bitcoin price movements, allowing for insights into which factors, such as sentiment or trading volume, have the largest influence on price changes.


Time Series Forecasting for 2021 using XGBoost Time Series Forecasting for 2021 using XGBoost

This visualization shows the time series forecasting results for 2021 using the XGBoost model. It compares the predicted Bitcoin prices with the actual prices, providing insights into the model’s forecasting accuracy for the year.


Time Series Forecasting for 2022 using XGBoost Time Series Forecasting for 2022 using XGBoost

This plot presents the time series forecasting results for 2022 using the XGBoost model. It shows the predicted Bitcoin prices for the year and compares them with the actual market prices to evaluate the model’s performance.


Time Series Forecasting for 2023 using XGBoost Time Series Forecasting for 2023 using XGBoost

This visualization displays the time series forecasting results for 2023. It highlights the predicted Bitcoin prices and compares them with the actual prices to assess the model’s forecasting ability for this year.

Model 2: Random Forest Regressor Model

Bagging ensemble for capturing sentiment-driven volatility.

Feature Importance in Random Forest Feature Importance in Random Forest

This plot illustrates the feature importance in the Random Forest model. It highlights the most significant features contributing to the prediction of Bitcoin price movements, showing which factors have the largest impact on price changes.

Time Series Forecasting for 2021 using Random Forest Regression Time Series Forecasting for 2021 using Random Forest Regression

This visualization shows the time series forecasting results for 2021 using the Random Forest model. It compares the predicted Bitcoin prices with the actual prices, providing insights into the model’s forecasting accuracy for the year.

Time Series Forecasting for 2022 using Random Forest Regression Time Series Forecasting for 2022 using Random Forest Regression

This plot presents the time series forecasting results for 2022 using the Random Forest model. It shows the predicted Bitcoin prices for the year and compares them with the actual market prices to evaluate the model’s performance.

Time Series Forecasting for 2023 using Random Forest Regression Time Series Forecasting for 2023 using Random Forest Regression

This visualization displays the time series forecasting results for 2023. It highlights the predicted Bitcoin prices and compares them with the actual prices to assess the model’s forecasting ability for this year.

Comparing Both Models

Metric evaluation: RMSE, MAE, and R² comparisons.etc

Random Forest Regression Model Performance Random Forest Regression Model Performance

This table shows the performance of the Random Forest Regression model. It highlights key performance metrics such as accuracy, precision, recall, and RMSE, which are used to assess the model’s effectiveness in predicting Bitcoin prices.


XGBoost Model Performance XGBoost Model Performance

This table displays the performance of the XGBoost model. It includes key metrics to evaluate how well the model forecasts Bitcoin prices, comparing its results against actual price data for accuracy and reliability.

Model Recommendations

XGBoost performed best in stable periods (Periods 1 & 2), delivering higher accuracy and better fit. Random Forest excelled in Period 3, where volatility was highest, showing greater resilience to market fluctuations. Overall, XGBoost is ideal for stable market conditions, while Random Forest is better for volatile environments.

9. Conclusion

9.1 Insights & Findings

9.1.1 Cryptocurrency Adoption & Market Behavior

9.1.2 Social Media Sentiment & Public Perception

9.1.3 Bitcoin Price Trends & Volatility

9.1.4 Predictive Modeling Insights


9.2 Future Recommendations

9.2.1 For Investors & Traders


9.3 Key Recommendations

Based on the findings, the following strategic recommendations are made for cryptocurrency platforms, investors, and stakeholders:


Statistical Analysis Projects with R

🔹 A. Statistical Analysis & Advanced Statistics using R

Duration: Sep 2024 - Dec 2025

🔗 Dataset: Concrete Compressive Strength. 📂 Github Repository: GitHub Repository 📂 Code File: Code

Objective:
To perform statistical analysis on the dataset, uncover insights, and support data-driven decision-making by understanding relationships between variables.

Process:

  1. Data Cleaning: Handled missing values and outliers, ensuring data consistency.
  2. Exploratory Data Analysis (EDA): Analyzed distributions and relationships using visualizations.
  3. Hypothesis Testing: Conducted t-tests and chi-square tests to validate assumptions.
  4. Regression Analysis: Built linear regression models and evaluated performance.
  5. ANOVA: Compared means across groups to identify significant differences.
  6. Visualization: Created visualizations using ggplot2 for clear insights.

Tools Used:

📊 Key Visualizations

Numerical Variable Distribution
Numerical Distribution
Categorical Variable Distribution
Categorical Distribution
Normality Check
Normalization Correlation Analysis
Correlation Matrix
Simple Linear Regression (SLR) Assumptions
SLR Assumptions
Regression Model Results
Regression Model
Generalized Additive Model (GAM)
GAM Model

Outcome:


🔹 B. Time Series Forecasting Models in R

Duration: Nov 2024 - Dec 2024
🔗 Dataset: Vital Statistics in the UK.

📂 Code File: Vital Statistics in the UK - Time Series Modelling

Objective:
To develop accurate time series forecasting models for predicting future trends, enabling stakeholders to optimize business strategies.

Process:

  1. Data Preparation: Collected and cleaned historical time series data.
  2. Model Selection: Evaluated ARIMA, SARIMA, and ETS models for accuracy.
  3. Model Evaluation: Used RMSE and MAE to measure model performance.
  4. Visualization: Created visualizations to compare predicted vs actual values.

Tools Used:

📊 Key Visualizations

Additive Model with Increasing or Decresing Trends Additive Model with Trend Forecasting Results
Forecasting
Forecast Errors
Forecast Errors

Outcome:


Advanced SQL Database Development Projects

🔹 A. Hospital Database Development (Microsoft SQL Server)

Duration: Jan 2024 - Apr 2024
📂 Code Files:

Objective:
Designed and implemented a scalable, high-performance hospital database to manage large volumes of data.

Process:

Tools Used:

Database Diagram for Restuarent
Restuarent Diagram Database Diagram for Hospiatl
Hospiatl Diagram Total Appointments The total number of appointments is shown in the following visualization: Total Appointments

Outcome:


Data Mining & Machine Learning Projects in Python



🔹 A. Real-Time Data Classification Models in Python

Duration: Sep 2024 - Present

Objective:
Developed ML models to predict client subscription to term deposits using real world dataset.

Process:

Outcome:

Tools Used: Python, Scikit-learn, Pandas, Matplotlib

📌 1- Classification Models On Banking Datasets:

📂 Code Files:

📊 Key Visualizations from Banking Dataset:

Data Exploration
MLDM Calls

Model Performance
KNN Classification
Decision Tree Classification

Actionable Recommendations For Banking DataSet:

📌 2- Classification Models On Agriculture Datasets:

📊 Key Visualizations from Agriculture Dataset:
Data Exploration for Agriculture Dataset
Rice Data Exploration

Model Performance on Agriculture Dataset
Model Comparison

Actionable Recommendations For Agriculture DataSet:

📌 3- Classification Models on Obesity Dataset

🔗 Dataset: [Estimation of Obesity Levels Based on Eating Habits and Physical Condition](https://archive.ics.uci.edu/dataset/544/estimation+of+obesity+levels+based+on+eating+habits+and+physical+condition)

📂 Code Files:

📊 Key Visualizations from Obesity Dataset:
Gender Distribution Gender Distribution

Age Distribution Age Distribution

Confusion Matrix
Confusion Matrix

Actionable Recommendations For Agriculture DataSet:

🔹 B. Customer Segmentation Using K-Means and Hierarchical Clustering

Duration: Sep 2024 - Present

Objective:
Developed ML models for customer segmentation using clustering techniques to identify distinct groups and optimize marketing strategies.

Process:

Outcome:

Tools Used:

📌 1- Clustering Models On Credit Card Marketing Dataset:

🔗 Dataset: Credit Card Marketing Dataset

📂 Code Files:

📊Key Visualizations from Credit Card Marketing Dataset:

Number of Clusters
No of Clusters Correlation Analysis
Correlation Dendrogram
Dendrogram

Recommendations for Credit Card DataSet:

📊Key Visualizations from Obesity Dataset:
Skewness
Skewness HeatMap
HeatMap Clusters for this Dataset
Clusters

Recommendations for Obesity DataSet:

📌 3 - Clustering Models on Online Shoppers Purchasing Intention Dataset

🔗 Dataset: Online Shoppers Purchasing Intention Dataset

📂 Code Files:

📊 Key Visualizations from Online Shoppers Purchasing Intention Dataset: Skewness Skewness Revenue Distribution Revenue Distribution Customer Clusters Customer Clusters

Recommendations for Online Shoppers Purchasing Intention DataSet:


🔹 C. Sentiment Analysis and Text Classification on Real Time Dataset

Duration: Sep 2024 - Present

Objective
Developed ML models to analyze sentiment and classify customer reviews from McDonald’s US stores , movies and Twitter trends across the world.

Process:

Outcome:

Tools Used:

📌 1- Sentiment Analysis on the entire U.S. McDonald’s reviews dataset

🔗 Dataset: - US McDonald’s Stores Reviews Dataset 📂 Code Files:

📊 Key Visualizations from Macdonald Dataset:
Sentiment Distribution
Overall Sentiment Positive Reviews
Positive Reviews Negative Reviews & Word Clouds
Negative Word Cloud

Recommendations from Macdonald Dataset:

📌 2- Sentiment Analysis on twitter reviews dataset

🔗 Dataset: Twitter Sentiment Analysis Dataset

📊 Key Visualizations from Twitter Dataset Sentiments Across the World
Sentiments of Tweets

Tweets Heat Map
Tweets Heat Map

Top Words in Positive, Negative, and Neutral Emotions
Top Words Word Cloud Across Sentiments
Word Cloud Confusion Matrix Confusion Matrix

Recommendations from Twitter Dataset:

📌 3- Sentiment Analysis on Movie Reviews

🔗 Dataset: Movies_Reviews_Modified_Version1
📂 Code File:

Recommendations from Movies Reviews Dataset:

A. Classification Models on Azure ML Designer

Duration: Sep 2024 - Present
🔗 Dataset: Banking Dataset 📂 Code File: Azure ML Designer Code

Process:

  1. Upload & Clean Data: Import dataset, handle missing values, and split data into training and testing sets.
  2. Model Selection & Training: Choose classification algorithms (e.g., Logistic Regression, Decision Tree, SVM), and train the model.
  3. Model Evaluation: Evaluate using accuracy, precision, and recall.
  4. Hyperparameter Tuning & Deployment: Optionally tune the model and deploy it for real-time predictions.

Outcome:

Tools Used:

📊 Key Visualizations:


Databricks Projects with PySpark

####🔹 A. Databricks Projects Using PySpark (RDD, DataFrames, and SQL)
Duration: Jan 2024 - Present

Objective:
Efficiently process and analyze large-scale datasets using PySpark on Databricks, creating optimized data pipelines for big data challenges.

🔗 Dataset:

Process:

  1. Data Import & Transformation:
    • Used RDDs and DataFrames to load and clean large datasets.
    • Applied Spark SQL for data aggregation and transformation.
  2. Optimization:
    • Optimized pipelines with partitioning, caching, and parallel processing.
  3. Big Data Processing:
    • Processed large datasets for real-time insights.

📊 Key Visualizations:

2. Working with DataFrames

3. Working with SQL

This repository highlights various techniques for cleaning, visualizing, and analyzing data using RDDs, DataFrames, and SQL in Databricks Notebooks.

Outcome:

Tools Used:


🔹 B. Steam Data Analysis: Visualization & ALS Evaluation in Databricks

Duration: Jan 2024 - Present

Objective:
Analyze large-scale Steam datasets using PySpark on Databricks, with data visualization and ALS (Alternating Least Squares) for recommendation system evaluation.

🔗 Dataset:

📂 Code File:
GitHub: Data Analysis, Visualization & ALS Evaluation

Process:

  1. Data Import & Cleaning: Loaded and cleaned Steam dataset using PySpark.
  2. EDA & Visualization: Visualized game trends and pricing with Matplotlib and Seaborn.
  3. ALS Model: Built and tuned a recommendation system using ALS, optimized with RMSE.
  4. Predictions & Recommendations: Generated personalized game suggestions.

Outcome:

Tools Used:

📊 Key Visualizations:

Power BI Dashboard Development Projects

🔹 A. Power BI Dashboards: Real-Time Insights by Region and Country Group

Duration: Jan 2024 - Current

🔗 Dashboards Repository: GitHub Repository

Objective:
Create interactive, real-time dashboards for monitoring business performance and enabling data-driven decision-making.

Process:

Tools Used:

📊Key Visualizations

1.Population Growth by Real-Time Insights by Region and Country Group* Dataset: World Population Prospects 2024
Dashboard 1 Final Dashboard Dashboard 2 Complete Dashboard

Outcome:
-The dashboard provides a comprehensive overview of global population trends, showcasing how population dynamics have evolved between 1960 and 2022.

2.HIV Data Dashboard Outcomes by Region and Country Group* 🔗 Dataset: World Population Prospects 2024
Final Dashboard

Outcome:

3. Sales Report Dashboard* 🔗 Dataset: World Trend Final Dashboard Outcome:

4. European Energy Transition(2000-2020)* 🔗 Dataset: Energy Dataset 2000 - 2020/)
Final Dashboard

5. European Energy Transition* 🔗 Dataset: Energu Dataset/)
Final Dashboard Outcome:

Current Projects

Currently, I am working on my dissertation focused on Cryptocurrency: Global Trends, Acceptance Around the World, and Future Price Predictions of top currencies based on historical data. The research will involve creating visualizations and reporting to analyze the evolution of cryptocurrencies and their impact on the global market.

Research Focus:

Part-Time Work Experience Alongside my dissertation, I am working part-time with Eagle Cars and Tiger Taxi, helping them generate weekly and monthly reports for better decision-making. This experience allows me to apply my data analysis and reporting skills in real-world scenarios.

💼 Work Experience


I have gained hands-on experience in various data science roles, where I applied my skills to solve real-world business challenges.

Data Visualization Analyst (Part-Time)

Eagle Cars & Tiger Taxis | Oct 2024 - Present | Clitheroe, UK

Data Scientist (Full-Time)

WebDoc | May 2023 - Dec 2023 | Islamabad, Pakistan

Data Insights Analyst (Full-Time)

Zones, IT Solutions | Sep 2021 - May 2023 | Islamabad, Pakistan


🎓 Education


Here’s my academic background that laid the foundation for my career in data science.

M.S. in Data Science (Expected May 2025)

University of Salford, UK

B.S. in Software Engineering (Graduated May 2022)

Bahria University, Pakistan


🎯 Activities


In addition to my professional and academic pursuits, I am actively involved in extracurricular activities.

President, Dawah Society - Salford University (2024)


🔧 How to Use This Repository


Clone this repository to explore my projects and codebase:
```bash git clone https://github.com/mananabbasi

📞 Contact

You can get in touch with me through the following channels:

📧 Email: mananw25@gmail.com 🔗 LinkedIn: Linkedin Profile 🐙 GitHub: GitHub Profile