text below image
🌟 Welcome to my Data Scientist Portfolio! 🚀
A showcase of my work in data analysis, machine learning, and visualization, highlighting my ability to extract insights and build data-driven solutions.
I am a data scientist dedicated to solving complex problems using data-driven methods. Specializing in statistics, machine learning, data visualization, and big data processing, I transform raw data into valuable insights. Always eager to learn, I continuously expand my knowledge and adapt to new technologies in the field.
I am proficient in a range of tools and technologies that help me effectively analyze data and develop insights.
This dissertation explores the growth and volatility of the cryptocurrency market, focusing on Bitcoin, and examines how social media sentiment and app adoption influence market behavior. The study analyzes global trends in cryptocurrency exchange app adoption from 2015 to 2022, considering regional and demographic differences. It also investigates the impact of over 500,000 Bitcoin-related tweets (2021–2023) using NLP, sentiment analysis, and LDA techniques, with a focus on how influential figures shape public sentiment and Bitcoin price movements.
The research aims to better understand the drivers of cryptocurrency market volatility and price fluctuations. By combining historical Bitcoin price data with social media sentiment, the study will develop predictive models to forecast future trends and volatility, offering valuable insights for traders, investors, and policymakers.
This dissertation focuses on analyzing cryptocurrency adoption, social media sentiment, and market behavior, specifically for Bitcoin. It involves the collection and preprocessing of various datasets, including global cryptocurrency adoption trends, Bitcoin-related tweets from 2021 to 2023, and minute-by-minute Bitcoin price data. The data is cleaned, standardized, and aligned to ensure consistency, and then explored through visualizations such as global adoption maps, time series plots, and sentiment distributions.
The research employs Natural Language Processing (NLP) to analyze tweet sentiment and extract key topics, and it applies feature engineering techniques to Bitcoin data, creating indicators such as moving averages and volatility. To identify patterns in the data, clustering techniques are also used.
For modeling, two regression models—XGBoost and Random Forest—are selected due to their ability to handle complex, non-linear relationships. These models will be trained using data from 2021, 2022, and 2023, and evaluated based on metrics such as RMSE, R-squared, volatility, and MAPE. The goal is to develop predictive models for cryptocurrency trends and volatility, offering insights for traders, investors, and policymakers.
I have gathered reliable data from multiple platforms, including CoinMarketCap, Mendeley Data, Twitter, and Kaggle, to create four distinct datasets, each serving a specific purpose.
1. Cryptocurrency Adoption and Exchange Activity (2015–2022) This dataset tracks the growth of cryptocurrency users, app downloads, and activity across various cryptocurrency exchanges from 2015 to 2022. Data is sourced from the Bank for International Settlements.
2. Bitcoin Tweets (2021–2023) This collection includes Bitcoin-related tweets from 2021, 2022, and 2023, which will be analyzed using Natural Language Processing (NLP) and sentiment analysis to assess public sentiment and its impact on Bitcoin price movements.
3. Influencer Tweets This dataset contains tweets from 52 influential figures in the cryptocurrency space. The data is sourced from Mendeley Data - Influencers Tweets.
4. Bitcoin Historical Price Data (2012–Present) This dataset includes minute-by-minute Bitcoin price data from 2012 to the present, sourced from Kaggle Bitcoin Historical Data.
These datasets will provide a comprehensive basis for analyzing global cryptocurrency adoption trends, social media sentiment, and Bitcoin price behavior. By integrating these sources, the analysis aims to uncover insights into market trends, investor psychology, and the impact of social media on Bitcoin price fluctuations.
Trends from 2015–2022, global app usage, download spikes, and key event impacts.
3.1 Introduction This analysis explores cryptocurrency exchange app adoption across the G20, G7, and 33 countries, using a comprehensive dataset spanning from August 2015 to June 2022. The dataset is organized across ten sheets and captures various dimensions, including user activity, demographics, and the relationship between Bitcoin price trends and app usage. It also offers insights into how external events have influenced cryptocurrency adoption and overall market behavior.
3.2 Dataset Source The dataset can be accessed via the following link: Bank for International Settlements Data
3.3 Dataset Overview The dataset provides monthly indicators such as downloads, daily active users (DAUs), user demographics, and correlations with Bitcoin price movements. It offers a global perspective on crypto adoption trends and behavioral patterns. The data is distributed across ten distinct sheets:
3.4 Key Visualizations
The analysis includes several key visualizations that help in understanding global cryptocurrency adoption trends, sentiment analysis, and the relationship between Bitcoin price movements and social media activity. Below are the primary types of visualizations included in this project:
Seasonal Analysis Plots: Identify seasonal patterns in user behavior and Bitcoin price trends.
Heatmaps: To analyze correlations between tweet volume, sentiment, and Bitcoin price during specific time.
Active Users vs Downloads: This plot compares the global trend of active users versus downloads.
Impact on Bitcoin Volatility: How major events in the market affect Bitcoin volatility.
Loss and Gain Analysis: Comparing losses and gains in the cryptocurrency market.
Cryptocurrency Exchange Performance: Visualization of the performance of different cryptocurrency exchanges.
Number of Downloads: A visualization showing the number of downloads across platforms over time.
These visualizations provide a comprehensive view of the dynamics influencing cryptocurrency markets and adoption, allowing for a deeper understanding of market behavior and sentiment shifts.
Using Wilcoxon signed-rank test and event windows to evaluate how key announcements moved prices. For the Wilcoxon signed-rank test, which is a non-parametric test (used if the data is not normally distributed): • Null Hypothesis (H₀): There is no significant difference in Bitcoin prices before and after the event. • Alternative Hypothesis (H₁): There is a significant difference in Bitcoin prices before and after the event
Since the p-value is less than 0.05, we reject the null hypothesis. This indicates that there is a statistically significant difference between Bitcoin prices before and after the event.
Result from Wilcoxon signed-rank test
• The before event and after event Bitcoin prices are significantly different, based on the Wilcoxon signed-rank test. • Bitcoin price is the biggest driver of engagement, influencing app usage by around 50%, with external events like China’s crypto crackdown and Kazakhstan’s unrest also playing significant roles.
Event Impact on Bitcoin: A chart showing the effect of major events on Bitcoin’s price and volatility.
4.1 Introduction In this section, I shift focus from app usage trends to examining the influence of social media sentiment on cryptocurrency adoption and behavior. Specifically, I analyze Bitcoin-related tweets collected monthly from 2021 to 2023. To uncover patterns in public opinion and community dynamics, I applied Natural Language Processing (NLP) techniques, including sentiment analysis and unsupervised clustering. These methods help categorize sentiment trends over time, identify key discussion topics, and explore how these patterns align with Bitcoin price movements and major news events.
4.2 Dataset Source: All datasets were collected from Kaggle:
4.3 Dataset Overview The datasets consist of three years’ worth of Bitcoin-related tweets, covering 2021, 2022, and 2023. Each dataset contains metadata such as:
4.4 Key Visualizations
The following visualizations help analyze Bitcoin-related tweet volume and sentiment over time, revealing patterns in public opinion and engagement with the cryptocurrency:
Combined Daily Tweet Volume
This graph displays the daily tweet volume for Bitcoin across 2021, 2022, and 2023, helping to track spikes in tweet activity.
Combined Hourly Tweet Volume
This chart shows the hourly distribution of Bitcoin-related tweets, providing insights into peak times for discussions.
Daily Tweet Count (2021–2023)
A daily count of Bitcoin-related tweets for the years 2021, 2022, and 2023, helping to identify key days of high activity.
Monthly Tweet Volume (2021–2023)
This plot illustrates the monthly volume of Bitcoin-related tweets, showing seasonal trends and long-term shifts in interest.
Top 10 Days for Tweet Volume (2021–2023)
A bar chart highlighting the top 10 days with the highest tweet volumes for Bitcoin, reflecting significant events.
Tweet Count per Hour (2021–2023)
This visualization displays the tweet volume per hour for Bitcoin-related tweets across the three years.
These visualizations provide a comprehensive overview of the tweet volume dynamics surrounding Bitcoin, allowing for a better understanding of how public sentiment evolves over time.
Detecting tweet clusters (e.g., memes, analysis, FUD, bullish signals).
These visualizations help to understand the underlying clusters of Bitcoin-related tweets and the most prominent topics within each cluster:
Number of Clusters
This visualization shows the number of clusters identified in the Bitcoin tweet dataset, which helps categorize tweets into distinct topics based on sentiment and content.
PCA Visualization of Clusters
A Principal Component Analysis (PCA) plot visualizing the tweet clusters in a 2D space, helping to identify relationships and groupings between different tweet topics.
Top Words in Clusters
This word cloud highlights the most frequent terms within each cluster, giving insight into the major themes and topics discussed in Bitcoin-related tweets.
These visualizations provide valuable insights into how Bitcoin-related discussions are organized and what topics dominate conversations on social media.
Using VADER and TextBlob to derive user sentiment patterns.
1 Overall Sentiment Distribution
This chart shows the general distribution of tweet sentiments—positive, negative, and neutral—across the entire dataset. It helps establish a high-level understanding of public opinion.
2 Sentiment Across Tweets (2021–2023)
This visualization highlights how sentiment fluctuated across individual tweets throughout the years 2021 to 2023. It’s useful for identifying peaks or drops in sentiment linked to real-world events.
3 Sentiment Trend Across Years
This line graph displays the average sentiment per year, showing how public sentiment evolved over time. Trends can provide insights into broader societal or platform-specific changes.
4 Word Cloud of Frequent Terms
The word cloud highlights the most frequently used words in the tweet dataset. Larger words represent higher frequency, offering a glimpse into common topics and keywords.
Tracking popular crypto influencers, tweet timelines, and user engagement.
5.1 Introduction
After analyzing sentiment trends from the 500,000 public tweets collected between 2021 and 2023, the next step involves exploring insights from 52 selected influencer accounts.
Since this dataset already includes a cleaned text column, additional NLP preprocessing will not be performed.
5.2 Dataset Source
🔗 Access the dataset on Mendeley Data
5.3 Dataset Overview
This dataset includes tweets from 52 individuals discussing cryptocurrency over a span of more than two years.
It features:
This curated dataset provides a rich foundation for examining influencer behavior and public influence trends in the crypto space.
5.4 Key Visulization
This section presents visual insights based on tweets from 52 key influencers in the cryptocurrency domain, covering sentiment impact, engagement behavior, tweet characteristics, and overall trends.
1 Influencer Engagement Metrics
This chart displays engagement levels (likes, retweets, replies) from influential accounts. It helps identify which influencers generate the most interaction and community response.
2 Post Volume vs. Sentiment
This visualization compares the number of tweets posted by influencers with their associated sentiment. It offers insight into whether higher tweet activity correlates with more positive or negative sentiment.
3 Sentiment Impact on Tweet Engagement
This chart explores how sentiment (positive, neutral, negative) affects the engagement rate of tweets. Do positive tweets get more likes? Are negative tweets more viral?
4 Tweet Length Distribution
This figure shows the distribution of tweet lengths among influencers. It helps in understanding whether shorter or longer tweets are more common and how length might relate to engagement or sentiment.
5 Tweet Posting Trends Over Time
This time series graph reveals tweet activity patterns from influencers over the analyzed period. It highlights key periods of increased activity or dormancy, potentially linked to major events.
LDA modeling to explore key topics and trends within influencer tweets.
Results from Topic Modelling
Topics Engagement and Trends
This section analyzed tweets from 52 influential crypto figures between February 2021 and June 2023, revealing spikes in engagement during key events, particularly in late 2021, late 2022, and early 2023. While positive sentiment prevailed overall, the most-liked tweet was negative, emphasizing the viral impact of emotionally charged content. Topic modeling showed market analysis, Ethereum/DeFi, and Bitcoin updates were most engaging, with BTC discussions having more negative sentiment and a growing interest in whale tracking and transaction-level insights.
Decomposing Bitcoin volatility and trend cycles using clustering.
6.1 Introduction
In this analysis, I employ advanced Exploratory Data Analysis (EDA) and clustering techniques to uncover the key trends in Bitcoin’s market data. By analyzing minute-by-minute price data from various Bitcoin exchanges, I aim to identify patterns in Bitcoin’s price movements and explore how market sentiment influences these trends.
Using clustering, I seek to gain a deeper understanding of the factors driving Bitcoin’s market behavior, providing valuable insights into its volatility and trading dynamics.
6.2 Dataset Source
🔗 Access the dataset on Kaggle
6.3 Dataset Overview
This dataset contains historical minute-by-minute data for Bitcoin’s price across various exchanges. It includes:
The dataset spans a significant period and offers a detailed view of Bitcoin’s price volatility and trading trends, making it ideal for EDA and clustering techniques.
6.4 Key Visualizations
1 Bitcoin Average Price Daily
This chart shows the average Bitcoin price on a daily basis. It provides an overview of the general trend of Bitcoin’s price over time.
2 Bitcoin Average Price Over the Year
This graph shows how Bitcoin’s average price changed throughout the year. It helps to identify seasonal trends and year-on-year comparisons.
3 Bitcoin Close Price Over Time
This visualization tracks Bitcoin’s closing price over a defined period, showcasing key fluctuations and trends in its daily close prices.
4 Bitcoin Price and Rolling Statistics
This plot visualizes Bitcoin’s price along with its rolling statistics (like moving averages), providing insights into short-term price trends and volatility.
5 Bitcoin Price for Three Different Years (2021, 2022, 2023)
This graph compares Bitcoin’s price trends across three years: 2021, 2022, and 2023. It highlights key differences in price behavior between these years.
6 Bitcoin Trading Volume Over Time
This chart shows the trading volume of Bitcoin over time, reflecting the level of market activity and investor interest during various periods.
#7 Correlation Heatmap
This heatmap illustrates the correlation between different Bitcoin market variables (such as price, volume, market cap). It helps in understanding how these factors interact.
8 Monthly Average Bitcoin Price
This plot shows the monthly average price of Bitcoin, helping to identify long-term trends and how Bitcoin’s price behaves across different months of the year.
9 Volume and Close Price
This chart shows the relationship between Bitcoin’s trading volume and its closing price. It highlights how trading volume can affect price movements and indicate market sentiment.
Labeling volatility phases, identifying price regimes and anomaly detection.
PCA Components for Bitcoin
This plot visualizes the Principal Component Analysis (PCA) components for Bitcoin. It helps in understanding how different features contribute to the overall variance in Bitcoin’s price data.
Cluster Summary
This summary plot provides insights into the clustering results. It shows how the different clusters of Bitcoin market data are distributed, giving an overview of the key groupings and their characteristics.
Number of Clusters for Bitcoin
Seasonal and weekly patterns reveal price peaks in mid-February, March, and November, with notable dips in summer and on Sundays, while clustering analysis identifies two market behaviors: Cluster 0 with high, stable prices and low volume, and Cluster 1 with lower prices, higher volumes, and more volatility.
Aligning tweets and price by timestamp to extract correlation patterns.
7.1 Introduction
In this section, I integrate the cleaned Twitter dataset from 2021 to 2023, complete with sentiment labels and intensity scores, with Bitcoin market data to explore potential relationships between public sentiment and price dynamics. By aligning both datasets on the date column, I aim to uncover how fluctuations in social media sentiment correspond with Bitcoin’s price movements, trading volume, and volatility.
This combined analysis forms a crucial bridge between behavioral signals and financial trends, setting the stage for predictive modeling in the following section.
7.2 Dataset Source
I will use the dataset described in Chapter 3.2, which was created by merging tweets from various months in 2021, 2022, and 2023 into a single collection. Before combining, the tweets were carefully filtered to remove spam and irrelevant content.
7.3 Dataset Overview
The integrated dataset contains:
By combining these two datasets, this analysis provides an opportunity to explore how social media sentiment may correlate with Bitcoin’s market movements.
7.4 Key visualization
1 2021 Sentiment and Price Distribution
This chart visualizes the sentiment distribution and corresponding Bitcoin price data for 2021. It helps to see how public sentiment affected Bitcoin’s price during this period.
2 2022 Sentiment and Price Distribution
This visualization tracks sentiment and Bitcoin price distribution in 2022, providing insights into the relationship between market sentiment and price fluctuations over this year.
3 2023 Sentiment and Price Distribution
Here, sentiment distribution for 2023 is compared with Bitcoin’s price trends. The analysis highlights how sentiment may have impacted Bitcoin’s price in this year.
4 Average Compound Score and Sentiment
This graph displays the average sentiment compound scores and their corresponding intensity over time, providing insights into overall sentiment shifts across the three years.
5 Correlation Across All Three Time Periods
This heatmap visualizes the correlation between sentiment and Bitcoin’s market data (price, volume, volatility) for the years 2021, 2022, and 2023. It helps to identify any changes or consistent relationships over time.
6 Sentiment vs. Average Closing Price
This chart explores the relationship between sentiment and Bitcoin’s average closing price. It aims to uncover any correlation between positive/negative sentiment and Bitcoin’s daily closing price.
7 Sentiment vs. Trading Volume
This visualization compares sentiment scores with Bitcoin’s trading volume, helping to assess whether increased sentiment intensity leads to more trading activity.
8 Sentiment vs. Trading Volume (Alternate View)
This alternate view of sentiment vs. trading volume provides additional insights into how public sentiment correlates with Bitcoin’s trading volume over time.
T-test
Null Hypothesis (H₀): There is no significant difference in the average Bitcoin closing price between positive sentiment and negative sentiment. Alternative Hypothesis (H₁): There is a significant difference in the average Bitcoin closing price between positive sentiment and negative sentiment.
Reject the null hypothesis If the p-value is below 0.05, it suggests that the average Bitcoin price is significantly different between positive and negative sentiment.
The boxplot shows that Bitcoin closing prices are slightly higher on positive sentiment days compared to negative ones. While both sentiment types show similar price ranges and outliers, the median price is higher with positive sentiment, suggesting a mild link between optimistic tweets and stronger Bitcoin performance.
8.1 Introduction
After combining the Bitcoin Twitter sentiment dataset with the Bitcoin price dataset, I obtained a unified dataset that captures both market behavior and public sentiment over time. This dataset will serve as the foundation for building predictive models aimed at forecasting Bitcoin price movements.
To determine the most suitable modeling approach, I will begin by conducting normality tests on the features and target variable. This step will help assess whether data distribution supports the use of linear models or calls for more flexible, non-linear alternatives. If the data appears stable and well-behaved, I will consider using XGBoost due to its accuracy and ability to handle structured data efficiently. On the other hand, if the data shows signs of volatility or noise, Random Forest will be the preferred choice, as it tends to perform well with complex, fluctuating patterns. This model selection process will ensure that the final approach aligns with the nature of the data and yields the most reliable predictive results.
8.2 Dataset Source
Following the integration of the Bitcoin Twitter sentiment dataset with the Bitcoin price dataset in Chapter 3.5, I obtained a unified dataset that reflects both market dynamics and public sentiment over time. This combined dataset will form the basis for developing predictive models to forecast Bitcoin price movements.
8.3 Dataset Overview for Modelling
The unified dataset includes the following features:
This combined dataset will be used to develop models aimed at predicting Bitcoin’s price movements, considering the impact of social media sentiment on the market.
Modeling pipeline using cross-validation and grid search.
QQ Plot for Normality Test
This QQ plot helps assess whether the data follows a normal distribution, which is important for determining the suitability of linear models for predictive modeling.
Shapiro-Wilk Test for Normality
The Shapiro-Wilk test results provide another measure of normality. If the p-value is above a threshold, the null hypothesis of normality cannot be rejected, indicating that the data is approximately normally distributed.
Result: No Linearity
This visualization indicates that there is no clear linear relationship between the features and the target variable. This suggests that linear models may not be the most suitable approach for this dataset. Non-linear models such as XGBoost or Random Forest may be more appropriate for capturing the complex patterns in the data.
Since the data does not meet the assumption of normality, I will use non-linear models for analysis. I will use two Nonlinear models and compare their results
Powerful gradient boosting approach for price prediction.
Feature Importance in XGBoost
This plot illustrates the feature importance in the XGBoost model. It highlights the most significant features contributing to the prediction of Bitcoin price movements, allowing for insights into which factors, such as sentiment or trading volume, have the largest influence on price changes.
Time Series Forecasting for 2021 using XGBoost
This visualization shows the time series forecasting results for 2021 using the XGBoost model. It compares the predicted Bitcoin prices with the actual prices, providing insights into the model’s forecasting accuracy for the year.
Time Series Forecasting for 2022 using XGBoost
This plot presents the time series forecasting results for 2022 using the XGBoost model. It shows the predicted Bitcoin prices for the year and compares them with the actual market prices to evaluate the model’s performance.
Time Series Forecasting for 2023 using XGBoost
This visualization displays the time series forecasting results for 2023. It highlights the predicted Bitcoin prices and compares them with the actual prices to assess the model’s forecasting ability for this year.
Bagging ensemble for capturing sentiment-driven volatility.
Feature Importance in Random Forest
This plot illustrates the feature importance in the Random Forest model. It highlights the most significant features contributing to the prediction of Bitcoin price movements, showing which factors have the largest impact on price changes.
Time Series Forecasting for 2021 using Random Forest Regression
This visualization shows the time series forecasting results for 2021 using the Random Forest model. It compares the predicted Bitcoin prices with the actual prices, providing insights into the model’s forecasting accuracy for the year.
Time Series Forecasting for 2022 using Random Forest Regression
This plot presents the time series forecasting results for 2022 using the Random Forest model. It shows the predicted Bitcoin prices for the year and compares them with the actual market prices to evaluate the model’s performance.
Time Series Forecasting for 2023 using Random Forest Regression
This visualization displays the time series forecasting results for 2023. It highlights the predicted Bitcoin prices and compares them with the actual prices to assess the model’s forecasting ability for this year.
Metric evaluation: RMSE, MAE, and R² comparisons.etc
Random Forest Regression Model Performance
This table shows the performance of the Random Forest Regression model. It highlights key performance metrics such as accuracy, precision, recall, and RMSE, which are used to assess the model’s effectiveness in predicting Bitcoin prices.
XGBoost Model Performance
This table displays the performance of the XGBoost model. It includes key metrics to evaluate how well the model forecasts Bitcoin prices, comparing its results against actual price data for accuracy and reliability.
XGBoost performed best in stable periods (Periods 1 & 2), delivering higher accuracy and better fit. Random Forest excelled in Period 3, where volatility was highest, showing greater resilience to market fluctuations. Overall, XGBoost is ideal for stable market conditions, while Random Forest is better for volatile environments.
9.1 Insights & Findings
9.1.1 Cryptocurrency Adoption & Market Behavior
9.1.2 Social Media Sentiment & Public Perception
9.1.3 Bitcoin Price Trends & Volatility
9.1.4 Predictive Modeling Insights
9.2 Future Recommendations
9.2.1 For Investors & Traders
9.3 Key Recommendations
Based on the findings, the following strategic recommendations are made for cryptocurrency platforms, investors, and stakeholders:
Duration: Sep 2024 - Dec 2025
🔗 Dataset: Concrete Compressive Strength. 📂 Github Repository: GitHub Repository 📂 Code File: Code
Objective:
To perform statistical analysis on the dataset, uncover insights, and support data-driven decision-making by understanding relationships between variables.
Process:
Tools Used:
Numerical Variable Distribution
Categorical Variable Distribution
Normality Check
Correlation Analysis
Simple Linear Regression (SLR) Assumptions
Regression Model Results
Generalized Additive Model (GAM)
Outcome:
Duration: Nov 2024 - Dec 2024
🔗 Dataset: Vital Statistics in the UK.
📂 Code File: Vital Statistics in the UK - Time Series Modelling
Objective:
To develop accurate time series forecasting models for predicting future trends, enabling stakeholders to optimize business strategies.
Process:
Tools Used:
Additive Model with Increasing or Decresing Trends
Forecasting Results
Forecast Errors
Outcome:
Duration: Jan 2024 - Apr 2024
📂 Code Files:
Objective:
Designed and implemented a scalable, high-performance hospital database to manage large volumes of data.
Process:
Tools Used:
Database Diagram for Restuarent
Database Diagram for Hospiatl
Total Appointments
The total number of appointments is shown in the following visualization:
Outcome:
Duration: Sep 2024 - Present
Objective:
Developed ML models to predict client subscription to term deposits using real world dataset.
Process:
Outcome:
Tools Used: Python, Scikit-learn, Pandas, Matplotlib
📌 1- Classification Models On Banking Datasets:
📂 Code Files:
📊 Key Visualizations from Banking Dataset:
Data Exploration
Model Performance
Actionable Recommendations For Banking DataSet:
📌 2- Classification Models On Agriculture Datasets:
📊 Key Visualizations from Agriculture Dataset:
Data Exploration for Agriculture Dataset
Model Performance on Agriculture Dataset
Actionable Recommendations For Agriculture DataSet:
📌 3- Classification Models on Obesity Dataset
🔗 Dataset: [Estimation of Obesity Levels Based on Eating Habits and Physical Condition](https://archive.ics.uci.edu/dataset/544/estimation+of+obesity+levels+based+on+eating+habits+and+physical+condition)
📂 Code Files:
📊 Key Visualizations from Obesity Dataset:
Gender Distribution
Age Distribution
Confusion Matrix
—
Actionable Recommendations For Agriculture DataSet:
Duration: Sep 2024 - Present
Objective:
Developed ML models for customer segmentation using clustering techniques to identify distinct groups and optimize marketing strategies.
Process:
Outcome:
Tools Used:
📌 1- Clustering Models On Credit Card Marketing Dataset:
🔗 Dataset: Credit Card Marketing Dataset
📂 Code Files:
📊Key Visualizations from Credit Card Marketing Dataset:
Number of Clusters
Correlation Analysis
Dendrogram
Recommendations for Credit Card DataSet:
Credit Limit: Promote for low usage, high limits and raise limits for high usage, low credit limits.
📌 2- Clustering Models On Obesity Dataset:
📊Key Visualizations from Obesity Dataset:
Skewness
HeatMap
Clusters for this Dataset
Recommendations for Obesity DataSet:
📌 3 - Clustering Models on Online Shoppers Purchasing Intention Dataset
🔗 Dataset: Online Shoppers Purchasing Intention Dataset
📂 Code Files:
📊 Key Visualizations from Online Shoppers Purchasing Intention Dataset:
Skewness
Revenue Distribution
Customer Clusters
Recommendations for Online Shoppers Purchasing Intention DataSet:
Duration: Sep 2024 - Present
Objective
Developed ML models to analyze sentiment and classify customer reviews from McDonald’s US stores , movies and Twitter trends across the world.
Process:
Outcome:
Tools Used:
📌 1- Sentiment Analysis on the entire U.S. McDonald’s reviews dataset
🔗 Dataset: - US McDonald’s Stores Reviews Dataset 📂 Code Files:
📊 Key Visualizations from Macdonald Dataset:
Sentiment Distribution
Positive Reviews
Negative Reviews & Word Clouds
Recommendations from Macdonald Dataset:
📌 2- Sentiment Analysis on twitter reviews dataset
🔗 Dataset: Twitter Sentiment Analysis Dataset
📊 Key Visualizations from Twitter Dataset
Sentiments Across the World
Tweets Heat Map
Top Words in Positive, Negative, and Neutral Emotions
Word Cloud Across Sentiments
Confusion Matrix
Recommendations from Twitter Dataset:
📌 3- Sentiment Analysis on Movie Reviews
🔗 Dataset: Movies_Reviews_Modified_Version1
📂 Code File:
Recommendations from Movies Reviews Dataset:
Duration: Sep 2024 - Present
🔗 Dataset:
Banking Dataset
📂 Code File:
Azure ML Designer Code
Process:
Outcome:
Tools Used:
📊 Key Visualizations:
Model Comparison:
Model Performance:
####🔹 A. Databricks Projects Using PySpark (RDD, DataFrames, and SQL)
Duration: Jan 2024 - Present
Objective:
Efficiently process and analyze large-scale datasets using PySpark on Databricks, creating optimized data pipelines for big data challenges.
🔗 Dataset:
This repository contains the code used for the Data Science Complete Project utilizing Big Data tools and techniques. Below are the links to the respective code files:
GitHub Repository
Main repository containing all project resources.
RDD Notebook
Notebook demonstrating operations and analysis with RDD.
DataFrame Notebook
Notebook for data manipulation and analysis using DataFrames.
Process:
📊 Key Visualizations:
Data Cleaning with RDDs
Visualized the data cleaning process using RDD transformations.
Completed Studies Analysis
Overview of completed studies using RDD.
Sponsor Type Analysis
Visualization of different sponsor types in the data.
2. Working with DataFrames
Data Cleaning with DataFrames
Visualized the process of cleaning and transforming data within DataFrames.
Creating DataFrame Schema
Schema Design
Visualization of the DataFrame schema structure.
3. Working with SQL
This repository highlights various techniques for cleaning, visualizing, and analyzing data using RDDs, DataFrames, and SQL in Databricks Notebooks.
Outcome:
Tools Used:
Duration: Jan 2024 - Present
Objective:
Analyze large-scale Steam datasets using PySpark on Databricks, with data visualization and ALS (Alternating Least Squares) for recommendation system evaluation.
🔗 Dataset:
📂 Code File:
GitHub: Data Analysis, Visualization & ALS Evaluation
Process:
Outcome:
Tools Used:
📊 Key Visualizations:
Duration: Jan 2024 - Current
🔗 Dashboards Repository: GitHub Repository
Objective:
Create interactive, real-time dashboards for monitoring business performance and enabling data-driven decision-making.
Process:
Tools Used:
1.Population Growth by Real-Time Insights by Region and Country Group*
Dataset: World Population Prospects 2024
Dashboard 1
Dashboard 2
Outcome:
-The dashboard provides a comprehensive overview of global population trends, showcasing how population dynamics have evolved between 1960 and 2022.
2.HIV Data Dashboard Outcomes by Region and Country Group*
🔗 Dataset: World Population Prospects 2024
Outcome:
3. Sales Report Dashboard*
🔗 Dataset: World Trend
Outcome:
4. European Energy Transition(2000-2020)*
🔗 Dataset: Energy Dataset 2000 - 2020/)
5. European Energy Transition*
🔗 Dataset: Energu Dataset/)
Outcome:
Currently, I am working on my dissertation focused on Cryptocurrency: Global Trends, Acceptance Around the World, and Future Price Predictions of top currencies based on historical data. The research will involve creating visualizations and reporting to analyze the evolution of cryptocurrencies and their impact on the global market.
Research Focus:
Part-Time Work Experience Alongside my dissertation, I am working part-time with Eagle Cars and Tiger Taxi, helping them generate weekly and monthly reports for better decision-making. This experience allows me to apply my data analysis and reporting skills in real-world scenarios.
I have gained hands-on experience in various data science roles, where I applied my skills to solve real-world business challenges.
Eagle Cars & Tiger Taxis | Oct 2024 - Present | Clitheroe, UK
WebDoc | May 2023 - Dec 2023 | Islamabad, Pakistan
Zones, IT Solutions | Sep 2021 - May 2023 | Islamabad, Pakistan
Here’s my academic background that laid the foundation for my career in data science.
University of Salford, UK
Bahria University, Pakistan
In addition to my professional and academic pursuits, I am actively involved in extracurricular activities.
Clone this repository to explore my projects and codebase:
```bash
git clone https://github.com/mananabbasi
You can get in touch with me through the following channels:
📧 Email: mananw25@gmail.com 🔗 LinkedIn: Linkedin Profile 🐙 GitHub: GitHub Profile—