Profile Photo
Here’s what I’ve done

About

  • Introduction
  • Work Experiences
  • Skills
  • Education
  • Project
  • Certification

Introduction


I am Youssef Dir, a recent graduate in Business Data Science. Passionate about data analysis, machine learning, and web development. I have experience working with Python , SQL, RStudio, Snowflake,Power BI and I aim to apply my skills in real-world projects.

Work Experiences

Work Experience Image

Data Scientist - Internship (6 months)

  • Avanci Mvgroup a consulting company specializing in marketing that helps its customers boost
    their business with data-driven strategies and solutions.

    As a data scientist, here are the projects and tasks I have worked on :

    Creation and Management of a Simulated Database - SQL

    The generation of realistic data using the Faker library (Python) allows for the
    simulation of a dataset tailored for business analyses. This data can then be implemented
    on SQL Server, where relationships betweentables are managed and the structure is optimized for
    efficient access. Once the data is imported, a cleaning process is carried out to handle missing
    values, detectin consistencies, and prepare the data for in-depth exploratory analysis. This ensures
    data integrity and facilitates its use in analytical and decision-making contexts.

    Development of Dashboards and Data Visualization - Streamlit

    The integration of visualization tools such as Plotly and Matplotliballows for
    the effective presentation of insights, including customer segmentation and business
    performance. Advanced visualizationsare created, such as interactive maps to analyze
    regional audiences, charts for customer segmentation, and performance tracking graphs.
    Additionally, a dedicated tab is implemented for data quality visualization,providing
    a comprehensive view of dataintegrity and ensuring that any issues can be addressed
    promptly for accurate decision-making.

    Predictive analysis and advanced modeling

    statistical analysis and data cleaning, addressing missing values and detecting anomalies to
    ensure high-quality data. I identified key features by evaluating their relative importance in
    predictions and used PCA to detect redundancy between variables, optimizing feature selection.
    To handle imbalanced classes, I implemented a sampling strategy to balance the dataset. I then
    trained Random Forest and XGBoost models to predict customer
    behaviors and characteristics,evaluating their performance using metrics like accuracy and AUC ROC.

  • PROBTP Logo

    Data Analyst - Internship (3 months)


    • PROBTP is the leading professional group in social protection and the 8th largest in health and welfare,
      serving businesses, craftsmen, employees, and retirees French insurer in the construction
      andbuilding sectors.Health, welfare, insurance, savings, holidays, andretirement… her mission is to support their members
      atevery stage of their life. The company invests in multiple real estate funds in France, which include key financial
      indicators such as value of realization,value of subscription, and subscription price. The value of subscription is the price at
      which investors can purchase shares in the fund, including both the value of the assets (value of realization) and a premium for
      issuance, which covers additional costs like administrative and acquisition fees. The subscription price is the amount an investor
      pays to acquire one share in the fund, determined by the fund’s management, and can vary depending on the number of shares issued
      and market conditions.
    • reading of PDF files - Excel VBA

      Development of a VBA script to automate the reading of PDF files, extract key information and
      organize it into structured tables.

      SCPI Monitoring Tools

      Development of a VBA-based tool to track SCPI (Civil Real Estate Investment Company)
      values, including purchase values, Realization values,subscription prices,etc.

    • Skills

      Python

      Work Experience Image

    Streamlit Proficiency: Development and deployment of data-driven interactive applications.
    Data Manipulation: Skilled in Pandas and NumPy for data processing, and Parquet with SQLAlchemy for SQL database implementation.
    Data Visualization: Crafting advanced visualizations with Plotly and Seaborn.
    Dependency Management: Leveraging Poetry for dependency isolation and management, alongside containerization for application portability.
    Web Scraping: Extracting web data using BeautifulSoup.
    Predictive Analysis: Applying SciPy and Seaborn for modeling and prediction.

    Rstudio

    Work Experience Image

    Data Manipulation: Proficiency in Tidyverse packages such as {dplyr}, {purrr}, and {stringr} for efficient data handling.
    Data Visualization: Crafting advanced visualizations using {ggplot2} and {plotly}.
    Web Development: Building modern, production-ready web applications with {shiny}.
    Machine Learning: Predictive modeling with {tidymodels}.
    Exploratory and Factor Analysis: Conducting PCA (Principal Component Analysis), CA (Correspondence Analysis),
    MCA (Multiple Correspondence Analysis), and MFDA (Multiple Factor Analysis of Mixed Data).
    Productivity Tools: Leveraging {lintr}, {styler}, and {usethis} for code formatting, linting, and task automation.
    Sentiment Analysis: Implementing sentiment analysis with dedicated packages (e.g., {tidytext}).

    SQL - Mysql

    Work Experience Image

    Table Creation: Proficiency in designing and creating SQL tables to structure data.
    Joins: Expertise in using joins (INNER, LEFT, RIGHT, FULL) to combine and analyze relational data.

    Snowflake - Certification

    Snowflake Icon Snowflake Icon Snowflake Icon Snowflake Icon Snowflake Icon

    Data Warehousing Workshop: Proficiency in databases, warehouses, SQL worksheets, and external stages for
    data management and analysis.
    Data Engineering Workshop: Proficiency in timezones, timestamp formats, CTAS, streams/CDC, tasks, JSON parsing,
    merge statements, window functions, dashboards, Snowpipe, and metadata management.
    Data Application Builders Workshop: Streamlit in Snowflake to build interactive data
    applications, leveraging Python and pandas for data manipulation, and understands the basics of variables
    APIs, API keys, CLIs, and the SnowSQL CLI for efficient development and database operations.
    Collaboration, Marketplace & Cost Estimation Workshopstrong: strong foundation in SQL Basics, Intermediate SQL and
    Block Scripting/Control-of-Flow, with expertise in Snowflake-specific features like COPY INTO, file formats, cost categories,
    deployments on Azure and GCP, and leveraging the Snowflake Data Marketplace for data sharing.
    Data Lake Workshop: Proficiency in handling timezones and timestamp formats, CTAS statements,Snowflake
    Streams/CDC, Tasks, and Snowpipe for continuous loading, alongside advanced SQL skills like parsing JSON with paths
    and casts, SQL Merge statements,window functions, and storing metadata in columns.

    Outil

    Git - Gitlab: A strong mastery of Git, with skills in version control, branching, merging, committing changes,
    resolving conflicts, and collaborating on repositories
    Quarto: creating websites, documenting packages, and presentations with dynamic reveal.js slideshows.
    Excel: proficient in mastering dynamic pivot tables and charts, managing dynamic lists and named ranges,
    and automating graphics and tasks using VBA.
    PowerBi: dashboards and KPIs to visualize and track key performance indicators effectively.



    Projets


    Education


    University Logo

    Master 2 Business Data Science

    Big Data,Data Mining, Maching learning ,Web scraping
    Exploratory Data Analysis ( Python , Rstudio ),PCA, MCA, MFDA, Microeconometrics , APIS , Pricing. SQL Data Management.

    Licence 2 applied economics

    Econometrics, Statistics , Probability , R , Python ,
    Microeconomics , Excel (VBA ) ,HTML - CSS , Macroeconomics , Mathematiques.