Reference no: EM133777532
Data Analytics Project
Assessment - Presentation
Purpose - Assess your ability to present your problem, investigations, results and conclusions in an organised, clear and objective way.
Description Oral presentation.
Research Topic
The working title of the project stays the same - only "Correlation between Employee Satisfaction in Work and Productivity. "The project aims at discovering what different forces like job satisfaction, balance in people's lives, and demographic elements are responsible for the drop in workforce and productivity. The title itself gives out its concept and direction of the research so there has been no dearth of it from the original title projected in Assignment.
Introduction
The scope of the project has been broadened with the inclusion of clustering analysis, which segments employees into different risk groups, adding depth to our initial inquiry.
Employee attrition is a pressing issue that affects workforce stability, productivity, and overall organizational health. High turnover rates can disrupt business operations, increase recruitment and training costs, and negatively impact team morale. Understanding the reasons behind employee turnover is crucial for human resources (HR) departments aiming to retain talent and minimize productivity losses. This challenge has prompted organizations to explore various strategies for improving employee retention, with HR analytics playing an increasingly important role in predicting and mitigating attrition risks.
In this project, we use HR analytics data to quantify and explore factors such as job satisfaction, work-life balance, business travel, and career progression that contribute to employee attrition. By analyzing employee data, we aim to uncover patterns and relationships that explain why employees choose to leave their jobs and how organizations can intervene to prevent turnover. The data-driven approach provides a more objective basis for understanding employee behavior, allowing companies to make informed decisions regarding employee management and policy development.
Following are some of the key objectives proposed at the outset of the project:
Predictors of Attrition: Determine the degree of association between demographic and job- related characteristics of employees with their intent to leave, thus identifying high-risk groups.
Job Satisfaction Analysis: The investigation was pursued into how job satisfaction and work- life balance influence employee performance and turnover to understand the influence of these on long-term retention.
Career Growth: Promotion, opportunities for advancement, and tenure impact employee retention and how a lack of growth can potentially be the reason for attrition.
Business Travel: Research has also sought to understand the impact that frequent business trips have on employee well-being, job satisfaction, and turnover rates.
These objectives remain on track with the project, but we have somewhat broadened our inquiry scope by including various clustering techniques that also group employees based on their risk of attrition using job satisfaction, distance from home, and other demographic characteristics. These clustering methods will also allow us to segment employees into meaningful categories, giving us deeper insights into the several profiles of employees most likely to leave the organization. This added analysis will yield recommendations that are more targeted toward improving employee retention.
Methodology
Various methodologies adopted in the project involve descriptive and predictive analytics in the exploration of the relationship between employee satisfaction and attrition. This will explain how factors such as job satisfaction, work-life balance, business travel, and demographic characteristics drive employees to continue with or leave the organization.
Data Source: The data used in this exploration is a dataset of employee lists with 35 variables relating to different attributes such as Age, business travel Frequency, satisfaction, work-life balance, and Attrition, among others. The dataset therefore gives a very well- rounded view of the workforce, as it encompasses employee records spanning 1,470 employees. Ideally, the dataset contains exhaustive information on employees' demographics, education, performance ratings, compensation, and job characteristics to help identify patterns related to attrition.
Descriptive Statistics: The first part of the analysis is exploratory and reviews the dataset using summary statistics via means, medians, standard deviations, and distributions of key variables. This helps to provide an overview of the employee population and perhaps flags the areas that may interest the employees. The visualizations, such as histograms, box plots, and bar charts, are done to present data on variables of interest - for instance, age, daily rate, job satisfaction, and attrition rates-which enable us to see meaningful differences in these attributes across the different employee groups.
Correlation Analysis: To find relationships between variables, we will compute the correlation matrices. It helps in identifying significant correlations between factors such as distance from home, job satisfaction, years with the current manager, and attrition. For instance, we would expect to confirm that lower job satisfaction and greater distances from home could contribute to higher attrition rates. The correlation analysis will also give an idea of variable selection in predictive modeling; hence, it would help us focus our efforts on the most relevant factors causing employee turnover.
Predictive Modeling:
To ensure model accuracy and reliability, we will apply cross-validation techniques and split the data into training and testing sets. This approach will help us validate the effectiveness of our models.
Logistic Regression: In the given model, a logistic regression model is to be developed that would predict the probability of attrition based on the characteristics of the employees.
Since the outcome is binary, logistic regression would perfectly suit it, as it models the probability of employees staying or leaving. Age, job satisfaction, years at the company, and business travel frequency are just a few of the variables being utilized as predictors. This will also allow us to quantify the effect of each variable on the probability of attrition and, therefore, define which variables are important drivers of turnover.
Decision Trees: On the side, a decision tree model is being developed that would categorize the employees based on their risk of leaving. Decision trees carve out an intuitive way of segmenting the workforce by creating branches based on job satisfaction, work-life balance, and promotions. Hence, the model will provide comprehensive details to the human resource managers about which combination of factors leads to higher attrition risk and focus their retention efforts in that direction.
Clustering Analysis: A K-Means clustering algorithm will be employed in our study, taking into consideration some of the characteristics of the employees. The method will then divide the employees into clusters, considering job satisfaction, performance, and work-life balance. Due to this clustering analysis, employees are categorized into high and low risks, which gives a different dimension of employee attrition. Subgroups within the workforce that may be more likely to leave can be identified, enabling us to target that particular group with appropriate interventions for better retention.
Software & Tools: The analysis shall be done in Python due to the powerful libraries for data manipulation, such as Pandas, machine learning models, such as Scikit-learn, and data visualization, including Matplotlib/Seaborn. These tools bring flexibility and efficiency for handling large datasets, performing advanced statistical analyses, and creating insightful visualizations. Python's extensive ecosystem also allows seamless integration of steps regarding preprocessing, modeling, and evaluation, thus being the ideal choice for this project.
Progress Report:
Results toward the aims of the project have been appreciable. The analysis is well underway with substantial work completed in data exploration, correlation analysis, predictive modeling, and clustering. Below are the key areas where progress was made:
The preliminary correlation analysis shows statistically significant relationships, such as the p-value for job satisfaction being less than 0.05, indicating a strong link with attrition.
Data Exploration: Preliminary EDA has been done to understand the dataset that has 1,470 employees for 35 variables. Distribution of summary statistics and visualization of key variables such as age, job satisfaction, and work-life balance were extracted.
Attrition Rate: Approximately 16% of employees within the dataset have left the company. It is therefore very important to understand the trend in employee attrition.
Business Travel: Preliminary results indicate that employees who have to travel more frequently on work-related matters are more likely to leave the company. The attrition rate is significantly higher among those employees who need to travel quite frequently compared with others who either travel infrequently or not at all for business.
Work-Life Balance: Employees scoring lower on work-life balance tend to leave the company more often. Whereas the average of work-life balance for those who left is 2.5 on a scale of 1-4, the average for those who stayed is 3.1.
Age: Generally, attrition rates are higher in employees below 35 years of age, indicating a relatively low satisfaction or job security in the younger population.
Correlation Analysis: Preliminary correlation analysis was performed to study the relationship of various employee characteristics with attrition. Results are shown to indicate the key drivers of turnover.
Negative Correlations: Factors such as job satisfaction (-0.27), distance from home (-0.16), and years with the current manager emerge as negatively correlated with attrition. That means the better these factors, the lesser the chances of employees quitting. For instance, employees staying closer to the workplace or staying longer with the managers are less likely to quit.
Positive Correlations: Factors include frequent business travel at 0.19 and lower work-life balance scores at 0.24, which are positively correlated with attrition, hence stating that workers who experience these conditions are more likely to quit. For instance, business travel appears to act as a key driver of employee dissatisfaction, especially among employees who have families or other personal commitments.
Predictive Modeling:
To ensure model accuracy and reliability, we will apply cross-validation techniques and split the data into training and testing sets. This approach will help us validate the effectiveness of our models.
Logistic Regression: The model will be developed using logistic regression to predict attrition using predictors such as age, job satisfaction, distance from home, and business travel. Initial testing indicates that job satisfaction and distance from home are two of the top predictors in determining whether someone will leave or not. Employees who have a low job satisfaction score of less than 2 are likely to leave more than twice as much as others.
Decision Trees: The decision tree model in training will classify employees into different levels of attrition risk. The decision tree underlines some key decision points, like years with the current manager and work-life balance, as major factors affecting the output of the risk of attrition. Initial testing of the model yields a high accuracy rate, with over 80% correct classification of employees who are likely to leave. The decision tree serves well in highlighting the employees of the targeted segments, such as frequent travelers and those with low job satisfaction.
Clustering Analysis: K-Means clustering was done to segment the employees based on similar characteristics. Quite a few meaningful clusters were derived from the analysis, and some of these are described below.
Cluster 1: The cluster represents employees who have low job satisfaction with a high frequency of business travel. This cluster will result in the most employee flight risk, and over 40% of employees in this cluster have left the company.
Cluster 2: consists of people with very high job satisfaction and long tenure with the current manager. It has the lowest level of attrition risk-less than 5% of all the leavers.
Cluster 3: Younger employees below 30 years but average in job satisfaction, constrained in career growth. The cluster is moderately risky for attrition since the younger group may look out for better development opportunities.
These clusters will support HR with action items to better target high-risk groups for retention strategies, such as frequent travelers or low satisfaction scores among employees.
Visualization: To help communicate such findings, some of the following visualization tools have been developed:
Scatter plots: These are used to show the relationship between variables; for example, distance from home and attrition other words, the farther away from home, the higher the tendency toward attrition.
Heatmap: Relationships of job satisfaction with the time with current manager and attrition are indicated. It allows immediate determination of which of the two relationships is the strongest.
Bar charts: present comparative attrition rates across departments and identify Sales as the highest and Research & Development as the lowest.
These visualizations have been so instrumental in highlighting trends and in the communication of key findings in such an easy-to-understand manner.
Literature Review Updates
More literature has been reviewed since Assignment 1, especially on the impact of business travel and job satisfaction on employee attrition. It has been found that employees with
increased travel frequencies are more likely to burn out and, hence tend to have higher turnover rates. The literature also discusses the quality of the relationship between managers and employees in terms of staff retention. This agrees with our finding that longer tenure with the current manager is negatively related to attrition.
Conclusion
The project is on schedule and is going great; initial analyses confirmed some of the expected relationships between job satisfaction, work-life balance, and employee attrition. For the next couple of weeks, we will work on finishing the predictive models, refining the clustering analysis, and getting ready for the final project submission.