Please refer to the Project video for complete context about the “Advance Statistics” problem.
The objective of the project is to use the dataset ‘Factor-Hair-Revised.csv‘ to build an optimum regression model to predict satisfaction. You are expected to
- Perform exploratory data analysis on the dataset. Showcase some charts, graphs. Check for outliers and missing values (8 marks)
- Is there evidence of multicollinearity ? Showcase your analysis(6 marks)
- Perform simple linear regression for the dependent variable with every independent variable (6 marks)
- Perform PCA/Factor analysis by extracting 4 factors. Interpret the output and name the Factors (20 marks)
- Perform Multiple linear regression with customer satisfaction as dependent variables and the four factors as independent variables. Comment on the Model output and validity. Your remarks should make it meaningful for everybody
Please note the following:
- You have to submit 2 files :
- Business Report: In this you need to submit all the answers to all the questions in a sequential manner. Your answer should include detailed explanations & inferences to all the questions. Your report should not be filled with codes. You will be evaluated based on the business report. It should include the detailed explanation of approach used, insights, inferences, all outputs of codes like graphs, tables etc.
- R code file : This is a must and will be used for reference while evaluating
- You must give the sources of data presented. Do not refer to blogs; Wikipedia etc.
- Any assignment found copied/ plagiarized with other group(s) will not be graded and marked as zero.
- Please ensure timely submission as post deadline assignment will not be accepted.
Scoring guide (Rubric) – Project 2 Rubric (1)
CriteriaPoints1.1 EDA – Basic data summary, Univariate, Bivariate analysis, graphs -4
1.2 EDA – Check for Outliers and missing values and check the summary of the dataset -4
2. Check for Multicollinearity – Plot the graph based on Multicollinearity -6
3. Simple Linear Regression (with every variable) -6
4.1 Perform PCA/FA and Interpret the Eigen Values (apply Kaiser Normalization Rule) -10
4.2 Output Interpretation Tell why only 4 factors are being asked in the questions and tell whether it is correct in choosing 4 factors. Name the factors with correct explanations. -10
5.1 Create a data frame with a minimum of 5 columns, 4 of which are different factors and the 5th column is Customer Satisfaction – 3
5.2 Perform Multiple Linear Regression with Customer Satisfaction as the Dependent Variable and the four factors as Independent Variables -4
5.3 MLR summary interpretation and significance (R, R2, Adjusted R2,Degrees of Freedom, f-statistic, coefficients along with p-values) -8
5.4 Output Interpretation <making it meaningful for everybody> -5