BUA 6315: Business Analytics for Decision Making
BUA 6315: Business Analytics for Decision Making
BUA 6315: Business Analytics for Decision Making 1 Final Project Data Mining Handout: Dataset 1 If you are using the college admission data, follow the instructions below to complete the Data Mining prompts in Sections 3 and 4 of the final project. Use the entire dataset which has 17339 datapoints. Transform the College GPA to a Transfer dummy using the following function: =if(College_GPA=””, 0,1). To complete the data mining prompts in Sections 3 and 4 of your final project, you must subset the colleges and select the following two colleges: Business & Economics and Mathematics & Science to apply the data mining techniques that you learned in Chapter 9, Chapter 10, and Chapter 11 of your textbook. Part I: Supervised Data Mining This part will help you prepare your data for the prompts that involve supervised data mining in Sections 3 and 4 of your final project submission. For more information about what must be included in your final report, see the Final Project document, available in Blackboard. Step 1: Methodology Choose either KNN algorithm or Decision Tree model based on the insights you want to gain from the data. You will need to be able to explain the motivation for using the model you have chosen in Section 3 of your final project submission. Step 2: Analysis and Results Choose ONE of the following sets of instructions to prepare your data, depending on the model you selected in Step 1, in order to address the prompts related to data mining in Section 4 of your final project submission. Please scroll down the page!!! BUA 6315: Business Analytics for Decision Making 2 KNN Algorithm Note that you will perform KNN Algorithm for each college. Decision Tree Note that you will construct a decision tree model for each college. 1. For each college, perform KNN analysis on the data set to predict whether an applicant will eventually decide to enroll at the college using predictor variables such as gender, race, SAT/ACT, HSGPA, and parent’seducation level. Note: You need to transform all categorical variables to numerical variables by creating dummy variables if there are. For example: gender needs to be transformed. 1. For each college, create a classification tree model to predict which college is most likely to accept a given university applicant based on the applicant’s gender, race, high school GPA, SAT/ACT score, and parent’s education level. Note: You need to transform all categorical variables to numerical variables by creating dummy variables if there are. For example: gender needs to be transformed. 2. For each college, partition the data with 50% for the training set, 30% for Validation set, and 20% for the test set. 3. For each college, report the accuracy, specificity, sensitivity, and precision rates for the test data set in a table. Interpret your results. 4. For each college, inspect performance charts and report the area under the ROC curve Please scroll down the page!!! BUA 6315: Business Analytics for Decision Making 3 3. For each college, partition the data with 50% for training set, 30% for validation set, and 20% for test set. 4. For each college, report the accuracy, specificity, sensitivity, and precision rates for the test data set in a table. Interpret your results. 5. For each college, inspect performance charts and report the area under the ROC curve (AUC). Comment on the performance of the KNN classification model based on AUC. (AUC). Comment on the performance of the classification tree based on AUC. Please scroll down the page!!! BUA 6315: Business Analytics for Decision Making 4 Part II: Unsupervised Data Mining This part will help you prepare your data for the prompts that involve unsupervised data mining in Sections 3 and 4 of your final project submission. For more information about what must be included in your final report, see the Final Project document, available in Blackboard. Step 1: Methodology Choose either the Hierarchical or K-Means clustering model, based on the insights you want to gain from the data. You will need to be able to explain the motivation for using the model you have chosen in Section 3 of your final project submission. Step 2: Analysis and Results Choose ONE of the following sets of instructions to prepare your data, depending on the model you selected in Step 1, in order to address the prompts related to data mining in Section 4 of your final project submission. Hierarchical Clustering Note that you will perform clustering for each college. K-Means Clustering Note that you will perform clustering for each college. 1. Perform agglomerative hierarchical clustering to group college applicants who are admitted and enrolled in the business and economics according to numerical 1. Perform k-means clustering togroup college applicants who are admitted and enrolled in the business and economics according to numerical BUA 6315: Business Analytics for Decision Making 5 values such as high school GPA, SAT/ACT score, college GPA, parents’ education. You need to subset the data first to include only college applicants in the business and economics who are both admitted and enrolled. 2. Use the Euclidean distance and the average linkage clustering to cluster the datainto three clusters. 3. Do you need to standardize data? Explain your reasoning. 4. Describe each cluster and write a report based on the clustering results. Here you can take the average of numerical variables and summarize your results in a table as explained in the video “Using Analytic Solver to Perform Agglomerative Clustering”. 5. Include a table that summarizes your results for each cluster. You can find a sample summary table below. values such as high school GPA, SAT/ACT score, college GPA, parents’ education using k=3. You need to subset the data first to include only college applicants in the business and economics who are both admitted and enrolled. 2. Do you need to standardize data? Explain your reasoning. 3. Describe each cluster and write a report based on the clustering results. Here you can take the average of numerical variables and summarize your results in a table as explained in the video “Using Analytic Solver to Perform Agglomerative Clustering”. 4. Include a table that summarizes your results for each cluster. You can find a sample summary table below. Clusters HSGPA SAT/ACT College GPA Mother’s Education Father’s Education Cluster 1 Average Average Average Average Average Cluster 2 Average Average Average Average Average Cluster 3 Average Average Average Average Average