Module Code: 55-600268Assignment Title: Assignment 1Individual/Group: Group

1.1 Learning Outcomes This assignment assesses your ability to:

1.2 Assessment Criteria This module will be assessed via a case study. This will involve the analysis of the data set described below. The first assignment, contributes to 40% of the final mark, and is a 10 minute group presentation that should address the question outlined below - the marks are indicated next to each question.

1.3 Submission DetailsOne group member is required to submit the power point presentation, through the “001 Group Presentation - Submission Point” on blackboard, by the given deadline.

1.4 Presentation DetailsAlthough you should answer the questions below, you should bear in mind that this scenario would in real-life be presented to the bank manager. You should therefore provide suitable output on your slides and be prepared to interpret this output in your presentation. You should have an introductory slide and a slide with some specific conclusions and another with applications to the bank. Bear-in-mind that the bank manager is unlikely to have any knowledge of data mining or statistics and will not understand or be interested in SAS code or Enterprise Miner settings. Therefore, you are required to interpret the outputs so that the bank manager can understand these. The presentation should last no more than 10 minutes, each group member is expected to present, the presentation will be stopped at 10 minutes - there will then be approximately 3 minutes of questions.

1.6 Problem OutlineFor this assignment you are required to analyse a data set concerning financial transactions and details for customers at a Czech bank.

1.7 Data Provided The final query is saved as a SAS dataset for use in Enterprise Miner. It is called czechbk15.sas7bdat. It is available on the SHU server in the path: E:SHUUsers!SharedDataRichABI2223 You will need to create a library to access the data.

1.8 Details of the Query and Resulting DataIn this assignment you will investigate if there are any groups of accounts with similar properties. Also you will build a model to predict which accounts have a second account holder attached to that account. For this purpose a subset of variables are selected from the final combination of tables for each account. These variables can be seen to represent for each account, credits and different types of withdrawals that take place:

Withdrawal (taking money out) there are two separate variables for each of the following methods of withdrawing money:

For each of these types of payments the number of payments (ending in –n) and the value of payments (ending in –t) has been recorded for a period of five years.

Finally additional information is held about each account:

Account id, Age of primary account holder, if they have a credit card or not (with this bank), number of days account open, if they have a loan or not, if there is a second user of the account and the gender of the main account holder (sex). There is one nominal variable: the frequency of their bank statements which is monthly, weekly or after transaction. This gives the set of variables as shown in the appendix. Make sure you fully understand what these variables represent - for a full list see the Appendix 1.

For this assignment we will be using only the following variables in the data set. Whilst you are working on the assignment set all the other variables to rejected and then you will not have to keep changing them.

1.9 Analysis RequiredSince the cluster analysis (which we will be carrying out in the next assignment) requires the use of fields that are as symmetrical as possible you should first investigate each of the interval fields in the data.

a) Explain what actual transformations the software has picked, were any of the interval variables not transformed - if so why do you think this is? (Hint: you may wish to include the SAS transformations table as a screen shot on your slides) (2 marks)b) Produce further plots of the transformed variables and use these to present evidence of whether the transformations have been successful. (Refer to the lower branch of Figure 1.0 for guidance on the Enterprise Miner stream you need for this). Comment clearly on your results. (Please note that it may not be possible to make all variables totally symmetrical). Consequently, state for each interval variable whether subsequent analysis should use the original (untransformed) variable or the new transformed variable. Hence list which set of interval variables you would use for clustering. (Hint: you may wish to show the plots of the original and transformed variables side by side in your slides) (8 marks)

The bank would also like to have an idea about the characteristics of their customers that have chosen to have a second account.

Now run the Decision Tree node.a) Using the tree diagram fully interpret the derived tree and discuss the Fit Statistics (Hint: include a screen shot of your Tree and Fit Statistics in your slides) (6 marks)b) If you have a customer who has the following characteristics, would they be likely to have a second account? Discuss the results in terms of practicality to the bank (Hint: you might wish to show the path followed through the decision tree in your slides).Age = ., creditn = 0.01, creditt =200, stmentn = 0.02, stmentt = 10, card = y, cardwdn = 0, cardwdt = 0, insuren = 0, insuret = 0, overdtn = 0.42, overdtt = 600, days = 800, frequency = monthly, householdn = 0, householdt = 0, othbwdn = 1000, othbwdt = 500, loanpayn = 6000, loanpayt = 98894, sex =M, cashwdn = 0, cashwdt = 0 (3 marks)c) If the bank wished to use this model, to look at the important factors that impact on a customer’s decision to have a second account, what reservations might you have? (3 marks)

2.0 PresentationYou should put all of your finds in a PowerPoint presentation. These should look clear, neat and professional and contain the correct information. The group will deliver the presentation in a 10-15 minute slot, where each group member is expected to present. The presentation should last no more than 10 minutes and there will be 3 minutes at the end for questions – further marks will be allocated for your group’s response to these questions. (5 marks) Total Marks available: 40 marks

