Main Content

formatpoints

Format scorecard points and scaling

Description

example

sc= formatpoints(sc,Name,Value)修改使用op计分卡点和扩展tional name-value pair arguments. For example, use optional name-value pair arguments to change the scaling of the scores or the rounding of the points.

Examples

collapse all

This example shows how to useformatpointsto scale by providing the points, odds levels, and PDO (points to double the odds). By usingformatpointsto scale, you can put points and scores in a desired range that is more meaningful for practical purposes. Technically, this involves a linear transformation from the unscaled to the scaled points by theformatpointsfunction.

Create acreditscorecardobject using theCreditCardData.matfile to load thedata(using a dataset from Refaat 2011). Use the'IDVar'argument increditscorecardto indicate that'CustID'contains ID information and should not be included as a predictor variable.

loadCreditCardDatasc = creditscorecard(data,'IDVar','CustID');

Perform automatic binning to bin for all predictors.

sc = autobinning(sc);

Fit a linear regression model using default parameters.

sc = fitmodel(sc);
1.添加CustIncome Chi2St偏差= 1490.8527at = 32.588614, PValue = 1.1387992e-08 2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06 3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601 4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257 5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306 6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078 7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769 Generalized linear regression model: status ~ [Linear formula with 8 terms in 7 predictors] Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 0.70239 0.064001 10.975 5.0538e-28 CustAge 0.60833 0.24932 2.44 0.014687 ResStatus 1.377 0.65272 2.1097 0.034888 EmpStatus 0.88565 0.293 3.0227 0.0025055 CustIncome 0.70164 0.21844 3.2121 0.0013179 TmWBank 1.1074 0.23271 4.7589 1.9464e-06 OtherCC 1.0883 0.52912 2.0569 0.039696 AMBalance 1.045 0.32214 3.2439 0.0011792 1200 observations, 1192 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16

Display unscaled points for predictors retained in the fitting model and display the minimum and maximum possible unscaled scores.

[PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=37×3 table预测本点  ______________ ________________ _________ {'CustAge' } {'[-Inf,33)' } -0.15894 {'CustAge' } {'[33,37)' } -0.14036 {'CustAge' } {'[37,40)' } -0.060323 {'CustAge' } {'[40,46)' } 0.046408 {'CustAge' } {'[46,48)' } 0.21445 {'CustAge' } {'[48,58)' } 0.23039 {'CustAge' } {'[58,Inf]' } 0.479 {'CustAge' } {'' } NaN {'ResStatus' } {'Tenant' } -0.031252 {'ResStatus' } {'Home Owner' } 0.12696 {'ResStatus' } {'Other' } 0.37641 {'ResStatus' } {'' } NaN {'EmpStatus' } {'Unknown' } -0.076317 {'EmpStatus' } {'Employed' } 0.31449 {'EmpStatus' } {'' } NaN {'CustIncome'} {'[-Inf,29000)'} -0.45716 ⋮
MinScore = -1.3100
MaxScore = 3.0726

Scale by providing the points, odds levels, and PDO (points to double the odds). Suppose that you want a score of 500 points to have odds of 2 (twice as likely to be good than to be bad) and that the odds double every 50 points (so that 550 points would have odds of 4).

sc = formatpoints(sc,'PointsOddsAndPDO',[500 2 50]); [PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=37×3 table预测本点  ______________ ________________ ______ {'CustAge' } {'[-Inf,33)' } 52.821 {'CustAge' } {'[33,37)' } 54.161 {'CustAge' } {'[37,40)' } 59.934 {'CustAge' } {'[40,46)' } 67.633 {'CustAge' } {'[46,48)' } 79.755 {'CustAge' } {'[48,58)' } 80.905 {'CustAge' } {'[58,Inf]' } 98.838 {'CustAge' } {'' } NaN {'ResStatus' } {'Tenant' } 62.031 {'ResStatus' } {'Home Owner' } 73.444 {'ResStatus' } {'Other' } 91.438 {'ResStatus' } {'' } NaN {'EmpStatus' } {'Unknown' } 58.781 {'EmpStatus' } {'Employed' } 86.971 {'EmpStatus' } {'' } NaN {'CustIncome'} {'[-Inf,29000)'} 31.309 ⋮
MinScore = 355.5051
MaxScore = 671.6403

This example shows how to useformatpointsto scale by providing theWorstandBestscore values. By usingformatpointsto scale, you can put points and scores in a desired range that is more meaningful for practical purposes. Technically, this involves a linear transformation from the unscaled to the scaled points.

Create acreditscorecardobject using theCreditCardData.matfile to load thedata(using a dataset from Refaat 2011). Use the'IDVar'argument increditscorecardto indicate that'CustID'contains ID information and should not be included as a predictor variable.

loadCreditCardDatasc = creditscorecard(data,'IDVar','CustID');

Perform automatic binning to bin for all predictors.

sc = autobinning(sc);

Fit a linear regression model using default parameters.

sc = fitmodel(sc);
1.添加CustIncome Chi2St偏差= 1490.8527at = 32.588614, PValue = 1.1387992e-08 2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06 3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601 4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257 5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306 6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078 7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769 Generalized linear regression model: status ~ [Linear formula with 8 terms in 7 predictors] Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 0.70239 0.064001 10.975 5.0538e-28 CustAge 0.60833 0.24932 2.44 0.014687 ResStatus 1.377 0.65272 2.1097 0.034888 EmpStatus 0.88565 0.293 3.0227 0.0025055 CustIncome 0.70164 0.21844 3.2121 0.0013179 TmWBank 1.1074 0.23271 4.7589 1.9464e-06 OtherCC 1.0883 0.52912 2.0569 0.039696 AMBalance 1.045 0.32214 3.2439 0.0011792 1200 observations, 1192 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16

Display unscaled points for predictors retained in the fitting model and display the minimum and maximum possible unscaled scores.

[PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=37×3 table预测本点  ______________ ________________ _________ {'CustAge' } {'[-Inf,33)' } -0.15894 {'CustAge' } {'[33,37)' } -0.14036 {'CustAge' } {'[37,40)' } -0.060323 {'CustAge' } {'[40,46)' } 0.046408 {'CustAge' } {'[46,48)' } 0.21445 {'CustAge' } {'[48,58)' } 0.23039 {'CustAge' } {'[58,Inf]' } 0.479 {'CustAge' } {'' } NaN {'ResStatus' } {'Tenant' } -0.031252 {'ResStatus' } {'Home Owner' } 0.12696 {'ResStatus' } {'Other' } 0.37641 {'ResStatus' } {'' } NaN {'EmpStatus' } {'Unknown' } -0.076317 {'EmpStatus' } {'Employed' } 0.31449 {'EmpStatus' } {'' } NaN {'CustIncome'} {'[-Inf,29000)'} -0.45716 ⋮
MinScore = -1.3100
MaxScore = 3.0726

Scale by providing the'Worst'and'Best'score values. The range provided below is a common score range. Display the points information again to verify that they are now scaled and also display the scaled minimum and maximum scores.

sc = formatpoints(sc,'WorstAndBestScores',[300 850]); [PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=37×3 table预测本点  ______________ ________________ ______ {'CustAge' } {'[-Inf,33)' } 46.396 {'CustAge' } {'[33,37)' } 48.727 {'CustAge' } {'[37,40)' } 58.772 {'CustAge' } {'[40,46)' } 72.167 {'CustAge' } {'[46,48)' } 93.256 {'CustAge' } {'[48,58)' } 95.256 {'CustAge' } {'[58,Inf]' } 126.46 {'CustAge' } {'' } NaN {'ResStatus' } {'Tenant' } 62.421 {'ResStatus' } {'Home Owner' } 82.276 {'ResStatus' } {'Other' } 113.58 {'ResStatus' } {'' } NaN {'EmpStatus' } {'Unknown' } 56.765 {'EmpStatus' } {'Employed' } 105.81 {'EmpStatus' } {'' } NaN {'CustIncome'} {'[-Inf,29000)'} 8.9706 ⋮
MinScore = 300.0000
MaxScore = 850

As expected, the values ofMinScoreandMaxScorecorrespond to the desired worst and best scores.

This example shows how to useformatpointsto scale by providing theShiftandSlope值。通过使用formatpointsto scale, you can put points and scores in a desired range that is more meaningful for practical purposes. Technically, this involves a linear transformation from the unscaled to the scaled points by theformatpointsfunction.

Create acreditscorecardobject using theCreditCardData.matfile to load thedata(using a dataset from Refaat 2011). Use the'IDVar'argument increditscorecardto indicate that'CustID'contains ID information and should not be included as a predictor variable.

loadCreditCardDatasc = creditscorecard(data,'IDVar','CustID');

Perform automatic binning to bin for all predictors.

sc = autobinning(sc);

Fit a linear regression model using default parameters.

sc = fitmodel(sc);
1.添加CustIncome Chi2St偏差= 1490.8527at = 32.588614, PValue = 1.1387992e-08 2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06 3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601 4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257 5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306 6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078 7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769 Generalized linear regression model: status ~ [Linear formula with 8 terms in 7 predictors] Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 0.70239 0.064001 10.975 5.0538e-28 CustAge 0.60833 0.24932 2.44 0.014687 ResStatus 1.377 0.65272 2.1097 0.034888 EmpStatus 0.88565 0.293 3.0227 0.0025055 CustIncome 0.70164 0.21844 3.2121 0.0013179 TmWBank 1.1074 0.23271 4.7589 1.9464e-06 OtherCC 1.0883 0.52912 2.0569 0.039696 AMBalance 1.045 0.32214 3.2439 0.0011792 1200 observations, 1192 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16

Display unscaled points for predictors retained in the fitting model and display the minimum and maximum possible unscaled scores.

[PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=37×3 table预测本点  ______________ ________________ _________ {'CustAge' } {'[-Inf,33)' } -0.15894 {'CustAge' } {'[33,37)' } -0.14036 {'CustAge' } {'[37,40)' } -0.060323 {'CustAge' } {'[40,46)' } 0.046408 {'CustAge' } {'[46,48)' } 0.21445 {'CustAge' } {'[48,58)' } 0.23039 {'CustAge' } {'[58,Inf]' } 0.479 {'CustAge' } {'' } NaN {'ResStatus' } {'Tenant' } -0.031252 {'ResStatus' } {'Home Owner' } 0.12696 {'ResStatus' } {'Other' } 0.37641 {'ResStatus' } {'' } NaN {'EmpStatus' } {'Unknown' } -0.076317 {'EmpStatus' } {'Employed' } 0.31449 {'EmpStatus' } {'' } NaN {'CustIncome'} {'[-Inf,29000)'} -0.45716 ⋮
MinScore = -1.3100
MaxScore = 3.0726

Scale by providing the'Shift'and'Slope'值。In this example, there is an arbitrary choice of shift and slope. Display the points information again to verify that they are now scaled and also display the scaled minimum and maximum scores.

sc = formatpoints(sc,'ShiftAndSlope',[300 6]); [PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=37×3 table预测本点  ______________ ________________ ______ {'CustAge' } {'[-Inf,33)' } 41.904 {'CustAge' } {'[33,37)' } 42.015 {'CustAge' } {'[37,40)' } 42.495 {'CustAge' } {'[40,46)' } 43.136 {'CustAge' } {'[46,48)' } 44.144 {'CustAge' } {'[48,58)' } 44.239 {'CustAge' } {'[58,Inf]' } 45.731 {'CustAge' } {'' } NaN {'ResStatus' } {'Tenant' } 42.67 {'ResStatus' } {'Home Owner' } 43.619 {'ResStatus' } {'Other' } 45.116 {'ResStatus' } {'' } NaN {'EmpStatus' } {'Unknown' } 42.399 {'EmpStatus' } {'Employed' } 44.744 {'EmpStatus' } {'' } NaN {'CustIncome'} {'[-Inf,29000)'} 40.114 ⋮
MinScore = 292.1401
MaxScore = 318.4355

This example shows how to useformatpointsto separate the base points from the rest of the points assigned to each predictor variable. Theformatpoints名称-值对的观点'BasePoints'serves this purpose.

Create acreditscorecardobject using theCreditCardData.matfile to load thedata(using a dataset from Refaat 2011). Use the'IDVar'argument increditscorecardto indicate that'CustID'contains ID information and should not be included as a predictor variable.

loadCreditCardDatasc = creditscorecard(data,'IDVar','CustID');

Perform automatic binning to bin for all predictors.

sc = autobinning(sc);

Fit a linear regression model using default parameters.

sc = fitmodel(sc);
1.添加CustIncome Chi2St偏差= 1490.8527at = 32.588614, PValue = 1.1387992e-08 2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06 3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601 4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257 5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306 6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078 7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769 Generalized linear regression model: status ~ [Linear formula with 8 terms in 7 predictors] Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 0.70239 0.064001 10.975 5.0538e-28 CustAge 0.60833 0.24932 2.44 0.014687 ResStatus 1.377 0.65272 2.1097 0.034888 EmpStatus 0.88565 0.293 3.0227 0.0025055 CustIncome 0.70164 0.21844 3.2121 0.0013179 TmWBank 1.1074 0.23271 4.7589 1.9464e-06 OtherCC 1.0883 0.52912 2.0569 0.039696 AMBalance 1.045 0.32214 3.2439 0.0011792 1200 observations, 1192 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16

Display unscaled points for predictors retained in the fitting model and display the minimum and maximum possible unscaled scores.

[PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=37×3 table预测本点  ______________ ________________ _________ {'CustAge' } {'[-Inf,33)' } -0.15894 {'CustAge' } {'[33,37)' } -0.14036 {'CustAge' } {'[37,40)' } -0.060323 {'CustAge' } {'[40,46)' } 0.046408 {'CustAge' } {'[46,48)' } 0.21445 {'CustAge' } {'[48,58)' } 0.23039 {'CustAge' } {'[58,Inf]' } 0.479 {'CustAge' } {'' } NaN {'ResStatus' } {'Tenant' } -0.031252 {'ResStatus' } {'Home Owner' } 0.12696 {'ResStatus' } {'Other' } 0.37641 {'ResStatus' } {'' } NaN {'EmpStatus' } {'Unknown' } -0.076317 {'EmpStatus' } {'Employed' } 0.31449 {'EmpStatus' } {'' } NaN {'CustIncome'} {'[-Inf,29000)'} -0.45716 ⋮
MinScore = -1.3100
MaxScore = 3.0726

By setting the name-value pair argumentBasePointsto true, the points information table reports the base points separately in the first row. The minimum and maximum possible scores are not affected by this option.

sc = formatpoints(sc,'BasePoints',true); [PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=38×3 table预测本点  ______________ ______________ _________ {'BasePoints'} {'BasePoints'} 0.70239 {'CustAge' } {'[-Inf,33)' } -0.25928 {'CustAge' } {'[33,37)' } -0.24071 {'CustAge' } {'[37,40)' } -0.16066 {'CustAge' } {'[40,46)' } -0.053933 {'CustAge' } {'[46,48)' } 0.11411 {'CustAge' } {'[48,58)' } 0.13005 {'CustAge' } {'[58,Inf]' } 0.37866 {'CustAge' } {'' } NaN {'ResStatus' } {'Tenant' } -0.13159 {'ResStatus' } {'Home Owner'} 0.026616 {'ResStatus' } {'Other' } 0.27607 {'ResStatus' } {'' } NaN {'EmpStatus' } {'Unknown' } -0.17666 {'EmpStatus' } {'Employed' } 0.21415 {'EmpStatus' } {'' } NaN ⋮
MinScore = -1.3100
MaxScore = 3.0726

This example shows how to useformatpointsto round points. Rounding is usually applied after scaling, otherwise, if the points for a particular predictor are all in a small range, rounding could cause the rounded points for different bins to be the same. Also, rounding all the points may slightly change the minimum and maximum total points.

Create acreditscorecardobject using theCreditCardData.matfile to load thedata(using a dataset from Refaat 2011). Use the'IDVar'argument increditscorecardto indicate that'CustID'contains ID information and should not be included as a predictor variable.

loadCreditCardDatasc = creditscorecard(data,'IDVar','CustID');

Perform automatic binning to bin for all predictors.

sc = autobinning(sc);

Fit a linear regression model using default parameters.

sc = fitmodel(sc);
1.添加CustIncome Chi2St偏差= 1490.8527at = 32.588614, PValue = 1.1387992e-08 2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06 3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601 4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257 5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306 6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078 7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769 Generalized linear regression model: status ~ [Linear formula with 8 terms in 7 predictors] Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 0.70239 0.064001 10.975 5.0538e-28 CustAge 0.60833 0.24932 2.44 0.014687 ResStatus 1.377 0.65272 2.1097 0.034888 EmpStatus 0.88565 0.293 3.0227 0.0025055 CustIncome 0.70164 0.21844 3.2121 0.0013179 TmWBank 1.1074 0.23271 4.7589 1.9464e-06 OtherCC 1.0883 0.52912 2.0569 0.039696 AMBalance 1.045 0.32214 3.2439 0.0011792 1200 observations, 1192 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16

Display unscaled points for predictors retained in the fitting model and display the minimum and maximum possible unscaled scores.

[PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=37×3 table预测本点  ______________ ________________ _________ {'CustAge' } {'[-Inf,33)' } -0.15894 {'CustAge' } {'[33,37)' } -0.14036 {'CustAge' } {'[37,40)' } -0.060323 {'CustAge' } {'[40,46)' } 0.046408 {'CustAge' } {'[46,48)' } 0.21445 {'CustAge' } {'[48,58)' } 0.23039 {'CustAge' } {'[58,Inf]' } 0.479 {'CustAge' } {'' } NaN {'ResStatus' } {'Tenant' } -0.031252 {'ResStatus' } {'Home Owner' } 0.12696 {'ResStatus' } {'Other' } 0.37641 {'ResStatus' } {'' } NaN {'EmpStatus' } {'Unknown' } -0.076317 {'EmpStatus' } {'Employed' } 0.31449 {'EmpStatus' } {'' } NaN {'CustIncome'} {'[-Inf,29000)'} -0.45716 ⋮
MinScore = -1.3100
MaxScore = 3.0726

Scale points, and display the points information. By default, no rounding is applied.

sc = formatpoints(sc,'WorstAndBestScores',[300 850]); PointsInfo = displaypoints(sc)
PointsInfo=37×3 table预测本点  ______________ ________________ ______ {'CustAge' } {'[-Inf,33)' } 46.396 {'CustAge' } {'[33,37)' } 48.727 {'CustAge' } {'[37,40)' } 58.772 {'CustAge' } {'[40,46)' } 72.167 {'CustAge' } {'[46,48)' } 93.256 {'CustAge' } {'[48,58)' } 95.256 {'CustAge' } {'[58,Inf]' } 126.46 {'CustAge' } {'' } NaN {'ResStatus' } {'Tenant' } 62.421 {'ResStatus' } {'Home Owner' } 82.276 {'ResStatus' } {'Other' } 113.58 {'ResStatus' } {'' } NaN {'EmpStatus' } {'Unknown' } 56.765 {'EmpStatus' } {'Employed' } 105.81 {'EmpStatus' } {'' } NaN {'CustIncome'} {'[-Inf,29000)'} 8.9706 ⋮

Use the name-value pair argumentRoundto apply rounding for all points and then display the points information again.

sc = formatpoints(sc,'Round','AllPoints'); PointsInfo = displaypoints(sc)
PointsInfo=37×3 table预测本点  ______________ ________________ ______ {'CustAge' } {'[-Inf,33)' } 46 {'CustAge' } {'[33,37)' } 49 {'CustAge' } {'[37,40)' } 59 {'CustAge' } {'[40,46)' } 72 {'CustAge' } {'[46,48)' } 93 {'CustAge' } {'[48,58)' } 95 {'CustAge' } {'[58,Inf]' } 126 {'CustAge' } {'' } NaN {'ResStatus' } {'Tenant' } 62 {'ResStatus' } {'Home Owner' } 82 {'ResStatus' } {'Other' } 114 {'ResStatus' } {'' } NaN {'EmpStatus' } {'Unknown' } 57 {'EmpStatus' } {'Employed' } 106 {'EmpStatus' } {'' } NaN {'CustIncome'} {'[-Inf,29000)'} 9 ⋮

This example shows that rounding scorecard points can modify the original risk ranking of a credit scorecard. You can control rounding by usingformatpointswith the optional name-value pair argument for'Rounding'.

Credit scores rank customers by risk. If higher scores are given to better, less risky customers, then higher scores must correspond to lower default probabilities. When you use the name-value pair argument for'Rounding', depending on the value for'Rounding', the rounding behavior is:

  • When'Rounding'is set to'None'(default option), no rounding is applied to points or scores, and the risk ranking is completely consistent with the calibrated model.

  • When'Rounding'is set to'FinalScore', rounding is only applied to the final scores. In this case: a) Customers with different scores (different risk) may have the same rounded score. b) Customers with the same rounded score may have different default probabilities. c) Customer with higher rounded scores will always have lower default probability than customers with lower scores.

  • When'Rounding'is set to'AllPoints', rounding is applied to all points in the scorecard (all bins, all predictors). In this case: a) Customers with different scores (different risk) may have the same rounded score, or their ranking may even be reversed (the customer with the lower original score may have a higher rounded score). b) Customers with the same rounded score may have different default probabilities. c) Customer with higher rounded scores may in some cases havehigherdefault probabilities than customers with lower scores.

Create acreditscorecard

To demonstrate the rounding behavior, first create acreditscorecardobject.

loadCreditCardDatasc = creditscorecard(data,'IDVar','CustID','ResponseVar','status'); sc = autobinning(sc); sc = modifybins(sc,'CustIncome','CutPoints',20000:5000:60000); sc = fitmodel(sc);
1.Adding CustIncome, Deviance = 1487.9719, Chi2Stat = 35.469392, PValue = 2.5909009e-09 2. Adding TmWBank, Deviance = 1465.7998, Chi2Stat = 22.172089, PValue = 2.4927133e-06 3. Adding AMBalance, Deviance = 1455.206, Chi2Stat = 10.593833, PValue = 0.0011346548 4. Adding EmpStatus, Deviance = 1446.3918, Chi2Stat = 8.8142314, PValue = 0.0029889009 5. Adding CustAge, Deviance = 1440.6825, Chi2Stat = 5.709236, PValue = 0.016875883 6. Adding ResStatus, Deviance = 1436.1363, Chi2Stat = 4.5462043, PValue = 0.032991806 7. Adding OtherCC, Deviance = 1431.9546, Chi2Stat = 4.1817827, PValue = 0.040860699 Generalized linear regression model: status ~ [Linear formula with 8 terms in 7 predictors] Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 0.70247 0.064046 10.968 5.4345e-28 CustAge 0.60579 0.24405 2.4822 0.013058 ResStatus 1.4463 0.65427 2.2105 0.02707 EmpStatus 0.90501 0.29262 3.0928 0.0019828 CustIncome 0.70869 0.20535 3.4512 0.00055815 TmWBank 1.0839 0.23244 4.6631 3.1145e-06 OtherCC 1.0906 0.52936 2.0602 0.039377 AMBalance 1.0148 0.32273 3.1445 0.0016636 1200 observations, 1192 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 91.5, p-value = 6.12e-17

Apply the'Rounding'Options

Apply each of the three'Rounding'options to thecreditscorecardobject.

sc = formatpoints(sc,'PointsOddsAndPDO',[500 2 50]);% No Rounding虽然ts1 = displaypoints(sc); [S1,P1] = score(sc); defProb1 = probdefault(sc); sc = formatpoints(sc,'Round','AllPoints');% 'AllPoints' Rounding虽然ts2 = displaypoints(sc); [S2,P2] = score(sc); defProb2 = probdefault(sc); sc = formatpoints(sc,'Round','FinalScore');% 'FinalScore' Rounding虽然ts3 = displaypoints(sc); [S3,P3] = score(sc); defProb3 = probdefault(sc);

Compare the'Rounding'Options

Visualize the default probabilities versus the scores.

figure holdonscatter(S1, defProb1,'g*') scatter(S2, defProb2,'ro') scatter(S3, defProb3,'b+') legend('No Rounding','AllPoints','FinalScore') axis([388 394 0.695 0.705]) xlabel('Credit score') ylabel('Default probability') title('Default probabilities and Credit scores') grid

Figure contains an axes object. The axes object with title Default probabilities and Credit scores contains 3 objects of type scatter. These objects represent No Rounding, AllPoints, FinalScore.

Inspect the points and total scores for each'Rounding'option, in table format.

ind = [208 363 694 886]; ProbDefault = defProb1(ind)
ProbDefault =4×10.6997 0.6989 0.6982 0.6972
% ScoreNoRounding = S1(ind)PointsNoRounding = P1(ind,:); PointsNoRounding.Total = S1(ind)
PointsNoRounding=4×8 tableCustAge ResStatus EmpStatus CustIncome TmWBank另外一些erCC AMBalance Total _______ _________ _________ __________ _______ _______ _________ ______ 52.9 61.555 58.503 24.647 51.551 50.416 89.4 388.97 67.65 61.555 58.503 24.647 51.551 75.723 49.64 389.27 54.234 61.555 58.503 24.647 51.551 75.723 63.271 389.48 52.9 92.441 58.503 24.647 61.277 50.416 49.64 389.82
% ScoreAllPoints = S2(ind)PointsAllPoints = P2(ind,:); PointsAllPoints.Total = S2(ind)
PointsAllPoints=4×8 tableCustAge ResStatus EmpStatus CustIncome TmWBank另外一些erCC AMBalance Total _______ _________ _________ __________ _______ _______ _________ _____ 53 62 59 25 52 50 89 390 68 62 59 25 52 76 50 392 54 62 59 25 52 76 63 391 53 92 59 25 61 50 50 390
% ScoreFinalScore = S3(ind): PointsFinalScore = P3(印第安纳州);PointsFinalScore。合计al = S3(ind)
PointsFinalScore=4×8 tableCustAge ResStatus EmpStatus CustIncome TmWBank另外一些erCC AMBalance Total _______ _________ _________ __________ _______ _______ _________ _____ 52.9 61.555 58.503 24.647 51.551 50.416 89.4 389 67.65 61.555 58.503 24.647 51.551 75.723 49.64 389 54.234 61.555 58.503 24.647 51.551 75.723 63.271 389 52.9 92.441 58.503 24.647 61.277 50.416 49.64 390

The originalcreditscorecardmodel, without rounding, was calibrated to the data with a logistic regression. The ranking and probabilities have a statistical foundation.

Rounding, however, effectively modifies thecreditscorecardmodel. When only the final score is rounded, this leads to some "ties" in rounded scores, but at least the risk rankingacrossscores is preserved (because ifs1<=s2, then round(s1) <= round(s2)).

However, when you round all points, a score may gain extra points by chance. For example, in the second row in the table (row 363 of original data), the points for all predictors are rounded up by almost0.5. The original score is389.27. Rounding the final score makes it389. However, rounding all points makes it392, that is three points higher than rounding the final score.

This example shows how to useformatpointsto score missing or out-of-range data. When data is scored, some observations can be either missing (NaN, orundefined) or out of range. You will need to decide whether or not points are assigned to these cases. Use the name-value pair argumentMissingto do so.

Create acreditscorecardobject using theCreditCardData.matfile to load the data (using a dataset from Refaat 2011). Use the'IDVar'argument increditscorecardto indicate that'CustID'contains ID information and should not be included as a predictor variable.

loadCreditCardDatasc = creditscorecard(data,'IDVar','CustID');

Perform automatic binning to bin for all predictors.

sc = autobinning(sc);

Indicate that the minimum allowed value for'CustAge'is zero. This makes any negative values for age invalid or out-of-range.

sc = modifybins(sc,'CustAge','MinValue',0);

Fit a linear regression model using default parameters.

sc = fitmodel(sc);
1.添加CustIncome Chi2St偏差= 1490.8527at = 32.588614, PValue = 1.1387992e-08 2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06 3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601 4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257 5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306 6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078 7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769 Generalized linear regression model: status ~ [Linear formula with 8 terms in 7 predictors] Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 0.70239 0.064001 10.975 5.0538e-28 CustAge 0.60833 0.24932 2.44 0.014687 ResStatus 1.377 0.65272 2.1097 0.034888 EmpStatus 0.88565 0.293 3.0227 0.0025055 CustIncome 0.70164 0.21844 3.2121 0.0013179 TmWBank 1.1074 0.23271 4.7589 1.9464e-06 OtherCC 1.0883 0.52912 2.0569 0.039696 AMBalance 1.045 0.32214 3.2439 0.0011792 1200 observations, 1192 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16

Suppose there are missing or out of range observations in the data that you want to score. Notice that by default, the points and score assigned to the missing value isNaN.

% Set up a data set with missing and out of range data for illustration purposesnewdata = data(1:5,:); newdata.CustAge(1) = NaN;% missingnewdata.CustAge(2) = -100;% invalidnewdata.ResStatus(3) ='';% missingnewdata.ResStatus(4) ='House';% invaliddisp(newdata)
CustID CustAge TmAtAddress ResStatus EmpStatus CustIncome TmWBank OtherCC AMBalance UtilRate status ______ _______ ___________ ___________ _________ __________ _______ _______ _________ ________ ______ 1 NaN 62 Tenant Unknown 50000 55 Yes 1055.9 0.22 0 2 -100 22 Home Owner Employed 52000 25 Yes 1161.6 0.24 0 3 47 30  Employed 37000 61 No 877.23 0.29 0 4 50 75 House Employed 53000 20 Yes 157.37 0.08 0 5 68 56 Home Owner Employed 53000 14 Yes 561.84 0.11 0
[Scores,Points] = score(sc,newdata); disp(Scores)
NaN NaN NaN NaN 1.4535
disp(Points)
CustAge ResStatus EmpStatus CustIncome TmWBank另外一些erCC AMBalance _______ _________ _________ __________ _________ ________ _________ NaN -0.031252 -0.076317 0.43693 0.39607 0.15842 -0.017472 NaN 0.12696 0.31449 0.43693 -0.033752 0.15842 -0.017472 0.21445 NaN 0.31449 0.081611 0.39607 -0.19168 -0.017472 0.23039 NaN 0.31449 0.43693 -0.044811 0.15842 0.35551 0.479 0.12696 0.31449 0.43693 -0.044811 0.15842 -0.017472

Use the name-value pair argumentMissingto replaceNaNwith points corresponding to a zero Weight-of-Evidence (WOE).

sc = formatpoints(sc,'Missing','ZeroWOE'); [Scores,Points] = score(sc,newdata); disp(Scores)
0.9667 1.0859 0.8978 1.5513 1.4535
disp(Points)
CustAge ResStatus EmpStatus CustIncome TmWBank另外一些erCC AMBalance _______ _________ _________ __________ _________ ________ _________ 0.10034 -0.031252 -0.076317 0.43693 0.39607 0.15842 -0.017472 0.10034 0.12696 0.31449 0.43693 -0.033752 0.15842 -0.017472 0.21445 0.10034 0.31449 0.081611 0.39607 -0.19168 -0.017472 0.23039 0.10034 0.31449 0.43693 -0.044811 0.15842 0.35551 0.479 0.12696 0.31449 0.43693 -0.044811 0.15842 -0.017472

Alternatively, use the name-value pair argumentMissingto replace the missing value with the minimum points for the predictors that have the missing values.

sc = formatpoints(sc,'Missing','MinPoints'); [Scores,Points] = score(sc,newdata); disp(Scores)
0.7074 0.8266 0.7662 1.4197 1.4535
disp(Points)
CustAge ResStatus EmpStatus CustIncome TmWBank另外一些erCC AMBalance ________ _________ _________ __________ _________ ________ _________ -0.15894 -0.031252 -0.076317 0.43693 0.39607 0.15842 -0.017472 -0.15894 0.12696 0.31449 0.43693 -0.033752 0.15842 -0.017472 0.21445 -0.031252 0.31449 0.081611 0.39607 -0.19168 -0.017472 0.23039 -0.031252 0.31449 0.43693 -0.044811 0.15842 0.35551 0.479 0.12696 0.31449 0.43693 -0.044811 0.15842 -0.017472

As a third alternative, use the name-value pair argumentMissingto replace the missing value with the maximum points for the predictors that have the missing values.

sc = formatpoints(sc,'Missing','MaxPoints'); [Scores,Points] = score(sc,newdata); disp(Scores)
1.3454 1.4646 1.1739 1.8273 1.4535
disp(Points)
CustAge ResStatus EmpStatus CustIncome TmWBank另外一些erCC AMBalance _______ _________ _________ __________ _________ ________ _________ 0.479 -0.031252 -0.076317 0.43693 0.39607 0.15842 -0.017472 0.479 0.12696 0.31449 0.43693 -0.033752 0.15842 -0.017472 0.21445 0.37641 0.31449 0.081611 0.39607 -0.19168 -0.017472 0.23039 0.37641 0.31449 0.43693 -0.044811 0.15842 0.35551 0.479 0.12696 0.31449 0.43693 -0.044811 0.15842 -0.017472

Verify that the minimum and maximum points assigned to the missing data correspond to the minimum and maximum points for the corresponding predictors. The points for'CustAge'are reported in the first seven rows of the points information table. For'ResStatus'the points are in rows 8 through 10.

PointsInfo = displaypoints(sc); PointsInfo(1:7,:)
ans=7×3 tablePredictors Bin Points ___________ ____________ _________ {'CustAge'} {'[0,33)' } -0.15894 {'CustAge'} {'[33,37)' } -0.14036 {'CustAge'} {'[37,40)' } -0.060323 {'CustAge'} {'[40,46)' } 0.046408 {'CustAge'} {'[46,48)' } 0.21445 {'CustAge'} {'[48,58)' } 0.23039 {'CustAge'} {'[58,Inf]'} 0.479
min(PointsInfo.Points(1:7))
ans = -0.1589
max(PointsInfo.Points(1:7))
ans = 0.4790
PointsInfo(8:10,:)
ans=3×3 tablePredictors Bin Points _____________ ______________ _________ {'CustAge' } {'' } 0.479 {'ResStatus'} {'Tenant' } -0.031252 {'ResStatus'} {'Home Owner'} 0.12696
min(PointsInfo.Points(8:10))
ans = -0.0313
max(PointsInfo.Points(8:10))
ans = 0.4790

This example describes the assignment of points for missing data when the'BinMissingData'option is set totrue.

  • Predictors that have missing data in the training set have an explicit bin forwith corresponding points in the final scorecard. These points are computed from the Weight-of-Evidence (WOE) value for thebin and the logistic model coefficients. For scoring purposes, these points are assigned to missing values and to out-of-range values.

  • Predictors with no missing data in the training set have nobin, therefore no WOE can be estimated from the training data. By default, the points for missing and out-of-range values are set toNaN, and this leads to a score ofNaNwhen runningscore. For predictors that have no explicitbin, use the name-value argument'Missing'informatpointsto indicate how missing data should be treated for scoring purposes.

Create acreditscorecardobject using theCreditCardData.matfile to load thedataMissingwith missing values.

loadCreditCardData.mathead(dataMissing,5)
CustID CustAge TmAtAddress ResStatus EmpStatus CustIncome TmWBank OtherCC AMBalance UtilRate status ______ _______ ___________ ___________ _________ __________ _______ _______ _________ ________ ______ 1 53 62  Unknown 50000 55 Yes 1055.9 0.22 0 2 61 22 Home Owner Employed 52000 25 Yes 1161.6 0.24 0 3 47 30 Tenant Employed 37000 61 No 877.23 0.29 0 4 NaN 75 Home Owner Employed 53000 20 Yes 157.37 0.08 0 5 68 56 Home Owner Employed 53000 14 Yes 561.84 0.11 0
fprintf('Number of rows: %d\n',height(dataMissing))
Number of rows: 1200
fprintf('Number of missing values CustAge: %d\n',sum(ismissing(dataMissing.CustAge)))
Number of missing values CustAge: 30
fprintf('Number of missing values ResStatus: %d\n',sum(ismissing(dataMissing.ResStatus)))
Number of missing values ResStatus: 40

Usecreditscorecardwith the name-value argument'BinMissingData'set totrue本失踪的数字或分类数据a separate bin. Apply automatic binning.

sc = creditscorecard(dataMissing,'IDVar','CustID','BinMissingData',true); sc = autobinning(sc); disp(sc)
creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: '' VarNames: {1x11 cell} NumericPredictors: {1x6 cell} CategoricalPredictors: {'ResStatus' 'EmpStatus' 'OtherCC'} BinMissingData: 1 IDVar: 'CustID' PredictorVars: {1x9 cell} Data: [1200x11 table]

Set a minimum value of zero forCustAgeandCustIncome. With this, any negative age or income information becomes invalid or "out-of-range". For scoring purposes, out-of-range values are given the same points as missing values.

sc = modifybins(sc,'CustAge','MinValue',0); sc = modifybins(sc,'CustIncome','MinValue',0);

Display and plot bin information for numeric data for'CustAge'that includes missing data in a separate bin labelled.

[bi,cp] = bininfo(sc,'CustAge'); disp(bi)
Bin Good Bad Odds WOE InfoValue _____________ ____ ___ ______ ________ __________ {'[0,33)' } 69 52 1.3269 -0.42156 0.018993 {'[33,37)' } 63 45 1.4 -0.36795 0.012839 {'[37,40)' } 72 47 1.5319 -0.2779 0.0079824 {'[40,46)' } 172 89 1.9326 -0.04556 0.0004549 {'[46,48)' } 59 25 2.36 0.15424 0.0016199 {'[48,51)' } 99 41 2.4146 0.17713 0.0035449 {'[51,58)' } 157 62 2.5323 0.22469 0.0088407 {'[58,Inf]' } 93 25 3.72 0.60931 0.032198 {''} 19 11 1.7273 -0.15787 0.00063885 {'Totals' } 803 397 2.0227 NaN 0.087112
plotbins(sc,'CustAge')

{

Display and plot bin information for categorical data for'ResStatus'that includes missing data in a separate bin labelled.

[bi,cg] = bininfo(sc,'ResStatus'); disp(bi)
Bin Good Bad Odds WOE InfoValue ______________ ____ ___ ______ _________ __________ {'Tenant' } 296 161 1.8385 -0.095463 0.0035249 {'Home Owner'} 352 171 2.0585 0.017549 0.00013382 {'Other' } 128 52 2.4615 0.19637 0.0055808 {'' } 27 13 2.0769 0.026469 2.3248e-05 {'Totals' } 803 397 2.0227 NaN 0.0092627
plotbins(sc,'ResStatus')

{

For the'CustAge'and'ResStatus'predictors, there is missing data (NaNs and) in the training data, and the binning process estimates a WOE value of-0.15787and0.026469respectively for missing data in these predictors, as shown above.

ForEmpStatusandCustIncomethere is no explicit bin for missing values because the training data has no missing values for these predictors.

bi = bininfo(sc,'EmpStatus'); disp(bi)
Bin Good Bad Odds WOE InfoValue ____________ ____ ___ ______ ________ _________ {'Unknown' } 396 239 1.6569 -0.19947 0.021715 {'Employed'} 407 158 2.5759 0.2418 0.026323 {'Totals' } 803 397 2.0227 NaN 0.048038
bi = bininfo(sc,'CustIncome'); disp(bi)
Bin Good Bad Odds WOE InfoValue _________________ ____ ___ _______ _________ __________ {'[0,29000)' } 53 58 0.91379 -0.79457 0.06364 {'[29000,33000)'} 74 49 1.5102 -0.29217 0.0091366 {'[33000,35000)'} 68 36 1.8889 -0.06843 0.00041042 {'[35000,40000)'} 193 98 1.9694 -0.026696 0.00017359 {'[40000,42000)'} 68 34 2 -0.011271 1.0819e-05 {'[42000,47000)'} 164 66 2.4848 0.20579 0.0078175 {'[47000,Inf]' } 183 56 3.2679 0.47972 0.041657 {'Totals' } 803 397 2.0227 NaN 0.12285

Usefitmodelto fit a logistic regression model using Weight of Evidence (WOE) data.fitmodelinternally transforms all the predictor variables into WOE values, using the bins found with the automatic binning process.fitmodelthen fits a logistic regression model using a stepwise method (by default). For predictors that have missing data, there is an explicitbin, with a corresponding WOE value computed from the data. When usingfitmodel, the corresponding WOE value for thebin is applied when performing the WOE transformation.

[sc,mdl] = fitmodel(sc);
1.添加CustIncome Chi2St偏差= 1490.8527at = 32.588614, PValue = 1.1387992e-08 2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06 3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601 4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257 5. Adding CustAge, Deviance = 1442.8477, Chi2Stat = 4.4974731, PValue = 0.033944979 6. Adding ResStatus, Deviance = 1438.9783, Chi2Stat = 3.86941, PValue = 0.049173805 7. Adding OtherCC, Deviance = 1434.9751, Chi2Stat = 4.0031966, PValue = 0.045414057 Generalized linear regression model: status ~ [Linear formula with 8 terms in 7 predictors] Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 0.70229 0.063959 10.98 4.7498e-28 CustAge 0.57421 0.25708 2.2335 0.025513 ResStatus 1.3629 0.66952 2.0356 0.04179 EmpStatus 0.88373 0.2929 3.0172 0.002551 CustIncome 0.73535 0.2159 3.406 0.00065929 TmWBank 1.1065 0.23267 4.7556 1.9783e-06 OtherCC 1.0648 0.52826 2.0156 0.043841 AMBalance 1.0446 0.32197 3.2443 0.0011775 1200 observations, 1192 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 88.5, p-value = 2.55e-16

Scale the scorecard points by the "points, odds, and points to double the odds (PDO)" method using the'PointsOddsAndPDO'argument offormatpoints. Suppose that you want a score of 500 points to have odds of 2 (twice as likely to be good than to be bad) and that the odds double every 50 points (so that 550 points would have odds of 4).

Display the scorecard showing the scaled points for predictors retained in the fitting model.

sc = formatpoints(sc,'PointsOddsAndPDO',[500 2 50]); PointsInfo = displaypoints(sc)
PointsInfo=38×3 tablePredictors Bin Points _____________ ______________ ______ {'CustAge' } {'[0,33)' } 54.062 {'CustAge' } {'[33,37)' } 56.282 {'CustAge' } {'[37,40)' } 60.012 {'CustAge' } {'[40,46)' } 69.636 {'CustAge' } {'[46,48)' } 77.912 {'CustAge' } {'[48,51)' } 78.86 {'CustAge' } {'[51,58)' } 80.83 {'CustAge' } {'[58,Inf]' } 96.76 {'CustAge' } {'' } 64.984 {'ResStatus'} {'Tenant' } 62.138 {'ResStatus'} {'Home Owner'} 73.248 {'ResStatus'} {'Other' } 90.828 {'ResStatus'} {'' } 74.125 {'EmpStatus'} {'Unknown' } 58.807 {'EmpStatus'} {'Employed' } 86.937 {'EmpStatus'} {'' } NaN ⋮

Notice that points for thebin forCustAgeandResStatusare explicitly shown (as64.9836and74.1250, respectively). These points are computed from the WOE value for thebin, and the logistic model coefficients.

For predictors that have no missing data in the training set, there is no explicitbin. By default the points are set toNaNfor missing data and they lead to a score ofNaNwhen runningscore. For predictors that have no explicitbin, use the name-value argument'Missing'informatpointsto indicate how missing data should be treated for scoring purposes.

For the purpose of illustration, take a few rows from the original data as test data and introduce some missing data. Also introduce some invalid, or out-of-range values. For numeric data, values below the minimum (or above the maximum) allowed are considered invalid, such as a negative value for age (recall'MinValue'was earlier set to0forCustAgeandCustIncome). For categorical data, invalid values are categories not explicitly included in the scorecard, for example, a residential status not previously mapped to scorecard categories, such as "House", or a meaningless string such as "abc123".

tdata = dataMissing(11:18,mdl.PredictorNames);% Keep only the predictors retained in the model% Set some missing valuestdata.CustAge(1) = NaN; tdata.ResStatus(2) =''; tdata.EmpStatus(3) =''; tdata.CustIncome(4) = NaN;% Set some invalid valuestdata.CustAge(5) = -100; tdata.ResStatus(6) ='House'; tdata.EmpStatus(7) ='Freelancer'; tdata.CustIncome(8) = -1; disp(tdata)
CustAge ResStatus EmpStatus CustIncome TmWBank另外一些erCC AMBalance _______ ___________ ___________ __________ _______ _______ _________ NaN Tenant Unknown 34000 44 Yes 119.8 48  Unknown 44000 14 Yes 403.62 65 Home Owner  48000 6 No 111.88 44 Other Unknown NaN 35 No 436.41 -100 Other Employed 46000 16 Yes 162.21 33 House Employed 36000 36 Yes 845.02 39 Tenant Freelancer 34000 40 Yes 756.26 24 Home Owner Employed -1 19 Yes 449.61

Score the new data and see how points are assigned for missingCustAgeandResStatus, because we have an explicit bin with points for. However, forEmpStatusandCustIncomethescorefunction sets the points toNaN.

[Scores,Points] = score(sc,tdata); disp(Scores)
481.2231 520.8353 NaN NaN 551.7922 487.9588 NaN NaN
disp(Points)
CustAge ResStatus EmpStatus CustIncome TmWBank另外一些erCC AMBalance _______ _________ _________ __________ _______ _______ _________ 64.984 62.138 58.807 67.893 61.858 75.622 89.922 78.86 74.125 58.807 82.439 61.061 75.622 89.922 96.76 73.248 NaN 96.969 51.132 50.914 89.922 69.636 90.828 58.807 NaN 61.858 50.914 89.922 64.984 90.828 86.937 82.439 61.061 75.622 89.922 56.282 74.125 86.937 70.107 61.858 75.622 63.028 60.012 62.138 NaN 67.893 61.858 75.622 63.028 54.062 73.248 86.937 NaN 61.061 75.622 89.922

Use the name-value argument'Missing'informatpointsto choose how to assign points to missing values for predictors that do not have an explicitbin. In this example, use the'MinPoints'option for the'Missing'argument. The minimum points forEmpStatusin the scorecard displayed above are58.8072, and forCustIncomethe minimum points are29.3753.

sc = formatpoints(sc,'Missing','MinPoints'); [Scores,Points] = score(sc,tdata); disp(Scores)
481.2231 520.8353 517.7532 451.3405 551.7922 487.9588 449.3577 470.2267
disp(Points)
CustAge ResStatus EmpStatus CustIncome TmWBank另外一些erCC AMBalance _______ _________ _________ __________ _______ _______ _________ 64.984 62.138 58.807 67.893 61.858 75.622 89.922 78.86 74.125 58.807 82.439 61.061 75.622 89.922 96.76 73.248 58.807 96.969 51.132 50.914 89.922 69.636 90.828 58.807 29.375 61.858 50.914 89.922 64.984 90.828 86.937 82.439 61.061 75.622 89.922 56.282 74.125 86.937 70.107 61.858 75.622 63.028 60.012 62.138 58.807 67.893 61.858 75.622 63.028 54.062 73.248 86.937 29.375 61.061 75.622 89.922

Input Arguments

collapse all

Credit scorecard model, specified as acreditscorecardobject. Usecreditscorecardto create acreditscorecardobject.

Name-Value Arguments

Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN, whereNameis the argument name andValueis the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and encloseNamein quotes.

Example:sc = formatpoints(sc,'BasePoints',true,'Round','AllPoints','WorstAndBestScores',[100, 700])

Note

ShiftAndSlope,PointsOddsAndPDO, andWorstAndBestScoresare scaling methods and you can use only one of these name-value pair arguments at one time. The other three name-value pair arguments (BasePoints,Missing, andRound) are not scaling methods and can be used together or with any one of the three scaling methods.

Indicator for separating base points, specified as the comma-separated pair consisting of'BasePoints'and a logical scalar. Iftrue, the scorecard explicitly separates base points. Iffalse, the base points are spread across all variables in thecreditscorecardobject.

Data Types:char

Indicator for points assigned to missing or out-of-range information when scoring, specified as the comma-separated pair consisting of'Missing'and a character vector with a value forNoScore,ZeroWOE,MinPoints, orMaxPoints, where:

  • NoScore— Missing and out-of-range data do not get points assigned and points are set toNaN. Also, the total score is set toNaN.

  • ZeroWOE——失踪或out-of-range data get assigned a zero Weight-of-Evidence (WOE) value.

  • MinPoints——失踪或out-of-range data get the minimum possible points for that predictor. This penalizes the score if higher scores are better.

  • MaxPoints——失踪或out-of-range data get the maximum possible points for that predictor. This penalizes the score if lower scores are better.

    Note

    When using thecreditscorecardname-value argument'BinMissingData'with a value oftrue, missing data for numeric and categorical predictors is binned in a separate bin labeled. Thebin only contains missing values for a predictor and does not contain invalid or out-of-range values for a predictor.

Data Types:char

Indicator whether to round points or scores, specified as the comma-separated pair consisting of'Round'and a character vector with values'AllPoints','FinalScore'or'None', where:

  • None— No rounding is applied.

  • AllPoints— Apply rounding to each predictor's points before adding up the total score.

  • FinalScore— Round the final score only (rounding is applied after all points are added up).

For more information and an example of using the'Round'名称-值对的观点, seeRounding and Default Probabilities.

Data Types:char

Indicator for shift and slope scaling parameters for the credit scorecard, specified as the comma-separated pair consisting of'ShiftAndSlope'and a numeric array with two elements[Shift, Slope].Slopecannot be zero. TheShiftAndSlopevalues are used scale the scoring model.

Note

ShiftAndSlope,PointsOddsAndPDO, andWorstAndBestScoresare scaling methods and you can use only one of these name-value pair arguments at one time. The other three name-value pair arguments (BasePoints,Missing, andRound) are not scaling methods and can be used together or with any one of the three scaling methods.

To remove a previous scaling and revert to unscaled scores, setShiftAndSlopeto[0,1].

Data Types:double

Indicator for target points (Points) for a given odds level (Odds) and the desired number of points to double the odds (PDO), specified as the comma-separated pair consisting of'PointsOddsAndPDO'and a numeric array with three elements[Points,Odds,PDO].Oddsmust be a positive number. ThePointsOddsAndPDOvalues are used to find scaling parameters for the scoring model.

Note

The points to double the odds (PDO) may be positive or negative, depending on whether higher scores mean lower risk, or vice versa.

ShiftAndSlope,PointsOddsAndPDO, andWorstAndBestScoresare scaling methods and you can use only one of these name-value pair arguments at one time. The other three name-value pair arguments (BasePoints,Missing, andRound) are not scaling methods and can be used together or with any one of the three scaling methods.

To remove a previous scaling and revert to unscaled scores, setShiftAndSlopeto[0,1].

Data Types:double

Indicator for worst (highest risk) and best (lowest risk) scores in the scorecard, specified as the comma-separated pair consisting of'WorstAndBestScores'and a numeric array with two elements[WorstScore,BestScore].WorstScoreandBestScoremust be different values. TheseWorstAndBestScoresvalues are used to find scaling parameters for the scoring model.

Note

WorstScoremeans the riskiest score, and its value could be lower or higher than the ‘best’ score. In other words, the ‘minimum’ score may be the ‘worst‘ score or the 'best' score, depending on the desired scoring scale.

ShiftAndSlope,PointsOddsAndPDO, andWorstAndBestScoresare scaling methods and you can use only one of these name-value pair arguments at one time. The other three name-value pair arguments (BasePoints,Missing, andRound) are not scaling methods and can be used together or with any one of the three scaling methods.

To remove a previous scaling and revert to unscaled scores, setShiftAndSlopeto[0,1].

Data Types:double

Output Arguments

collapse all

Credit scorecard model returned as an updatedcreditscorecardobject. For more information on using thecreditscorecardobject, seecreditscorecard.

Algorithms

The score of an individualiis given by the formula

Score(i) = Shift + Slope*(b0 + b1*WOE1(i) + b2*WOE2(i)+ ... +bp*WOEp(i))

wherebjis the coefficient of thejth variable in the model, and WOEj(i) is the Weight of Evidence (WOE) value for theith individual corresponding to thejth model variable.ShiftandSlopeare scaling constants further discussed below. The scaling constant can be controlled withformatpoints.

If the data for individualiis in thei-th row of a given dataset, to compute a score, the data(i,j) is binned using existing binning maps, and converted into a corresponding Weight of Evidence valueWOEj(i). Using the model coefficients, the unscaled score is computed as

s = b0 + b1*WOE1(i) + ... +bp*WOEp(i).

For simplicity, assume in the description above that thej-th variable in the model is thej-th column in the data input, although, in general, the order of variables in a given dataset does not have to match the order of variables in the model, and the dataset could have additional variables that are not used in the model.

The formatting options can be controlled usingformatpoints. When the base points are reported separately (see theformatpointsparameterBasePoints), the base points are given by

Base Points = Shift + Slope*b0,
and the points for thej-th predictor,ith行给出的
Points_ji = Slope*(bj*WOEj(i))).

By default, the base points are not reported separately, in which case

Points_ji = (Shift + Slope*b0)/p + Slope*(bj*WOEj(i)),
wherepis the number of predictors in the scorecard model.

By default, no rounding is applied to the points by thescorefunction (RoundisNone). IfRoundis set toAllPointsusingformatpoints, then the points for individualifor variablejare given by

虽然ts if rounding is 'AllPoints': round( Points_ji )
and, if base points are reported separately, the are also rounded. This yields integer-valued points per predictor, hence also integer-valued scores. IfRoundis set toFinalScoreusingformatpoints, then the points per predictor are not rounded, and only the final score is rounded
score if rounding is 'FinalScore': round(Score(i)).

Regarding the scaling parameters, theShift参数,Slopeparameter can be set directly with theShiftAndSlopeparameter offormatpoints. Alternatively, you can use theformatpointsparameter forWorstAndBestScores. In this case, the parametersShiftandSlopeare found internally by solving the system

Shift + Slope*smin = WorstScore, Shift + Slope*smax = BestScore,
whereWorstScoreandBestScoreare the first and second elements in theformatpointsparameter forWorstAndBestScoresandsminandsmaxare the minimum and maximum possible unscaled scores:
smin = b0 + min(b1*WOE1) + ... +min(bp*WOEp), smax = b0 + max(b1*WOE1) + ... +max(bp*WOEp).

A third alternative to scale scores is thePointsOddsAndPDOparameter informatpoints. In this case, assume that the unscaled scoresgives the log-odds for a row, and theShiftandSlopeparameters are found by solving the following system

Points = Shift + Slope*log(Odds) Points + PDO = Shift + Slope*log(2*Odds)
wherePoints,Odds, andPDO("points to double the odds") are the first, second, and third elements in thePointsOddsAndPDOparameter.

Whenever a given dataset has a missing or out-of-range value data (i,j), the points for predictorj, for individuali, are set toNaNby default, which results in a missing score for that row (aNaNscore). Using theMissingparameter forformatpoints, you can modify this behavior and set the corresponding Weight-of-Evidence (WOE) value to zero, or set the points to the minimum points, or the maximum points for that predictor.

References

[1] Anderson, R.The Credit Scoring Toolkit.Oxford University Press, 2007.

[2] Refaat, M.Credit Risk Scorecards: Development and Implementation Using SAS.lulu.com, 2011.

Version History

Introduced in R2014b

Baidu
map