E032

orsci-dm数据分析与数据挖掘数据包,提供距离判别分析。dm中提供的距离判别分析模型对象,可直接调用对象进行train和test操作。这些对象包括:(1)TDistanceDiscriminant_TwoClass用于2类别的距离判别分析;(2)TDistanceDiscriminant_MultiClass(...)用于多类别距离判别分析;(3)TDistanceDiscriminant_auto能够自动计算,调用者不必关心细节;(4)TDistanceDiscriminant_step为调用者提供了更详细的分步操作和控制能力。本文档为了展示更多的算法细节,采用分步调用,以更好地展示总体过程,有利于学习“距离判别分析”。

 

#include "stdafx.h"

#include "orsciSTAT.h"
using namespace dm;


int main()
{
cout << " orsci: 距离判别分析 Distance Discriminant Analysis..." << endl;
cout << " http://www.orsci.cn" << endl;
cout << endl;

//协方差相等
mdouble A = "1.14, 1.78; 1.18, 1.96; 1.20, 1.86; 1.26, 2.00; 1.28, 2.00; 1.30, 1.96";
mdouble B = "1.24, 1.72; 1.36, 1.74; 1.38, 1.64; 1.38, 1.82; 1.38, 1.90; 1.40, 1.70; 1.48, 1.82; 1.54, 1.82; 1.56, 2.08";

rowdouble x0 = "1.24, 1.80"; //book
rowdouble x1 = "1.50, 2.10"; //book

rowdouble x2 = "1.28, 1.86"; //练习题。

mdouble covA, covInvA, covB, covInvB;
rowdouble meanA, meanB;

meanA = vmt::mean(A, 0);
meanB = vmt::mean(B, 0);

//dm::TDistanceDiscriminant_TwoClass mModel; //说明可以直接建模型,利用模型完成。下面分步展示

const double alphaLevel = 0.05;
const bool flagCovEqual = dm::test_CovMatrixEqual(A, B, covA, covInvA, covB, covInvB, alphaLevel);
cout << "两总体方差相等检验标记:" << flagCovEqual << endl;

cout << "A = " << endl;
cout << A << endl;

cout << "mean(A, 0) = " << endl;
cout << meanA << endl;

if (flagCovEqual)
{
cout << "A与B的总体协方差矩阵经检验相等!" << endl;
cout << "S = " << endl;
cout << covA << endl; //注:此时covA就存储了S的值!
cout << "SInv = " << endl;
cout << covInvA << endl;
}
else
{
cout << "A协方差矩阵:" << endl;
printf(covA, 8); //注意:如果两个总体协方差相等,则covA代表总体协方差的估计,而并非从A直接计算得到。

cout << "A协方差矩阵的逆矩阵:" << endl;
printf(covInvA, 6); //保留6位小数输出矩阵。
}

cout << "B = " << endl;
cout << B << endl;

cout << "mean(B, 0) = " << endl;
cout << meanB << endl;

if (flagCovEqual)
{
cout << "A与B的总体协方差矩阵经检验相等!" << endl;
}
else
{
cout << "B协方差矩阵:" << endl;
printf(covB, 8); //注意:如果两个总体协方差相等,则covB代表总体协方差的估计,而并非从B直接计算得到。

cout << "B协方差矩阵的逆矩阵:" << endl;
printf(covInvB, 6); //保留6位小数输出矩阵。
}

cout << endl;
if (flagCovEqual) cout << "情况1:按照协方差矩阵相等进行计算..." << endl;
else cout << "情况2:按照协方差矩阵不相等..." << endl;

//x0
const double distx0_A = dm::dist_Mahalanobis(x0, meanA, covInvA);
const double distx0_B = dm::dist_Mahalanobis(x0, meanB, covInvB);
cout << "x0 = " << endl;
cout << x0 << endl;
cout << "x0 to A = " << distx0_A << endl;
cout << "x0 to B = " << distx0_B << endl;
cout << "h(x0) = " << distx0_A * distx0_A - distx0_B * distx0_B << endl;

//x1
const double distx1_A = dm::dist_Mahalanobis(x1, meanA, covInvA);
const double distx1_B = dm::dist_Mahalanobis(x1, meanB, covInvB);
cout << "x1 = " << endl;
cout << x1 << endl;
cout << "x1 to A = " << distx1_A << endl;
cout << "x1 to B = " << distx1_B << endl;
cout << "h(x1) = " << distx1_A * distx1_A - distx1_B * distx1_B << endl;

//x2
const double distx2_A = dm::dist_Mahalanobis(x2, meanA, covInvA);
const double distx2_B = dm::dist_Mahalanobis(x2, meanB, covInvB);
cout << "x2 = " << endl;
cout << x2 << endl;
cout << "x2 to A = " << distx2_A << endl;
cout << "x2 to B = " << distx2_B << endl;
cout << "h(x2) = " << distx2_A * distx2_A - distx2_B * distx2_B << endl;

cout << endl;
cout << "press any key to stop..." << endl;
char pp;
cin >> pp;
return 0;
}

输出

(一)运行过程

orsci: 距离判别分析 Distance Discriminant Analysis...
http://www.orsci.cn

两总体方差相等检验标记:1
A =
rowCount = 6 colCount = 2
1.14 1.78
1.18 1.96
1.2 1.86
1.26 2
1.28 2
1.3 1.96

mean(A, 0) =
1.22667 1.92667
A与B的总体协方差矩阵经检验相等!
S =
rowCount = 2 colCount = 2
0.00754872 0.00664615
0.00664615 0.0133812

SInv =
rowCount = 2 colCount = 2
235.421 -116.928
-116.928 132.808

B =
rowCount = 9 colCount = 2
1.24 1.72
1.36 1.74
1.38 1.64
1.38 1.82
1.38 1.9
1.4 1.7
1.48 1.82
1.54 1.82
1.56 2.08

mean(B, 0) =
1.41333 1.80444
A与B的总体协方差矩阵经检验相等!

情况1:按照协方差矩阵相等进行计算...
x0 =
1.24 1.8
x0 to A = 1.60238
x0 to B = 2.62594
h(x0) = -4.32792
x1 =
1.5 2.1
x1 to A = 3.24022
x1 to B = 2.71647
h(x1) = 3.11983
x2 =
1.28 1.86
x2 to A = 1.44616
x2 to B = 2.51544
h(x2) = -4.23604

press any key to stop...

(二)说明:

(1)距离判别分析属于线性分类方法,简单易用,适用于连续属性的分类。

(2)dm数据分析与数据挖掘包提供了距离判别分析的多种接口,甚至提供了distanceDiscriminant_auto(...)可以让调用者不必关系算法的实现细节,直接输出结果。不过,这里为了展示算法的总体过程,采用逐步分开调用,便于学习。

(3)orsci-dm包支持数据分析和数据挖掘计算,可下载配套软件orsci应用。

书籍 姜维. 《数据分析与数据挖掘》、《数据分析与数据挖掘实践》
软件 orsci-dm开发包(C++语言、Delphi语言和C语言)。