DMX- SQL SERVER 数据挖掘简介二
SQL SERVER 提供下列SCHEMA查询:
DMSCHEMA_MINING_SERVICES,
DMSCHEMA_MINING_SERVICE_PARAMETERS,
DMSCHEMA_MINING_MODELS, DMSCHEMA_MINING_COLUMNS,
DMSCHEMA_MINING_MODEL_CONTENT,
DMSCHEMA_MINING_FUNCTIONS,
DMSCHEMA_MINING_STRUCTURES,
DMSCHEMA_MINING_STRUCTURE_COLUMNS,
DMSCHEMA_MINING_MODEL_XML,
DMSCHEMA_MINING_MODEL_PMML
通过这些查询,可以知道,SQL SERVER 提供9种算法,
service_name
Microsoft_Association_Rules
Microsoft_Clustering
Microsoft_Decision_Trees
Microsoft_Naive_Bayes
Microsoft_Neural_Network
Microsoft_Sequence_Clustering
Microsoft_Time_Series
Microsoft_Linear_Regression
Microsoft_Logistic_Regression
实际上,是7种,LINEAR REGRESSION是DECISION TREES的变种,LOGISTIC REGRESSION是NEURAL NETWORK的变种。
同时, 也可以了解每种算法支持的数据类型,比如:
service_name supported_input_content_types
Microsoft_Association_Rules Cyclical,Discrete,Discretized,Key,Table,Ordered
就不支持连续数据。
当然, 也包括每种算法支持的函数,例如,NAIVE_BAYES包括下列函数:
function_name
Predict
Predict
PredictAdjustedProbability
PredictAssociation
PredictHistogram
PredictNodeId
PredictProbability
PredictSupport
$AdjustedProbability
$NodeId
$Probability
$Support
BottomCount
BottomPercent
BottomSum
IsDescendent
RangeMax
RangeMid
RangeMin
TopCount
TopPercent
TopSum
IsTrainingCase
IsTestCase
Exists
StructureColumn
StructureColumn
也可以知晓每种算法需要的参数,比如,CLUSTERING可以输入下列参数:
parameter_name
CLUSTER_COUNT
CLUSTER_SEED
CLUSTERING_METHOD
MAXIMUM_INPUT_ATTRIBUTES
MAXIMUM_STATES
MINIMUM_SUPPORT
MODELLING_CARDINALITY
SAMPLE_SIZE
STOPPING_TOLERANCE