第一句子网 > sklearn.feature_selection.RFE

sklearn.feature_selection.RFE

时间：2024-04-22 19:50:04

sklearn.feature_selection.RFE(estimator,*, n_features_to_select=None,step=1, verbose=0, importance_getter='auto')

RFE方法是用递归特征消除法进行特征排序。

这种方法需要一个外部的估计器(estimator)给每个特征分配权重。所谓的权重，可以想象成线性回归里的β值们。RFE方法的目标是通过递推的方式，一步一步的缩小特征集合。

具体而言，就是：

1. 刚开始的时候，估计器在初始特征集上进行训练，每个特征的重要性通过任何特定的属性或可调用对象来获得。

2.将最不重要的特征从当前的特征集中剔除(修剪)。

3. 将此过程在修剪后的集合上递归重复，直到最终达到所需要的特征数量。

参数(parameters)：

(1)estimator:估计器，必须是一个拥有fit方法的监督学习的估计器。能够提供特征重要性的信息。(比如：coef_,feature_importances_).

(2)n_features_to_select: int or float, default=None选择最终要保留的特征数目，缺省情况下默认保留50% 如果用int赋值，表示保留具体多少个特征；若用0~1之间的浮点数赋值，则表示保留多少百分比的特征

(3)step: int or float, default=1如果大于或等于1，那么step对应于每次迭代时要移除的(整数)特征数。如果在(0.0, 1.0)范围内，那么step对应于每次迭代时要移除的特征的百分比(向下取整).

(4)verbose: int, default=0控制输出的粗略程度。

(5)importance_getter:str or callable, default=’auto’

1) 如果是默认值'auto'，则通过估算器的coef_或feature_importances_属性使用特征重要性。

2) 如果是字符串，指定用于提取特征重要性的属性名称/路径（用attrgetter实现）。

例如，<1>在TransformedTargetRegressor的情况下给出regressor_.coef_;

<2>在class:~sklearn.pipeline.Pipeline的最后一步的名字是clf的情况下给出named_steps.clf.feature_importances_

3）如果是callable，那么将覆盖默认的特征重要性获取器。该callable会被和estimator一起传递，它应该返回每个特征的importance。

属性(Attributes):

(1)classes_:ndarray of shape (n_classes,)Classes labels available whenestimatoris a classifier.

对非分类器会提示如下错误

AttributeError: 'SVR' object has no attribute 'classes_'

(2)estimator_:所使用的估计器

(3)n_features_:int所选择的特征数

(4)n_features_in_:int拟合过程中看到的特征数量。只有如果使用的估计器在拟合过程中暴露出这样的属性才会被定义。

(5)feature_names_in_:ndarray of shape (n_features_in_,)拟合过程中看到的特征名称。仅当X的特征名称都是字符串时才定义。

(6)ranking_:ndarray of shape (n_features,)特征的排名，比如ranking_[i]对应第i个特征的排名位置。选定的(即估计为最佳的)特征被分配到rank 1。

比如：总共十个特征，最终要求筛选出五个特征。那么.ranking_的结果可能如下所示。即，所有被选择出来的特征都被rank为1

array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])

(7)support_:ndarray of shape (n_features,)The mask of selected features

在上一个例子中，对应的.support_的输出如下图所示

array([ True, True, True, True, True, False, False, False, False,False])

一个原API手册中的例子：

The following example shows how to retrieve the 5 most informative features in the Friedman #1 dataset.

>>> from sklearn.datasets import make_friedman1>>> from sklearn.feature_selection import RFE>>> from sklearn.svm import SVR>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)>>> estimator = SVR(kernel="linear")>>> selector = RFE(estimator, n_features_to_select=5, step=1)>>> selector = selector.fit(X, y)>>> selector.support_array([ True, True, True, True, True, False, False, False, False,False])>>> selector.ranking_array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])

方法(Method):

下面表格来自sklearn的原API手册：

(1)decision_function(X) : 计算X的决策函数

参数：X：{array-like or sparse matrix} of shape (n_samples, n_features)

输入的样本。在内部，它将被转换为dtype=np.float32，如果提供的是一个稀疏的矩阵，则转换为稀疏的csr_matrix。

返回值(Returns)： score:array, shape = [n_samples, n_classes] or [n_samples]

“输入样本”的决策函数，类的顺序和属性class_中类的顺序相对应。回归和二分类产生一个形状为[n_samples]的数组。

*不是所有分类器都有决策函数：

'SVR' object has no attribute 'decision_function'

(2)fit(X,y,**fit_params):拟合RFE模型，然后在选定的特征上拟合之前定义的estimator

参数1）X：{array-like, sparse matrix} of shape (n_samples, n_features)训练中的输入样本

2) y:array-like of shape (n_samples,)目标值

3)**fit_params:dict额外传递的参数，字典格式。忽略即可

返回值(Returns)：self:objectFitted estimator.

(3)fit_transform(X,y=None,**fit_params)Fit to data, then transform it.用可选的参数fit_params对X和y进行transform，并返回X的transform版本。

参数： (和fit的参数差不多)

1) X:array-like of shape (n_samples, n_features)Input samples.

2) y:array-like of shape (n_samples,) or (n_samples, n_outputs), default=NoneTargetvalues (None for unsupervised transformations).

3) **fit_params:dictAdditional fit parameters.

返回值(Returns)： X_new：ndarray array of shape (n_samples, n_features_new)转换后的数组

(4)get_feature_names_out(input_features=None)根据选定的特征结果，对所有特征做mask操作

参数：input_features：array-like of str or None, default=None输入特征

分两种情况：1）如果input_features为None，则使用feature_names_ in_作为特征名称。如果feature_names_in_没有定义，那么将生成以下输入特征名称： ["x0", "x1", ..., "x(n_features_in_ - 1)"] 。 2）如果input_features是一个数组并且feature_names_in_被定义了，那么input_features必须与feature_names_in_匹配。

返回值(Returns)：feature_names_out：ndarray of str objects转换后的特征名称

(5)get_params(deep=True)得到这个估计器的参数

参数：deep:bool, default=True如果是True，将返回这个estimator和其包含的子对象(也是estimator)的参数。

返回值(Returns)：params：dict被映射到它们的值的参数名

*实例：

In[]: selector.get_params() Out[]: {'estimator__C': 1.0,'estimator__cache_size': 200,'estimator__coef0': 0.0,'estimator__degree': 3,'estimator__epsilon': 0.1,'estimator__gamma': 'scale','estimator__kernel': 'linear','estimator__max_iter': -1,'estimator__shrinking': True,'estimator__tol': 0.001,'estimator__verbose': False,'estimator': SVR(kernel='linear'),'importance_getter': 'auto','n_features_to_select': None,'step': 1,'verbose': 0}

(6)get_support(indices=False)获取所选特征的掩码或整数索引

参数：indices：bool, default=False

如果是True，返回值将是一个整数的数组，而不是一个布尔掩码(boolean mask)

返回值(Returns)：support:array

一个索引，用于从一个特征向量中选择保留的特征。

1) 如果indces=False，这是一个形状为[#inputfeatures]的布尔数组，如果这个数组某个对应位置的特征被选择保留,那么这个数组此位置的元素为True。

2) 如果indices=True，这是一个形状为[#outputfeatures]的整数数组，其值是输入特征向量的索引。

*设置为False和True的区别

In[]: selector.get_support(indices=False)Out[]: array([ True, True, True, True, True, False, False, False, False,False])

In[]: selector.get_support(indices=True)Out[]: array([0, 1, 2, 3, 4], dtype=int64)

(7)inverse_transform(X)反向转换操作

参数：X:array of shape [n_samples, n_selected_features]

返回值(Returns)：X_r:array of shape [n_samples, n_original_features]在X特征被转换掉的地方插入一列零

(8)predict(X)将X缩减到选定的特征并使用估计器进行预测略

(9)predict_log_proba(X)预测X的类对数概率(Predict class log-probabilities for X)

参数：X:array of shape [n_samples, n_features]

返回值(Returns):p:array of shape (n_samples, n_classes)输入样本的类对数概率, 各类的顺序与属性classes_中的顺序一致。

(10)predict_proba(X)预测X的类别概率

参数：X:{array-like or sparse matrix} of shape (n_samples, n_features)输入的样本在内部被转换为dtype=np.float32，如果提供的是一个稀疏的矩阵，则转换为稀疏的csr_matrix。

返回值(Returns):p:array of shape (n_samples, n_classes)输入样本的类别概率。类的顺序与属性classes_中的顺序相对应

(11)score(X,y,**fit_params)将X缩小到选定的特征，并返回估计器的得分

参数：

X:array of shape [n_samples, n_features]The input samples.

y:array of shape [n_samples]The target values.

**fit_params:dictParameters to pass to thescoremethod of the underlying estimator.

返回值(Returns):score:float用rfe.transform(X)和y返回的选定特征计算的基础估计器的得分。

*实际返回只是一个得分而已：

In[]: selector.score(X,y)Out[]: 0.675124732630447

(12)set_output(*,transform=None)设置输出容器

参数：transform：{“default”, “pandas”}, default=None

配置transform和fit_transform的输出：

"default"： transformer的默认输出格式；

"pandas"： DataFrame输出；

None: 转换配置不变

返回值(Returns):self:estimator instanceEstimator instance.

(13)set_params(**params)设置estimator的参数。该方法既适用于简单的估计器，也适用于嵌套对象（如Pipeline）。后者有<组件>__<参数>形式的参数，这样就可以更新嵌套对象的每个组件。

参数：**params:dictEstimator parameters.

返回值(Returns): self:estimator instanceEstimator instance.

(14)transform(X)Reduce X to the selected features

参数：X:array of shape [n_samples, n_features]The input samples.

返回值(Returns):X_r:array of shape [n_samples, n_selected_features]The input samples with only the selected features.

API后附的一个例子：

可以对图像像素的重要性做排序：

from sklearn.svm import SVCfrom sklearn.datasets import load_digitsfrom sklearn.feature_selection import RFEimport matplotlib.pyplot as plt# Load the digits datasetdigits = load_digits()X = digits.images.reshape((len(digits.images), -1))y = digits.target# Create the RFE object and rank each pixelsvc = SVC(kernel="linear", C=1)rfe = RFE(estimator=svc, n_features_to_select=1, step=1)rfe.fit(X, y)ranking = rfe.ranking_.reshape(digits.images[0].shape)# Plot pixel rankingplt.matshow(ranking, cmap=plt.cm.Blues)plt.colorbar()plt.title("Ranking of pixels with RFE")plt.show()

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。