首先,我们需要导入必要的库:
```python import numpy as np import pandas as pd
from sklearn.datasets import load_iris
from sklearn.modelselection import traintest_split
from sklearn.metrics import accuracy_score ```
接下来,我们加载鸢尾花数据集并进行预处理:
```python iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.featurenames) df['class'] = iris.target df['class'] = df['class'].map({0: iris.targetnames[0], 1: iris.targetnames[1], 2: iris.targetnames[2]})
df.describe() ```
然后,我们将数据划分为特征和标签:
```python x = iris.data y = iris.target.reshape(-1, 1)
xtrain, xtest, ytrain, ytest = traintestsplit(x, y, testsize=0.3, randomstate=35, stratify=y) ```
现在,我们来实现KNN分类器的核心算法部分:
```python
def l1_distance(a, b): return np.sum(np.abs(a - b), axis=1)
def l2_distance(a, b): return np.sqrt(np.sum((a - b) ** 2, axis=1))
class KNN(object): def init(self, nneighbors=1, distfunc=l1distance): self.nneighbors = nneighbors self.distfunc = dist_func
# 训练模型
def fit(self, x, y):
self.x_train = x
self.y_train = y
# 模型预测
def predict(self, x):
y_predict = np.zeros((x.shape[0], 1), dtype=self.y_train.dtype)
# 遍历输入数据点
for i, x_test in enumerate(x):
distances = self.dist_func(self.x_train, x_test)
# 按照距离由近到远排序
sorted_indices = np.argsort(distances)
# 选取最近的k个点
k_nearest_labels = self.y_train[sorted_indices[:self.n_neighbors]]
# 统计出现频率最高的类别
most_frequent_label = np.argmax(np.bincount(k_nearest_labels.flatten()))
# 将预测结果赋值给y_predict
y_predict[i] = most_frequent_label
return y_predict
```
以上就是KNN分类器的核心实现,包括数据加载、预处理以及算法的具体实现。希望对你有所帮助。