반응형
sklearn.compose에 있는 컬럼 트랜스포머와 셀렉터 사용법
StandardScaler, OneHotEncoder 동시적용
selector 사용시 dtype_include 지정 np.number or object
OneHotEncoder 사용시 sparse False옵션
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import make_column_transformer, make_column_selector
from sklearn.model_selection import train_test_split
fuel = pd.read_csv('../input/dl-course-data/fuel.csv')
X = fuel.copy()
# Remove target
y = X.pop('FE')
preprocessor = make_column_transformer(
(StandardScaler(),
make_column_selector(dtype_include=np.number)),
(OneHotEncoder(sparse=False),
make_column_selector(dtype_include=object)),
)
X = preprocessor.fit_transform(X)
OneHotEncoder
from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder(handle_unknown='ignore')
>>> X = [['Male', 1], ['Female', 3], ['Female', 2]]
>>> enc.fit(X)
OneHotEncoder(handle_unknown='ignore')
>>> enc.categories_
[array(['Female', 'Male'], dtype=object), array([1, 2, 3], dtype=object)]
>>> enc.transform([['Female', 1], ['Male', 4]]).toarray()
array([[1., 0., 1., 0., 0.],
[0., 1., 0., 0., 0.]])
>>> enc.inverse_transform([[0, 1, 1, 0, 0], [0, 0, 0, 1, 0]])
array([['Male', 1],
[None, 2]], dtype=object)
>>> enc.get_feature_names_out(['gender', 'group'])
array(['gender_Female', 'gender_Male', 'group_1', 'group_2', 'group_3'], ...)
One can always drop the first column for each feature:
>>> drop_enc = OneHotEncoder(drop='first').fit(X)
>>> drop_enc.categories_
[array(['Female', 'Male'], dtype=object), array([1, 2, 3], dtype=object)]
>>> drop_enc.transform([['Female', 1], ['Male', 2]]).toarray()
array([[0., 0., 0.],
[1., 1., 0.]])
반응형
'python & DS' 카테고리의 다른 글
RandomForestClassifier (1) | 2023.11.18 |
---|---|
pandas AI (0) | 2023.11.18 |
dbscan (0) | 2023.08.24 |
missing value 처리 방식 실제사례 (0) | 2023.08.14 |
IoT3 (0) | 2022.07.31 |