티스토리 뷰
반응형
sklearn.compose에 있는 컬럼 트랜스포머와 셀렉터 사용법
StandardScaler, OneHotEncoder 동시적용
selector 사용시 dtype_include 지정 np.number or object
OneHotEncoder 사용시 sparse False옵션
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import make_column_transformer, make_column_selector
from sklearn.model_selection import train_test_split
fuel = pd.read_csv('../input/dl-course-data/fuel.csv')
X = fuel.copy()
# Remove target
y = X.pop('FE')
preprocessor = make_column_transformer(
(StandardScaler(),
make_column_selector(dtype_include=np.number)),
(OneHotEncoder(sparse=False),
make_column_selector(dtype_include=object)),
)
X = preprocessor.fit_transform(X)
OneHotEncoder
from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder(handle_unknown='ignore')
>>> X = [['Male', 1], ['Female', 3], ['Female', 2]]
>>> enc.fit(X)
OneHotEncoder(handle_unknown='ignore')
>>> enc.categories_
[array(['Female', 'Male'], dtype=object), array([1, 2, 3], dtype=object)]
>>> enc.transform([['Female', 1], ['Male', 4]]).toarray()
array([[1., 0., 1., 0., 0.],
[0., 1., 0., 0., 0.]])
>>> enc.inverse_transform([[0, 1, 1, 0, 0], [0, 0, 0, 1, 0]])
array([['Male', 1],
[None, 2]], dtype=object)
>>> enc.get_feature_names_out(['gender', 'group'])
array(['gender_Female', 'gender_Male', 'group_1', 'group_2', 'group_3'], ...)
One can always drop the first column for each feature:
>>> drop_enc = OneHotEncoder(drop='first').fit(X)
>>> drop_enc.categories_
[array(['Female', 'Male'], dtype=object), array([1, 2, 3], dtype=object)]
>>> drop_enc.transform([['Female', 1], ['Male', 2]]).toarray()
array([[0., 0., 0.],
[1., 1., 0.]])
반응형
'python & DS' 카테고리의 다른 글
RandomForestClassifier (1) | 2023.11.18 |
---|---|
pandas AI (0) | 2023.11.18 |
dbscan (0) | 2023.08.24 |
missing value 처리 방식 실제사례 (0) | 2023.08.14 |
IoT3 (0) | 2022.07.31 |
공지사항
최근에 올라온 글
최근에 달린 댓글
- Total
- Today
- Yesterday
링크
TAG
- geometry
- 통계
- Paper
- amazon
- Neo4j
- LECTURE
- 파이썬
- Shell
- 기본
- mongodb
- MachineLearning
- LeetCode
- tag hello
- quiz
- extract archive multiple files
- linkedin-skill-assessments-quizzes
- nvidia #gan
- ML
- pytorch
- 자바
- 알고리즘
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | |
7 | 8 | 9 | 10 | 11 | 12 | 13 |
14 | 15 | 16 | 17 | 18 | 19 | 20 |
21 | 22 | 23 | 24 | 25 | 26 | 27 |
28 | 29 | 30 | 31 |
글 보관함