python을 이용한 Wine Quality dataset Logistic Regression
Python을 이용한 UCI Wine Quality 데이터 Logistic Regression
데이터셋 다운로드
https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/
UCI Wine Quality Data set
Attribute information
- 1 - fixed acidity
- 2 - volatile acidity
- 3 - citric acid
- 4 - residual sugar
- 5 - chlorides
- 6 - free sulfur dioxide
- 7 - total sulfur dioxide
- 8 - density
- 9 - pH
- 10 - sulphates
- 11 - alcohol
- 12 - quality (score between 0 and 10)
코드 구현
CSV 데이터 확인.
import pandas as pd
import statsmodels.api as sm
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
wine_data = pd.read_csv('winequality-white.csv',delimiter=';',dtype=float)
wine_data.head(10)
데이터 자르기 및 qulity 변수 값 변경.
x_data = wine_data.iloc[:,0:-1]
y_data = wine_data.iloc[:,-1]
# Score 값이 7보다 작으면 0, 7보다 크거나 같으면 1로 값 변경.
y_data = np.array([1 if i>=7 else 0 for i in y_data])
x_data.head(5)
# 트레인, 테스트 데이터 나누기.
train_x, test_x, train_y, test_y = sklearn.model_selection.train_test_split(x_data, y_data, test_size = 0.3,random_state=42)
로지스틱 리그레션 모델 학습
log_reg = LogisticRegression()
log_reg.fit(train_x, train_y)
성능 평가
y_pred = log_reg.predict(test_x)
print("Train Data:",log_reg.score(train_x,train_y))
print("Test Data",sum(y_pred == test_y) / len(test_y))
from sklearn.metrics import classification_report
y_true, y_pred = test_y, log_reg.predict(test_x)
print(classification_report(y_true, y_pred))
odds ratio
print (np.exp(logit.params))
params = logit.params
conf = logit.conf_int()
conf['OR'] = params
conf.columns = ['2.5%', '97.5%', 'OR']
print (np.exp(conf))
MLE
logit = sm.Logit(train_y,train_x).fit()
logit.summary()