Batch Normalization Tensorflow

2018. 7. 3. 17:22

Batch Normalization

learning rate를 너무 높게 잡을 경우 gradient가 explode/vanish 하거나, 비정상 local minima에 빠지는 문제발생. 이는 parameter들의 scale 때문, Batch Normalization을 사용시 backpropagation 할 때 parameter의 scale에 영향을 받지 않게 되며, 따라서, learning rate를 크게 잡을 수 있게 되고 이는 빠른 학습을 가능하게 함.
Batch Normalization의 경우 자체적인 regularization 효과가 있음.
Activation 함수 적용전에 사용.

알고리즘

미니 배치 단위로 평균과 분산을 구함.
평균과 분산을 구해 normalize 시킴.
gamma를 더하고 beta를 곱해 scale 과 shift를 시킴.

*gamma와 beta는 trainable 한 파라미터로 backprob이 가능.

학습

training 시 mini batch 단위에서 구한 평균과 분산을 이용해 normalize.
mini batch 단위에서 구한 average와 variance를 저장.
moving average 방식

테스트

Test 시 train할때 구했던 평균과 분산의 평균을 이용해 normalize.
추가적으로 연산 시 m/(m-1)를 곱해주게 되는데, 이것은 학습 전체 데이터에 대한 분산이 아니라 mini batch 분산을 통해 전체 분산을 추정하기 때문에 통계학적으로 보정을 위해 베셀의 보정값을 곱해준다고 한다..

-> 베셀 보정 : 수학적으로 표본분산의 기댓값이 모분산과 일치하도록 하기위한 설정.

-> 모분산을 추정할때 그냥 m으로 나누게 되면 모분산 보다 항상 작은 값이 추정되는 현상이 발생한다. 이 현상을 제거하기 위해 m 대신 m-1로 나눠줌.

Convolution에서의 Batch Norm

Convolution에서는 activation function 전에 batch norm을 사용한다. 기본적인 형태는 Wx+b 형태로 들어가게 되는데 batch norm에서의 beta 역할이 Wx+b 에서의 b의 역할을 대체하기 때문에 이를 없애준다.

Tensorflow Code 작성시 주의.

tf.layers.batch_normalization(training=) 의 parameter 로 training이 들어감. -> train 시 True, test와 validation 시 False로 설정해 주어야함. 하지만 이렇게 셋팅 후 모델을 돌려보면 test와 vali 에서 값이 이상하게 나오게 된다. 이는 batch norm의 train 연산시 계산되는 평균과 분산을 업데이트 시켜줘야 하는데 이를 자동으로 해주지 못함. 즉 수동으로 업데이트 시켜야함!!

# placeholder 지정

batch_prob = tf.placeholder(tf.bool)

tf.layers.batch_normalization(training=batch_prob)

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

with tf.control_dependencies(update_ops):

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

#또한 feed_dict에 update_ops 연산에 필요한 placeholder도 feed_dict에 넣어주자.

feed_dict = {X: batch[0], Y: batch[1], batch_prob: True}

MNIST Code

import mnist_function
import tensorflow as tf
import pandas as pd
import numpy as np

# set  data count
count = [400, 4000, 4000, 4000, 4000, 4000, 4000, 4000, 4000, 4000]
# load data set
train_x, train_y, test_x, test_y , vali_x, vali_y = mnist_function.data_set(count)
print('train        test         vali')
print(len(train_x),len(test_x),len(vali_x))

#train data shuffle
train_x, train_y = mnist_function.data_shuffle(train_x, train_y)
# set parameters
batch_size = 32
learning_rate = 0.001
training_epochs = 5

# Network Model.
tf.set_random_seed(777)

#keep_prob = tf.placeholder(tf.float32)
batch_prob = tf.placeholder(tf.bool)

X = tf.placeholder(tf.float32, [None, 784])
X_img = tf.reshape(X, [-1, 28, 28, 1])
Y = tf.placeholder(tf.float32, [None, 10])

W1 = tf.Variable(tf.random_normal([3, 3, 1, 32], stddev=0.01))
L1 = tf.nn.conv2d(X_img, W1, strides=[1, 1, 1, 1], padding='SAME')
L1 = tf.layers.batch_normalization(L1, center=True, scale=True, training=batch_prob)
L1 = tf.nn.relu(L1)
L1 = tf.nn.max_pool(L1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

W2 = tf.Variable(tf.random_normal([3, 3, 32, 64], stddev=0.01))
L2 = tf.nn.conv2d(L1, W2, strides=[1, 1, 1, 1], padding='SAME')
L2 = tf.layers.batch_normalization(L2, center=True, scale=True, training=batch_prob)
L2 = tf.nn.relu(L2)
L2 = tf.nn.max_pool(L2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

W3 = tf.Variable(tf.random_normal([3, 3, 64, 128], stddev=0.01))
L3 = tf.nn.conv2d(L2, W3, strides=[1, 1, 1, 1], padding='SAME')
L3 = tf.layers.batch_normalization(L3, center=True, scale=True, training=batch_prob)
L3 = tf.nn.relu(L3)
L3 = tf.nn.max_pool(L3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
L3_flat = tf.reshape(L3, [-1, 128 * 4 * 4])

W4 = tf.get_variable("W4", shape=[128 * 4 * 4, 100], initializer=tf.contrib.layers.xavier_initializer())
b4 = tf.Variable(tf.random_normal([100]))
L4 = tf.layers.batch_normalization(L3_flat , center=True, scale=True, training=batch_prob)
L4 = tf.nn.relu(tf.matmul(L4 , W4) + b4)

W5 = tf.get_variable("W15", shape=[100, 10], initializer=tf.contrib.layers.xavier_initializer())
b5 = tf.Variable(tf.random_normal([10]))
L5 = tf.layers.batch_normalization(L4, center=True, scale=True, training=batch_prob)
logits = tf.matmul(L5, W5) + b5
y_pred = tf.nn.softmax(logits)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=Y, logits=logits))

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

sess = tf.Session()
sess.run(tf.global_variables_initializer())

print('Learning start')

train_total_loss = []
train_total_acc = []
validation_total_acc = []
validation_total_loss = []

for epoch in range(training_epochs):
    avg_loss = 0
    avg_acc = 0

    feature_train_li = []
    train_y_li = []

    total_batch = int(len(train_x) / batch_size)

    start_index = 0
    finish_index = batch_size
    for i in range(total_batch):
        batch = mnist_function.next_batch(start_index, finish_index, train_x, train_y)

        start_index += batch_size
        finish_index += batch_size
        #, keep_prob: 0.7
        feed_dict = {X: batch[0], Y: batch[1], batch_prob: True}
        train_loss, _, feature_train, y_label_train, train_acc = sess.run([cost, optimizer, L4, Y, accuracy], feed_dict=feed_dict)

        avg_loss += train_loss / total_batch
        avg_acc += train_acc / total_batch

        if(epoch + 1 == training_epochs):
            feature_train_li.append(feature_train)
            train_y_li.append(y_label_train)

    # vali_x, vali_y
    vali_loss, vali_acc = sess.run([cost, accuracy], feed_dict={X: vali_x, Y: vali_y, batch_prob: False})

Batch Normalization 적용. 미적용.

Tensorflow API - https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization

Paper - https://arxiv.org/pdf/1502.03167.pdf

'Study > Deep Learning' 카테고리의 다른 글

Softmax fuction 소프트맥스 함수 (0)	2018.08.08
Gradient Descent Optimiztion (0)	2018.07.18
loss function 손실 함수 (0)	2018.07.11
CNN MNIST Example Tensorflow (0)	2018.07.04
NIN tensorflow cifar-10 (0)	2018.07.02

NIN tensorflow cifar-10

2018. 7. 2. 20:05

import tensorflow as tf
import numpy as np
from tensorflow.python.keras._impl.keras.datasets.cifar10 import load_data


def next_batch(num, data, labels):
    idx = np.arange(0, len(data))
    np.random.shuffle(idx)
    idx = idx[:num]
    data_shuffle = [data[i] for i in idx]
    labels_shuffle = [labels[i] for i in idx]

    return np.asarray(data_shuffle), np.asarray(labels_shuffle)


def CNN_Layers(x):

    x_image = x

    # MLP Layer 1
    W1 = tf.Variable(tf.random_normal([5, 5, 3, 192], stddev=0.01))
    b1 = tf.Variable(tf.random_normal([192], stddev=0.01, dtype=tf.float32))
    L1 = tf.nn.conv2d(x_image, W1, strides=[1, 1, 1, 1], padding='SAME') + b1
    L1 = tf.nn.relu(L1)

    W2 = tf.Variable(tf.random_normal([1, 1, 192, 160], stddev=0.05, dtype=tf.float32))
    b2 = tf.Variable(tf.constant(0, shape=[160], dtype=tf.float32))
    L2 = tf.nn.conv2d(L1, W2, strides=[1, 1, 1, 1], padding='SAME') + b2
    L2 = tf.nn.relu(L2)

    W3 = tf.Variable(tf.random_normal([1, 1, 160, 96], stddev=0.05, dtype=tf.float32))
    b3 = tf.Variable(tf.constant(0, shape=[96], dtype=tf.float32))
    L3 = tf.nn.conv2d(L2, W3, strides=[1, 1, 1, 1], padding='SAME') + b3
    L3 = tf.nn.relu(L3)

    L3 = tf.nn.max_pool(L3, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')
    L3 = tf.nn.dropout(L3, 0.5)

    # MLP Layer 2
    W4 = tf.Variable(tf.random_normal([5, 5, 96, 192], stddev=0.05, dtype=tf.float32))
    b4 = tf.Variable(tf.random_normal([192], stddev=0.01, dtype=tf.float32))
    L4 = tf.nn.conv2d(L3, W4, strides=[1, 1, 1, 1], padding='SAME') + b4
    L4 = tf.nn.relu(L4)

    W5 = tf.Variable(tf.random_normal([1, 1, 192, 192], stddev=0.05, dtype=tf.float32))
    b5 = tf.Variable(tf.constant(0, shape=[192], dtype=tf.float32))
    L5 = tf.nn.conv2d(L4, W5, strides=[1, 1, 1, 1], padding='SAME') + b5
    L5 = tf.nn.relu(L5)

    W6 = tf.Variable(tf.random_normal([1, 1, 192, 192], stddev=0.05, dtype=tf.float32))
    b6 = tf.Variable(tf.constant(0, shape=[192], dtype=tf.float32))
    L6 = tf.nn.conv2d(L5, W6, strides=[1, 1, 1, 1], padding='SAME') + b6
    L6 = tf.nn.relu(L6)

    L6 = tf.nn.max_pool(L6, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')
    L6 = tf.nn.dropout(L6, 0.5)

    # MLP Layer 3
    W7 = tf.Variable(tf.random_normal([3, 3, 192, 192], stddev=0.05, dtype=tf.float32))
    b7 = tf.Variable(tf.random_normal([192], stddev=0.01, dtype=tf.float32))
    L7 = tf.nn.conv2d(L6, W7, strides=[1, 1, 1, 1], padding='SAME') + b7
    L7 = tf.nn.relu(L7)

    W8 = tf.Variable(tf.random_normal([1, 1, 192, 192], stddev=0.05, dtype=tf.float32))
    b8 = tf.Variable(tf.constant(0, shape=[192], dtype=tf.float32))
    L8 = tf.nn.conv2d(L7, W8, strides=[1, 1, 1, 1], padding='SAME') + b8
    L8 = tf.nn.relu(L8)

    W9 = tf.Variable(tf.random_normal([1, 1, 192, 10], stddev=0.05, dtype=tf.float32))
    b9 = tf.Variable(tf.constant(0, shape=[10], dtype=tf.float32))
    L9 = tf.nn.conv2d(L8, W9, strides=[1, 1, 1, 1], padding='SAME') + b9
    L9 = tf.nn.relu(L9)
    output = tf.nn.avg_pool(L9, ksize=[1, 8, 8, 1], strides=[1, 1, 1, 1], padding='VALID')

    output = tf.reshape(output, [-1, 1 * 1 * 10])
    logits = output
    y_pred = tf.nn.softmax(logits)

    return y_pred, logits

x = tf.placeholder(tf.float32, shape=[None, 32, 32, 3])
y = tf.placeholder(tf.float32, shape=[None, 10])
keep_prob = tf.placeholder(tf.float32)

(x_train, y_train), (x_test, y_test) = load_data()
y_train_one_hot = tf.squeeze(tf.one_hot(y_train, 10), axis=1)
y_test_one_hot = tf.squeeze(tf.one_hot(y_test, 10), axis=1)

y_pred, logits = CNN_Layers(x)

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=logits))
train_step = tf.train.RMSPropOptimizer(1e-4).minimize(loss)

correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
    sess.run(tf.global_variables_initializer())

    for i in range(10000):
        batch = next_batch(128, x_train, y_train_one_hot.eval())

        if i % 100 == 0:
            train_accuracy = accuracy.eval(feed_dict={x: batch[0], y: batch[1], keep_prob: 1.0})
            loss_print = loss.eval(feed_dict={x: batch[0], y: batch[1], keep_prob: 1.0})

            print("Epoch: %d, Accuracy: %f, loss: %f" % (i, train_accuracy, loss_print))
        sess.run(train_step, feed_dict={x: batch[0], y: batch[1], keep_prob: 0.5})

    test_batch = next_batch(10000, x_test, y_test_one_hot.eval())
    print("Test Data Set Accuracy: %f" % accuracy.eval(feed_dict={x: test_batch[0], y: test_batch[1], keep_prob: 1.0}))

'Study > Deep Learning' 카테고리의 다른 글

Softmax fuction 소프트맥스 함수 (0)	2018.08.08
Gradient Descent Optimiztion (0)	2018.07.18
loss function 손실 함수 (0)	2018.07.11
CNN MNIST Example Tensorflow (0)	2018.07.04
Batch Normalization Tensorflow (4)	2018.07.03

python을 이용한 Wine Quality dataset Naive Bayesain GaussianNB & BernoulliNB

2018. 7. 2. 19:16

데이터셋 다운로드

https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/

UCI Wine Quality Data set

Attribute information

1 - fixed acidity
2 - volatile acidity
3 - citric acid
4 - residual sugar
5 - chlorides
6 - free sulfur dioxide
7 - total sulfur dioxide
8 - density
9 - pH
10 - sulphates
11 - alcohol
12 - quality (score between 0 and 10)

코드 구현

CSV 데이터 확인.

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

wine_data = pd.read_csv('winequality-white.csv',delimiter=';',dtype=float)

wine_data.head(10)

데이터 자르기 및 qulity 변수 값 변경.

x_data = wine_data.iloc[:,0:-1]

y_data = wine_data.iloc[:,-1]

# Score 값이 7보다 작으면 0, 7보다 크거나 같으면 1로 값 변경.

y_data = np.array([1 if i>=7 else 0 for i in y_data])

x_data.head(5)

# 트레인, 테스트 데이터 나누기.

train_x, test_x, train_y, test_y = sklearn.model_selection.train_test_split(x_data, y_data, test_size = 0.3,random_state=42)

GaussianNB 모델 구축

from sklearn.naive_bayes import GaussianNB

gnb = GaussianNB()

gnb.fit(train_x,train_y)

성능 평가

#Predict

y_pred_train = gnb.predict(train_x)

y_pred_test = gnb.predict(test_x)

y_pred_test2 = gnb.predict_proba(test_x)

print("Train Data:", accuracy_score(train_y, y_pred_train))

print("Test Data" , accuracy_score(test_y, y_pred_test))

# Confusing matrix

confusion = confusion_matrix(test_y,y_pred_test)

print("confusion_matrix\n{}".format(confusion))

y_true, y_pred = test_y, gnb.predict(test_x)

print(classification_report(y_true, y_pred))

# Roc Curve

fpr, tpr, thresholds = roc_curve(test_y, y_pred_test2[:,1], pos_label=1)

roc_auc = auc(fpr, tpr)

plt.title('Receiver Operating Characteristic')

plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)

plt.legend(loc = 'lower right')

plt.plot([0, 1], [0, 1],'r--')

plt.xlim([0, 1])

plt.ylim([0, 1])

plt.ylabel('True Positive Rate')

plt.xlabel('False Positive Rate')

plt.show()

BernoulliNB 모델 구축

from sklearn.naive_bayes import BernoulliNB

bnb = BernoulliNB()

bnb.fit(train_x,train_y)

성능 평가

#Predict

y_pred_train = bnb.predict(train_x)

y_pred_test = bnb.predict(test_x)

y_pred_test2 = bnb.predict_proba(test_x)

print("Train Data:", accuracy_score(train_y, y_pred_train))

print("Test Data" , accuracy_score(test_y, y_pred_test))

# Confusing matrix

confusion = confusion_matrix(test_y,y_pred_test)

print("confusion_matrix\n{}".format(confusion))

y_true, y_pred = test_y, bnb.predict(test_x)

print(classification_report(y_true, y_pred))

# Roc Curve

fpr, tpr, thresholds = roc_curve(test_y, y_pred_test2[:,1], pos_label=1)

roc_auc = auc(fpr, tpr)

plt.title('Receiver Operating Characteristic')

plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)

plt.legend(loc = 'lower right')

plt.plot([0, 1], [0, 1],'r--')

plt.xlim([0, 1])

plt.ylim([0, 1])

plt.ylabel('True Positive Rate')

plt.xlabel('False Positive Rate')

plt.show()

'Study > Data Science' 카테고리의 다른 글

RMSE, MAE (0)	2018.08.03
Supervised learning, UnSupervised learning Models (0)	2018.08.03
python을 이용한 Wine Quality dataset KNN (0)	2018.06.18
python을 이용한 Wine Quality dataset Decision Tree (0)	2018.06.05
python을 이용한 Wine Quality dataset Logistic Regression (0)	2018.06.05

Design Patterns PROXY 프록시 패턴 C++

2018. 7. 2. 19:08

PROXY 프록시 패턴

대리자를 통해 다른 객체에 대한 접근을 제어.
자신이 상대하는 대상과 동일할 인터페이스 제공
원격지 프록시, 가상 프록시, 보호용 프록시, 스마트 참조자(스마트 포인터)가 있다.

#include <iostream> using namespace std; class Subject{ public: virtual void request() = 0; }; class RealSubject : public Subject{ public: void request() { cout << "RealSubject Requset" << endl; } }; class Proxy : public Subject{ public: Proxy() :_realsubject(nullptr) {} ~Proxy() { if(_realsubject) delete _realsubject; } void request(){ if(!_realsubject) _realsubject = new RealSubject; _realsubject->request(); } private: RealSubject* _realsubject; }; int main(){ Subject* Proxyptr = new Proxy(); Proxyptr->request(); delete Proxyptr; return 0; }

'Study > Design Patterns c++' 카테고리의 다른 글

Design Patterns CHAIN OF RESPONSIBILITY 책임 연쇄 패턴 패턴 C++ (0)	2018.07.03
Design Patterns ABSTRACT FACTORY 추상 팩토리 패턴 C++ (0)	2018.07.03
Design Patterns FLYWEIGHT 플라이급 패턴 패턴 C++ (0)	2018.07.02
Design Patterns FACADE 퍼사드 패턴 C++ (0)	2018.06.19
Design Patterns DECORATOR 장식자 패턴 C++ (0)	2018.06.19

Design Patterns FLYWEIGHT 플라이급 패턴 패턴 C++

2018. 7. 2. 19:07

FLYWEIGHT 플라이급 패턴 (분동 패턴)

공유를 통해 많은 작은 객체들을 지원 한다.
메모리 절약에 용의 하다.
BTree , map 이용

#include <iostream>

#include <map> using namespace std; class Flyweight{ public: virtual void operation() = 0; }; class UnsharedConcreteFlyweight : public Flyweight{ public: void operation() override { cout << "Unshared" << endl; } }; class ConcreteFlyweight : public Flyweight{ public: void operation() override { cout << "Share" << endl; } }; class FlyweightFactory{ public: Flyweight* getFlyweight(int key){ if(_map.find(key) == _map.end()){ _map[key] = new ConcreteFlyweight; } return _map[key]; } private: map<int,Flyweight*> _map; }; int main(){ FlyweightFactory factory; Flyweight* _flyweight = factory.getFlyweight(1); _flyweight->operation(); return 0; }

'Study > Design Patterns c++' 카테고리의 다른 글

Design Patterns ABSTRACT FACTORY 추상 팩토리 패턴 C++ (0)	2018.07.03
Design Patterns PROXY 프록시 패턴 C++ (0)	2018.07.02
Design Patterns FACADE 퍼사드 패턴 C++ (0)	2018.06.19
Design Patterns DECORATOR 장식자 패턴 C++ (0)	2018.06.19
Design Patterns STRATEGY 전략 패턴 C++ (0)	2018.06.18

PREV 1 ···12 13 14 15 16 17 NEXT

Deeppp

Batch Normalization Tensorflow

'Study > Deep Learning' 카테고리의 다른 글

NIN tensorflow cifar-10

'Study > Deep Learning' 카테고리의 다른 글

python을 이용한 Wine Quality dataset Naive Bayesain GaussianNB & BernoulliNB

UCI Wine Quality Data set

Attribute information

코드 구현

GaussianNB 모델 구축

BernoulliNB 모델 구축

'Study > Data Science' 카테고리의 다른 글

Design Patterns PROXY 프록시 패턴 C++

'Study > Design Patterns c++' 카테고리의 다른 글

Design Patterns FLYWEIGHT 플라이급 패턴 패턴 C++

'Study > Design Patterns c++' 카테고리의 다른 글

+ Recent posts

티스토리툴바