8-5 High Level TensorFlow API에 의한 뉴럴네트워크(NN) 학습:II(Keras 준비단계)

codingart (66)in #kr • 7 years ago (edited)

읽어 들인 데이터를 직접 사용하여 머신 러닝을 시킬 수도 있으나 이 데이터들을 표준화(Normalization) 시키면 머신 러닝 결과가 더욱 정밀해질 수 있다.
읽어들인 MNIST 학습 데이타와 테스트 데이타를 표준화 한 후 원 데이터는 컴퓨터 메모리를 많이 차지하므로 삭제하도록 한다.

한편 표준화 이전이든 후든 X_train 이나 X-test는 그 수가 대단히 많으므로 출력하지 않도록 조심하자.

여기서부터 뉴럴 네트워크(NN: Neural Network)를 구성해 보기로 한다. 앞서서 준비한 표준화 데이터를 Session 실행 단계에서 입력할 수 있도록 2개의 placeholder tf_x 와 tf_y를 준비하고 멀티 레이어 퍼셉트론을 구성하기로 한다. 하지만 레이어 구성단계에서 Sigmoid 함수 사용을 피하고 대신에 Hyperbolic tangent 함수를 사용하도록 한다.

실수 값을 가지는 변수 z 의 -∞＜z＜+∞ 영역에서 두 함 수 모두 0과 1 사이의 값을 가지며 sigmoid 함수와 hyperbolic tangent 함수의 미분 계수는 둘 다 종모양이지만 sigmoid는 최대 값이 0.25 이며 hyperbolic tangent는 1.0 이다. 따라서 10개의 은닉층을 구성하게 되면 곱셈 효과로 인해 sigmoid는 10의 –6승에 달해 거의 Backpropagation 계산에서 의미를 상실하게 되나 hyperbolic tangent 는 보다 효율이 좋은 편이다. 이 문제에 관해서는 Hinton 교수에 의해 ReLU 함수로 정리가 되었으나 한편 ReLU 함수 가 z=0 점에서 기울기가 불연속적이므로
z∙Sigmoid(z) 함수로 대체하면 미분계수도 연속적이고 함수모양도 비슷하며 결과도 비슷하게 얻어진다는 점을 참고하자.
마지막 출력 층에서는 Softmax를 사용하여 확률 계산 결과를 얻도록 한다.

헤더 영역에서 뉴럴 네트워크의 파라메터를 설정하자. 표준화된 각 데이터 별 feature의 수는 784개로서 X_train_centered 의 shape[1] 값을 취하도록 한다. shape[0] 의 값은 60000개 이내의 샘플 수로서 None 으로 설정된다. 뉴럴 네트워크 연산 마지막 Softmax 확률 계산을 위해 입력하는 데이터 수는 classification 하려는 침ㄴㄴ 의 수로서 n_classes = 10 으로 설정한다. 매번 연산 시 동일한 결과를 얻어내기 위해서 random seed를 임의의 정수로 설정한다. 그래도 MNIST에서 batch 샘플들을 무작위로 추출하면 결과에 변동성이 나타남에 유의하자.

Computational graph를 설정하자. TensorFlow는 헤더 영역에 이어 실행 Session 에 들어가기 전에 반드시 Graph를 구성하게끔 되어 있으며 필요서 TensorBoard 에 의해 출력해 볼 수 있는 특징이 있다. 하지만 PyTorch 의 경우를 보면 일반 파이선 코드처럼 별도의 Grapg 구성을 지원하지 않으며 다이나믹한 영역으로 남겨둔다. 사실 Computational Graph가 없는 코드가 어디 있으련만.

뉴럴 네트워크 구성에 있어서 placeholder 로 입력될 데이터가 현재 하나당 784개의 요소를 가지는 일차원 형태의 어레이 리스트 데이터로 설정되어 있다면 별도의 reshaping 작업이 필요 없으므로 tf.layers.flatten 과 tf.layers.dense 중 후자만 사용해도 된다.

아울러 Wide Deep 뉴럴 네트워크의 은닉 층(hidden layer) 구성에서 2차원 어레이 형태의 웨이트 매트릭스의 row 값은 곱해주는 입력 벡터의 요소 수에 해당하나 column 값은 일종의 DOF(degree of freedom, 자유도)에 해당한다. 즉 임의로 설정가능하기 때문에 여러 번 연산을 해보고 경험적으로 결정하면 될 것이다.

현재의 뉴럴 네트워크 코드는 2개의 은닉 층 h1 과 h2를 가지도록 구성되었으며 이때 출력 처리를 위해서 hyperbolic tangent 즉 tanh 함수로 지정하였다. 마지막 레이어에서 는 activation 함수가 필요 없는게 Softmax 가 적용되기 때문이다.

Cost 함수를 도입하고 경사하강법을 적용할 옵티마이저를 설정한다. 아울러 옵티마이저에 learning rate 값을 지정하고 Session 실행 단계에서 글로벌 변수들의 초기화를 담당할 연산자를 설정한다.

Session 실행을 위해 batch 데이터를 준비하기 위한 루틴을 작성하자. batch 사이즈는 통상 100개를 많이 하지만 사용자가 결정하면 된다. X 데이터는 앞서서 준비한 X_train_centered를 사용하고 y 데이터는 class 정보이기 때문에 표준화와는 아무런 관련이 없으므로 y_train을 그대로 사용하면 된다. Suffling 도 굳이 할 필요는 없을 듯하다. 필요하다면 shuffle=True 로 설정하면 될 것이다.

Session을 위한 준비가 완료 되었으면 실행에 들어가자.computational Graph g를 실행 후 init_op 에 의해 글로벌 변수 초기화를 시킨다.
그 다음 loop 안에서 training_costs를 리스트로 선언하고 batch 데이터 즉 코드에 따르면 64개를 생성하여 cost를 계산한다. 64번 계산할 때 마다 cost 값을 모았다가 batch loop가 끝나면 평균하여 출력한다. 그 다음 Backprogation 에 의해 웨이트 값을 업데이트 하여 다시 batch loop를 돌리되 epoch = 50회 만큼 학습을 시킨다.

여기까지 뉴럴 네트워크 과정을 단계별로 모듈화 하는 과정을 토대로 Keras에서 어떤 방식으로 처리하는지 알아보기로 하자.
#Pre Neural Network for Keras

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import sys
import gzip
import shutil
import os
import struct
import tensorflow.contrib.keras as keras

#Training neural networks efficiently with high-level TensorFlow APIs
#Building multilayer neural networks using TensorFlow's Layers API

unzips mnist

if (sys.version_info > (3, 0)):
writemode = 'wb'
else:
writemode = 'w'

zipped_mnist = [f for f in os.listdir('./') if f.endswith('ubyte.gz')]
for z in zipped_mnist:
with gzip.GzipFile(z, mode='rb') as decompressed, open(z[:-3], writemode) as outfile:
outfile.write(decompressed.read())

def load_mnist(path, kind='train'):
"""Load MNIST data from path"""
labels_path = os.path.join(path,
'%s-labels-idx1-ubyte' % kind)
images_path = os.path.join(path,
'%s-images-idx3-ubyte' % kind)

with open(labels_path, 'rb') as lbpath:
    magic, n = struct.unpack('>II', 
                             lbpath.read(8))
    labels = np.fromfile(lbpath, 
                         dtype=np.uint8)

with open(images_path, 'rb') as imgpath:
    magic, num, rows, cols = struct.unpack(">IIII", 
                                           imgpath.read(16))
    images = np.fromfile(imgpath, 
                         dtype=np.uint8).reshape(len(labels), 784)
    images = ((images / 255.) - .5) * 2

return images, labels

loading the data

X_train, y_train = load_mnist('.', kind='train')
print('Rows: %d, Columns: %d' %(X_train.shape[0], X_train.shape[1]))

X_test, y_test = load_mnist('.', kind='t10k')
print('Rows: %d, Columns: %d' %(X_test.shape[0], X_test.shape[1]))

mean centering and normalization:

mean_vals = np.mean(X_train, axis=0)
std_val = np.std(X_train)

X_train_centered = (X_train - mean_vals)/std_val
X_test_centered = (X_test - mean_vals)/std_val

del X_train, X_test

print(X_train_centered.shape, y_train.shape)
print(X_test_centered.shape, y_test.shape)

n_features = X_train_centered.shape[1]
n_classes = 10
random_seed = 123
np.random.seed(random_seed)

g = tf.Graph()
with g.as_default():

tf.set_random_seed(random_seed)
tf_x = tf.placeholder(dtype=tf.float32,shape=(None, n_features),name='tf_x')

tf_y = tf.placeholder(dtype=tf.int32,shape=None, name='tf_y')
y_onehot = tf.one_hot(indices=tf_y, depth=n_classes)

h1 = tf.layers.dense(inputs=tf_x, units=50, activation=tf.tanh, name='layer1')
h2 = tf.layers.dense(inputs=h1, units=50,activation=tf.tanh,name='layer2')

logits = tf.layers.dense(inputs=h2, units=10,activation=None,name='layer3')

predictions = {'classes' : tf.argmax(logits, axis=1, name='predicted_classes'),
    'probabilities' : tf.nn.softmax(logits, name='softmax_tensor') }

define cost function and optimizer:

with g.as_default():
cost = tf.losses.softmax_cross_entropy(
onehot_labels=y_onehot, logits=logits)

optimizer = tf.train.GradientDescentOptimizer(
        learning_rate=0.001)

train_op = optimizer.minimize(loss=cost)

init_op = tf.global_variables_initializer()

def create_batch_generator(X, y, batch_size=128, shuffle=False):
X_copy = np.array(X)
y_copy = np.array(y)

if shuffle:
    data = np.column_stack((X_copy, y_copy))
    np.random.shuffle(data)
    X_copy = data[:, :-1]
    y_copy = data[:, -1].astype(int)

for i in range(0, X.shape[0], batch_size):
    yield (X_copy[i:i+batch_size, :], y_copy[i:i+batch_size])

create a session to launch the graph

sess = tf.Session(graph=g)
sess.run(init_op)

for epoch in range(50):
training_costs = []
batch_generator = create_batch_generator( X_train_centered, y_train,
batch_size=64)
for batch_X, batch_y in batch_generator:
## prepare a dict to feed data to our network:
feed = {tf_x:batch_X, tf_y:batch_y}
_, batch_cost = sess.run([train_op, cost], feed_dict=feed)
training_costs.append(batch_cost)
print(' -- Epoch %2d ''Avg. Training Loss: %.4f' % (
epoch+1, np.mean(training_costs) ) )

do prediction on the test set:

feed = {tf_x : X_test_centered}
y_pred = sess.run(predictions['classes'], feed_dict=feed)
print('Test Accuracy: %.2f%%' % ( 100*np.sum(y_pred == y_test)/y_test.shape[0]))

#kr-new #jjangjjangman #kr-dev #pre