Caffeを使って手書きのひらがなを識別する

少女時代のクラスタリングはやめたの？

Caffeのネットワークの説明が不足していたので今日は詳しく説明します

Caffeのネタはもう終わりにしようと思っていたのですが、少女時代のクラスタリングの話ではいまいち何をやっているのかわからないし、世の中に少女時代の区別がつくようになって嬉しい人がどれだけいるのか疑問だったので改めて違うテーマを使って説明していきたいと思います

今回は手書き文字認識をテーマに学習器の設定の仕方を書きたいと思います。
できるだけ他のテーマを対象とした画像クラスタリングにも使えるように説明していくのでよかったら見ていってください。

少女時代の顔認識の記事を見ていない人はこのまとめ記事をご覧ください。私がこれまでCaffeに関して書いた記事をリンクしています。

これまでのCaffeの記事のまとめと補足
Ry0 Note http://ry0.github.io/blog/2015/09/30/using-summary-of-caffe/

まずワークスペースの確保

caffeのフォルダの中にexamplesというフォルダがあると思います。ここに作ります。学習モデルを作る際に、Cifar10の学習モデルを参考にするので、コピーしておきます。

cd caffe/examples
mkdir <自分が作りたい名前>
cp cifar10/cifar10_quick* <自分が作りたい名前>

ここからはこの作成したフォルダの中で作業していきます。

まずはデータセット作り

これがないと学習を行うことができないので、既存のデータセットをどこかから持ってくるか、自分で作成してください。
データセット作りに関しても上のリンクで書いています。
今回、ひらがなの手書き文字認識を例としているので、クラス数はひらがな（濁音半濁音を含む）の73文字をクラス数とします。したがって「あ」から順に0から「ぽ」までの72です。

これでleveldb形式の学習データとテストデータtegaki_cifar10_test_leveldbとtegaki_cifar10_train_leveldbを作成しました。

平均画像の作成は

cd ../../ #Caffeのフォルダの一番最初まで移動
build/tools/compute_image_mean.bin -backend=leveldb ./examples/handwriting_recognition/tegaki_cifar10_train_leveldb ./examples/handwriting_recognition/tegaki_mean.binaryproto

としました。成功するとtegaki_mean.binaryprotoが/examples/handwriting_recognition/に生成されます。

cifar10_quick_solver.prototxtの編集

赤字から緑の字に変更することを意味します。
まずcifar10_quick_solver.prototxtから好きな名前に変更します。 tegaki_cifar10_quick_solver.prototxtとしました。

# reduce the learning rate after 8 epochs (4000 iters) by a factor of 10

# The train/test net protocol buffer definition
- net: "examples/cifar10/cifar10_quick_train_test.prototxt"
+ net: "examples/handwriting_recognition/tegaki_cifar10_quick_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
- test_iter: 100
+ test_iter: 20
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
- base_lr: 0.001
+ base_lr: 0.0001
momentum: 0.9
weight_decay: 0.004
# The learning rate policy
lr_policy: "fixed"
# Display every 100 iterations
display: 100
# The maximum number of iterations
- max_iter: 4000
+ max_iter: 10000
# snapshot intermediate results
- snapshot: 4000
+ snapshot: 5000
- snapshot_prefix: "examples/cifar10/cifar10_quick"
+ snapshot_prefix: "examples/handwriting_recognition/tegaki_cifar10_quick_with_dropout"
# solver mode: CPU or GPU
solver_mode: GPU

test_iter
学習中の正答率評価を1回行なうために使うデータ数をイテレーション単位で指定します。
base_lr
学習がどの程度一度に進めるかを表す数値で、小さいほど精度が良くなるとのことなので、一桁小さくしました。
max_iter
学習をおこなうイテレーション数です。Cifar10のモデルよりも多く設定しました。
snapshot
学習結果の途中経過を出力する頻度です。
snapshot_prefix
途中経過を出力するときの名前を指定します。
solver_mode
CPUを使う場合は変更してください。

cifar10_quick_train_test.prototxtの編集

同様に変更します。 cifar10_quick_train_test.prototxtをtegaki_cifar10_quick_train_test.prototxtにリネームしました。

1行目からの変更分です。

name: "CIFAR10_quick"
layer {
  name: "cifar"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
-    mean_file: "examples/cifar10/mean.binaryproto"
+    mean_file: "examples/handwriting_recognition/tegaki_mean.binaryproto"
  }
  data_param {
-    source: "examples/cifar10/cifar10_train_lmdb"
+    source: "examples/handwriting_recognition/tegaki_cifar10_train_leveldb"
    batch_size: 100
-    backend: LMDB
+    backend: LEVELDB
  }
}
layer {
  name: "cifar"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
-    mean_file: "examples/cifar10/mean.binaryproto"
+    mean_file: "examples/handwriting_recognition/tegaki_mean.binaryproto"
  }
  data_param {
-    source: "examples/cifar10/cifar10_test_lmdb"
+    source: "examples/handwriting_recognition/tegaki_cifar10_test_leveldb"
    batch_size: 100
-    backend: LMDB
+    backend: LEVELDB
  }
}

transform_paramの中にはデータセットをランダムにクロップする(例: crop_size: 30)とか左右反転して学習に用いる(例: mirror: true)を入れてデータの水増しができます。今回は用いませんでしたが、少ないデータセットを有効に使うための工夫をいれてもいいでしょう。
ちなみに少女時代の学習モデルには使っているので参考にしてみてください。

162行目からの記述です。 過学習を防ぐためにドロップアウトを新たに入れています。
一番重要なのは最後のnum_output:で目的に応じたクラス数を記述するようにしてください。
最後の修正点は、学習時の精度も確認できるようにする設定です。

layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool3"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 64
    weight_filler {
      type: "gaussian"
      std: 0.1
    }
    bias_filler {
      type: "constant"
    }
  }
}
+layer {
+  name: "drop1"
+  type: "Dropout"
+  bottom: "ip1"
+  top: "drop1"
+  dropout_param {
+    dropout_ratio: 0.5
+  }
+}
layer {
  name: "ip2"
  type: "InnerProduct"
-  bottom: "ip1"
+  bottom: "drop1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
-    num_output: 10
+    num_output: 73
    weight_filler {
      type: "gaussian"
      std: 0.1
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
-  include {
-    phase: TEST
-  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

prototxtファイルからブロック図をpngで出力するコマンドがあります。

python/draw_net.py examples/handwriting_recognition/tegaki_cifar10_quick_train_test.prototxt caffeNet.png --rankdir BT

全体像は画像をクリックしたら見れます。

cifar10_quick.prototxtの編集

cifar10_quick.prototxtをtegaki_cifar10_quick.prototxtにしました。

最後の出力だけ自分が出力したいクラス数に変更しました。

layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
-    num_output: 10
+    num_output: 73
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "ip2"
  top: "prob"
}

学習の実行

これで学習を実行します。エラーがでる場合はもう一度ファイルネームが間違っていないか確認してください。

cd ../../ #Caffeのフォルダの一番最初まで移動
build/tools/caffe train --solver examples/handwriting_recognition/tegaki_cifar10_quick_solver.prototxt

結果

テストデータに関する精度は9割超えでまずまずの結果が得られました。

簡単なアプリを作成

これだけでは面白くないので、OpenCVを使ってマウスで実際に字を書いて、それがひらがなの何かをCaffeに処理させるアプリを作りました。

これらのすべてのソースコードと学習モデルはGithubに置いているので興味がある方はどうぞ

このアプリの実行方法はREADMEにも書いていますが

cd handwriting_recognition/python
python handwriting-recognition.py

です。学習済みのCaffeモデルも置いてますので、データセットを用意しなくても結果だけ試すことができます。
ただしOpenCVが必要です。一応OpenCV 3.0でも動きます。

おわりに

できるだけ丁寧に書いたつもりでしたがどうでしょうか。少しでも使い方が分かってもらえたら幸いです。ひらがな文字認識のプログラム等が動かない場合はフィードバックください。

学習モデルも構築についてもっと何をやっているのか知りたくなった方はSIG2Dさんがアップしてくださっている資料を参考にしてください。
しっかり読んだら、私の記事を見るより絶対理解が深まるはずです。

SIG2D’14 葉月ちゃんでも出来る Deep Learning
SIG2D http://sig2d.org/blog/2015/07/02/sig2d14/

最近ではGoogleが出したTensorFlowのほうにみんなの興味が行ってるので、これからどうなるんでしょうねCaffeは。
時間があればTensorFlowの方も試したいと思います。
最後まで見ていただいてありがとうございました。