在上一篇 中介绍了如何编译、安装、配置ssd,本文介绍如何训练数据为模型并测试模型效果。
数据集首先下载数据集。这里下载的是VOC 2007/2012,总计2.7GB,解压后2.9GB
1 2 3 4 5 6 7 8 cd $HOME /data wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar tar -xvf VOCtrainval_11-May-2012.tar tar -xvf VOCtrainval_06-Nov-2007.tar tar -xvf VOCtest_06-Nov-2007.tar
注意这里的下载保存路径,需要是家目录下的data目录,是和下一步的脚本中路径一致。
LMDB创建LMDB文件
1 2 3 4 5 6 7 8 9 cd $CAFFE_ROOT ./data/VOC0712/create_list.sh ./data/VOC0712/create_data.sh
创建列表输出为
1 2 3 4 5 6 7 8 9 10 hyper372@hyper372-ai:~/Documents/caffe$ ./data/VOC0712/create_list.sh Create list for VOC2007 trainval... Create list for VOC2012 trainval... Create list for VOC2007 test ... I0605 10:07:47.896740 5859 get_image_size.cpp:61] A total of 4952 images. I0605 10:07:49.007894 5859 get_image_size.cpp:100] Processed 1000 files. I0605 10:07:50.128299 5859 get_image_size.cpp:100] Processed 2000 files. I0605 10:07:51.242344 5859 get_image_size.cpp:100] Processed 3000 files. I0605 10:07:52.354455 5859 get_image_size.cpp:100] Processed 4000 files. I0605 10:07:53.439630 5859 get_image_size.cpp:105] Processed 4952 files.
创建数据数据库输出为
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 hyper372@hyper372-ai:~/Documents/caffe$ ./data/VOC0712/create_data.sh /home/hyper372/Documents/caffe/build/tools/convert_annoset --anno_type=detection --label_type=xml --label_map_file=/home/hyper372/Documents/caffe/data/VOC0712/../../data/VOC0712/labelmap_voc.prototxt --check_label=True --min_dim=0 --max_dim=0 --resize_height=0 --resize_width=0 --backend=lmdb --shuffle=False --check_size=False --encode_type=jpg --encoded=True --gray=False /home/hyper372/data/VOCdevkit/ /home/hyper372/Documents/caffe/data/VOC0712/../../data/VOC0712/test.txt /home/hyper372/data/VOCdevkit/VOC0712/lmdb/VOC0712_test_lmdb I0605 10:21:20.892621 6031 convert_annoset.cpp:122] A total of 4952 images. I0605 10:21:20.893379 6031 db_lmdb.cpp:35] Opened lmdb /home/hyper372/data/VOCdevkit/VOC0712/lmdb/VOC0712_test_lmdb I0605 10:21:23.375008 6031 convert_annoset.cpp:195] Processed 1000 files. I0605 10:21:25.777346 6031 convert_annoset.cpp:195] Processed 2000 files. I0605 10:21:28.468402 6031 convert_annoset.cpp:195] Processed 3000 files. I0605 10:21:31.168674 6031 convert_annoset.cpp:195] Processed 4000 files. I0605 10:21:33.670279 6031 convert_annoset.cpp:201] Processed 4952 files. /home/hyper372/Documents/caffe/build/tools/convert_annoset --anno_type=detection --label_type=xml --label_map_file=/home/hyper372/Documents/caffe/data/VOC0712/../../data/VOC0712/labelmap_voc.prototxt --check_label=True --min_dim=0 --max_dim=0 --resize_height=0 --resize_width=0 --backend=lmdb --shuffle=False --check_size=False --encode_type=jpg --encoded=True --gray=False /home/hyper372/data/VOCdevkit/ /home/hyper372/Documents/caffe/data/VOC0712/../../data/VOC0712/trainval.txt /home/hyper372/data/VOCdevkit/VOC0712/lmdb/VOC0712_trainval_lmdb I0605 10:21:34.663084 6100 convert_annoset.cpp:122] A total of 16551 images. I0605 10:21:34.663497 6100 db_lmdb.cpp:35] Opened lmdb /home/hyper372/data/VOCdevkit/VOC0712/lmdb/VOC0712_trainval_lmdb I0605 10:21:37.976790 6100 convert_annoset.cpp:195] Processed 1000 files. I0605 10:21:41.071249 6100 convert_annoset.cpp:195] Processed 2000 files. I0605 10:21:44.191231 6100 convert_annoset.cpp:195] Processed 3000 files. I0605 10:21:47.320384 6100 convert_annoset.cpp:195] Processed 4000 files. I0605 10:21:50.551687 6100 convert_annoset.cpp:195] Processed 5000 files. I0605 10:21:53.697355 6100 convert_annoset.cpp:195] Processed 6000 files. I0605 10:21:56.773370 6100 convert_annoset.cpp:195] Processed 7000 files. I0605 10:21:59.869189 6100 convert_annoset.cpp:195] Processed 8000 files. I0605 10:22:02.992766 6100 convert_annoset.cpp:195] Processed 9000 files. I0605 10:22:06.083061 6100 convert_annoset.cpp:195] Processed 10000 files. I0605 10:22:09.214797 6100 convert_annoset.cpp:195] Processed 11000 files. I0605 10:22:12.303718 6100 convert_annoset.cpp:195] Processed 12000 files. I0605 10:22:15.529724 6100 convert_annoset.cpp:195] Processed 13000 files. I0605 10:22:18.636446 6100 convert_annoset.cpp:195] Processed 14000 files. I0605 10:22:21.745776 6100 convert_annoset.cpp:195] Processed 15000 files. I0605 10:22:24.911562 6100 convert_annoset.cpp:195] Processed 16000 files. I0605 10:22:26.647538 6100 convert_annoset.cpp:201] Processed 16551 files.
基础模型从https://github.com/conner99/VGGNet/
下载基础模型文件并放置于.../caffe/models/VGGNet
目录下
训练在caffe目录执行
1 python3 examples/ssd/ssd_pascal.py
开始训练,输出数据为
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ... 02:18.929926 9181 solver.cpp:259] Train net output I0606 10:02:18.929934 9181 sgd_solver.cpp:138] Iteration 30, lr = 0.0001 I0606 10:02:19.643611 9181 solver.cpp:243] Iteration 40, loss = 15.662 I0606 10:02:19.643646 9181 solver.cpp:259] Train net output I0606 10:02:19.643652 9181 sgd_solver.cpp:138] Iteration 40, lr = 0.0001 I0606 10:02:20.352501 9181 solver.cpp:243] Iteration 50, loss = 15.9099 I0606 10:02:20.352522 9181 solver.cpp:259] Train net output I0606 10:02:20.352526 9181 sgd_solver.cpp:138] Iteration 50, lr = 0.0001 I0606 10:02:21.057381 9181 solver.cpp:243] Iteration 60, loss = 12.4434 I0606 10:02:21.057401 9181 solver.cpp:259] Train net output I0606 10:02:21.057406 9181 sgd_solver.cpp:138] Iteration 60, lr = 0.0001 I0606 10:02:21.766348 9181 solver.cpp:243] Iteration 70, loss = 10.3261 I0606 10:02:21.766368 9181 solver.cpp:259] Train net output I0606 10:02:21.766372 9181 sgd_solver.cpp:138] Iteration 70, lr = 0.0001 I0606 10:02:22.482571 9181 solver.cpp:243] Iteration 80, loss = 14.9099 I0606 10:02:22.482591 9181 solver.cpp:259] Train net output I0606 10:02:22.482596 9181 sgd_solver.cpp:138] Iteration 80, lr = 0.0001 I0606 10:02:23.193933 9181 solver.cpp:243] Iteration 90, loss = 13.2082 I0606 10:02:23.193953 9181 solver.cpp:259] Train net output I0606 10:02:23.193957 9181 sgd_solver.cpp:138] Iteration 90, lr = 0.0001
mbox_loss/loss整体是减少的
训练完成
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ... I0606 12:50:29.691890 9181 sgd_solver.cpp:138] Iteration 119950, lr = 1e-06 I0606 12:50:30.420228 9181 solver.cpp:243] Iteration 119960, loss = 2.89373 I0606 12:50:30.420271 9181 solver.cpp:259] Train net output I0606 12:50:30.420276 9181 sgd_solver.cpp:138] Iteration 119960, lr = 1e-06 I0606 12:50:31.145910 9181 solver.cpp:243] Iteration 119970, loss = 3.60434 I0606 12:50:31.145931 9181 solver.cpp:259] Train net output I0606 12:50:31.145933 9181 sgd_solver.cpp:138] Iteration 119970, lr = 1e-06 I0606 12:50:31.867653 9181 solver.cpp:243] Iteration 119980, loss = 2.38721 I0606 12:50:31.867677 9181 solver.cpp:259] Train net output I0606 12:50:31.867681 9181 sgd_solver.cpp:138] Iteration 119980, lr = 1e-06 I0606 12:50:32.584725 9181 solver.cpp:243] Iteration 119990, loss = 4.15097 I0606 12:50:32.584744 9181 solver.cpp:259] Train net output I0606 12:50:32.584748 9181 sgd_solver.cpp:138] Iteration 119990, lr = 1e-06 I0606 12:50:33.244800 9181 solver.cpp:596] Snapshotting to binary proto file models/VGGNet/VOC0712/SSD_300x300/VGG_VOC0712_SSD_300x300_iter_120000.caffemodel I0606 12:50:33.442319 9181 sgd_solver.cpp:307] Snapshotting solver state to binary proto file models/VGGNet/VOC0712/SSD_300x300/VGG_VOC0712_SSD_300x300_iter_120000.solverstate I0606 12:50:33.523787 9181 solver.cpp:332] Iteration 120000, loss = 3.751 I0606 12:50:33.523806 9181 solver.cpp:433] Iteration 120000, Testing net (#0) I0606 12:50:33.523845 9181 net.cpp:693] Ignoring source layer mbox_loss I0606 12:52:32.019611 9181 solver.cpp:546] Test net output I0606 12:52:32.019716 9181 solver.cpp:337] Optimization Done. I0606 12:52:32.019721 9181 caffe.cpp:254] Optimization Done.
训练结果训练结果在.../caffe/modes/VGGNet/VOC0712/SSD_300x300/
目录中
1 2 3 4 5 6 7 8 9 10 11 12 hyper372@hyper372-ai:~/Documents/caffe/models/VGGNet/VOC0712/SSD_300x300$ ll total 410856 drwxrwxr-x 2 hyper372 hyper372 4096 6月 6 12:50 ./ drwxrwxr-x 3 hyper372 hyper372 4096 6月 5 10:25 ../ -rw-rw-r-- 1 hyper372 hyper372 26298 6月 6 10:02 deploy.prototxt -rw-rw-r-- 1 hyper372 hyper372 669 6月 6 10:02 solver.prototxt -rw-rw-r-- 1 hyper372 hyper372 27125 6月 6 10:02 test.prototxt -rw-rw-r-- 1 hyper372 hyper372 28593 6月 6 10:02 train.prototxt -rw-rw-r-- 1 hyper372 hyper372 105154337 6月 6 12:50 VGG_VOC0712_SSD_300x300_iter_120000.caffemodel -rw-rw-r-- 1 hyper372 hyper372 105143086 6月 6 12:50 VGG_VOC0712_SSD_300x300_iter_120000.solverstate -rw-rw-r-- 1 hyper372 hyper372 105154337 6月 6 11:53 VGG_VOC0712_SSD_300x300_iter_80000.caffemodel -rw-rw-r-- 1 hyper372 hyper372 105143085 6月 6 11:53 VGG_VOC0712_SSD_300x300_iter_80000.solverstate
机械革命深海幽灵Z3 Air-S 2060,batch为1,耗时3个小时。batch为8,耗时18小时20分钟。再加就爆内存了。
模型测试 目标检测执行命令
1 python3 examples/ssd/ssd_detect.py
输出为
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ... I0606 13:52:13.823376 10354 net.cpp:228] pool1 does not need backward computation. I0606 13:52:13.823379 10354 net.cpp:228] relu1_2 does not need backward computation. I0606 13:52:13.823401 10354 net.cpp:228] conv1_2 does not need backward computation. I0606 13:52:13.823403 10354 net.cpp:228] relu1_1 does not need backward computation. I0606 13:52:13.823405 10354 net.cpp:228] conv1_1 does not need backward computation. I0606 13:52:13.823410 10354 net.cpp:228] data_input_0_split does not need backward computation. I0606 13:52:13.823411 10354 net.cpp:228] input does not need backward computation. I0606 13:52:13.823413 10354 net.cpp:270] This network produces output detection_out I0606 13:52:13.823448 10354 net.cpp:283] Network initialization done . I0606 13:52:13.887385 10354 net.cpp:761] Ignoring source layer data I0606 13:52:13.887403 10354 net.cpp:761] Ignoring source layer data_data_0_split I0606 13:52:13.907215 10354 net.cpp:761] Ignoring source layer mbox_loss [[0.43371916, 0.041391477, 0.72588027, 0.50166625, 15, 0.642155, 'person' ]] 481 323 [0.43371916, 0.041391477, 0.72588027, 0.50166625, 15, 0.642155, 'person' ] [209, 13, 349, 162] [209, 13] person
检测的图片位于.../caffe/examples/images/
,路径可以在源码中修改。
在.../caffe
目录会生成检测结果
batch为8时,模型可以检测出更多的结果
视频检测执行命令
1 python3 examples/ssd/ssd_pacsal_webcam.py
会自动打开摄像头,检测结果为
batch为8时,检测可能性值为0.95
模型评分执行命令
1 python3 examples/ssd/score_ssd_pascal.py
部分输出为
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ... I0606 13:58:26.945226 11671 net.cpp:228] relu2_1 does not need backward computation. I0606 13:58:26.945227 11671 net.cpp:228] conv2_1 does not need backward computation. I0606 13:58:26.945230 11671 net.cpp:228] pool1 does not need backward computation. I0606 13:58:26.945230 11671 net.cpp:228] relu1_2 does not need backward computation. I0606 13:58:26.945232 11671 net.cpp:228] conv1_2 does not need backward computation. I0606 13:58:26.945233 11671 net.cpp:228] relu1_1 does not need backward computation. I0606 13:58:26.945235 11671 net.cpp:228] conv1_1 does not need backward computation. I0606 13:58:26.945237 11671 net.cpp:228] data_data_0_split does not need backward computation. I0606 13:58:26.945238 11671 net.cpp:228] data does not need backward computation. I0606 13:58:26.945240 11671 net.cpp:270] This network produces output detection_eval I0606 13:58:26.945272 11671 net.cpp:283] Network initialization done . I0606 13:58:26.945401 11671 solver.cpp:75] Solver scaffolding done . I0606 13:58:26.946391 11671 caffe.cpp:155] Finetuning from models/VGGNet/VOC0712/SSD_300x300/VGG_VOC0712_SSD_300x300_iter_120000.caffemodel I0606 13:58:27.101177 11671 net.cpp:761] Ignoring source layer mbox_loss I0606 13:58:27.103570 11671 caffe.cpp:251] Starting Optimization I0606 13:58:27.103576 11671 solver.cpp:294] Solving VGG_VOC0712_SSD_300x300_train I0606 13:58:27.103578 11671 solver.cpp:295] Learning Rate Policy: multistep I0606 13:58:27.276463 11671 solver.cpp:332] Iteration 0, loss = 5.70725 I0606 13:58:27.276504 11671 solver.cpp:433] Iteration 0, Testing net (#0) I0606 13:58:27.285435 11671 net.cpp:693] Ignoring source layer mbox_loss I0606 14:00:27.669929 11671 solver.cpp:546] Test net output I0606 14:00:27.670023 11671 solver.cpp:337] Optimization Done. I0606 14:00:27.670027 11671 caffe.cpp:254] Optimization Done.
评分为0.576414,不及格,batch为8时,评分为0.715659,还行。
错误这里指的是我遇到的错误
NameError: name ‘xrange’ is not defined. Did you mean: ‘range’xrange函数是python2的,将其改为range即可。
TypeError: ‘>’ not supported between instances of ‘builtin_function_or_method’ and ‘int’打开.../caffe/python/caffe/model_libs.py
注释第16行如下
TypeError: 1.0 has type float, but expected one of: int, long打开.../caffe/python/caffe/model_libs.py
修改第156行
1 pad = int ((3 +(dilation-1 ) *2 )-1 ) // 2
第375行
1 pad = int ((kernel_size + (dilation -1 )*(kernel_size-1 ))-1 )//2
第417行
1 pad = int ((kernel_size+(dilation-1 )*(kernel_size-1 ))-1 ) //2
这三处都是将/
改为//
Check failed: status == CUDNN_STATUS_SUCCESS (1 vs. 0) CUDNN_STATUS_NOT_INITIALIZED这个可能导致的原因有很多,比如显卡和启动不匹配,显存不够,删除~/.nv
目录等。我遇到的解决方案是给train.prototxt
中所有convolution_param
添加engine: CAFFE
。
Check failed: a<=b <0 vs -1.19209e-007>打开.../caffe/src/caffe/util/math_functions.cpp
注释第250行
Data layer prefetch queue empty由修改上一个问题导致的问题
打开.../caffe/src/caffe/util/sampler.cpp
在第109行添加
1 2 3 4 if (bbox_width >= 1.0 ){bbox_width=1.0 }if (bbox_height>= 1.0 ){bbox_height=1.0 }
确保数据不会越界。
2 vs. 0 Out of memory这个是内存不够,将.../caffe/example/ssd/ssd_pacal.py
中batch_size调小一点。
(10 vs. 0) invalid device ordinalGPU顺序错误,官方程序使用了四个GPU,但是我只有一个,将.../caffe/example/ssd/ssd_pacal.py
中第332行改为1个GPU。
(4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR我的是内存不够了,把nvidia-smi中占用多的杀了。
此函数由caffe.io.load_image()调用
将.../caffe/example/ssd/ssd_detect.py
中第68行的
1 image = caffe.io.load_image(image_file)
改为
1 2 3 image = cv2.imread(image_file) image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB) image = image/255
需要导入cv2库。
mbox_loss = nan (* 1 = nan loss) 或者 loss = nan损失值溢出
修改.../caffe/example/ssd/ssd_pacal.py
第232行base_lr的值,缩至1/10,如果不行再缩至1/10
Couldn’t find any detections同上
源码我修改后的可用源码位于caffe-ssd