PyTorch-YOLOv3がエラーで動かない。そんな時に確認してほしいアレの話
2019-07-12
azblob://2022/11/11/eyecatch/2019-07-12-pytorch-yolov3-error-get-over-000.jpg

はじめに

前回でOSのディスク容量を増やして、いざPyTorch-YOLOv3を動かそうとしたらエラーで詰まったので、その解決方法を紹介します。

やったこと

このReadmeに従い、以下のコマンドでtest.pyを実行しました。

python3 test.py --weights_path weights/yolov3.weights

すると、以下のようなエラー。

$ python3 test.py --weights_path weights/yolov3.weights

Namespace(batch_size=8, class_path='data/coco.names', conf_thres=0.001, data_config='config/coco.data', img_size=416, iou_thres=0.5, model_def='config/yolov3.cfg', n_cpu=8, nms_thres=0.5, weights_path='weights/yolov3.weights')
Compute mAP...
Detecting objects:   0%|                                                                                                             | 0/625 [00:00<?, ?it/s]Traceback (most recent call last):
  File "test.py", line 98, in <module>
    batch_size=8,
  File "test.py", line 36, in evaluate
    for batch_i, (_, imgs, targets) in enumerate(tqdm.tqdm(dataloader, desc="Detecting objects")):
  File "/home/dluser/anaconda3/lib/python3.7/site-packages/tqdm/_tqdm.py", line 1022, in __iter__
    for obj in iterable:
  File "/home/dluser/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 336, in __next__
    return self._process_next_batch(batch)
  File "/home/dluser/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
TypeError: Traceback (most recent call last):
  File "/home/dluser/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/dluser/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 106, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/dluser/PyTorch-YOLOv3/utils/datasets.py", line 96, in __getitem__
    img, pad = pad_to_square(img, 0)
  File "/home/dluser/PyTorch-YOLOv3/utils/datasets.py", line 23, in pad_to_square
    img = F.pad(img, pad, "constant", value=pad_value)
  File "/home/dluser/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2159, in pad
    return ConstantPadNd.apply(input, pad, value)
  File "/home/dluser/anaconda3/lib/python3.7/site-packages/torch/nn/_functions/padding.py", line 40, in forward
    c_output = c_output.narrow(i, p[0], c_output.size(i) - p[0])
TypeError: narrow(): argument 'start' (position 2) must be int, not numpy.int64

このissueで紹介されている解決方法を試すと以下。

$ python3 test.py --weights_path weights/yolov3.weights

Namespace(batch_size=8, class_path='data/coco.names', conf_thres=0.001, data_config='config/coco.data', img_size=416, iou_thres=0.5, model_def='config/yolov3.cfg', n_cpu=8, nms_thres=0.5, weights_path='weights/yolov3.weights')
Compute mAP...
Detecting objects:   0%|                                                                                                             | 0/625 [00:00<?, ?it/s]Traceback (most recent call last):
  File "test.py", line 98, in <module>
    batch_size=8,
  File "test.py", line 36, in evaluate
    for batch_i, (_, imgs, targets) in enumerate(tqdm.tqdm(dataloader, desc="Detecting objects")):
  File "/home/dluser/anaconda3/lib/python3.7/site-packages/tqdm/_tqdm.py", line 1022, in __iter__
    for obj in iterable:
  File "/home/dluser/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 336, in __next__
    return self._process_next_batch(batch)
  File "/home/dluser/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
  File "/home/dluser/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/dluser/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 106, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/dluser/PyTorch-YOLOv3/utils/datasets.py", line 115, in __getitem__
    x1 += pad[0]
RuntimeError: Expected object of type torch.DoubleTensor but found type torch.LongTensor for argument #4 'other'

他に色々調べたり、試したりしたのですが、うまくいかずたどり着いたのが、「Pytorchのバージョン問題かも」疑惑。
そこで、Pytorchのバージョンを以下のコマンドで最新にしました(※実際に自分の環境で行う際にはPythonやCUDAのバージョンなどを確認してその環境にあったものをインストールしてください)。

$conda install pytorch torchvision cudatoolkit=9.0 -c pytorch

issueで修正した部分を元に戻すと、以下のように無事testすることができました!

$ python3 test.py --weights_path weights/yolov3.weights --n_cpu 6

Namespace(batch_size=8, class_path='data/coco.names', conf_thres=0.001, data_config='config/coco.data', img_size=416, iou_thres=0.5, model_def='config/yolov3.cfg', n_cpu=6, nms_thres=0.5, weights_path='weights/yolov3.weights')
Compute mAP...
Detecting objects: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 625/625 [06:55<00:00,  1.51it/s]
Computing AP: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:01<00:00, 55.33it/s]
Average Precisions:
+ Class '0' (person) - AP: 0.6907157868140823
+ Class '1' (bicycle) - AP: 0.46869626369935824
+ Class '2' (car) - AP: 0.5847854090492266
+ Class '3' (motorbike) - AP: 0.6173425471546101
+ Class '4' (aeroplane) - AP: 0.7368216071089109
+ Class '5' (bus) - AP: 0.7521598809811552
+ Class '6' (train) - AP: 0.754366135549987
+ Class '7' (truck) - AP: 0.4188454158138422
+ Class '8' (boat) - AP: 0.4055367418456285
+ Class '9' (traffic light) - AP: 0.4443524890561703
+ Class '10' (fire hydrant) - AP: 0.7803236133317674
+ Class '11' (stop sign) - AP: 0.7203250980406222
+ Class '12' (parking meter) - AP: 0.5318708513711929
+ Class '13' (bench) - AP: 0.3334771229094851
・・・

ちなみに私の環境に入ったPyTorchは1.1.0でした。

おわりに

今回はRequirementsにバージョンが書いてないからいけるだろと適当にやった結果痛い目をみましたが、無事PyTorch-YOLOv3を動かすことができました。
これからは、なんか動かないなと思ったらまずパッケージのバージョンを確認するところから始めたいですね。