评估代码

最近看论文看到这样的指标,就去了解了一下是怎么得来的。

这套评价指标来自于aitodpycocotools,官方github:GitHub - jwwangchn/cocoapi-aitod: COCO API - Dataset @ http://cocodataset.org/ 

 这套评估指标的生成和使用pycocotools进行评估类似,先来看看使用pycocotools如何生成评估指标:(参考自cocoapi的评估示例脚本:https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb

from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval

coco_true = COCO(annotation_file='/path/to/annotation.json')
coco_pre = coco_true.loadRes('/path/to/prediction.json')
cocoevaluator = COCOeval(cocoGt = coco_true, cocoDt = coco_pre, iouType = "bbox")
cocoevaluator.evaluate()
cocoevaluator.accumulate()
cocoevaluator.summarize()

 与上述类似,使用aitodpycocotools进行评估,只需要改前面两行导入:

from aitodpycocotools.coco import COCO
from aitodpycocotools.cocoeval import COCOeval

coco_true = COCO(annotation_file='/mnt/sdb2/ray/AI-TOD/annotations/aitodv2_test.json')
coco_pre = coco_true.loadRes('output/tod/prediction.json')
cocoevaluator = COCOeval(cocoGt = coco_true, cocoDt = coco_pre, iouType = "bbox")
cocoevaluator.evaluate()
cocoevaluator.accumulate()
cocoevaluator.summarize()

 获取predicition.json

在上面的代码中,只有prediction.json是需要自己生成的。predicition.json的数据格式是一个列表,里面的元素为字典,每一个字典就是一个预测框的信息。predicition.json的数据格式如下所示:(参考自:COCO - Common Objects in Context

[{"image_id": int, 
  "category_id": int, 
  "bbox": [x,y,width,height], 
  "score": float}, {......}, ...]

 注意以上的x,y是图片左上角的坐标xmin和ymin。以我在使用的DETR类模型为例,对测试集中的每张图片,都会生成300个预测框,将每个框的以上四个信息写入,最后就能得到json文件,最后成功生成评估指标。

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=1500 ] = 0.133
Average Precision  (AP) @[ IoU=0.25      | area=   all | maxDets=1500 ] = -1.000
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1500 ] = 0.347
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1500 ] = 0.074
Average Precision  (AP) @[ IoU=0.50:0.95 | area=verytiny | maxDets=1500 ] = 0.035
Average Precision  (AP) @[ IoU=0.50:0.95 | area=  tiny | maxDets=1500 ] = 0.128
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1500 ] = 0.181
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1500 ] = 0.242
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.043
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.226
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1500 ] = 0.238
Average Recall     (AR) @[ IoU=0.50:0.95 | area=verytiny | maxDets=1500 ] = 0.056
Average Recall     (AR) @[ IoU=0.50:0.95 | area=  tiny | maxDets=1500 ] = 0.229                                            Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1500 ] = 0.314                                            Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1500 ] = 0.371
Optimal LRP             @[ IoU=0.50      | area=   all | maxDets=1500 ] = 0.883
Optimal LRP Loc         @[ IoU=0.50      | area=   all | maxDets=1500 ] = 0.311
Optimal LRP FP          @[ IoU=0.50      | area=   all | maxDets=1500 ] = 0.444
Optimal LRP FN          @[ IoU=0.50      | area=   all | maxDets=1500 ] = 0.629
# Class-specific LRP-Optimal Thresholds #                                                                                   [0.51 0.47 0.59 0.53 0.44 0.51 0.42 0.4 ] 

这里不清楚为什么第二行的结果是-1,但是其他的指标应该是对的,因为这里本人用pycocotools也进行了评估,指标结果和以上相差不大,差别应该来自maxdets的影响。这里也贴一个用pycocotools评估的结果。

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.127
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.329
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.072
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.120
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.239
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.043
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.136
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.226
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.217
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.357
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000

 由于AI-TOD数据集里的目标都是小目标,所以area=large的指标输出为-1。

注意事项

主要是讲一下我踩的坑。

  • 模型预测得到的预测框坐标值为[cx, cy, w, h],我没有进行坐标的转换就写入了json文件,导致后续生成的评估结果都为0(第二行还是-1)。坐标三种格式:xyxy、cxcywh和xywh,写入json的是xywh。
  • aitodpycocotools的评估结果第二行结果为-1虽不影响使用,但还暂时没弄清原因。

 

Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐