OBJECT RECOGNITION FOR AUGMENTED REALITY APPLICATIONS

Volume 4 (1), June 2021, Pages 15-28

Vladislav Li1, Georgios Amponis2, Jean-Christophe Nebel1, Vasileios Argyriou1, Thomas Lagkas2  and Panagiotis Sarigiannidis3


1 Department of Networks and Digital Media, Kingston University, London, UK

2 Department of Computer Science, International Hellenic University, Greece

3 Department of Electrical and Computer Engineering, University of Western Macedonia, Kozani, Greece, This email address is being protected from spambots. You need JavaScript enabled to view it.


Abstract

Developments in the field of neural networks, deep learning, and increases in computing systems’ capacity have allowed for a significant performance boost in scene semantic information extraction algorithms and their respective mechanisms. The work presented in this paper investigates the performance of various object classification- recognition frameworks and proposes a novel framework, which incorporates Super-Resolution as a preprocessing method, along with YOLO/Retina as the deep neural network component. The resulting scene analysis framework was fine-tuned and benchmarked using the COCO dataset, with the results being encouraging. The presented framework can potentially be utilized, not only in still image recognition scenarios but also in video processing.

Keywords:

Object Recognition, Scene Analysis, Super Resolution, Machine Learning, High-Performance Computing, Feature Extraction.

DOI: https://doi.org/10.32010/26166127.2021.4.1.15.28

 

 

Reference 

Cai, Z., & Vasconcelos, N. (2019). Cascade R-CNN: high quality object detection and instance segmentation. IEEE transactions on pattern analysis and machine intelligence.

Cao, J., Cholakkal, H., Anwer, R. M., Khan, F. S., Pang, Y., & Shao, L. (2020). D2det: Towards high quality object detection and instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11485-11494).

Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., & Tian, Q. (2019). Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6569-6578).

Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial networks. arXiv preprint arXiv: 1406.2661.

He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017). Mask r-cnn. Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

Huang, Y., Shao, L., & Frangi, A. F. (2017). Simultaneous super-resolution and cross-modality synthesis of 3D medical images using weakly-supervised joint convolutional sparse coding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6070-6079).

Johnson, J., Alahi, A., & Fei-Fei, L. (2016, October). Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision (pp. 694-711). Springer, Cham.

Ke, W., Zhang, T., Huang, Z., Ye, Q., Liu, J., & Huang, D. (2020). Multiple anchor learning for visual object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10206-10215).

Kong, T., Sun, F., Liu, H., Jiang, Y., & Shi, J. (2019). Consistent optimization for single-shot object detection. arXiv preprint arXiv: 1901.06563.

Law, H., & Deng, J. (2018). Cornernet: Detecting objects as paired keypoints. In Proceedings of the European conference on computer vision (ECCV) (pp. 734-750).

Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., ... & Shi, W. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4681-4690).

Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollar, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).

Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollar, P. (2018). Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 318-327.

Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., ... & Zitnick, C. L. (2014, September). Microsoft coco: Common objects in context. European conference on computer vision (pp. 740-755). Springer, Cham.

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016, October). Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21-37). Springer, Cham.

Mudunuri, S. P., & Biswas, S. (2015). Low resolution face recognition across variations in pose and illumination. IEEE transactions on pattern analysis and machine intelligence, 38(5), 1034-1040.

Perez-Rua, J. M., Zhu, X., Hospedales, T. M., & Xiang, T. (2020). Incremental few-shot object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 13846-13855).

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. arXivpreprint arXiv: 1506.01497.

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: 1409.1556.

Zhou, X., Zhuo, J., & Krahenbuhl, P. (2019). Bottom-up object detection by grouping extreme and center points. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 850-859).