Perception of multimodal objects in NLP through computer vision

Authors

  • Sakib Hosen Himel Department of Computer Science and Engineering, Daffodil International University, Dhaka-1207, Bangladesh
  • Mahidul Islam Rana Department of Computer Science and Engineering, Daffodil International University, Dhaka-1207, Bangladesh

DOI:

https://doi.org/10.25081/rrst.2023.15.8022

Keywords:

MobileNet, SSD-V3, Object detection, NLP, Computer vision, COCO dataset

Abstract

This project is based on voice interaction and object detecting properties. It will allow the users to do voice interaction with the artificial intelligence and it will reply with the system voice. That is how users will use their voice to command as a trigger to find out the category of any object by showing it using the camera module. At first, the user will show an object with the help of a camera and ask for identifying it in the system. The object detection system then captures a frame from the camera and predicts through the structure to identify which class the object belongs to by extracting the feature from there. The process of this application is to search the database to match the structural data to find out the exact category of the object. When this system approximately matches with the information of a category then the application will suggest the category for the object by mentioning the category name through voice. This application can also give some basic information by asking for it. Our general-purpose approach can be effective in interpreting the structure and properties of objects in different networks through natural language processing.

Downloads

Download data is not yet available.

References

Azevedo, F. A. C., Carvalho, L. R. B., Grinberg, L. T., Farfel, J. M., Ferretti, R. E. L., Leite R. E. P., Filho, W. J., Lent, R., & Herculano-Houzel, S. (2009). Equal numbers of neuronal and nonneuronal cells make the human brain an isometrically scaled-up primate brain. The Journal of Comparative Neurology, 513(5), 532-541. https://doi.org/10.1002/cne.21974

Budiharto, W. (2014). Robust vision-based detection and grasping object for manipulator using SIFT keypoint detector. International Conference on Advanced Mechatronic Systems (pp. 448-452). IEEE. https://doi.org/10.1109/ICAMechS.2014.6911587

Budiharto, W., Gunawan, A. A. S., Suroso, J. S., Chowanda, A., Patrik, A., & Utama, G. (2018). Fast object detection for quadcopter drone using deep learning. International Conference on Computer and Communication Systems (pp. 192-195). IEEE. https://doi.org/10.1109/CCOMS.2018.8463284

COCO. (2021). Common Objects in Context. Retrieved from https:// cocodataset.org/#home

Graetz, F. M. (2018). RetinaNet: how Focal Loss fixes Single-Shot Detection. Retrieved from https://towardsdatascience.com/retinanet-how-focal-loss-fixes-single-shot-detection-cb320e3bb0de

Hui, J. (2018). SSD object detection: Single Shot MultiBox Detector for real-time processing. Retrieved from https://jonathan-hui.medium.com/ssd-object-detection-single-shot-multibox-detector-for-real-time-processing-9bd8deac0e06

Li, J., Chen, X., Hovy, E., & Jurafsky, D. (2015). Visualizing and understanding neural models in NLP. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 681-691). Association for Computational Linguistics.

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. Lecture Notes in Computer Science: European conference on computer vision (vol. 8693, pp. 740-755). Cham: Springer. https://doi.org/10.1007/978-3-319-10602-1_48

Medium. (2021). Object Detection with SSD and MobileNet. Retrieved from https://medium.com/@aditya.kunar_52859/object-detection-with-ssd-and-mobilenetaeedc5917ad0

Wang, S. C. (2003). Artificial neural network. In Interdisciplinary computing in java programming (pp. 81-100). Boston, US: Springer.

Yeremia, H., Yuwono, N. A., Raymond, P., & Budiharto, W. (2013). Genetic algorithm and neural network for optical character recognition. Journal of Computer Science, 9(11), 1435-1442. https://doi.org/10.3844/jcssp.2013.1435.1442

Published

10-01-2023

How to Cite

Himel, S. H., & Rana, M. I. . (2023). Perception of multimodal objects in NLP through computer vision. Recent Research in Science and Technology, 15, 1–7. https://doi.org/10.25081/rrst.2023.15.8022

Issue

Section

Articles