| 研究生: |
林政威 Cheng-Wei Lin |
|---|---|
| 論文名稱: |
顯著物件與尺度不變特徵轉換特徵包比對之影像搜尋研究 The Study of Salient Object and BOF with SIFT for Image Retrieval |
| 指導教授: |
薛義誠
Yih-Chearng Shiue |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理學系 Department of Information Management |
| 論文出版年: | 2015 |
| 畢業學年度: | 103 |
| 語文別: | 中文 |
| 論文頁數: | 63 |
| 中文關鍵詞: | 影像檢索 、基於內容之影像檢索 、尺度不變特徵轉換 、特徵包 、K均數分群演算法 |
| 外文關鍵詞: | Image retrieval, Content-Based Image Retrieval, Scale Invariant Feature Transform, Bag of Features, K-means clustering algorithm |
| 相關次數: | 點閱:10 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
有效地檢索數位影像,已成為影像檢索領域的重要研究。1990年,基於內容之影像檢索主要為擷取影像低階特徵;但是低階視覺特徵和高階語意概念之間仍存在著語意差距。本研究提出以尺度不變特徵轉換(Scale Invariant Feature Transform, SIFT)之特徵包(Bag of Features, BOF)模型結合影像之顯著物件概念的影像檢索系統,以物件圖像作為查詢影像標的之影像搜尋,透過影像含有的物件進行搜尋,並實作出影像搜尋系統。
本研究透過顯著物件偵測技術辨識出影像的顯著物件,並降低背景雜訊對物件的影響。經過顯著物件偵測處理過的影像,使用SIFT擷取影像特徵,再透過K均數分群演算法對所有影像特徵向量分群,得到影像之BOF向量;另外物件圖像亦運用SIFT擷取影像特徵,再透過從資料集影像計算得到之編碼簿,統計物件圖像SIFT特徵在各視覺詞彙中的數量,得到物件圖像之BOF向量。
本研究從MSRA-A影像資料集整理出十個類型,共一千張影像進行實驗。實驗一:發現顯著物件偵測以矩形顯著影像表現較好;實驗二:探討編碼簿大小為何能影響影像搜尋準確率,實驗發現分群數目為200時,影像搜尋效果較佳;實驗三:探討物件圖像是否可以達到影像搜尋之應用,實驗結果發現以物件概念搜尋目標影像,確實可達到以物件搜尋影像之目的。從敏感度分析得知,透過變形功能提供更多樣的物件圖像,可以達到較精確的影像搜尋結果。
研究結果證實使用物件概念搜尋影像;並結合顯著物件與BOF與SIFT,確實比過去研究未結合顯著物件偵測之方法,較能夠提高影像搜尋準確率;最後,透過改良之系統搜尋方式與改善之影像搜尋準確率,實作出影像搜尋系統。
To effectively search digital images has become increasingly important in image retrieval (IR) area. In 1990’s, content-based image retrieval indexes images by their low-level features, but there are existing semantic gaps between low-level features and high-level semantic concepts. The study proposes an image retrieval system based on bag-of-features (BOF) with scale invariant feature transform (SIFT) combined salient object, to search through the objects contained in the image and to implement the real image retrieval system.
This research detects a salient object in the image through salient object detection, and reduces the influence of background noise. After using salient object detection, SIFT features are extracted from each salient image in image database, and clustered using K-means clustering algorithm to form the codebook. SIFT features are extracted from object image, and found the nearest cluster center of the visual vector in codebook, and then the SIFT features of image are quantified using this visual vocabulary. Finally, an object image is presented as a set of visual words.
In the experiments, image database is subset of image dataset MSRA-A. It contained 1000 images, which were equally divided into 10 different categories. The 1st experimental results showed that rectangle salient images perform better than original salient images in terms of salient object detection. The 2nd experiment studying the influence of the codebook size on retrieval performance of the system showed that the best size is 200 for this data set. The 3rd experimental results showed that using object concept is useful to find similar images that contain objects. From sensitivity analysis, providing a variety of query images through the transformation of object image can achieve better performance in image retrieval.
In conclusion object images can improve the accuracy of image retrieval based on BOF with SIFT combined salient object. Eventually, the study is to implement an image retrieval system by changing the query method and improving the precision in image retrieval.
[1] Bressler, S. L., Tang, W., Sylvester, C. M., Shulman, G. L., & Corbetta, M. (2008). Top-down control of human visual cortex by frontal and parietal cortex in anticipatory visual spatial attention. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 28(40), 10056–10061.
[2] Brown, M., & Lowe, D. (2002). Invariant Features from Interest Point Groups. British Machine Vision Conference, Cardiff, Wales, 656–665.
[3] Fehr, J., Streicher, A., & Burkhardt, H. (2009). A bag of features approach for 3D shape retrieval. Advances in Visual Computing, 5875, 34–43.
[4] Giesbrecht, B., Woldorff, M. G., Song, A. W., & Mangun, G. R. (2003). Neural mechanisms of top-down control during spatial and feature attention. NeuroImage, 19(3), 496–512.
[5] Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.
[6] Jiang, Y.-G., Ngo, C.-W., & Yang, J. (2007). Towards optimal bag-of-features for object categorization and semantic video retrieval. Proceedings of the 6th ACM International Conference on Image and Video Retrieval - CIVR ’07, 494–501.
[7] Khokher, A., & Talwar, R. (2012). Content-based Image Retrieval : Feature Extraction Techniques and Applications. International Conference on Recent Advances and Future Trends in Information Technology (iRAFIT2012), 9–14.
[8] Liu, T., Sun, J., Zheng, N., Tang, X., & Shum, H. Y. (2007). Learning to detect a salient object. In CVPR.
[9] Lowe, D. G. (1999). Object Recognition from Local Scale-Invariant Features. IEEE International Conference on Computer Vision, 1150–1157.
[10] Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
[11] Lv, H., Huang, X., Yang, L., Liu, T., & Wang, P. (2013). A K-means Clustering Algorithm Based on the Distribution of SIFT, 1301–1304.
[12] Ma, Y.-F., & Zhang, H.-J. (2003). Contrast-based image attention analysis by using fuzzy growing. Proceedings of the Eleventh ACM International Conference on Multimedia MULTIMEDIA 03, 102, 374–381.
[13] Mikolajczyk, K., & Schmid, C. (2003). A Performance Evaluation of Local Descriptors. ICPR, 2, 257–263. Retrieved from http://www.computer.org/portal/web/csdl/doi/10.1109/TPAMI.2005.188
[14] Navalpakkam, V., & Itti, L. (2006). An integrated model of top-down and bottom-up attention for optimizing detection speed. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2, 2049–2056.
[15] Niblack, C. W. (1993). QBIC project: querying images by content, using color, texture, and shape. Proceedings of SPIE, 1908(1), 173–187.
[16] Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.
[17] Pirnog, I., Oprea, C., & Paleologu, C. (2009). Image Content Extraction Using a Bottom-Up Visual Attention Model. 2009 Third International Conference on Digital Society.
[18] Przemyslaw, G., Krzysztof, S. la, & Pawel, D. (2012). Ranking by K-Means Voting Algorithm for Similar Image Retrieval, 509–517.
[19] Rutishauser, U., Walther, D., Koch, C., & Perona, P. (2004). Is bottom-up attention useful for object recognition? Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., 2.
[20] Schreij D., O. C. T. J. (2008). Abrupt onsets capture attention independent of top-down control settings. Perception and Psychophysics, 70(2), 208–218.
[21] Sivic, J., & Zisserman, a. (2003). {Video Google:} A text retrieval approach to object matching in videos. Proc. CVPR, (Iccv), 2–9.
[22] Theeuwes, J. (1991). Exogenous and endogenous control of attention: the effect of visual onsets and offsets. Perception & Psychophysics, 49(1), 83–90.
[23] Theeuwes, J. (1992). Perceptual selectivity for color and form. Perception & Psychophysics, 51(6), 599–606.
[24] Torres, R. da S., & Falcão, A. X. (2006). Content-Based Image Retrieval: Theory and Applications. Revista de Informática Teórica E Aplicada RITA, 13(2), 161–185.
[25] Veltkamp, R. C., & Tanase, M. (2000). Content-Based Image Retrieval Systems : A Survey. Technical Report UU-CS-2000-34, Dept. of Computing Science, Utrecht
[26] Wan, T., & Qin, Z. (2010). A new technique for summarizing video sequences through histogram evolution. International Conference on Signal Processing and Communications, 1–5.
[27] Yang, Z., & Kurita, T. (2013). Improvements to the Descriptor of SIFT by BOF Approaches. 2013 2nd IAPR Asian Conference on Pattern Recognition, 95–99.
[28] Yuan, X., Yu, J., Qin, Z., & Wan, T. (2011). A SIFT-LBP image retrieval model based on bag of features. International Conference on Image …, 1061–1064. Retrieved from http://icmll.buaa.edu.cn/members/jing.yu/YuanYuQinWan.pdf
[29] Zhang, S., Tian, Q., Hua, G., Huang, Q., & Gao, W. (2011). Generating descriptive visual words and visual phrases for large-scale image applications. IEEE Transactions on Image Processing, 20(9), 2664–2677.