跳到主要內容

簡易檢索 / 詳目顯示

研究生: 蘇志文
Chih-Wen Su
論文名稱: 以視訊內容為基礎應用於MPEG影片之視訊檢索技術
Content-based Video Retrieval Techniques for MPEG Video
指導教授: 廖弘源
Hong-Yuan Liao
范國清
Kuo-Chin Fan
口試委員:
學位類別: 博士
Doctor
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
畢業學年度: 94
語文別: 英文
論文頁數: 100
中文關鍵詞: 換景偵測視訊檢索
外文關鍵詞: shot change detection, video retrieval
相關次數: 點閱:8下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著數位時代的來臨,聲音、影像等各式各樣的資訊不僅能以更有效率、更加方便的形式被儲存,同樣也造成大量的影音資訊累積氾濫的問題。由於完全以人工方式對大量影片內容作註解需仰賴極大人力,且難以針對影片中每個段落定下客觀而詳盡的文字描述,而造成傳統上以文字查詢的方式,無法滿足影片內容搜尋上的需求。為了能在大量的數位影片中針對視訊內容查詢出特定影片段落,我們需藉由電腦事先對數位影片做自動而快速的分鏡偵測,再針對各個分割出來的鏡頭取出視訊特徵,再分就不同特徵以自動或半自動的方式加上客觀註解,並經整理分類存放於資料庫中,當使用者希望查詢某種影片內容時,可以直接透過電腦對此一建構過的資料庫做快速而有效比對,達到真正對影片內容查詢與瀏覽的目的。
    有鑑於此,本論文主要提出兩項以視訊內容為基礎,應用於MPEG影片之視訊檢索技術。首先,我們針對慢換景(gradual transition)中最常被使用,也最難被現有技術正確判斷出來的溶解特效作研究。利用每個像素位置的亮度變化,統計所有在時間軸上符合線性變化特性的像素數量百分比,並與藉由累加二項式分配所預估出的門檻值比較,已達到明顯辨識出溶解式(dissolve)慢換景的目的。再對影片畫面序列做重新取樣,使偵測的正確率不受溶解特效的時間長短而降低。我們也可應用同一原理來偵測淡入(fade-in)、淡出(fade-out)形式的慢換景。理論與實驗結果都證實,我們的方法不僅快速有效,更能達到其他同類型研究所難達到的低錯誤率,真正有效降低因影片中的物體、攝影機運動所造成的大量誤判情況發生。
    其次,我們提出了以應用層面最廣的MPEG影片為對象,藉由分析整理其內含之運動向量資訊,自動生成影片中移動物體的軌跡資訊,使我們在方法上有著下列優點:<1>直接應用現成已有的運動向量資訊,使註解時間更為快速。<2>影片中有多個移動物體之情況亦適用。<3>不受限於靜止裝設之攝影機所拍攝的影片。另一方面,我們亦發展了快速的軌跡比對策略,透過不同以往的座標表示法,使多數非相似軌跡僅須比對少數控制點就能加以排除,大幅加速檢索效率。


    Gradual shot change detection is one of the most important research issues in the field of video indexing/retrieval. Among the numerous types of gradual transitions, the dissolve-type gradual transition is considered the most common one, but it is also the most difficult one to detect. In most of the existing dissolve detection algorithms, the false/miss detection problem caused by motion is very serious. In this thesis, we present a novel dissolve-type transition detection algorithm that can correctly distinguish dissolves from disturbance caused by motion. We carefully model a dissolve based on its nature and then use the model to filter out possible confusion caused by the effect of motion.
    Furthermore, we propose the use of motion vectors embedded in MPEG bitstreams to generate so-called “motion flows”, which are applied to perform quick video retrieval. By using the motion vectors directly, we do not need to consider the shape of a moving object and its corresponding trajectory. Instead, we simply “link” the local motion vectors across consecutive video frames to form motion flows, which are then annotated and stored in a video database. In the video retrieval phase, we propose a new matching strategy to execute the video retrieval task. Motions that do not belong to the mainstream motion flows are filtered out by our proposed algorithm. The retrieval process can be triggered by query-by-sketch (QBS) or query-by-example (QBE). The experiment results show that our method is indeed efficient and accurate in the video retrieval process.

    1. Introduction 1 1.1 Motivation 2 1.2 Overview of CBIR 2 1.3 Overview of CBVR 3 1.3.1 Shot Change Detection 3 1.3.2 Features for CBVR 5 1.3.3 QBE Versus QBS 5 1.4 Organization of the Thesis 6 2. Background 7 2.1 MPEG Standards 8 2.1.1 Intra-Frame Coding 9 2.1.2 Inter-Frame Coding 12 2.2 Content-based Video Retrieval Techniques 14 2.2.1 Strategies of Shot Boundary Detection 15 2.2.2 Categories of Visual Features for CBVR 19 2.2.3 State-of-the-art 22 2.3 Concluding Remarks 27 3. A Motion-Tolerant Dissolve Detection Algorithm 28 3.1 Introduction 29 3.2 Modeling a Dissolve Transition 36 3.3 Threshold Selection 40 3.4 Discussion of False and Misdetection of Dissolve 46 3.4.1 Misdetection Caused by a Long Dissolve Duration 46 3.4.2 Color Shading 48 3.4.3 Illumination Problem 51 3.5 Experimental Results 51 3.6 Concluding Remarks 55 4. Motion Flow-based Video Retrieval 57 4.1 Introduction 58 4.2 Constructing Motion Flows from MPEG Bitstreams 62 4.2.1 Shot Change Detection 62 4.2.2 Camera Motion Estimation 63 4.2.3 Generating Motion Flow 67 4.3 Coarse-to-fine Trajectory Comparison 75 4.4 Experimental Results 81 4.5 Concluding Remarks 89 5. Conclusion and Future Work 91 References 93

    [1]Y. Rui, T. S. Huang, and S. F. Chang,, “Image retrieval: current techniques, promising directions, and open issues,” in Journal of Visual Communication and Image Representation, Vol. 10, No. 1, pp. 39-62, Mar. 1999.
    [2] W. Niblack, R. Berber, , W. Equitz, M. Flickner, E. Glasman, D. Petkovic, and P. Yanker, “The QBIC project: querying images by content using color, texture and shape”, in SPIE Storage and Retrieval for Image and Video Database II, pp. 173-187, Feb. 1993.
    [3] J. Dowe, “Content-based retrieval in multimedia imaging,” in SPIE Storage and Retrieval for Image and Video Databases II, pp.164-167, 1993.
    [4] J. R. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz, R. Humphrey, R. Jain, and C. F. Shu, “The Virage image search engine: an open framework for image management,” in SPIE Storage and Retrieval for Image and Video Databases V, pp 76-87, 1996.
    [5] J. R. Smith, and S. F. Chang,, “An image and video search engine for the World-Wide Web,” in Proc. of SPIE, vol. 3022, pp 85-95, 1997.
    [6] C. Carson, S. Belongie, H. Greenspan and J. Malik, “Region-based image querying,” in IEEE CVPR''97 Workshop on Content-Based Access of Image and Video Libraries, pp. 42-49, 1997.
    [7] T. P. Minka and R. W. Picard, “Interactive learning with a `society of models”, in Pattern Recognition, 30(4), pp. 565-581, Apr. 1997.
    [8] Y. Rui, T. Huang, and S. Mehrotra, “Content-based image retrieval with relevance feedback in MARS,” in IEEE International Conference on Image Processing, pp. 815-818, Oct. 1997.
    [9] Z. Yang, X. Wan,, and C. C. J. Kuo,, “Interactive image retrieval: concept, procedure and tools,” in IEEE 32nd Asilomar Conference, Montery, CA, pp. 261–265, Nov. 1998.
    [10] A. Nagasaka and Y. Tanaka, “Automatic video indexing and full-video search for object appearances,” in Proc. IFIP 2nd Working Conf. Visual Database Systems, pp. 113–127, 1992.
    [11] C. M. Lee and M. C. Ip, “A robust approach for camera break detection in color video sequence,” in Proc. IAPR Workshop Machine Vision Applications, Kawasaki, Japan, pp. 502–505, 1994.
    [12] A. Hampapur, R. Jain, and T.Weymouth, “Production model based digital video segmentation,” J. Multimedia Tools Applicat., vol. 1, no. 1, pp. 9–46, 1995.
    [13] H. C. Liu and G. L. Zick, “Automatic determination of scene changes in MPEG compressed video,” in Proc. ISCAS-IEEE Int. Symp. Circuits and Systems, pp. 764–767, 1995.
    [14] J. Meng, Y. Juan, and S. F. Chang, “Scene change detection in a MPEG compressed video sequence,” in Proc. SPIE/IS&T Symp. Electronic Imaging Science and Technology: Digital Video Compression: Algorithms and Technologies, vol. 2419, pp. 14–25, 1995.
    [15] B. Yeo and B. Liu, “Rapid scene analysis on compressed video,” IEEE Trans. Circuits Syst. Video Technol., vol. 5, pp. 533—544, 1995.
    [16] H. J. Zhang, C. Y. Low, and S.W. Smoliar, “Video parsing and browsing using compressed data,” Multimedia Tools and Applicat., vol. 1, no. 1, pp. 91–113, 1995.
    [17] D. Androutsos and A.N. Venetsanopoulos, “Efficient colour image indexing and retrieval using a vector based scheme,” Proceedings 1998 IEEE Second Workshop on Multimedia Signal Processing, Redondo Beach, California, 7-9, pp.15-20, December 1998.
    [18] A. Hampapur, A. Gupta, B. Horowitz, C-F. Shu, C. Fuller, J. Bach, M. Gorkani, R. Jain, “Virage Video Engine”, SPIE Vol. 3022, pp 188-197, 1997.
    [19] D. Ponceleon, S. Srinivasan, A. Amir, D. Petkovic, D. Diklic, “Key to Effective Video Retrieval: Effevtive Cataloging and Browsing”, ACM Multimedia, ’98, pp. 99-107.
    [20] S-F. Chang, W. Chen, H. Meng, H. Sundaram, D. Zhong, “A Fully Automated Content Based Video Search Engine Supporting Spatio-Temporal Queries”, IEEE Transaction on Circuits and Systems for Video Technology, Vol. 8, No. 5, pp. 602-615, Sept., 1998.
    [21] N. Dimitrova, H. J. Zhang, B. Shahraray, I. Sezan, T. Huang, and A. Zakhor, “Applications of video-content analysis and retrieval,” IEEE Multimedia, vol. 4, no. 3, pp. 42–55, Jul./Sep. 2002.
    [22] S. Antani, R. Kasturi, and R. Jain, “A survey on the use of pattern recognition methods for abstraction, indexing and retrieval of images and video,” Pattern Recognit., vol. 35, pp. 945–965, 2002.
    [23] J. M. Corridoni and A. Del Bimbo, “Structured representation and automatic indexing of movie information content,” Pattern Recognit., vol. 35, no. 12, pp. 2027–2045, 1998.
    [24] R. Lienhart, S. Pfeiffer, and W. Effelsberg, “Video abstracting,” Commun. ACM, vol. 40, no. 12, pp. 55–62, Dec. 1997.
    [25] M. Bertini, A. Del Bimbo, and P. Pala, “Content-based indexing and retrieval of TV news,” Pattern Recognit. Lett., vol. 22, pp. 503–516, 2001.
    [26] N. V. Patel and I. K. Sethi, “Video shot detection and characterization for video database,” Pattern Recognit., vol. 30, no. 4, pp. 583–592, 1997.
    [27] E. Sahouria and A. Zakhor, “Content analysis of video using principal components,” IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 8, pp. 1290–1298, Dec. 1999.
    [28] C. W. Ngo, T. C. Pong, and H. J. Zhang, “Motion-based video representation for scene change detection,” Int. J. Comput. Vis., vol. 50, no. 2, Nov. 2002.
    [29] M. K. Shan and S. Y. Lee, “A framework for temporal similarity measures of content-based scene retrieval,” Pattern Recognit. Lett., vol. 22, pp. 517–532, 2001.
    [30] I. B. Ozer, W. Wolf, and A. N. Akansu, “A graph-based object description for information retrieval in digital image and video libraries,” J. Vis. Commun. and Image Repres., vol. 13, pp. 425–459, 2002.
    [31] H. J. Zhang, A. Kankanhalli, and S.W. Smoliar, “Automatic partitioning of full-motion video,” Multimedia Syst., vol. 1, no. 1, pp. 10–28, 1993.
    [32] R. Zabih, J. Miller, and K. Mai, “A feature-based algorithm for detecting and classifying production effects,” ACM J. Multimedia Syst., vol. 7, no. 2, pp. 189-200, 1995.
    [33] C. C. Shih, H. R. Tyan, and H. Y. M. Liao, “Shot change detection based on the Reynolds Transport Theorem,” in Proc. IEEE 2nd Pacific-Rim Conf. Multimedia, vol. 2195, Lecture Notes in Computer Science, Beijing, China, pp. 819–824, Oct. 24–26, 2001.
    [34] C. W. Ngo, T. C. Pong, and R. T. Chin, “Detection of gradual transitions through temporal slice analysis,” in Proc. IEEE Computer Vision and Pattern Recognition, vol. I, pp. 36–41, Jun. 1999.
    [35] M. Wu, W. Wolf, and B. Liu, “An algorithm for wipe detection,” in Proc. IEEE Int. Conf. Image Processing, pp. 893–897, 1998.
    [36] R. Lienhart, “Reliable dissolve detection,” in Storage and Retrieval for Media Databases 2001, vol. 4315, Proc. SPIE, pp. 219–230, Jan. 2001.
    [37] J. Nam and A. H. Tewfik, “Dissolve transition detection using B-splines interpolation,” in Proc. IEEE Int. Conf.Multimedia and Expo, pp. 1349–1352, Jul. 2000.
    [38] Z. N. Li and J. Wei, “Spatio-temporal joint probability images for video segmentation,” in Proc. IEEE Int. Conf. Image Processing, vol. II, 2000, pp. 295–298.
    [39] W. A. C. Fernando, C. N. Canagarajah, and D. R. Bull, “Fade and dissolve detection in uncompressed and compressed video sequences,” in Proc. IEEE Int. Conf. Image Processing, vol. 3, pp. 299–303, 1999.
    [40] C. W. Ngo, T. C. Pong, and H. J. Zhang, “Motion analysis and segmentation through spatio-temporal slices processing,” IEEE Trans. Image Process., vol. 12, no. 3, pp. 341–355, Mar. 2003.
    [41] L. F. Chen, H. Y. M. Liao, and J. C. Lin, “Wavelet-based optical flow estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 1, pp. 1–12, Feb. 2002.
    [42] D. Li and H. Lu, “Lighting change problem in shot detection,” in Proc.7th IEEE Int. Conf. Electronics, Circuits and Systems 2000 (ICECS 2000), vol. 1, pp. 541–544, Dec. 2000.
    [43] M. Flickner et al., “Query by image and video content: The QBIC system,” IEEE Compute. Mag., vol.28, pp.23-32, Sept. 1995.
    [44] S. F. Chang, W. Chen, H. J. Meng, H. Sundaram, and D. Zhong, “A fully automated content-based video search engine supporting spatiotemporal queries,” IEEE Transactions on Circuits and Systems for Video Technology, Vol.8, No.5, pp.602- 615, 1998.
    [45] D. H. Douglas and T. K. Peucker, “Algorithms for the reduction of the number of points required to represent a digitized line or its caricature,” The Canadian Cartographer, vol.10, No.2, pp.112-122, 1973.
    [46] R. Wang and T. Huang, “Fast camera motion analysis in MPEG domain,” ICIP, vol.3, pp.691-694, Oct. 1999
    [47] S. Dagtas, W. Al-Khatib, A. Ghafoor, and R.L. Kashyap, “Models for motion-based video indexing and retrieval,” IEEE Transactions on Image Processing, vol.9, No.1, pp.88-101, 2000.
    [48] R. Fablet, P. Bouthemy, and P. Perez, “Nonparametric motion characterization using causal probabilistic models for video indexing and retrieval,” IEEE Transactions on Image Processing, vol.11, No.4, pp.393-407, 2002.
    [49] A. Pentland, R.W. Picard, and S. Sclaroff, “Photobook: Content-Based Manipulation of Image Databases,” International Journal of Computer Vision, vol. 18, No. 3, pp.233-254, 1996.
    [50] J. R. Smith, and S.F. Chang, “VisualSEEk: A Fully Automated Content-Based Image Query System,” ACM Multimedia Conference, pp.87-98, Nov. 1996.
    [51] A. Hamrapur, A. Gupta, B. Horowitz, C.F. Shu, C. Fuller, J. Bach, M. Gorkani, and R. Jain, “Virage Video Engine,” SPIE Proceedings on Storage and Retrieval for Image and Video Databases V, San Jose, pp.188-197, Feb. 1997.
    [52] Y. Deng and B. S. Manjunath, “NeTra-V: Toward an Object-based Video Representation,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 8, No.5, pp.616-627, Sept. 1998.
    [53] Y. F. Ma, and H. J. Zhang, “Motion texture: a new motion based video representation,” 16th International Conference on Pattern Recognition, vol. 2, pp.548-551, 11-15 Aug. 2002.
    [54] D. J. Lan, Y. F. Ma, and H. J. Zhang, “A novel motion-based representation for video mining,” International Conference on Multimedia and Expo, vol. 3, pp.469-472, 6-9 July. 2003.
    [55] B. S. Manjunath, P. Salembier, and T. Sikora, Introduction to MPEG-7: Multimedia Content Description Interface, June. 2002.
    [56] Y. Tsaig, and A. Averbuch, “Automatic segmentation of moving objects in video sequences: a region labeling approach,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 12, no.7, pp.597-612, July. 2002.
    [57] V. Mezaris, I. Kompatsiaris, and M. G. Strintzis, “Video object segmentation using Bayes-based temporal tracking and trajectory-based region merging,” IEEE Transactions on Circuits and Systems for Video Technology, vol.14, no.6, pp.782-795, June. 2004.
    [58] R. Cucchiara, C. Grana, M. Piccardi, and A. Prati, “Detecting moving objects, ghosts, and shadows in video streams,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.25, no.10, pp.1337-1342, Oct. 2003.
    [59] C. Rao, A. Gritai, M. Shah, and T. Syeda-Mahmood, “View-invariant alignment and matching of video sequences,” Ninth IEEE International Conference on Computer Vision, vol.2, pp.939-945, 13-16 Oct. 2003.
    [60] M. Vlachos, G. Kollios, and D. Gunopulos, “Discovering similar multidimensional trajectories,” 18th International Conference on Data Engineering, pp.673-684, 26 Feb.-1 March. 2002.
    [61] http://video.google.com/
    [62] http://search.yahoo.com/
    [63] C. -W. Su, H. -Y. Mark Liao, K. -C. Fan, C. -W. Lin, and H.-R. Tyan, “A Motion-Flow-Based Fast Video Retrieval System,” Proc. 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, Singapore, Nov. 10-11, 2005.
    [64] C. -W. Su, H. -Y. Mark Liao, H. -R. Tyan, K. -C. Fan, and L.-H. Chen, “A Motion-Tolerant Dissolve Detection Algorithm,” IEEE Transactions on Multimedia, vol.7, no.6, Dec. 2005.
    [65] C. -C. Shih, H. -R. Tyan, and H. -Y. Mark Liao, “Shot Change Detection based on the Reynolds Transport Theorem,” Lecture Notes in Computer Science, vol. 2195, pp.819-824.
    [66] B. -L. Yeo and B. Liu, “A unified approach to temporal segmentation of motion JPEG and MPEG compressed video,” in Proc. 2nd Int. Conf. Multimedia Computing and Systems, 1995, pp. 81–83.
    [67] S. F. Chang and D. G. Messerschmitt, “Manipulation and compositing of MC-DCT compressed video,” IEEE J. Select. Areas Commun., vol. 13, pp. 1–11, Jan. 1995.
    [68] T. Sikora, "The MPEG-7 visual standard for content description - An overview," IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp. 696--702, June 2001.
    [69] Y. S. Kim and W. Y. Kim,“Content-based trademark retrieval system using a visually salient feature”, Image and Vision Computing, vol.16, pp. 931-939, 1998.

    QR CODE
    :::