跳到主要內容

簡易檢索 / 詳目顯示

研究生: 陳永健
Yung-Chien Chen
論文名稱: 適用於數位視訊中移動字幕之偵測、定位以及擷取之方法
A Comprehensive Motion VideotextDetection、 Localizaiton and Extraction Method
指導教授: 蔡宗漢
Tsung-Han Tsai
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 電機工程學系
Department of Electrical Engineering
畢業學年度: 93
語文別: 英文
論文頁數: 72
中文關鍵詞: 內涵式視訊搜尋系統字幕擷取
外文關鍵詞: videotext extraction, content-based video retireval
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 對於以內容為主的多媒體資訊索引與摘要的需求愈來愈高時,擷取出有內涵意義的特徵值變成為一份重要的課題。於數位視訊中,畫面的文字即是十分有用的特徵值,它不僅可以清楚表達出該影片的內涵,而且並不難以擷取。再者,相較於語音辨識或是視覺影像分析的不完善,文字辨識系統卻已趨近成熟而完整。因此,大多數的視訊索引系統研究一開始以文字辨識為濫觴。
    在此篇論文,我們提出針對於移動文字之偵測與擷取演算法。相較於固定字幕的演算法而言,少有研究針對於移動文字。我們先利用Sobel detector找出可能為文字邊緣的像素,再使用垂直與水平統計表定位出正確的文字區域,最後採用Otsu Method 決定出臨界值以區分出文字與背景。不幸地,此方法仍有少數非文字的像素被辨識為文字。在此,我們使用提出的modified seed-fill演算法消除錯誤辨識的非文字區塊以提升辨識率。根據實驗結果,所提出的演算法對於不同類型視訊都能提供不錯的結果。


    Text in video is a very compact and accurate clue for video indexing and summarization. Most video text detection and extraction methods deal with the static videotext on video frames. Few methods can handle motion videotext well since motion videotext may hardly be extracted well. In this thesis, we propose a low computation load text detection and localization method to detect and localize the scrolling videotexts which provide much information for us. We also propose a videotext extraction method to extract the videotext. The detection method is carried out by edge detection, and the projection profile method is used to localize the text region well. The extraction method consists of adaptive thresholding, and our proposed modified seed-fill algorithm. Experimental results on a large number of video images are reported in detail.

    Chapter 1 INTRODUCTION 1.1 Motivation...................................................................................................1 1.2 MPEG-7 Standard.......................................................................................4 1.2.1 Structural Elements of Videotext DS………….........………..………4 1.2.2 Semantic Elements of Videotext DS…………………………………6 1.3 Thesis Organization…………………………………………….…………8 Chapter 2 Background and Related Work 2.1 Background…………………………………………………….………….10 2.2 Text Detection Method…………………………………...……………….12 2.2.1 Texture Based Method………………………………………………12 2.2.2 Color Based Method………………………………………………….13 2.2.3 Edge Based Method…………………………………………………..15 2.2.3.1 Under Compressed Domain………………………………15 2.2.3.2 Under Pixel Domain………………………………………..18 2.3 Text Localization Method…………………………………………………20 2.3.1 First Approach ……………………………………………………….20 2.3.2 Second Approach……………………………………………………..20 2.3.2.1 SSD-Based Module Image Match…………………………21 2.3.2.2 Contour-Based Text Stabilization…………………………22 2.3.3 Third Approach……………………………………………………….23 2.4 Text Extraction Method…………………………………………………...25 2.4.1 Multiple Frame Integration…………………………………………25 2.4.2 Interpolation………………………………………………………….26 Chapter 3 Proposed Videotext Detection, Localization and Extraction Algorithm 3.1 Overview of Proposed Algorithm…………………………………………30 3.1.1 Design Strategy……………………………………………………..31 3.1.2 Flowchart of the Proposed Algorithm………………………………31 3.2 Videotext Detection and Localization Method……………………………33 3.3 Videotext Extraction Method……………………………………………...38 Chapter 4 Experimental Result 4.1 Experimental Environment………………………………………………44 4.2 Experimental Result……………………………………………………….45 Chapter 5 Conclusions Reference..................................................................................................................54

    [1] Q. Huang, Z. Liu, A. Rosenberg, D. Gibbon, and B. Shahraray, “Automated generation of news content hierarchy by integrating audio, video, and text information,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 6, 1999, pp. 3025–3028.
    [2] W. Qi, L. Gu, H. Jiang, X.-R. Chen, and H.-J. Zhang, “Integrating visual, audio and text analysis for news video,” in Proc. Int. Conf. Image Process., vol. 3, 2000, pp. 520–523.
    [3] MPEG-7 Description Schemes, ISO/IEC/JTC1/SC29/WG11/N2844, July 1999.
    [4] MPEG Requirements Group, MPEG-7 Requirements Document, Doc. ISO/MPEG N2461, MPEG Atlantic City Meeting, October 1998
    [5] MPEG-7 Description Schemes (V0.6), ISO/IEC/JTC1/SC29/WG11/M5040, Version 0.6-a, September 1999.
    [6] C. Dorai, R. Bolle, N. Dimitrova, L. Agnihotri, “MPEG-7 Videotext Description Scheme, Doc. ISO/MPEG M5206, MPEG Melbourne Meeting”, October 1999.
    [7] H. Li, D. Doermann, and O. Kia, “Automatic text detection and tracking in digital video,” IEEE Trans. Image Process., vol. 9, no. 1, Jan. 2000, pp. 147–156.
    [8] Y. Zhong, H.-J. Zhang, and A. K. Jain, “Automatic caption localization in compressed video,” in Proc. Int. Conf. Image Process., vol. 2, 1999, pp. 96–100.
    [9] R. Lienhart and A. Wernicke, “Localizing and segmenting text in images, videos and web pages,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 4, Apr. 2002, pp. 256–268.
    [10] I. Sobel, “An isotropic 3_3 image gradient operator,” in Machine Vision for Three-Dimensional Scenes, H. Freeman, Ed. New York: Academic, 1990, pp. 376–379.
    [11] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. Syst., Man, Cybernet., vol. SMC-9, no. 1, Jan. 1979, pp. 62–66.
    [12] N. Dimitrova, L. Agnihotri, C. Dorai, and R. Bolle, “MPEG-7 Videotext Descriptor for Superimposed Text in Images and Video”, Signal Processing: Image Communication, 16 (2000), October 2000, pp. 137-155.
    [13] T. Sato, T. Kanade, E. K. Hughes, and M. A. Smith, “Video OCR for digital news archive,” in Proc. IEEE Workshop Content-Based Access Image Video Database, 1998, pp. 52–60.
    [14] A. K. Jain and B. Yu, “Automatic text location in images and video frames,” Pattern Recognit., vol. 31, no. 12, 1998, pp. 2055–2076.
    [15] L. Agnihotri and N. Dimitrova, “Text detection for video analysis,” in Proc. IEEE Workshop Content-Based Access Image Video Libraries, 1999, pp. 109–113.
    [16] V. Y. Mariano and R. Kasturi, “Locating uniform-colored text in video frames,” in Proc. 15th Int. Conf. Pattern Recognit., vol. 4, 2000, pp. 539–542.
    [17] D. Chen, K. Shearer, and H. Bourlard, “Text enhancement with asymmetric filter for video OCR,” in Proc. 11th Int. Conf. Image Anal. Process., 2001, pp. 192–197.
    [18] B. T. Chun, Y. Bae, and T.-Y. Kim, “Text extraction in videos using topographical features of characters,” in Proc. IEEE Int. Fuzzy Syst. Conf., vol. 2, 1999, pp. 1126–1130.
    [19] X. Gao and X. Tang et al., “Automatic news video caption extraction and recognition,” in Proc. LNCS 1983: 2nd Int. Conf. Intell. Data Eng. Automated Learning Data Mining, Financial Eng., Intell. Agents, K. S. Leung et al., Eds., Hong Kong, 2000, pp. 425–430.
    [20] V. Wu, R. Manmatha, and E. M. Riseman, “Textfinder: An automatic system to detect and recognize text in images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 11, Nov. 1999, pp. 1224–1229.
    [21] A. Wernicke and R. Lienhart, “On the segmentation of text in videos,” in Proc. IEEE Int. Conf. Multimedia Expo, vol. 3, Jul. 2000, pp. 1511–1514.
    [22] M. Cai, J. Song, and M. R. Lyu, “A new approach for video text detection,” in Proc. Int. Conf. Image Process., Rochester, NY, Sep. 2002, pp. 117–120.
    [23] C. Wolf, J.-M. Jolion, F. Chassaing, “Text localization, enhancement and binarization in multimedia documents” Pattern Recognition, 2002. Proceedings. 16th International Conference on, Volume 2, 11-15 Aug. 2002, pp. 1037 – 1040.
    [24] S. Antani, D. Crandall, and R. Kasturi, “Robust extraction of text in video,” in Proc. 15th Int. Conf. Pattern Recognit., vol. 1, 2000, pp. 831–834.
    [25] Lyu, M.R., Jiqiang Song, Min Cai, “A comprehensive method for multilingual video text detection, localization, and extraction”, IEEE Trans. Circuits Syst. Video Technol., Volume 15, Issue 2, Feb. 2005, pp. 243 – 255.
    [26] S. Kwak, K. Chung, Y. Choi, “Video Caption Image Enhancement for an Efficient Character Recognition”, in Proc. 15th Int. Conf. Pattern Recognit., vol. 2, 2000, pp. 2606–2609.

    QR CODE
    :::