| 研究生: |
卓晉億 Chin-yi Chou |
|---|---|
| 論文名稱: |
基於文字與主播偵測之新聞視訊分析系統 A TV News Analysis Scheme based onText and Anchorperson Identification |
| 指導教授: |
蘇柏齊
Po-chyi Su |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 畢業學年度: | 97 |
| 語文別: | 中文 |
| 論文頁數: | 79 |
| 中文關鍵詞: | 廣告 、文字偵測 、SVM 、電視新聞 、數位電視 |
| 外文關鍵詞: | commercial, text detection, SVM, TV news, Digital videos |
| 相關次數: | 點閱:11 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在數位科技漸趨成熟的今日,大量的影音資訊藉由數位化與日益進步的壓縮技術而得到廣泛的傳遞與永久的保存。現今的使用者能夠藉由不同的管道取得大量的多媒體資訊,但龐大的多媒體資料若需以人工方式搜尋或加註以分類則是相當耗時的。因此,如何協助使用者有效率地搜尋及萃取多媒體資訊的技術與工具成為一個相當重要的研究議題。
本研究針對新聞視訊提出協助內容擷取與分類的工具。在新聞視訊內容中,文字是最重要的特徵之一,少許的幾個文字可為新聞內容給予精確的註解,若能對新聞中的文字進行有效的識別,將有助於對新聞內容的認識與了解。然而,在台灣的新聞頻道中,畫面文字包括了新聞標題、氣象預報、股市行情與跑馬燈,內容繁複,且文字字體與字型及其大小格式不一,而目前的文字識別軟體僅能針對少數已訓練過字型做識別,無法作用於台灣多數新聞頻道中的文字,如何從複雜的新聞畫面中擷取出利於分析的區域,便成為待解決的問題。此外,穿插於新聞播報中的廣告會使得內容分析受到影響,因此我們必須予以有效剔除以利分析。本研究將針對有代表性意義的文字區域進行偵測擷取及相關處理,並對上述問題提出解決的方法。
With the Proliferation of multimedia data, requests for effective and efficient video retrieval are growing. Among the various kinds of digital videos, TV news videos play an important role in broadcasting nowadays and may also serve as a major source of daily information for people these days. In Taiwan, there are several TV news stations and duplicated news videos are repeated again and again. Watching them may be a waste of time. Considering that the digital recording facilities are widely available
now, we propose a classification scheme that can cluster the recorded TV news video segments so that the viewers may choose to watch the related archived news and even retrieve the useful information from them.
In the proposed scheme, we make use of the text in TV news for clustering videos. It should be noted that the text analysis in Taiwan’s TV news needs further processing since the text areas in Taiwan’s TV news may include various information including the caption, weather report, and stock market indices etc. It’s challenging to locate the area where we are really interested in. Furthermore, video OCR is not mature enough and does not work quite well in Taiwan’s TV news broadcasting because of the special and different text fonts used in each TV news channel. We apply the low-level feature extraction and SVM to locate the possible region of interest, which should help to differentiate new segments from commercials. Then the anchorperson scene will be located to divide a piece of news into two parts, one part with the anchorperson describing the news and the other part related to the news content itself. Next, we extract the caption in the second part, in which the text is more stable and representative. After refining the extracted text areas, a cross-correlation process is used to find the similar pattern in captions of video segments to relate them together. Experimental results will be
shown to demonstrate the feasibility of this potential solution.
[1] Nevenka Dimitrova , Hong-Jiang Zhang , Behzad Shahraray , Ibrahim Sezan , Thomas Huang , Avideh Zakhor, Applications of Video-Content Analysis and Retrieval, IEEE MultiMedia, v.9 n.3, p.42-55, July 2002
[2] P. Viola and M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features”, IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 511-518, 2001
[3] C. Garcia, and G.Tziritas. “ Face Detection Using Quantized SkinColor Regions Merging and Wavelet Packet Analysis.” in IEEE Transactions on Multimedia vol. 1 , No. 3 , pp. 264-277, 1999.
[4] I. Sobel, “An isotropic 3_3 image gradient operator,” in Machine Vision for Three-Dimensional Scenes, H. Freeman, Ed. New York: Academic, 1990, pp. 376–379.
[5] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. Syst., Man, Cybernet., vol. SMC-9, no. 1, pp. 62–66, Jan.
[6] Vapnik, V. , "Statistical Learning Theory," New York, NY: Wiley, 1998
[7] Chang, C. et al, "The analysis of decomposition methods for support vector machines," IEEE Transations on Neural Networks, 2000 , 11 (4):1003 ~1008
[8] C.J.C. Burges. “A tutorial on support vector machines for pattern - 61 -recogition.” Data Mining and Knowledge Discovery, 2(2) 955-974, 1998.
[9] N.Cristianini, J. Shawf-Taylor. “An Introduction to Support Vector Machines and other kernel-based learning methods,” Cambridge University Press,2000.
[10] Steve R. Gunn. “Support Vector Machines for Classification and Regression,” University of Southampton, Technical Report 1998.6
[11] M.A. Smith, T. Kanade, Video skimming for quick browsing based onaudio and image characterization, Carnegie Mellon University Pittsburgh, PA, Technical Report CMU-CS-95-186, July, 1995.,
[12] M. R. Lyu , J. Song and M. Cai “A comprehensive method for multilingual video text detection, localization, and extraction,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, pp. 243, 2005.
[13] K. I. Kim, K. Jung, and J. H. Kim, “Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 12, pp. 1631-1639, 2003.
[14] H. Li, D. Doermann, and O. Kia, “Automatic text detection and tracking in digital video,” IEEE Trans. Image Process., vol. 9, no. 1, pp. 147–156, Jan. 2000.
[15] Y. Zhong, K. Karu, A.K. Jain, Locating text in complex color images, Pattern Recognition 28 (1995) 1523–1535.
[16] U. Gargi, S. Antani, and R. Kasturi, “Indexing text events in digital video databases,” in Proc. 14th Int. Conf. Pattern Recognit., vol. 1, 1998, pp. 916–918.
[17] Y. Zhong, H.-J. Zhang, and A. K. Jain, “Automatic caption localization in compressed video,” in Proc. Int. Conf. Image Process., vol. 2, 1999, pp. 96–100.
[18] Y.-K. Lim, S.-H. Choi, and S.-W. Lee, “Text extraction in MPEG compressed video for content-based indexing,” in Proc. Int. Conf. on Pattern Recognit., vol. 4, 2000, pp. 409–412.
[19] D. Sadlier, S. Marlow, N. O''Connor, and N. Murphy, "Automatic TV advertisement detection from mpeg bitstream, " Pattern Recogition Society, vol. 35, no. 12, pp. 2-15, 2002.
[20] X. Hua, L.Lu, and H. Zhang, "Robust Learning-Based TV Commercial Detection", in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’05), vol. 4, pp. 6-8, July 2005.
[21] Alexander G. Hauptmann, and Michael J. Witbrock, “Story Segmentation and Detection of Commercials In Broadcast News Video” IEEE Conference “Research and Technologies Advances In Digital Libraries” 1988.
[22] J. Yeh, J. Chen, and J. Kuo et al, "TV commercial detection in news program videos, " in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’05), Vol. 5, pp. 23-26, 2005.
[23] K.K. Sung and T. Poggio, “Example-Based Learning for View-Based Human Face Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 39-51, Jan. 1998.
[24] K. I. Kim, K. Jung and J. H. Kim, “Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 12, pp.1631-1639, 2003.
[25] Dimitrova N., Agnihotri, L. and Wei G. Video Classification Based on HMM Using Text and Faces. European Conference on Signal Processing, Finland, 2000.
[26] Huang, J., Liu, Z., Wang, Y., Chen, Y. and Wong, E.K. Integration of Multimodal Features for Video Scene Classification Based on HMM. IEEE Third Workshop on Multimedia Signal Processing, Copenhagen, Denmark, 1999.
[27] Wei-Hao Lin, Alexander G. Hauptmann: News video classification using SVM-based multimodal classifiers and combination strategies. ACM Multimedia 2002: 323-326
[28] Weiyu Zhu, C. Toklu, and Shih-Ping Liou, “Automatic news video segmentation and categorization based on closed-captioned text,” Pro. Of IEEE Int’l Conf. on Multimedia and Expo, pp. 829-832, 2001.
[29] Y. Ariki, and T. Teranishi, “Indexing and classification of TV news articles based on telop recognition,” Proc. of the Fourth Int’l Conf. on Document Analysis and Recognition, vol. 1, pp. 422-427, 1997.
[30] Yoichi Yamashita, Toshikatsu Tsunekawa, Riichiro Mizoguchi, “Topic Recognition for News Speech Based On Keyword Spotting,” Proc. of 5th Int’l Conf. on Spoken Language Processing, 1998.
[31] Wei Qi; Lie Gu, Hao Jiang; Xiang-Rong Chen, Hong-Jiang Zhang, “Integrating visual, audio and text analysis for news video,” Proc. of 2000 Int’l Conf. on Image Processing, Vol. 3, pp. 520-523, 2000.
[32] Min-Kuan Chang, Ko-Yen Lu, Chia-Hung Yeh & Hsuan-Huei Shih, "Anchor person detection for TV news segmentation based on visual features," in Proceedings of SPIE conferences on OpticsEast, vol. 6391, pp. T1-T10, 2006.
[33] Qixiang Ye, Qingming Huang, Wen Gao, Debin Zhao: Fast and robust text detection in images and video frames. Image Vision Comput. 23(6): 565-576 2005
[34] W.T. Freeman, K. Tanaka, J.Ohta, and K. Kyuma, “Computer Vision for Computer Games,” Int. Conf. On Automatic Face and Gesture Recognition, pp.100-105, 1996.
[35] 鐘國亮, “影像處理與電腦視覺”, 第三版, pp.88-90, 1995