| 研究生: |
邱建明 Jen-Min Chiu |
|---|---|
| 論文名稱: |
結合影像與文字辨識的網路色情過濾 Internet Pornography Filtering With Combination ofImage-Based and Text-Based Classification |
| 指導教授: |
曾黎明
Li-Ming Tseng |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 畢業學年度: | 92 |
| 語文別: | 中文 |
| 論文頁數: | 64 |
| 中文關鍵詞: | 網站過濾 、色情影像偵測 、文件分類 |
| 外文關鍵詞: | Pornographic Image Analysis, Document Classification, Web filtering |
| 相關次數: | 點閱:16 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Internet的蓬勃發展,讓資訊與知識能更廣泛,更有效率地流通。但是方便取得的資訊,也意味著網路上的不當資訊更加地四處橫流;電腦教育的日漸普及,使得越來越多的人可以接觸到網路,對於藉由Internet來擴散的負面題材,例如色情、暴力、吸毒、種族仇恨...等等資訊,將因為未設防的存取環境,而比實體的傳播管道更具穿透力。因此在不妨礙言論自由的範圍內,對於以國中小學教育為主的網路環境所能接取的網站內容,及存取行為施以某種程度的過濾是有必要的。
對於網站過濾方面的研究,應用黑名單其中一種受歡迎的手法,獲得名單的方式則因方法而異。一般來說有可以分為人工檢查、關鍵字分析、程式自動收尋...等等。本文針對色情網站在影像及文字方面的特性,發展出一套綜合的分析方法。在色情圖片方面,利用影像處理及圖樣分析方面的技術:如色彩分析,紋理分析,中軸抽取,Shape From Shading...等技術,來分析影像中是否有膚色色調的區域,以及這些區域是否能代表存在著裸露的人體;在文字方面,則運用資訊檢索和文件分類的手法,測量關於色情方面的關鍵字之數目及出現頻率。最後藉由衡量兩方面所萃取出的特徵向量,計算彼此間的相似性,來對名單作群聚分析的工作,進一步精煉出色情與非色情的網址,來提高名單整體的精確性。
With the explosive growing of Internet, information and knowledge may proliferating wide-spreadly and efficiently. And the computer education is available to all in recent years, let more and more people access varirty material in Internet, But at the same time, it also implyed the flooding of inappropriate Internet content. In the unfortified enviroment, some objectionable topic such as pornography, violence, and hate messages, will penetrate to those who shouldn’t access these web sites. Thus, it is nessessary that apply filting scheme to offensive content, without harmimg to free
speech.
Blacklist is a popular way in current web filtering research, and there are variety collecting method of blacklist, i.e. key word analysis, human inspectnig ...etc.But there are alway some false positive exist. In this paper we develope a compounded method, according to the multiple characteristics of pornography sites in image and text, to refining the blacklist. For erotic images, we use the image processing techniques: color segmentation, coarse detection, median axes extraction, and shape from shading. For text in web document, we use the techniques of Information Retrieval and Document Classification, to measure the number and frequence of erotic key word. After extract two forms of feature vector, we measure the similarity of two document by the angle of their feature vector. Finally, the refining task is cast to the graph partitioning problem, and divide the blacklist into two groups: pornographic site and non-pornographic site.
[1] I. Androutsopoulos, et. al., "An Evaluation of Naive Bayesian Anti-Spam Filtering," in Proc. of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning (ECML 2000), pp. 9-17, May 2000.
[2] Arentz, W.A. and Olstad, B., "Classifying offensive sites based on image content", Computer Vision and Image Understanding Journal, No. 1-3, April-June 2004, pp. 295-310.
[3] Bosson, A. and Cawley, G.C. and Chan, Y. and Harvey, R.W., "Non-retrieval: blocking pornographic images", In International Conference on Image and Video Retrieval CIVR-2002, pp. 50-60, 2002
[4] Vittorio Castelli , Lawrence D. Bergman., “Image Databases: Search and Retrieval of Digital Imagery”, 2001
[5] Chan, Y., Harvey, R., Smith, D. ”Building systems to block pornography.” Challenge of Image Retrieval, BCS Electronic Workshops in Computing series (1999) 34—40
[6] Patrick S. Che, “An Automatic System for Collecting Crime Information on the Internet,” Journal of Information, Law and Technology
[7] P.E. Danielson. “Euclidean distance mapping.” Computer Graphics and Image Processing, 13:3:227-248, November 1980
[8] Rongbo Du, Reihaneh Safavi-Naini and Willy Susilo, “Web Filtering Using Text Classification”, 2004
[9] M.M. Fleck, D.A. Forsyth and C. Bregler, “Finding naked people,” Proc. European Conf. on Computer Vision , 1996.
[10] Fleck, Margaret M., 1996, “Practical Edge Finding with a robust estimator,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp.649-653
[11] Gary Vanderet, “An Affair of the Mind”, http://www.pbcc.org/sermons/vanderet /1089.pdf ,1997
[12] Th. Gevers, F. Aldershoff, and A. W. M. Smeulders, Classification of Images on Internet by Visual and Textual Information, Internet Imaging, SPIE, San Jose,January, 2000.
[13] Gibson, S. and Harvey, R.W., "Analysing and simplifying histograms using scale-trees", In Proceedings of 11th International Conference on Image Analysis and Processing, Palermo, Italy, 2001
[14] Haddon, J. and Forsyth, D. A. ``Shape representations from shading primitives'' 5th European Conference on Computer Vision, Proceedings p.415-31 vol.2., 1998
[15] X. He, H. Zha, C. H. Q. Ding, and H. D. Simon. Web document clustering using hyperlink structures. Computational Statistics & Data Analysis, 41(1):19--45, November 2002.
[16] Hooman Katirai, "Filtering Junk E-Mail: A Performance Comparison between Genetic Programming & Naive Bayes," available online at: http://members.rogers.com/hoomank/katirai99filtering.pdf, Sep. 1999.
[17] R. L. Hsu, M. A. Mottaleb, and A. K. Jain, “Face detection in color images,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 696-706, May 2002.
[18] Lee P.Y, Hui S.C., Fong A.C.M., ”Neural networks for web content filtering ,” IEEE Intelligent Systems, Volume: 17, pp. 48 -57,2002
[19] Feng Jiao, Lijuan, Wen Gao, Guoqin Cui. “Detecting Adult Image Using Multiple Features”,ICII 2001: International Conferences on Info-tech & Info-net Oct. 2001(B-024), Beijing,China
[20] S. Paek, C. L. Sable, V. Hatzivassiloglou, A. Jaimes, B. H. Schiffman, S.-F. Chang, K. R. McKeown, “Integration of Visual and Text Based Approaches for the Content Labeling and Classification of Photographs”, ACM SIGIR''99 Workshop on Multimedia Indexing and Retrieval, Berkeley, CA, August 19, 1999.
[21] Rafael C Gonzalez, Richard E. Woods., “Digital Image Processing”
[22] Rongbo Du, Reihaneh Safavi-Naimi, and Willy Susilo, “Web Filtering Using Text Classification”
[23] R. Schettini, G. Ciocca, and S. Zuffi. “A survey of methods for colour image indexing and retrieval inimage databases.”, Color Imaging Science: Exploiting Digital Media.John Wiley, 2001
[24] R. Schettini, C. Brambilla, C. Cusano, G. Ciocca., “On the detection of pornographic digital images”
[25] Shi, J., Malik, J., 1997. Normalized cuts and image segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June. pp. 731–737.
[26] M. C. Shin, K. I. Chang and L. V. Tsap, "Does Colorspace Transformation Make Any Difference on Skin Detection?", IEEE Workshop on Applications of Computer Vision, pages 275-279, Orlando, FL, December 2002
[27] Simmons, M., Sequin,C. H.: “2D Shape Decomposition and the Automatic Generation of Hierarchical Representations”. International Journal of Shape Modeling 4 (1998) 63--78.
[28] Smith, D.J. and Harvey, R.W. and Chan, Y. and Bangham, J. A., "Classifying web pages by content", In IEE European Workshop on Distributed Imaging, vol. 99/109, pp. 8/1-8/7, 1999, Reference No.:1999/109
[29] James Z. Wang, Jia Li, Gio Wiederhold and Oscar Firschein, ``System for Screening Objectionable Images, Using Daubechies’ Wavelets and Color Histograms'' Computer Communications, vol. 21, no. 15, pp. 1355-1360, Elsevier, 1998
[30] Richardson C. R, Resnick P. J, Hansen D. L. “Does pornography-blocking software block access to health information on the internet?” JAMA. 2002;288:2887-2894.
[31] Ruo Zhang , Ping-Sing Tsai , James Edwin Cryer , Mubarak Shah, “Shape from Shading: A Survey”, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.21 n.8, p.690-706, August 1999
[32] http://kids.yam.com/
[33] Recreational Software Advisory Council, http://www.rsac.org
[34] Squid Guard, http://www.squidguard.org
[35] SafeSurf, http://www.safesurf.com/
[36] http://www.w3.org/PICS/
[37] http://www.net-protect.org/
[38] http://www.saferinternet.org/
[39] http://yahooligans.yahoo.com/
[40] 郭廖軒,“以網域名稱伺服器為基礎之色情網站過濾系統“,國立中央大學資訊工程學系,民92
[41] 林維德,“色情網頁之偵測與蒐集,國立成功大學資訊工程研究所,民90
[42] 視覺素描研究所,”藝用解剖學”,藝術圖書公司,民76
[43] 魏道慧,”人體結構與藝術構成”,民81
[44] Lawrence Lessig, CODE and Othre Laws of Cyberspace. 劉靜宜譯(2002):《網路自由與法律》。台北:商周出版。
[45] http://ir.csie.ncku.edu.tw/Project/researchAchievement3.htm,台灣學術網路上不當資訊防制及搜尋機制