| 研究生: |
潘宇男 Yu-Nan Pan |
|---|---|
| 論文名稱: |
應用於H.264編碼器的高效率動態評估與去區塊濾波器的架構設計 Highly Efficient Motion Estimation and Deblocking Filter Architecture Design of H.264 Encoder |
| 指導教授: |
蔡宗漢
Tsung-Han Tsai |
| 口試委員: | |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
資訊電機學院 - 電機工程學系 Department of Electrical Engineering |
| 畢業學年度: | 99 |
| 語文別: | 英文 |
| 論文頁數: | 163 |
| 中文關鍵詞: | 動態評估 、去區塊濾波器 、H.264 |
| 外文關鍵詞: | Motion Estimation, H.264, Deblocking filter |
| 相關次數: | 點閱:13 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
MPEG-4 AVC/JVT/H.264具有許多有趣的特性。然而這些H.264的特性造成H.264在實現即時壓縮上是困難的。在之前的一些設計中,大部分的設計偏重於實現手機電視和高畫質電視。這些設計主要在探討如何達到規格及節省記憶體頻寬。一般來說有兩種方式達到目標。一種是專用集成電路(ASIC)而另一種是利用處理器如ARM或DSP 。
在這篇論文中我們提出運用於功率感知H.264編碼器的高效率動態評估與去區塊濾波器的架構設計。第一個部分我們提出一個加速動態評估的演算法和一個高效率的動態評估硬體。動態評估在H.264中提供七種不同的區塊大小來改善其編碼率失真的效能。這個新的技術比起應用固定尺寸可得到更好的效率。然而動態評估的運算量是隨著可用區塊大小的數量直線上升。所以我們先提出一個高效率複合式的動態評估演算法。這個複合式的動態評估演算法使用我們所提出的區塊決定演算法,這個區塊決定演算法是利用邊界資訊來從七個區塊尺寸中決定最好的尺寸,並且與我們所提出的預測六角型演算法做結合。當使用這個演算法,運算量可大幅的減低也因此是適合使用在像HDTV或QFHD的規格上。當我們利用實際的電影測試時,跟JM10.2的全域搜尋比較可提升300~405倍。跟一般的快速演算法比較大概可以快上3~247倍。當動態評估的演算法完成後,我們提出一個結合我們演算法的高效率動態評估硬體。跟別的硬體比較,這個硬體可提供較大的搜尋區域和較低的功率。我們所提出的硬體在達到SDTV即時壓縮及4張參考圖片的情況下只需要19.4MHz。在達到QFHD即時壓縮及一張參考圖片的情況下只需要116.6MHz。我們所提出的硬體的大小為300K個閘,而記憶體的使用量為12.6KB。
第二步,我們為去區塊效應濾波器提出一個新的處理程序和硬體。這個新的處理程序是以平行處理來建構可加速處理時間和減少記憶體存取。與別的硬體比較,我們所提出的硬體可以節省大約38~80%的記憶體存取。基於這個高效率的硬體,處理的效能可以改進並且可以降低在標準壓縮格式下的操作頻率。對於HDTV的格式操作頻率只需要11.5MHz。對於高解析度的QFHD,我們所提出的硬體的操作頻率只需要46.6MHz。我們所實現的結果需要約20.14K個閘,而記憶體的使用量約為64?32 bits。在46.6MHz的操作頻率下功率的消耗約為7.7mW。對於整個H.264編碼器,我們提出一個軟硬體整合的概念並整合使用我們前面所提出的硬體。最後我們針對整個H.264編碼器提出一個功率感知的演算法。這個功率感知的演算法跟原始的功率消耗比較可以節省大約9~87%的功率消耗。
There are many attractive features for the upcoming video coding standard MPEG-4 AVC/JVT/H.264. However, the attractive features within H.264 are hard to design for real-time implementation. In previous works, most researches focus on the achievable specification such as mobile TV and HDTV. They concentrate about how to meet the video specification and memory bandwidth. Generally, there are two solutions to achieve the targets. One is Application Specification Integrated Circuits (ASIC), and the other one is using pure processor such as ARM or DSP.
In this thesis, we propose high efficient Motion Estimation (ME) and deblocking filter architecture design using on power aware H.264 encoder. In the first part, we propose a speed-improve ME algorithm and a high efficiency architecture design. ME in H.264 employs seven permitted block sizes to improve the rate-distortion performance. This novel feature achieves significant coding gain over coding a macroblock (MB) using the fixed block size. However, ME is computationally intensive with the complexity increasing linearly to the number of the allowed block sizes. A high performance hybrid ME algorithm for H.264/AVC is proposed first. The hybrid ME algorithm used the proposed mode decision algorithm, Edge Information Mode Decision (EIMD), which is used the edge information to decide the best block mode of the seven modes and combining with the proposed Predict Hexagon Search (PHS). By using the proposed ME algorithm, the computational complexity has a huge reduction and thus it is suitable for high resolution applications such as HDTV (1920×1080) or QFHD (3840×2160). For the tested three real movies, the proposed algorithm can speedup about 300~405 times comparing with the full search of JM10.2. Compared with other popular fast algorithms, the proposed algorithm can has about 3~247 times of speedup ratio. After the ME algorithm is developed, an architecture for a combined fast motion estimation algorithm with the PHS and the EIMD is proposed. The proposed architecture applies a large search range and low operation frequency as compared with other popular ME architectures. The proposed architecture only needs 19.4 MHz operating frequency to achieve real time execution for the general specification of the SDTV (720×480) used with four reference frames and the search range of 256×256. The proposed architecture only requires 116.6 MHz operating frequency to achieve real time execution for the ultra high specification of the QFHD (3840×2160) used with one reference frame and the search range of 256×256. The gate count of the proposed architecture is 300K, and the memory usage is 12.6KB.
Second, we propose a new processing order and the architecture design for deblocking filter. The proposed processing order, double-cross processing order, is effectively constructed by a parallel flow to improve processing speed and reduce memory access. Moreover, the proposed architecture can save about 38-80% of memory access as compared with other designs. Based on this high efficient architecture, the processing performance can be enhanced, and the operation frequency for standardized video specifications can be reduced. For the general video specification HDTV1080p (1920?1080 @30fps), the operation frequency of the proposed architecture is only 11.5 MHz. For the high resolution QFHD specification (3840?2160 @30fps), the operation frequency of the proposed architecture is only 46.6 MHz. The implementation result is about 20.14K gates, and the memory requirement is 64?32 bits. The power dissipation for QFHD specification is 7.7 mW at 46.6 MHz operating frequency. For the whole H.264 encoder, we propose a HW/SW co-design scheme which uses our pervious proposed ME and deblocking filter machines. At final, we propose a power aware scheme for the whole H.264 encoder. The proposed power aware design can save about 9%~87% of power consumption while the power budget is used.
[1] Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification, ITU-T Rec. H.264 and ISO/IEC 14 496-10 AVC, Joint Video Team, Mar. 2003.
[2] “H.264/MPEG-4 Part 10: Overview,” H.264/MPEG-4 Part 10 White Paper, http://www.vcodex.com.
[3] “H.264/MPEG-4 Part 10: Inter Prediction,” H.264/MPEG-4 Part 10 White Paper, http://www.vcodex.com.
[4] S. Zhu and K. K. Ma, “A New Diamond Search Algorithm for Fast Block-Matching Motion Estimation,” IEEE Trans. on Image Processing, vol. 9, no. 2, pp. 287-290, Feb. 2000.
[5] J. Y. Tham, S. Ranganath, M. Ranganath, and A. A. Kassim, “A Novel Unrestricted Center-Biased Diamond Search Algorithm for Block Motion Estimation,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 8, no. 4, pp. 369-377, Aug. 1998.
[6] C. Zhu, X. Lin, and L. P. Chau, “A Novel Hexagon-Based Search Algorithm for Fast Block Motion Estimation,” in Proc. IEEE Acoustics, Speech ,and Signal Processing Conf., vol. 3, May 2001, pp. 1593-1596.
[7] C. Zhu, X. Lin, and L. P. Chau, “Hexagon-Based Search Pattern for Fast Block Motion Estimation,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 12, no. 5, pp. 349-355, May 2002.
[8] Z. Chen, P. Zhou, and Y. He “Fast Integer Pel and Fractional Pel Motion Estimation for AVC,” presented at the 6th JVT-Fo17 Meeting, Awaji Island, Japan, Dec. 2002.
[9] D. Wu, F. Pan, K. P. Lim, S. Wu, Z. G. Li, X. Lin, S. Rahardja, and C. C. Ko, “Fast Intermode Decision in H.264/AVC Video Coding,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 15, no. 7, pp. 953-958, July 2005.
[10] G. L. Li, M. J. Chen, H. J. Li, and C. T. Hsu, “Efficient Search and Mode Prediction Algorithms for Motion Estimation in H.264/AVC,” in Proc. IEEE Circuits and Systems Conf., vol. 6, May 2005, pp. 5481-5484.
[11] S. Y. Yap and J. V. McCanny, “A VLSI architecture for variable block size video motion estimation,” IEEE Trans. Circuits Syst. II, Expr. Briefs, vol. 51, no. 7, pp. 384–389, Jul. 2004.
[12] M. Kim, I. Hwang, and S.-I. Chae, “A fast vlsi architecture for fullsearch variable block size motion estimation in MPEG-4 AVC/H.264,” in Proc. Asia and South Pacific Design Automation Conf., Jan. 2005, vol. 1, pp. 631–634.
[13] Y. W. Huang, T. C.Wang, B. Y. Hsieh, and L. G. Chen, “Hardware architecture design for variable block size motion estimation in MPEG-4 AVC/JVT/ITU-T H.264,” in Proc. IEEE Int. Symp. Circuits and Systems (ISCAS’03), May 2003, vol. 2, pp. II796–II799.
[14] T. C. Chen, S. Y. Chien, Y. W. Huang, C. H. Tsai, C. Y. Chen, T. W. Chen, and L. G. Chen, “Analysis and architecture design of an HDTV720p 30frames/s H.264/AVC encoder,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 16, no. 6, pp. 673-688, June 2006.
[15] T. H. Tsai and Y. N. Pan, “A Novel Predict Hexagon Search Algorithm for Fast Block Motion Estimation on H.264 Video Coding,” in Proc. IEEE Asia-Pacific Circuits and Systems Conf., vol. 1, Dec. 2004, pp. 609-612.
[16] T. H. Tsai and Y. N. Pan, “A 3D Predict Hexagon Search Algorithm for Fast Block Motion Estimation on H.264 Video Coding,” in Proc. IEEE Multimedia and Expo Conf., July 2005, pp. 658-661.
[17] T. H. Tsai and Y. N. Pan “A Novel 3-D Predict Hexagon Search Algorithm for Fast Block Motion Estimation on H.264 Video Coding,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 16, no. 12, pp. 1542-1549, Dec. 2006.
[18] Y. N. Pan, and T. H. Tsai, “Fast Motion Estimation and Edge Information Inter-Mode Decision on H.264 Video Coding,” in Proc. IEEE Image Processing, vol. 2, pp. 473-476, Sept. 2007.
[19] Joint Video Team software JM10.2, July 2007.
[20] MPEG-4 Video Verification Model Version 18.0, ISO/IEC JTC1/SC29/WG11 N 3908, Jan. 2001.
[21] Gonzale, R. C. and R. E. Woods 2002 " Digital Image Processing. " 2nd Ed. Prentice Hall, Inc.
[22] S. Y. Huang, C. Y. Cho, and J. S. Wang, “Adaptive Fast Block-Matching Algorithm by Switching Search Patterns for Sequences with Wide-Range Motion Content,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 15, no. 11, pp. 1373-1384, Nov. 2005.
[23] Z. L. He, C. Y. Tsui, K. K. Chan, and M. L. Liou,“Low-power VLSI design for motionestimation using adaptive pixel truncation,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 10, no. 5, pp. 669–678, Aug.2000.
[24] C. Y. Kao and Y. L. Lin, “A High-Performance and Memory-Efficient Architecture for H.264/AVC Motion Estimation.” in Proc. IEEE Multimedia and Expo Conf., July 2008, pp. 141-144.
[25] C. C. Lin, Y, K, Lin, and T. S. Chang “PMRME: A Parallel Multi-Resolution Motion Estimation Algorithm and Architecture for HDTV Sizes H.264 Video Coding.” in Proc. IEEE Acoustics, Speech ,and Signal Processing Conf., May 2007, vol. 2, pp. 385-388.
[26] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560-576, Jul. 2003.
[27] P. List, A. Joch, J. Lainema, G. Bjontegaard, and M. Karczewicz, “Adaptive deblocking filter,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 614-619, Jul. 2003.
[28] M. Horowitz, A. Joch, F. Kossentini, and A. Hallapuro, “H.264/AVC baseline profile decoder complexity analysis,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 704-716, Jul. 2003.
[29] Y. W. Huang, T. W. Chen, B. Y. Hsieh, T. C. Wang, T. H. Chang, and L. G. Chen, “Architecture design for deblocking filter in H.264/JVT/AVC,” in Proc. IEEE Int. Conf. Multimedia Expo, vol. 1, pp. 693-696, Jul. 2003.
[30] B. Sheng, W. Gao, and D. Wu, “An implemented architecture of deblocking filter for H.264/AVC,” in Proc. IEEE Int. Conf. Image Processing, vol. 1, pp. 665-668, Oct. 2004.
[31] C. M. Chen and C. H. Chen, “An efficient VLSI architecture for edge filtering in H.264/AVC,” in Proc. IASTED Int. Conf. Circuits, Signals, Syst., pp. 118-122, Oct. 2005.
[32] C. M. Chen and C. H. Chen, “An efficient architecture for deblocking filter in H.264/AVC video coding,” in Proc. IASTED Int. Conf. Comput. Graphics Imaging, pp. 177-181, Aug. 2005.
[33] L. Li, S. Goto, and T. Ikenaga, “A highly parallel architecture for deblocking filter in H.264/AVC,” IEICE Trans. Inf. Syst., vol.E88-D, no. 7, pp. 1623-1629, Jul. 2005.
[34] M. Parlak and I. Hamzaoglu, “Low power H.264 deblocking filter hardware implementations,” IEEE Trans. Consumer Electron., vol. 54, no. 2, pp. 808-816, May 2008.
[35] C. C. Cheng and T. S. Chang, “An hardware efficient deblocking filter for H.264/AVC,” in Proc. IEEE Int. Conf. Consumer Electron., pp. 235-236, Jan. 2005.
[36] S. Y. Shih, C. R. Chang, and Y. L. Lin, “A near optimal deblocking filter for H.264 advanced video coding,” in Proc. Asia and South Pacific Design Automation Conf., pp. 170-175, Jan. 2006.
[37] C. C. Cheng, T. S. Chang, and K. B. Lee, “An in-place architecture for deblocking filter in H.264/AVC,” IEEE Trans. Circuits Syst. II, vol. 53, no. 7, pp. 530-534, Jul. 2006.
[38] S. Lee and K. Cho, “An efficient architecture of high-performance deblocking filter for H.264/AVC source,” IEICE Trans. Fundamentals of Electron. Commun. and Comput. Sciences, vol. E89-A, no. 6, pp. 1736-1739, Jun. 2006.
[39] C. M. Chen and C. H. Chen, “Configurable VLSI architecture for deblocking filter in H.264/AVC,” IEEE Trans. Very Large Scale Integration (VLSI) Syst., vol. 16, no. 8, pp. 1072-1082, Aug. 2008.
[40] F. Tobajas, G. M. Callico, P. A. Perez, V. de Armas, and R. Sarmiento, “An efficient double-filter hardware architecture for H.264/AVC deblocking filtering,” IEEE Trans. Consumer Electron., vol. 54, no. 1, pp. 131-139, Feb. 2008.
[41] C. M. Chen and C. H. Chen, “A memory efficient architecture for deblocking filter in H.264 using vertical processing order,” in Proc. IEEE Int. Conf. Intell. Sensors, Sensor Netw. Inf. Process., pp. 361-366, Dec. 2005.
[42] N. Ta, J. Youn, H. Kim, J. Choi, and S. S. Han, “Low-power high-throughput deblocking filter architecture for H.264/AVC,” in Proc. IEEE Int. Conf. Electron. Comput. Technol., pp. 627-631, Feb. 2009.
[43] T. C. Chen, S. Y. Chien, Y. W. Huang, C. H. Tsai, C. Y. Chen, T. W. Chen, and L. G. Chen, “Analysis and architecture design of an HDTV720p 30frames/s H.264/AVC encoder,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 6, pp. 673-688, June 2006.
[44] Y. K. Lin, D. W. Li, C. C. Lin, T. Y. Kuo, S. J. Wu, W. C. Tai, W. C. Chang, and T. S. Chang, "A 242mW 10mm2 1080p H.264/AVC High-Profile Encoder Chip," in Proc. IEEE Int. Conf. Solid-State Circuits , pp.314-615, Feb. 2008.
[45] L. Zhuo, Q. Wang, D. D. Feng, and L. Shen, "Optimization and Implementation of H.264 Encoder on DSP Platform," in Proc. IEEE Int. Conf. Multimedia and Expo , pp.232-235, July 2007.
[46] L. Ma and J. Song, "The Optimization Implementation of H.264 Video Encoder Based on General DSP," in Proc. Int. Conf. Intelligent Networks and Intelligent Systems , pp.616-620, Nov. 2008.
[47] T. Chattopadhyay, S. Banerjee, and A. Pal, "Enhancements of H.264 Encoder performance for video conferencing and videophone applications in TMS320C55X," in Proc. IEEE Int. Conf. Consumer Electronics, pp.1-6, Jan. 2006.
[48] Z. Shi, L. Xiaozhi, J. Tianguang, and Q. Jinlong, "Transplant and Optimization of H.264 Encoder Based on DSP Platform," in Proc. Int. Conf. Wireless Communications, Networking and Mobile Computing, pp.1-5, Oct. 2008.
[49] M. R. Mohammadnia, H. Taheri, and S. A. Motamedi, "Implementation and Optimization of Real-Time H.264/AVC Main Profile Encoder on DM648 DSP," in Proc. IEEE Int. Conf. Signal Acquisition and Processing, pp.48-52, April 2009.
[50] W. C. Chang, G. L. Li, and T. S. Chang, “Power-Aware Coding for H.264/AVC Video Encoder” in Proc. Int. Conf. VLSI/CAD, Aug. 2009.
[51] H. C. Chang, J. W. Chen, B. T. Wu, C. L. Su, J. S. Wang, and J. I. Guo, "A Dynamic Quality-Adjustable H.264 Video Encoder for Power-Aware Video Applications," IEEE Trans. Circuits Syst. Video Technol., vol.19, no.12, pp.1739-1754, Dec. 2009.
[52] C. J. Lian, S. Y. Chien, C. P. Lin, P. C. Tseng, and L. G. Chen, "Power-Aware Multimedia: Concepts and Design Perspectives," IEEE Trans. Circuits Syst. Magazine, vol.7, no.2, pp.26-34, 2007.
[53] C. Y. Chang, J. J. Leou, S. S. Kuo, and H. Y. Chen, "A new computation-aware scheme for motion estimation in H.264," in Proc. IEEE Int. Conf. Comput. and Inf. Technol., pp.561-565, July 2008.
[54] A. K. Kannur and B. Li; , "Power-aware content-adaptive H.264 video encoding," in Proc. IEEE Int. conf. Acoustics, Speech ,and Signal Processing Conf., pp.925-928, April 2009.