基於時頻感知域經由深度信念網路之吉他彈奏技巧辨識

簡易檢索 / 詳目顯示

回結果列表

研究生：	劉郁廷 Yu-ting Liu
論文名稱：	基於時頻感知域經由深度信念網路之吉他彈奏技巧辨識 Recognition of Guitar Playing Techniques with Deep Belief Networks based on Spectral-Temporal Receptive Fields
指導教授：	張寶基 Pao-chi Chang 王家慶 Jia-Ching Wang
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 通訊工程學系 Department of Communication Engineering
論文出版年：	2015
畢業學年度：	103
語文別：	中文
論文頁數：	71
中文關鍵詞：	聽覺模型、吉他彈奏技巧、分類、辨識、類神經網路、深度學習
外文關鍵詞：	STRF, Guitar Playing Technique, Classification, Recognition, Neural Network, Deep Belief Network
相關次數：	點閱：16 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

吉他是非常常見的樂器，被廣泛運用於流行音樂、搖滾樂、民謠…等，學習吉他成為許多人的興趣。而不同吉他彈奏技巧能夠表現不同聲音、展示不同情緒，進而拼湊成一幅樂章。
吉他彈奏技巧的變化相當細微，欲將其分類、辨識是具有挑戰性的工作。對於不熟悉吉他的人而言，技巧聽起來十分相像；而會彈吉他的人，便能單憑聆聽就區分出不同技巧。
面對彈奏技巧些微的變化，本研究提出以深度學習網路(Deep Belief Networks, DBN)學習音訊特徵，包含梅爾倒頻譜系數(MFCCs)及大腦皮質組織(spectro-temporal receptive field)，藉由不同初始化方法與新提出的深度學習網路架構，學習找出相對關鍵的特徵增加辨識效果，並使用完整音檔和Onset部分進行比較。實驗結果顯示，本研究提出之方法於Onset部分最高提升11.74%之辨識率，而完整音檔的部分，辨識率更為精準，到達0.9819。說明有效運用特徵參數及辨認器，相較於大量參數，更能準確分類資訊。

Guitar is a very common instrument which has been widely used in popular music, rock, ballad, etc. Different guitar playing technique can perform various vocal, express different emotion, then play the wonderful music. Some of guitar playing techniques has only tiny difference. To recognize it is a big challenge. This paper proposed a guitar playing technique recognition system including a novel STRF based feature extraction algorithm and a novel deep learning model called HCDBN. In experiments, the proposed system improves 11.74% recognition rate than baseline system on onset version dataset and achieves 98.19% recognition rate on whole version dataset. This paper also make an onset detection based guitar technique recognition system which can applied in real world guitar solo music.

摘　要    I
Abstract    II
致 謝    III
目　錄    IV
附圖索引    VI
附表索引    VIII
第一章 緒論    1
1　研究背景    1
2　研究動機與目的    2
3　論文架構    2
第二章 聽覺感知模型    3
1　聽覺感知模型    3
2　初期耳蝸模型    4
3 大腦皮質模型    6
4 STRF參數擷取    8
4.1 Scale參數擷取    8
4.2 Rate參數擷取    8
第三章 深度信念網路    11
1　深度信念網路    11
2 Generative Restricted Boltzmann Machines    15
3 Discriminative Restricted Boltzmann Machines    19
4 Initialization    21
5 Softmax    22
第四章　DBN架構與實驗結果    24
1　系統架構    24
1.1 Scheme1：Original DBN    24
1.2 Scheme2：LDDBN    26
1.3 Scheme3：LFDBN    27
1.4 Scheme4：HDDBN    28
1.5 Scheme5：HCDBN    29
1.6 架構比較    31
2　實驗數據    32
2.1 五種架構在Split資料庫的結果    35
2.2 Split資料庫以STRF參數擷取    36
2.3 觀察一：Onset部份的時頻圖變異數    37
2.4 五種架構在Whole資料庫的結果    39
2.5 Whole資料庫以STRF參數擷取    40
2.6 觀察二：完整音檔的時頻圖變異數    40
2.7 分為19子類別的結果    43
2.8 Split資料庫所有架構、參數、分類之組合比較總表    45
2.9 Whole資料庫所有架構、參數、分類之組合比較總表    48
2.10 真實音檔    52
第五章　結論及未來展望    55
參考文獻    56

                                

[1] T.S. Chi, P. Ru and S. Shamma, “Multiresolution spectrotemporal analysis of complex sounds,” , Journal of the Acoustical Society of America, vol. 118, no. 2, pp 887-906, 2005.
[2] Washington Neural Systems Laboratory Available on: http://neural.cs.washington.edu/
[3] Auditory Pathway:
http://www.edoctoronline.com/medical-atlas.asp?c=4&id=21838&m=3
[4] D. E. Rumelhart, G. E. Hinton, R. J. Williams, “Learning representations by back-propagating errors,” Nature 323 (6088): 533–536, 8 October 1986.
[5] D. H. Ackley, G. E. Hinton, T. J. Sejnowski, “A Learning Algorithm for Boltzmann Machines,” In D. E. Rumelhart, J. L. McClelland, and the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations (Cambridge: MIT Press): 282–317. 1985.
[6] P. Smolensky, Parallel Distributed Processing: Volume 1:Foundations, D. E. Rumelhart, J. L. McClelland, Eds. (MIT Press, Cambridge, 1986), pp. 194–281
[7] A. Mnih, and G. E. Hinton, “Learning Unreliable Constraints using Contrastive Divergence,” In IJCNN 2005, Montreal.
[8] V. Nair, and G. E. Hinton, “3-D Object recognition with deep belief nets,” Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. lafferty, C. K. I. Williams, and A. Culotta (Eds.), pp 1339-1347.
[9] A. R. Mohamed, G. E. Dahl, and G. E. Hinton, “Deep belief networks for phone recognition,” NIPS 22 workshop on Deep Learning for Speech Recognition.
[10] G. E. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, Navdeep Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, “Deep Neural Networks for Acoustic Modeling in Speech Recognition,” IEEE Signal Processing Magazine, November, 2012.
[11] In Rumelhart, David E.; McLelland, James L. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations. MIT Press. pp. 194–281. ISBN 0-262-68053-X.
[12] Mohammad Ali Keyvanrad, Mohammad Mehdi Homayounpour:
“A brief survey on deep belief networks and introducing a new object oriented MATLAB toolbox (DeeBNet) ”, CoRR abs/1408.3264 (2014)
[13] R. Salakhutdinov and G. E. Hinton, “Deep boltzmann machines,” in Proceedings of the international conference on artificial intelligence and statistics, 2009, vol. 5, pp. 448–455.
[14] C. M. Bishop, Pattern Recognition and Machine Learning, 1st ed. 2006. Corr. 2nd printing. Springer, 2007.
[15] G. Hinton, “A practical guide to training restricted boltzmann machines,” Machine Learning Group, University of Toronto, Technical report, 2010.
[16] Hugo Larochelle , Yoshua Bengio, “Classification using discriminative restricted Boltzmann machines”, Proceedings of the 25th international conference on Machine learning, p.536-543, July 05-09, 2008, Helsinki, Finland [doi>10.1145/1390156.1390224]
[17] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification”, arXiv:1502.01852 [cs.CV]
[18] A. Krizhevsky, I. Sutskever, and G. Hinton. “Imagenet classification with deep convolutional neural networks”, In NIPS, 2012.
[19] K. Simonyan and A. Zisserman “Very deep convolutional networks for large-scale image recognition”, arXiv:1409.1556, 2014.
[20] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions”, arXiv:1409.4842, 2014.
[21] C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu. “Deeply supervised nets”, arXiv:1409.5185, 2014.
[22] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks”, In International Conference on Artificial Intelligence and Statistics, pages 249–256, 2010.
[23] Softmax回歸：http://ufldl.stanford.edu/wiki/index.php/Softmax%E5%9B%9E%E5%BD%92
[24] Li Su, Li-Fan Yu, and Yi-Hsuan Yang, “Sparse Cepstral and Phase Codes for Guitar Playing Technique Classification”, in 15th International Society for Music Information Retrieval Conference, Taipei, Taiwan, Oct. 2014.
[25] Christian Kehling, Jakob Abeßer, Christian Dittmar, and Gerald Schuller, “Automatic tablature transcription of electric guitar recordings by estimation of score and instrument-related parameters”, In Proc. Int. Conf. Digital Audio Effects, 2014.

簡易檢索 / 詳目顯示

相關論文