| 研究生: |
劉郁廷 Yu-ting Liu |
|---|---|
| 論文名稱: |
基於時頻感知域經由深度信念網路之吉他彈奏技巧辨識 Recognition of Guitar Playing Techniques with Deep Belief Networks based on Spectral-Temporal Receptive Fields |
| 指導教授: |
張寶基
Pao-chi Chang 王家慶 Jia-Ching Wang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 通訊工程學系 Department of Communication Engineering |
| 論文出版年: | 2015 |
| 畢業學年度: | 103 |
| 語文別: | 中文 |
| 論文頁數: | 71 |
| 中文關鍵詞: | 聽覺模型 、吉他彈奏技巧 、分類 、辨識 、類神經網路 、深度學習 |
| 外文關鍵詞: | STRF, Guitar Playing Technique, Classification, Recognition, Neural Network, Deep Belief Network |
| 相關次數: | 點閱:16 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
吉他是非常常見的樂器,被廣泛運用於流行音樂、搖滾樂、民謠…等,學習吉他成為許多人的興趣。而不同吉他彈奏技巧能夠表現不同聲音、展示不同情緒,進而拼湊成一幅樂章。
吉他彈奏技巧的變化相當細微,欲將其分類、辨識是具有挑戰性的工作。對於不熟悉吉他的人而言,技巧聽起來十分相像;而會彈吉他的人,便能單憑聆聽就區分出不同技巧。
面對彈奏技巧些微的變化,本研究提出以深度學習網路(Deep Belief Networks, DBN)學習音訊特徵,包含梅爾倒頻譜系數(MFCCs)及大腦皮質組織(spectro-temporal receptive field),藉由不同初始化方法與新提出的深度學習網路架構,學習找出相對關鍵的特徵增加辨識效果,並使用完整音檔和Onset部分進行比較。實驗結果顯示,本研究提出之方法於Onset部分最高提升11.74%之辨識率,而完整音檔的部分,辨識率更為精準,到達0.9819。說明有效運用特徵參數及辨認器,相較於大量參數,更能準確分類資訊。
Guitar is a very common instrument which has been widely used in popular music, rock, ballad, etc. Different guitar playing technique can perform various vocal, express different emotion, then play the wonderful music. Some of guitar playing techniques has only tiny difference. To recognize it is a big challenge. This paper proposed a guitar playing technique recognition system including a novel STRF based feature extraction algorithm and a novel deep learning model called HCDBN. In experiments, the proposed system improves 11.74% recognition rate than baseline system on onset version dataset and achieves 98.19% recognition rate on whole version dataset. This paper also make an onset detection based guitar technique recognition system which can applied in real world guitar solo music.
[1] T.S. Chi, P. Ru and S. Shamma, “Multiresolution spectrotemporal analysis of complex sounds,” , Journal of the Acoustical Society of America, vol. 118, no. 2, pp 887-906, 2005.
[2] Washington Neural Systems Laboratory Available on: http://neural.cs.washington.edu/
[3] Auditory Pathway:
http://www.edoctoronline.com/medical-atlas.asp?c=4&id=21838&m=3
[4] D. E. Rumelhart, G. E. Hinton, R. J. Williams, “Learning representations by back-propagating errors,” Nature 323 (6088): 533–536, 8 October 1986.
[5] D. H. Ackley, G. E. Hinton, T. J. Sejnowski, “A Learning Algorithm for Boltzmann Machines,” In D. E. Rumelhart, J. L. McClelland, and the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations (Cambridge: MIT Press): 282–317. 1985.
[6] P. Smolensky, Parallel Distributed Processing: Volume 1:Foundations, D. E. Rumelhart, J. L. McClelland, Eds. (MIT Press, Cambridge, 1986), pp. 194–281
[7] A. Mnih, and G. E. Hinton, “Learning Unreliable Constraints using Contrastive Divergence,” In IJCNN 2005, Montreal.
[8] V. Nair, and G. E. Hinton, “3-D Object recognition with deep belief nets,” Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. lafferty, C. K. I. Williams, and A. Culotta (Eds.), pp 1339-1347.
[9] A. R. Mohamed, G. E. Dahl, and G. E. Hinton, “Deep belief networks for phone recognition,” NIPS 22 workshop on Deep Learning for Speech Recognition.
[10] G. E. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, Navdeep Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, “Deep Neural Networks for Acoustic Modeling in Speech Recognition,” IEEE Signal Processing Magazine, November, 2012.
[11] In Rumelhart, David E.; McLelland, James L. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations. MIT Press. pp. 194–281. ISBN 0-262-68053-X.
[12] Mohammad Ali Keyvanrad, Mohammad Mehdi Homayounpour:
“A brief survey on deep belief networks and introducing a new object oriented MATLAB toolbox (DeeBNet) ”, CoRR abs/1408.3264 (2014)
[13] R. Salakhutdinov and G. E. Hinton, “Deep boltzmann machines,” in Proceedings of the international conference on artificial intelligence and statistics, 2009, vol. 5, pp. 448–455.
[14] C. M. Bishop, Pattern Recognition and Machine Learning, 1st ed. 2006. Corr. 2nd printing. Springer, 2007.
[15] G. Hinton, “A practical guide to training restricted boltzmann machines,” Machine Learning Group, University of Toronto, Technical report, 2010.
[16] Hugo Larochelle , Yoshua Bengio, “Classification using discriminative restricted Boltzmann machines”, Proceedings of the 25th international conference on Machine learning, p.536-543, July 05-09, 2008, Helsinki, Finland [doi>10.1145/1390156.1390224]
[17] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification”, arXiv:1502.01852 [cs.CV]
[18] A. Krizhevsky, I. Sutskever, and G. Hinton. “Imagenet classification with deep convolutional neural networks”, In NIPS, 2012.
[19] K. Simonyan and A. Zisserman “Very deep convolutional networks for large-scale image recognition”, arXiv:1409.1556, 2014.
[20] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions”, arXiv:1409.4842, 2014.
[21] C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu. “Deeply supervised nets”, arXiv:1409.5185, 2014.
[22] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks”, In International Conference on Artificial Intelligence and Statistics, pages 249–256, 2010.
[23] Softmax回歸:http://ufldl.stanford.edu/wiki/index.php/Softmax%E5%9B%9E%E5%BD%92
[24] Li Su, Li-Fan Yu, and Yi-Hsuan Yang, “Sparse Cepstral and Phase Codes for Guitar Playing Technique Classification”, in 15th International Society for Music Information Retrieval Conference, Taipei, Taiwan, Oct. 2014.
[25] Christian Kehling, Jakob Abeßer, Christian Dittmar, and Gerald Schuller, “Automatic tablature transcription of electric guitar recordings by estimation of score and instrument-related parameters”, In Proc. Int. Conf. Digital Audio Effects, 2014.