| 研究生: |
蘇筱凌 Hsiao-Ling Su |
|---|---|
| 論文名稱: |
於對話中定位特定發音之研究 – 以滿意為例 Locating Satisfaction in Vocal Dialogue |
| 指導教授: | 許秉瑜 |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 企業管理學系 Department of Business Administration |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 54 |
| 中文關鍵詞: | 關鍵字搜尋 、顧客滿意度 、梅爾倒頻譜係數 、交叉注意力機制 、語音辨識 |
| 外文關鍵詞: | Keyword search, Customer satisfaction, Mel-frequency cepstral coefficients, Cross-attention mechanism, Speech recognition |
| 相關次數: | 點閱:12 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在企業中,了解顧客對產品或服務的滿意度對於提高顧客的再購率和推薦意願至關重要。因此,建立一種有效率的語音辨識方法,能夠準確分析客服語音,成為一個迫切的需求。然而,在長語音訊號中定位出顧客滿意情緒的聲音位置是一項具有挑戰性的任務。
本研究旨在將關鍵字搜索與交叉注意力的技術相結合,以有效定位出特定聲音位置。研究中採用了包含不同說話者聲音的特定發音資料集以及業界電話訪談聲音資料集,透過對這些聲音資料進行分析和交叉匹配,目標是找到長語音訊號中正向或負向滿意情緒的聲音位置。在研究過程中,首先對這些資料進行資料前處理和聲音特徵萃取,接著,運用交叉注意力模型,將處理後的資料輸入其中,透過計算兩不同特徵向量之間的注意力分數,定位出具有最高注意力分數的滿意聲音位置。
實驗結果顯示,濾波器組數量和位移步伐參數是影響命中率的重要因素,根據研究結果顯示,在不同的參數設置下,最佳參數為濾波器組數量30且位移步伐10的設置表現最佳,評估指標HR@5達到95.08%,HR@3達到84.15%,HR@1達到60.11%。
In the business, understanding customer satisfaction with products or services is crucial for improving customer repurchase rates and willingness to recommend. Therefore, establishing an efficient method of speech recognition that can accurately analyze customer service voice becomes an urgent requirement. However, locating the dialogues of customer satisfaction emotions within long speech signals is a challenging task.
This research aims to combine keyword search with cross-attention techniques to effectively locate satisfaction vocal dialogue. The research utilizes specific pronunciation datasets containing voices from different speakers, as well as business telephone interview voice datasets. By analyzing and cross-matching these voice data, the goal is to find the dialogues of satisfied vocals conveying positive or negative emotions in long speech signals. In the research process, the data undergo preprocessing and feature extraction, followed by the application of a cross-attention model to input the processed data. By calculating the attention scores between different features, we can locate the dialogues of satisfied vocals with the highest attention scores.
The experimental results demonstrate that the number of filter banks and the shift stride parameters are important factors affecting the hit ratio. According to the research findings, the optimal parameters are a filter banks quantity of 30 and a shift stride of 10, achieving the best performance across different evaluation metrics. The HR@5 reaches 95.08%, HR@3 reaches 84.15%, and HR@1 reaches 60.11%.
[1] Oliver, R. L. (1980). A cognitive model of the antecedents and consequences of satisfaction decisions. Journal of marketing research, 17(4), 460-469.
[2] Fornell, C., Johnson, M. D., Anderson, E. W., Cha, J., & Bryant, B. E. (1996). The American customer satisfaction index: nature, purpose, and findings. Journal of marketing, 60(4), 7-18.
[3] Groves, R. M., & Mathiowetz, N. A. (1984). Computer assisted telephone interviewing: Effects on interviewers and respondents. Public Opinion Quarterly, 48(1B), 356-369.
[4] Randolph, J. J., Virnes, M., Jormanainen, I., & Eronen, P. J. (2006). The effects of a computer-assisted interview tool on data quality. Journal of Educational Technology & Society, 9(3), 195-205.
[5] Park, Y., Teiken, W., & Gates, S. C. (2009). Low-Cost Call Type Classification for Contact Center Calls Using Partial Transcripts. In Tenth Annual Conference of the International Speech Communication Association.
[6] Shan, C., Zhang, J., Wang, Y., & Xie, L. (2018). Attention-based end-to-end models for small-footprint keyword spotting. arXiv preprint arXiv:1803.10916.
[7] Churchill Jr, G. A., & Surprenant, C. (1982). An investigation into the determinants of customer satisfaction. Journal of marketing research, 19(4), 491-504.
[8] Kang, D., & Park, Y. (2014). based measurement of customer satisfaction in mobile service: Sentiment analysis and VIKOR approach. Expert Systems with Applications, 41(4), 1041-1050.
[9] González-Rodríguez, M. R., Díaz-Fernández, M. C., & Gómez, C. P. (2020). Facial-expression recognition: An emergent approach to the measurement of tourist satisfaction through emotions. Telematics and Informatics, 51, 101404.
[10] Hempel, D. J. (1977). Consumer satisfaction with the home buying process: Conceptualization and measurement. The conceptualization of consumer satisfaction and dissatisfaction, 7.
[11] Day, R. L. (1984). Modeling choices among alternative responses to dissatisfaction. ACR North American Advances.
[12] Fornell, C. (1992). A national customer satisfaction barometer: The Swedish experience. Journal of marketing, 56(1), 6-21.
[13] Kotler, P., & Armstrong, G. (1994). Marketing management, analysis, planning, implementation, and control, Philip Kotler. London: Prentice-Hall International.
[14] Caruana, A., Money, A. H., & Berthon, P. R. (2000). Service quality and satisfaction–the moderating role of value. European Journal of marketing.
[15] Chen, G., Parada, C., & Heigold, G. (2014, May). Small-footprint keyword spotting using deep neural networks. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4087-4091). IEEE.
[16] Shan, C., Zhang, J., Wang, Y., & Xie, L. (2018). Attention-based end-to-end models for small-footprint keyword spotting. arXiv preprint arXiv:1803.10916.
[17] Berg, A., O'Connor, M., & Cruz, M. T. (2021). Keyword transformer: A self-attention model for keyword spotting. arXiv preprint arXiv:2104.00769.
[18] Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 28(4), 357-366.
[19] Tiwari, V. (2010). MFCC and its applications in speaker recognition. International journal on emerging technologies, 1(1), 19-22.
[20] Zheng, F., Zhang, G., & Song, Z. (2001). Comparison of different implementations of MFCC. Journal of Computer science and Technology, 16, 582-589.
[21] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
[22] Guan, W., Wu, Z., & Ping, W. (2022, January). Question-oriented cross-modal co-attention networks for visual question answering. In 2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE) (pp. 401-407). IEEE.
[23] Chorowski, J. K., Bahdanau, D., Serdyuk, D., Cho, K., & Bengio, Y. (2015). Attention-based models for speech recognition. Advances in neural information processing systems, 28.
[24] Graves, A., Wayne, G., & Danihelka, I. (2014). Neural turing machines. arXiv preprint arXiv:1410.5401.
[25] 沈依.利用 LSTM 建立聲音滿意度辨識模型. 2020. PhD Thesis. National Central University.