跳到主要內容

簡易檢索 / 詳目顯示

研究生: 范凱翔
Kai-shiang Fan
論文名稱: 應用序列樣式探勘於軟體版本歷史之研究
A study of applying sequential-pattern miningto software version histories
指導教授: 林熙禎
Shi-jen Lin
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
畢業學年度: 97
語文別: 中文
論文頁數: 47
中文關鍵詞: 版本控制系統循序探勘軟體演進軟體工程
外文關鍵詞: Software Engineering, Version Control System (VCS), Sequential-pattern mining, Software evolution
相關次數: 點閱:23下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現代軟體不斷的演進,雖然在版本控制系統中有詳細的軟體版本異動資訊,
    但是對於日益複雜的軟體結構的理解仍是有限的,同時,一個專案動輒有數千甚
    至上萬行程式碼,所以使得軟體維護成為一項難題。目前對於版本控制系統的資
    料挖掘研究多使用關聯規則,常會忽略可能存在的程式異動順序關係,因此本研
    究加入時間維度,從網路上開放的CVS程式庫取得資料並利用循序探勘技術加以
    分析,找出潛在的序列樣式,與先前相似的研究相比能更精確的指出軟體專案需
    變動的「個體」,並以一個分類規則檢視序列樣式對於使用者的價值,以提供未
    來軟體維護的參考。


    The evolution of the modern software is continual. Although detailed
    information of the evolution of the software version is stored in the version control
    system (VCS), the understanding of more and more complex software structure is still
    finite. On the other hand, lines of code in the software project are usually hundreds of
    thousands, which makes the software maintenance a difficult problem. The researches
    of applying data mining techniques to VCS are usually based on association rules,
    which usually pass over the ordering information. As a result, by taking the time
    dimension of the software data from the Concurrent Version System (CVS) into
    consideration, this study uses the sequential-pattern mining technique to analyze and
    find out the potential sequence pattern. We expect the “entities” to be changed more
    precisely than those in previous researches and re-evaluate the sequence pattern for
    the users by means of a classification rule in order to provide the reference of the
    software maintenance in the future.

    目錄 摘要................................................................................................................................. i Abstract ......................................................................................................................... ii 誌謝.............................................................................................................................. iii 目錄............................................................................................................................... iv 圖目錄........................................................................................................................... vi 表目錄.......................................................................................................................... vii 第一章 緒論 ................................................................................................................. 1 1.1 研究背景......................................................................................................... 1 1.2 研究動機......................................................................................................... 3 1.3 研究目的......................................................................................................... 4 1.4 研究方法......................................................................................................... 5 1.5 論文架構......................................................................................................... 6 第二章 文獻探討 ......................................................................................................... 7 2.1 軟體工程資料種類......................................................................................... 7 2.2 資料探勘於軟體工程之應用......................................................................... 8 2.2.1 關聯規則.............................................................................................. 8 2.2.2 序列規則.............................................................................................. 9 2.2.3 關聯規則與序列規則的比較............................................................ 10 2.2.4 其他方法............................................................................................ 11 2.3 序列樣式探勘............................................................................................... 11 2.3.1 Apriori-like .......................................................................................... 12 2.3.2 Pattern-growth..................................................................................... 13 2.3.3 時序群集............................................................................................ 15 2.4 小結............................................................................................................... 15 v 第三章 系統平台設計 ............................................................................................... 16 3.1 系統架構....................................................................................................... 16 3.2 資料蒐集....................................................................................................... 17 3.3 資料前處理................................................................................................... 18 3.4 序列樣式探勘............................................................................................... 19 3.5 序列樣式探討............................................................................................... 24 第四章 實驗結果與討論 ........................................................................................... 26 4.1 實驗環境....................................................................................................... 26 4.2 實驗對象....................................................................................................... 26 4.3 實驗設計....................................................................................................... 27 4.3.1 資料蒐集............................................................................................ 28 4.3.2 資料前處理........................................................................................ 30 4.3.3 序列樣式探勘及分類探討................................................................ 32 4.4 實驗結果與討論........................................................................................... 32 4.4.1 XAMJ project...................................................................................... 33 4.4.2 Sqlexplorer project .............................................................................. 34 4.4.3 Findbugs project.................................................................................. 36 4.4.4 OpenXava project ............................................................................... 38 第五章 結論與未來研究方向 ................................................................................... 40 5.1 結論............................................................................................................... 40 5.2 未來研究方向............................................................................................... 42 參考文獻...................................................................................................................... 43 中文參考文獻...................................................................................................... 43 英文參考文獻...................................................................................................... 43 網頁資料.............................................................................................................. 47

    參考文獻
    中文參考文獻
    1. 林雅鈞、民90,物件導向設計之版本變更差異分析研究,中華大學資訊工程
    研究所碩士論文。
    2. 陳仕昇、民92,序列樣式探勘之研究,中央大學資訊管理研究所博士論文。
    3. 洪菁憶、民97,循序探勘在軟體版本控制上的應用,中央大學資訊管理研究
    所碩士論文。
    英文參考文獻
    4. Agrawal, R., Lin, K., Sawhney, H. S., and Shim, K. 1995. “Fast Similarity
    Search in the Presence of Noise, Scaling, and Translation in Time-Series
    Databases.” In Proceedings of the 21th international Conference on Very Large
    Data Bases (September 11 - 15, 1995).
    5. Ball, T., Kim, J.M., Porter, A.A. and Siy, H.P., “If your version control system
    could talk....” In ICSE Workshop on Process Modeling and Empirical Studies of
    Software Engineering, 1997.
    6. Bieman, J.M., Andrews, A.A. and Yang, H.J., “Understanding change-proneness
    in OO software through visualization.” In Proc. 11th International Workshop on
    Program Comprehension. Portland, Oregon, 2003; 44–53.
    7. Breu, S., Zimmermann, T., and Lindig, C. 2006. “Mining eclipse for
    cross-cutting concerns.” In Proceedings of the 2006 international Workshop on
    Mining Software Repositories (Shanghai, China, May 22 - 23, 2006). MSR ''06.
    ACM, New York, NY, 94-97.
    8. Burch, M., Diehl, S. and Weisgerber, P., “Visual data mining in software
    archives.” Proceedings ACM Symposium on Software Visualization (SoftVis’05).
    ACM Press: New York NY, 2005; 37–46.
    9. Canfora, G., Cerulo, L. and Di Penta, M., “Identifying Changed Source Code
    Lines from Version Repositories.” Proc. of the 4th International Workshop on
    Mining Software Repositories, Minneapolis, Minnesota, IEEE Computer Society
    Press, 2007.
    10. Chao Liu, Xifeng Yan, Long Fei, Jiawei Han and Samuel Midkiff, "SOBER:
    Statistical Model-based Bug Localization", the 5th joint meeting of the European
    Software Engineering Conference and ACM SIGSOFT Symposium on the
    Foundations of Software Engineering, pp. 286-295, Lisbon, Portugal, Sept. 2005.
    11. Chen, A., Chou, E., Wong, J., Yao, A.Y., Zhang, Q., Zhang S. and Michail, A.,
    “CVSSearch: Searching through source code using CVS comments.”
    Proceedings 17th IEEE International Conference on Software Maintenance
    (ICSM’01). IEEE Computer Society Press: Los Alamitos CA, 2001; 364–373.
    12. Dallmeier, V., Lindig, C., and Zeller, A. 2005. “Lightweight bug localization with
    AMPLE.” In Proceedings of the Sixth international Symposium on Automated
    Analysis-Driven Debugging (Monterey, California, USA, September 19 - 21,
    2005). AADEBUG''05. ACM, New York, NY, 99-104.
    13. Gall, H., Hajek, K. and Jazayeri, M., “Detection of logical coupling based on
    product release history.” Proceedings 14
    th
    IEEE International Conference on
    Software Maintenance. IEEE Computer Society Press: Los Alamitos CA, 1998;
    190–199.
    14. Hassan, A.E. and Holt, R.C., “Predicting change propagation in software
    systems.” Proceedings 20
    th
    IEEE International Conference on Software
    Maintenance. IEEE Computer Society Press: Los Alamitos CA. 2004; 284–293.
    15. Hassan, A.E., “The road ahead for Mining Software Repositories,” Frontiers of
    Software Maintenance. 2008; 48 - 57.
    16. Kagdi, H., Yusuf, S. and Maletic, J.I., “Mining sequences of changed-files from
    version histories.” Proceedings 3
    rd
    International Workshop on Mining Software
    Repositories. ACM Press: New York NY. 2006; 47–53.
    17. Kagdi, H., Collard, M.L. and Maletic, J.I., “A survey and taxonomy of
    approaches for mining software repositories in the context of software
    evolution.” Journal of Software Maintenance and Evolution: Research and
    Practice. 2007; 19(2): 77-131.
    18. Kagdi, H., Collard, M.L., and Maletic, J.I., “Comparing Approaches to Mining
    Source Code for Call-Usage Patterns.” Proceedings of 4th International
    Workshop on Mining Software Repositories, Minneapolis, MN, 2007; 123 - 130.
    19. Kawaguchi, S., Garg, P. K., Matsushita, M., and Inoue, K. 2004. “MUDABlue:
    An Automatic Categorization System for Open Source Repositories.”
    In Proceedings of the 11th Asia-Pacific Software Engineering
    Conference (November 30 - December 03, 2004). APSEC. IEEE Computer
    Society, Washington, DC, 184-193.
    20. Mandelin, D., Xu, L., Bodik, R., and Kimelman, D. 2005. Jungloid mining:
    helping to navigate the API jungle. SIGPLAN Not. 40, 6 (Jun. 2005), 48-61.
    21. Michail, A. and Xie, T., “Helping users avoid bugs in GUI
    applications,” Software Engineering, 2005. ICSE 2005. Proceedings. 27th
    International Conference on , vol., no., pp. 107-116, 15-21 May 2005.
    22. Pei, J., Han, J.W., Mortazavi-Asl, B., Pinto, H., Chen, Q.M., Dayal, U., Hsu,
    M.C., “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected
    Pattern Growth,” Data Engineering, International Conference on, pp. 0215, 17th
    International Conference on Data Engineering (ICDE''01), 2001.
    23. Xie, T. and Notkin, D., “Automatically Identifying Special and Common Unit
    Tests for Object-Oriented Programs.” In Proceedings of the 16th IEEE
    International Symposium on Software Reliability Engineering(ISSRE 2005),
    Chicago, Illinois, USA, pp. 277-287, November 2005.
    24. Xie, T. and Pei, J. 2006. MAPO: mining API usages from open source
    repositories. In Proceedings of the 2006 international Workshop on Mining
    Software Repositories (Shanghai, China, May 22 - 23, 2006). MSR ''06. ACM,
    New York, NY, 54-57.
    25. Yang, J., Wang, W., “CLUSEQ: Efficient and Effective Sequence
    Clustering,” Data Engineering, International Conference on, pp. 101, 19th
    International Conference on Data Engineering (ICDE''03), 2003.
    26. Ying, A.T.T., Murphy, G.C., Ng, R. and Chu-Carroll, M.C., “Predicting source
    code changes by mining change history.” IEEE Transactions on Software
    Engineering, 2004; 30(9):574–586.
    27. Zimmermann, T. and Weisgerber, P., “Preprocessing CVS Data For Fine-Grained
    Analysis.” Proc. Mining Software Repositories, 2004; 2-6.
    28. Zimmermann, T., Weisgerber, P., Diehl, S. and Zeller, A., “Mining version
    histories to guide software changes.” Proceedings 26
    th
    International Conference
    on Software Engineering (ICSE’04). IEEE Computer Society Press: Los
    Alamitos CA, 2004; 563–572.
    29. Zimmermann, T., Zeller, A., Weisgerber, P. and Diehl, S., “Mining version
    histories to guide software changes.” IEEE Transactions on Software
    Engineering, 2005; 31(6):429–445.
    30. Zimmermann, T., Kim, S., Whitehead, E.J. Jr. and Zeller, A., “Mining Version
    Archives for Co-changed Lines.” In Proceedings of the Third International
    Workshop on Mining Software Repositories, Shanghai, China, 2006, 72 - 75.
    網頁資料
    31. DMSE, http://ase.csc.ncsu.edu/dmse/
    32. Microsoft 時序群集演算法技術參考,
    http://msdn.microsoft.com/zh-tw/library/cc645866.aspx
    33. Sourceforge.net, http://sourceforge.net/index.php

    QR CODE
    :::