跳到主要內容

簡易檢索 / 詳目顯示

研究生: 葉庭
Ting Yeh
論文名稱: 探究強化學習與停止策略於活動來源頁面探勘之設計
On the Design of RL Algorithms and Termination Strategies for Focused Crawling - A case study for Event Source Page Discovery
指導教授: 張嘉惠
Chia-Hui Chang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 44
中文關鍵詞: 強化式學習網頁探勘多任務學習
外文關鍵詞: Reinforcement learning, web mining, Multitask Learning
相關次數: 點閱:16下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究旨在開發一個智能的爬蟲系統,以收集活動來源網頁的資訊。我們的目標是希望能節省使用者在瀏覽器尋找活動的時間,並提供結構化的活動資訊,以滿足現代人尋找當地特色活動的需求。在我們先前的工作中,我們使用了基於強化學習的策略梯度方法進行活動源網頁的挖掘。然而,我們發現兩階段訓練存在兩個問題:第一階段僅能使用固定步伐進行訓練,第二階段的微調訓練效能沒有顯著提升。為了改進這些問題,我們希望在初始訓練階段就能通過可變動步伐的方式控制回合的停止。這樣能夠提供更靈活的訓練,以適應不同的場景和環境變化,並改善模型的性能和結果。為了實現這一目標,我們設計了資產控制的停止策略,並且採用不同的強化式學習演算法。同時,我們將原本的兩階段訓練框架定義得更加嚴謹,將訓練策略擴展為四種不同的方法。通過與先前的工作進行比較,我們想要確定新設計的停止策略是否能夠降低點擊成本,並且確定在應用不同的強化學習算法後,選擇最適合我們任務的方法。最終,我們也想選擇最適合我們任務的訓練策略。結果顯示,我們新設計的停止策略在DQN算法中實現了更低的點擊成本和更高的性能。點擊成本從1.4%降低到1.2%,性能從72%提升到78.2%。比較不同的訓練策略後,我們得出結論,通過使用標記數據和給予正確答案的獎勵函數對於我們的任務更加適合。


    The purpose of this study is to develop an intelligent web crawler system that collects information from activity source web pages. Our objective is to save users' time in browsing for activities and provide structured activity information to fulfill the modern demand for finding local distinctive activities. In our previous work, we utilized a reinforcement learning-based policy gradient method for activity source web page mining. However, we identified two issues with the two-stage training process: firstly, the first stage only allowed training with a fixed step size, and secondly, the fine-tuning in the second stage did not have significant improvement. To address these problems, we aim to control episode termination using a variable step size during the initial training phase. This approach would provide more flexibility in training to adapt to different scenarios and environmental changes, thereby improving the performance and outcomes of the model. To achieve this goal, we introduced an asset control stopping strategy and employed different reinforcement learning algorithms. Moreover, we redefined the original two-stage training framework, expanding the training strategies to four different methods. By comparing the results with our previous work, we aimed to determine if the newly designed stopping strategy could reduce click costs and identify the most suitable method after applying various reinforcement learning algorithms. Ultimately, we aimed to select the training strategy that best suited our task. The results showed that our newly designed stopping strategy achieved lower click costs and higher performance in the DQN algorithm. The click cost decreased from 1.4% to 1.227%, and the performance improved from 72% to 78.2%. After comparing different training strategies, we concluded that using labeled data and a reward function that incorporates correct answers is more suitable for our task.

    摘要 i Abstract ii 致謝 iii 目錄 iv 圖目錄 v 表目錄 vi 一、 緒論 1 1-1 問題描述 1 1-2 動機與研究目標 2 1-3 貢獻 2 二、 相關研究 4 2-1 焦點式爬蟲 (Focused Crawler) 4 2-2 強化學習 (Reinforcement Learning) 4 2-3 焦點式爬蟲結合深度強化學習 5 三、 方法 7 3-1 任務簡述 7 3-2 狀態集特徵擷取 7 3-3 強化學習訓練策略 7 3-4 模型與方法 9 3-4-1 模型介紹 10 3-4-2 停止策略方法 13 3-4-3 DQN/Deep SARSA 訓練方法與演算法 13 3-4-4 Policy Gradient 訓練方法與演算法 18 3-4-5 Advantage Actor-Critic 訓練方法與演算法 20 四、 實驗 22 4-1 資料集蒐集與標記規則 22 4-2 實驗與效能分析 23 4-2-1 評估方式 23 4-2-2 資產方法的設置 23 4-2-3 不同停止策略在不同演算法的比較 25 4-2-4 訓練策略 D 在不同乎強化學習演算法之比較 27 4-2-5 消融實驗 29 五、 結論 31 參考文獻 32

    [1] Chia-Hui Chang, Yu-Ching Liao, and Ting Yeh. Event source page discovery via
    policy-based rl with multi-task neural sequence model. In Web Information Systems Engineering–WISE 2022: 23rd International Conference, Biarritz, France,
    November 1–3, 2022, Proceedings, pages 597–606. Springer, 2022.
    [2] Cheng Li, Michael Bendersky, Vijay Garg, and Sujith Ravi. Related event
    discovery. In Proceedings of the Tenth ACM International Conference on Web
    Search and Data Mining, pages 355–364, 2017.
    [3] John Foley, Michael Bendersky, and Vanja Josifovski. Learning to extract local
    events from the web. In Proceedings of the 38th International ACM SIGIR
    Conference on Research and Development in Information Retrieval, pages 423–
    432, 2015.
    [4] Qifan Wang, Bhargav Kanagal, Vijay Garg, and D Sivakumar. Constructing a
    comprehensive events database from the web. In Proceedings of the 28th ACM
    International Conference on Information and Knowledge Management, pages
    229–238, 2019.
    [5] Alberto HF Laender, Berthier A Ribeiro-Neto, Altigran S Da Silva, and Juliana S Teixeira. A brief survey of web data extraction tools. ACM Sigmod
    Record, 31(2):84–93, 2002.
    [6] Chia-Hui Chang, Mohammed Kayed, Moheb R Girgis, and Khaled F Shaalan. A
    survey of web information extraction systems. IEEE transactions on knowledge
    and data engineering, 18(10):1411–1428, 2006.
    [7] Emilio Ferrara, Pasquale De Meo, Giacomo Fiumara, and Robert Baumgartner.
    Web data extraction, applications and techniques: A survey. Knowledge-based
    systems, 70:301–323, 2014.
    [8] Chia-Hui Chang and Shao-Chen Lui. Iepad: Information extraction based on
    pattern discovery. In Proceedings of the 10th international conference on World
    Wide Web, pages 681–688, 2001.
    [9] Arati Manjaramkar and Rahul L Lokhande. Depta: An efficient technique
    for web data extraction and alignment. In 2016 International Conference on
    Advances in Computing, Communications and Informatics (ICACCI), pages
    2307–2310. IEEE, 2016.
    [10] Gary William Flake, Steve Lawrence, and C Lee Giles. Efficient identification
    of web communities. In Proceedings of the sixth ACM SIGKDD international
    conference on Knowledge discovery and data mining, pages 150–160, 2000.
    [11] Banu Wirawan Yohanes, Handoko Handoko, and Hartanto Kusuma Wardana. Focused crawler optimization using genetic algorithm. TELKOMNIKA
    (Telecommunication Computing Electronics and Control), 9(3):403–410, 2011.
    [12] Anshika Pal, Deepak Singh Tomar, and SC Shrivastava. Effective focused crawling based on content and link structure analysis. arXiv preprint
    arXiv:0906.5034, 2009.
    [13] Robert Meusel, Peter Mika, and Roi Blanco. Focused crawling for structured
    data. In Proceedings of the 23rd ACM International Conference on Conference
    on Information and Knowledge Management, pages 1039–1048, 2014.
    [14] Ioannis Partalas, Georgios Paliouras, and Ioannis Vlahavas. Reinforcement
    learning with classifier selection for focused crawling. In ECAI 2008, pages
    759–760. IOS Press, 2008.
    [15] Avi Singh, Larry Yang, Kristian Hartikainen, Chelsea Finn, and Sergey Levine.
    End-to-end robotic reinforcement learning without reward engineering. arXiv
    preprint arXiv:1904.07854, 2019.
    [16] Divyansh Garg, Shuvam Chakraborty, Chris Cundy, Jiaming Song, and Stefano
    Ermon. Iq-learn: Inverse soft-q learning for imitation. Advances in Neural
    Information Processing Systems, 34:4028–4039, 2021.
    [17] Tongtao Zhang, Heng Ji, and Avirup Sil. Joint entity and event extraction with
    generative adversarial imitation learning. Data Intelligence, 1(2):99–120, 2019.
    [18] Miyoung Han, Pierre-Henri Wuillemin, and Pierre Senellart. Focused crawling
    through reinforcement learning. In Web Engineering: 18th International Conference, ICWE 2018, Cáceres, Spain, June 5-8, 2018, Proceedings 18, pages
    261–278. Springer, 2018.
    [19] Yuanchun Li and Oriana Riva. Glider: A reinforcement learning approach to extract ui scripts from websites. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages
    1420–1430, 2021.
    [20] Shagun Sodhani, Amy Zhang, and Joelle Pineau. Multi-task reinforcement
    learning with context-based representations. In International Conference on
    Machine Learning, pages 9767–9779. PMLR, 2021.

    QR CODE
    :::