ERP日誌分析-以A公司為例｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	吳志能 Zhi-Neng Wu
論文名稱：	ERP日誌分析-以A公司為例
指導教授：	柯士文
口試委員:
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理學系在職專班 Executive Master of Information Management
論文出版年：	2020
畢業學年度：	108
語文別：	中文
論文頁數：	74
中文關鍵詞：	大數據分析
外文關鍵詞：	Audit Of Accountant, big data analysis tool
相關次數：	點閱：16 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本研究個案公司現在所面臨問題，如同其他家公司類似問題。投資可觀金錢,人力、物力只為保存一堆不知如何使用的日誌。為了只是內外部稽核證明的確依法規執行。真正要從這些日誌中找出有用資訊卻不知如何下手。市面上大數據商業用分析軟體除了價錢昂貴，再者數據可能需要給廠商進行分析，這相關個資是企業最不想流露。

近幾年，科技創新是為帶動經濟成長和國家進步的主要動力。尤其人工智慧。正在快速改變全球的產業發展，是未來的重要趨勢。台灣為了與世界科技同步，已選定AI為我國下世代的發展主力計畫。個案公司發現同業中已開始佈局，不論是人才培訓或硬體投資皆以起步。勢必開始對這塊領域有所涉略，畢竟科技軍備競賽是沒有停止的一天。

那要收集那些數據，有了數據那下一步要分析什麼？企業營運是需資金所支持，每一次的人力、物力投資都是成本，投入的成本必需有利益產出，產出的利益如果有其發展性才會有更進一步的投資計畫。個案公司想到公司每年面對外部大大小稽核，如客戶稽核、海關稽核和ISO14000資訊安全管理稽核。這些稽核有相同的項目都是針對關鍵營運系統，如ERP。舉凡稽核人員都會問到帳號使用紀錄。這反映出公司內控是否有做好。依據過往紀錄，常常是缺失結尾。這些問題的答案就在這些成千萬筆的紀錄中，但偏偏面對稽核員稽核當下往往拿不出來。後續接踵而來是一連串改善缺失計畫，這到還是輕微，有些歐美客戶認為這是重大缺失，直接取消訂單，這損失實在無法估計。

個案公司決定以收集ERP 日誌資料為主。對外面對稽核單位提出跟ERP 使用者記錄相關問題，答案能從這些資料擷取出來。稽核零缺失帶來實質效益訂單穩定，公司聲譽高標準。對內管理者目前是無法得知系統真實上線人數，針對下一年年度帳號預算規劃，帳號追加購買這一項往往無法有一數據輔助決策。買多是浪費開銷，帳號不足使用者抱怨連連，延遲公司日常作業。基於上述兩項原因，個案公司預期大數據分析目前帳號使用分佈能帶來實質效益，打聽同業已經執行大數據分析專案過程，另外公司在最近一年來聘請如工研院或大學教授幫同仁上大數據相關課程。發現都是使用Python作為大數據分析工具，一來開源無須再付出額外授權費用，再者相關套件成熟。業界使用頻繁，容易找到參考範例，彼此溝通有相同語言。決定使用Python 作為此次日誌分析工具。

Recent year, the software and hardware of Artificial Intelligence technology improved quickly. Each company tries to collect any data they have. For example, manufacturing try to collect the data of the mechanism. The raw data collected that have more than one hundred Terabytes. The data just collected finish. These collected dates not clean transfers of meaningful data, they cannot improve business income. These companies consider using Commercial software to analyze the big data they have. The much Commercial software license cost and data expose issues that these companies have more concern.

Our research focus on ERP Audit logs collect and analysis. The company that we study recent year faced Audit Of Accountant. The auditor asks questions about the security issue of the ERP System. For example, the ERP system users login time, from which machine, and they make any changes in the ERP system. The questions above can be found correct answer in their ERP audit logs. The audit logs recorded that raw data about the mention questions before. The company does not know how to use the raw data to get answers for the Audit Of Accountant. Due to the issue the company gets audit punishment.

The company decides to use open source of the big data to solve the audit issue. They try to analyze the audit log raw to get some good results. For example, current online ERP system users, user login from which computer, the counts of the ERP program used. The company reference others companies have already used big data to reduce daily work. Finally, the company tries to use open source 'Python' to the main big data analysis tool. Python become the most popular big data analysis tool recent year. The Python communities more and the Python package good for analysis big data.

目錄
中文摘要    i
英文摘要    ii
致謝    iii
目錄    iv
圖目錄    v
表目錄    vi
第1章    緒論    1
1.1    研究背景    1
1.2    研究動機    1
1.3    研究目的    2
1.4    論文架構    3
第2章    文獻探討    5
2.1    文獻探討    5
2.1.1    審計準則參考    5
2.1.2    國內論文參考    6
2.2    資料分析流程    8
2.3    特徵工程    10
2.3.1    何謂特徵工程    10
2.3.2    特徵工程重要性    12
2.4    Feature extraction(特徵提取)    13
2.4.1    Numeric Data(數值資料)    13
2.4.2    Categorical Data(類別資料)    15
2.4.3    Temporal Data(日期資料)    18
2.4.4    Text Data(文字資料)    19
2.5    Feature Scaling(特徵值縮放)    20
2.5.1    為什麼要做特徵值縮放    20
2.5.2    特徵值縮放適用演算法    22
2.5.3    特徵值縮放的方法    23
2.5.3.1    資料正規化(Normalization)    23
2.5.3.2    分數標準化(Z-Score Standardization)    24
2.5.3.3    資料正規化和分數標準化差異性    25
2.6    本章節總結    25
第3章    研究方法    26
3.1    研究架構    26
3.2    研究方法與人員訪談    26
3.3    需求彙整    27
3.4    研究步驟    29
3.5    研究範圍及限制    35
第4章    結果與分析    37
4.1    需求一結果分析    37
4.2    需求二結果分析    40
4.3    需求三結果分析    44
4.4    需求四結果分析    51
第5章    結論    59
5.1    研究結論    59
5.2    研究貢獻    60
5.3    未來展望    61
參考文獻    62

圖目錄
圖 2 1 CRISP-DM    8
圖 2 2 CRISP-DM steps    9
圖 2 3 machine learning pipeline    9
圖 2 4 Data science pipeline    10
圖 2 5 A generic dataset snapshot    11
圖 2 6 Data preparation    12
圖 2 7 Binning(數據分箱)    14
圖 2 8 Log Transform    15
圖 2 9 Shirt size as an ordinal categorical attribute    17
圖 2 10 One Hot Encoding    18
圖 2 11 原始日期資料    19
圖 2 12 依時間特徵值區分    19
圖 2 13 Text Data Raw Data    20
圖 2 14 Text Data Split Title    20
圖 2 15 A graphical representation of Euclidean Distance    21
圖 2 16 Euclidean distance    21
圖 2 17 Andrew Ng機器學習課程內的特徵縮放    22
圖 2 18 資料正規化(Normalization)    23
圖 2 19 分數標準化(Z-Score Standardization)    24
圖 3 1 事件所占百分比    30
圖 3 2 事件實際筆數    31
圖 3 3 稽核日誌表格    32
圖 3 4 使用者登入帳號    33
圖 3 5 (研究架構圖)    35
圖 4 1 每月數據量統計圖    38
圖 4 2 日數據量統計圖    38
圖 4 3 週數據量統計圖    39
圖 4 4 月數據大小統計圖    39
圖 4 5 12月每日數據量統計圖    41
圖 4 6 12月每週一當日日誌數據量統計圖    42
圖 4 7 12月第一個週一Loading    42
圖 4 8 1月第一個週一Loading    43
圖 4 9 12月份使用帳號及登入電腦數    44
圖 4 10 每月使用帳號登入次數分群統計    45
圖 4 11 12月份ERP帳號依廠區使用    46
圖 4 12 12月ERP帳號廠區使用帳號統計    47
圖 4 13 12月份ERP電腦數依廠區使用    48
圖 4 14 12月份ERP電腦數使用統計    49
圖 4 15 每月帳號超過20台電腦以上登入    50
圖 4 16 每月帳號無使用清單    50
圖 4 17 12月份依廠區登入次數分群    51
圖 4 18 12月份依廠區登入次數_11廠    52
圖 4 19 12月份依廠區登入次數_21廠    52
圖 4 20 12月份依廠區登入次數_22廠    53
圖 4 21 12月份依廠區登入次數_11廠_清單    53
圖 4 22 帳號1103MP001使用者分析    54
圖 4 23 帳號1103MP001使用者分析_清單一    55
圖 4 24 帳號1103MP001使用者分析_清單二    55
圖 4 25 帳號1103MP001使用者分析_清單三    56

表目錄
表格 1 Nominal Features(名目特徵)    16
表格 2 Ordinal Features(順序特徵)    17
表格 3 年齡和年所得資料    21
表格 4 資料正規化(Normalization)    23
表格 5 分數標準化(Z-Score Standardization)    24
表格 6 個案訪談內容    27
表格 7 事件欄位說明    32


                                

Abhishek Kathuria. (2019年1月10日). Methods and Uses of Feature Scaling. 擷取自 Medium: https://medium.com/datadriveninvestor/methods-and-uses-of-feature-scaling-94a44b43181a
AkinfaderinWale. (2017年9月12日). Missing Data Conundrum: Exploration and Imputation Techniques. 擷取自 https://medium.com/ibm-data-science-experience/missing-data-conundrum-exploration-and-imputation-techniques-9f40abe0fd87
ANIRUDDHA BHANDARI. (2020年4月3日). Understanding the Difference Between Normalization vs. Standardization. 擷取自 Analyticsvidhya: https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/
BeglePeter. (2017年9月25日). How I scored in the top 9% of Kaggle’s Titanic Machine Learning Challenge. 擷取自 Medium: https://medium.com/i-like-big-data-and-i-cannot-lie/how-i-scored-in-the-top-9-of-kaggles-titanic-machine-learning-challenge-243b5f45c8e9
BrownleeJason. (2019年9月27日). Discover Feature Engineering, How to Engineer Features and How to Get Good at It. 擷取自 machinelearningmastery: https://machinelearningmastery.com/discover-feature-engineering-how-to-engineer-features-and-how-to-get-good-at-it/
ChallengeScience Nigeria 2019Data. (2019年11月19日). Data Science Nigeria 2019 Challenge. 擷取自 zindi: https://zindi.africa/competitions/data-science-nigeria-2019-challenge-1-insurance-prediction/data
CRISP-DM的六個階段. (2018年07月15日). 擷取自 https://codertw.com/%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/411709/
DOMINGOSPEDRO. (2012年10月). A Few Useful Things to Know About Machine Learning. 擷取自 https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
HuangPat. (2019年4月25日). 初學Python手記#3-資料前處理( Label encoding、 One hot encoding). 擷取自 Medium: https://medium.com/@PatHuang/%E5%88%9D%E5%AD%B8python%E6%89%8B%E8%A8%98-3-%E8%B3%87%E6%96%99%E5%89%8D%E8%99%95%E7%90%86-label-encoding-one-hot-encoding-85c983d63f87
Kaggle Pokemon. (2018). Kaggle Pokemon Dataset. 擷取自 https://www.kaggle.com/poornimashanbhag/pokemoncsv
Kaggle Titanic. (2018). Titanic: Machine Learning from Disaster. 擷取自 https://www.kaggle.com/c/titanic/data
Maher Deeb. (2019年6月14日). Feature Engineering — Automation and Evaluation — Part 1. 擷取自 Medium: https://medium.com/ki-labs-engineering/feature-engineering-automation-and-evaluation-part-1-a34fb42e0bd4
MoffittChris. (2014年12月29日). Pandas Pivot Table Explained. 擷取自 Practical Business Python: https://pbpython.com/pandas-pivot-table-explained.html
OdeguaRising. (2019年12月3日). Easy Data Analysis, Visualization and Modeling using Datasist (PART 1). 擷取自 Towards Data Science: https://towardsdatascience.com/https-medium-com-risingdeveloper-easy-data-analysis-visualization-and-modeling-using-datasist-part1-8b26526dbe01
Paul JermynDixon, Brian J ReadMaurice. (1999). Preparing Clean Views of Data for Data Mining. 擷取自 https://www.ercim.eu/publication/ws-proceedings/12th-EDRG/EDRG12_JeDiRe.pdf
PressGil. (2016年5月23日). Cleaning Big Data: Most Time-Consuming. 擷取自 Forbes : https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#698c46ee6f63
Quinn-Yann. (2018年10月8日). Feature Scaling: Normalization and Standardization. 擷取自 https://www.cnblogs.com/quinn-yann/p/9808247.html
RodriguesIsrael. (2020年02月17日). CRISP-DM methodology leader in data mining and big data mining and big data. 擷取自 towardsdatascience: https://towardsdatascience.com/crisp-dm-methodology-leader-in-data-mining-and-big-data-467efd3d3781
SAPDocumentation. (無日期). SAP Documentation. 擷取自 SAP Documentation: https://help.sap.com/saphelp_nwmobile711/helpdata/en/4d/41bf80aa601c86e10000000a42189b/frameset.htm
Sarkar(DJ)Dipanjan. (2018年1月6日). Categorical Data. 擷取自 https://towardsdatascience.com/understanding-feature-engineering-part-2-categorical-data-f54324193e63
Sarkar(DJ)Dipanjan. (2018年1月5日). UNDERSTANDING FEATURE ENGINEERING (PART 1). 擷取自 towardsdatascience: https://towardsdatascience.com/understanding-feature-engineering-part-1-continuous-numeric-data-da4e47099a7b
WickhamHadley. (2014年9月12日). Tidy Data. 擷取自 https://vita.had.co.nz/papers/tidy-data.pdf
Yeh James. (2017年10月10日). 資料前處理(Missing data, One-hot encoding, Feature Scaling). 擷取自 https://medium.com/jameslearningnote/%E8%B3%87%E6%96%99%E5%88%86%E6%9E%90-%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-%E7%AC%AC2-4%E8%AC%9B-%E8%B3%87%E6%96%99%E5%89%8D%E8%99%95%E7%90%86-missing-data-one-hot-encoding-feature-scaling-3b70a7839b4a
吳琮璠. (2016). 審計學. 元照出版.
國內審計準則公報之審計準則-第二十五號-內部稽核工作之採用. (無日期). 內部稽核工作之採用. 擷取自國內審計準則公報: http://dss.ardf.org.tw/ardf/au25.pdf
國內審計準則公報之審計準則-第二號-風險評估與內部控制-電腦資訊系統特點與考量. (無日期). 風險評估與內部控制--電腦資訊系統特點與考量. 擷取自國內審計準則公報: http://dss.ardf.org.tw/ardf/au002.pdf
國立中正大學會計與資訊科技研究所碩士論文-陳嬌鶴. (2009). 運用電腦稽核技術進行舞弊查核之證研究-以製造業為例. 國立中正大學會計與資訊科技研究所碩士論文, (頁 111).
國立高雄科技大學會計資訊系-林德雄. (2018). 電腦通用稽核軟體實作–利用開源軟體 Python. 國立高雄科技大學會計資訊系碩士論文, (頁 158).
維基百科. (2019年9月9日). One-hot. 擷取自維基百科: https://zh.wikipedia.org/wiki/One-hot
維基百科. (2020年3月6日). 四分位距. 擷取自維基百科 : https://zh.wikipedia.org/wiki/%E5%9B%9B%E5%88%86%E4%BD%8D%E8%B7%9D

簡易檢索 / 詳目顯示

相關論文