目录

1. A Data Mining Approach to Predict E-Commerce Customer Behaviour [中英文摘要]

2. A Hybrid Classification Method Based on Machine Learning Classifiers to Predict Performance in Educational Data Mining [中英文摘要]

3. A Process Mining Approach for Supporting IoT Predictive Security [中英文摘要]

4. A Review on Predicting Cardiovascular Diseases Using Data Mining Techniques [中英文摘要]

5. A data mining approach to predict academic performance of students using ensemble techniques [中英文摘要]

6. A data mining approach to predict falls in humanoid robot locomotion [中英文摘要]

7. A literature survey on various classifications of data mining for predicting heart disease [中英文摘要]

8. A survey of location prediction using trajectory mining [中英文摘要]

9. A wavelet analysis based data processing for time series of data mining predicting [中英文摘要]

10. Accurate and Transparent Path Prediction Using Process Mining [中英文摘要]

11. Accurate and Transparent Path Prediction Using Process Mining [中英文摘要]

12. Adaptive Routing Mechanism in SDN to Limit Congestion [中英文摘要]

13. An Efficient Association Rule Mining Method to Predict Diabetes Mellitus: KNHANES 2013–2015 [中英文摘要]

14. An association rule mining approach in predicting flood areas [中英文摘要]

15. “An investigation on educational data mining to analyze and predict the students academic performance using visualization” [中英文摘要]

16. Application of data mining software to predict the alum dosage in coagulation process: A case study of Vientaine, Lao PDR [中英文摘要]

17. “Application of optimum binning technique in data mining approaches to predict students final grade in a course” [中英文摘要]

18. “Apply of sum of difference method to predict placement of students using educational data mining” [中英文摘要]

19. Applying process mining to support management of predictive analytics/data mining projects in a decision making center [中英文摘要]

20. Artificial intelligence in medicine [中英文摘要]

21. Automatic learning of predictive CEP rules: Bridging the gap between data mining and complex event processing [中英文摘要]

22. Collective intelligent information and database systems [中英文摘要]

23. Comparative study of different data mining techniques in predicting forest fire in Lebanon and mediterranean [中英文摘要]

24. Completion Time and Next Activity Prediction of Processes Using Sequential Pattern Mining [中英文摘要]

25. Context-Aware Data Mining vs Classical Data Mining: Case Study on Predicting Soil Moisture [中英文摘要]

26. “DTFA Rule Mining-Based Model to Predict Students Performance” [中英文摘要]

27. “DTI Helps to Predict Parkinsons Patients Symptoms Using Data Mining Techniques” [中英文摘要]

28. Data Science for Urban Sustainability: Data Mining and Data-Analytic Thinking in the Next Wave of City Analytics [中英文摘要]

29. Data mining approach to predict and analyze the cardiovascular disease [中英文摘要]

30. Data mining approaches for packaging yield prediction in the post-fabrication process [中英文摘要]

31. Data mining to predict and prevent errors in health insurance claims processing [中英文摘要]

32. Decision tree approach to predict lung cancer the data mining technology [中英文摘要]

33. Dropout prediction in MOOCs: A comparison between process and sequence mining [中英文摘要]

34. Enabling Process Innovation via Deviance Mining and Predictive Monitoring [中英文摘要]

35. Ensemble predictive process mining-taking predictive process mining to the next level [中英文摘要]

36. Hybrid heuristics miner based on time series prediction for streaming process mining [中英文摘要]

37. In silico approaches to predict drug-transporter interaction profiles: Data mining, model generation, and link to cholestasis [中英文摘要]

38. Influence of Historical Meteorological Data Processing in a Mobile Application of Weather Prediction, based on Data Mining [中英文摘要]

39. Large scale predictive process mining and analytics of university degree course data [中英文摘要]

40. Long term temperature monitoring and thermal processes prediction within mining dumps [中英文摘要]

41. Machine learning paradigms [中英文摘要]

42. Medical data mining for heart diseases and the future of sequential mining in medical field [中英文摘要]

43. “Mining Student Information System Records to Predict Students Academic Performance” [中英文摘要]

44. “Mining educational data to predict students academic performance” [中英文摘要]

45. Mining predictive process models out of low-level multidimensional logs [中英文摘要]

46. Next step recommendation and prediction based on process mining in adaptive case management [中英文摘要]

47. On the advantage of using dedicated data mining techniques to predict colorectal cancer [中英文摘要]

48. On the use of process mining and machine learning to support decision making in systems design [中英文摘要]

49. Performance Analysis of Collaborative Data Mining vs Context Aware Data Mining in a Practical Scenario for Predicting Air Humidity [中英文摘要]

50. Pervasive decision support to predict football corners and goals by means of data mining [中英文摘要]

51. Predicting learner performance using data-mining techniques and ontology [中英文摘要]

52. Predicting nosocomial infection by using data mining technologies [中英文摘要]

53. Predicting preterm birth in maternity care by means of data mining [中英文摘要]

54. Predicting triage waiting time in maternity emergency care by means of data mining [中英文摘要]

55. Predicting user interaction in enterprise social systems using process mining [中英文摘要]

56. Prediction modeling for ingot manufacturing process utilizing data mining roadmap including dynamic polynomial neural network and bootstrap method [中英文摘要]

57. Prediction of customer movements in large tourism industries by the means of process mining [中英文摘要]

58. Proceedings of 2nd International Conference on Communication, Computing and Networking [中英文摘要]

59. Proceedings of the International Symposium for Production Research 2018 [中英文摘要]

60. Process inspection by attributes using predicted data [中英文摘要]

61. Pr{\"{a}}diktives monitoring von gesch{\"{a}}ftsprozessen zur beherrschung von risiken in weltweiten netzen auf basis von process mining und simulation [中英文摘要]

62. Queue mining - Predicting delays in service processes [中英文摘要]

63. Real-time decision support using data mining to predict blood pressure critical events in intensive medicine patients [中英文摘要]

64. Redesigning Learning for Greater Social Impact [中英文摘要]

65. Review of Machine Learning and Data Mining Methods to Predict Different Cyberattacks [中英文摘要]

66. “Rough Set Data Mining Algorithms and Pursuit Eye Movement Measurements Help to Predict Symptom Development in Parkinsons Disease” [中英文摘要]

67. Rule mining techniques to predict prokaryotic metabolic pathways [中英文摘要]

68. Smart Sustainable Cities of the Future [中英文摘要]

69. “Supervised Educational Data Mining to Discover Students Learning Process to Improve Students Performance” [中英文摘要]

70. Text mining models to predict brain deaths using X-Rays clinical notes [中英文摘要]

71. Traffic prediction in wireless mesh networks using process mining algorithms [中英文摘要]

72. A Hybrid Data Mining Model to Predict Coronary Artery Disease Cases Using Non-Invasive Clinical Data [中英文摘要]

73. A New Method of Predicting the Height of the Fractured Water-Conducting Zone Due to High-Intensity Longwall Coal Mining in China [中英文摘要]

74. A data- and ontology-driven text mining-based construction of reliability model to analyze and predict component failures [中英文摘要]

75. A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs [中英文摘要]

76. A new data mining-based framework to test case prioritization using software defect prediction [中英文摘要]

77. A panel of Transcription factors identified by data mining can predict the prognosis of head and neck squamous cell carcinoma [中英文摘要]

78. Activity prediction in process mining using the WoMan framework [中英文摘要]

79. Ambient Intelligence for Health [中英文摘要]

80. An empirical study of using sequential behavior pattern mining approach to predict learning styles [中英文摘要]

81. Application of a Conceptual Ecological Model to Predict the Effects of Sand Mining around Chilsan Island Group in the West Coast of Korea [中英文摘要]

82. Arabic Text Classification Based on Word and Document Embeddings [中英文摘要]

83. Biological Networks and Pathway Analysis [中英文摘要]

84. Chat mining: Predicting user and message attributes in computer-mediated communication [中英文摘要]

85. Computational and Statistical Methods in Intelligent Systems [中英文摘要]

86. Data Mining on MRI-Computational Texture Features to Predict Sensory Characteristics in Ham [中英文摘要]

87. Data mining service recommendation based on dataset features [中英文摘要]

88. Data mining-based flatness pattern prediction for cold rolling process with varying operating condition [中英文摘要]

89. Demand Side Management Using Bacterial Foraging Optimization Algorithm [中英文摘要]

90. Development of a Web-Based Radiation Toxicity Prediction System Using Metarule-Guided Mining to Predict Radiation Pneumonitis and Esophagitis in Lung Cancer Patients [中英文摘要]

91. Handling Concept Drift for Predictions in Business Process Mining [中英文摘要]

92. Integrating in-process software defect prediction with association mining to discover defect pattern [中英文摘要]

93. Intelligent Systems Design and Applications [中英文摘要]

94. Mining biometric data to predict programmer expertise and task difficulty [中英文摘要]

95. Mining expressed sequence tags of rapeseed (Brassica napus L.) to predict the drought responsive regulatory network [中英文摘要]

96. Mining software change data stream to predict changeability of classes of object-oriented software system [中英文摘要]

97. Mining the Situation: Spatiotemporal Traffic Prediction with Big Data [中英文摘要]

98. Mining usage scenarios in business processes: Outlier-aware discovery and run-time prediction [中英文摘要]

99. Mining user interaction patterns in the darkweb to predict enterprise cyber incidents [中英文摘要]

100. Multimedia services quality prediction based on the association mining between context and QoS properties [中英文摘要]

101. Optimization of MRI Acquisition and Texture Analysis to Predict Physico-chemical Parameters of Loins by Data Mining [中英文摘要]

102. Pattern-Based Opinion Mining for Stock Market Trend Prediction [中英文摘要]

103. Predicting Within-24h Visualisation of Hospital Clinical Reports Using Bayesian Networks [中英文摘要]

104. Predicting ground vibration induced by rock blasting using a novel hybrid of neural network and itemset mining [中英文摘要]

105. Predicting simulation parameters of biological systems using a Gaussian process model [中英文摘要]

106. Predicting the distribution of ground fissures and water-conducted fissures induced by coal mining: a case study [中英文摘要]

107. Predicting unusual energy consumption events from smart home sensor network by data stream mining with misclassified recall [中英文摘要]

108. Proceedings of 2016 SAI Intelligent Systems Conference (IntelliSys) [中英文摘要]

109. Proceedings of International Conference on Recent Advancement on Computer and Communication [中英文摘要]

110. Product Lifecycle Management for Digital Transformation of Industries [中英文摘要]

111. Quaternion Watershed Transform [中英文摘要]

112. Queue mining for delay prediction in multi-class service processes [中英文摘要]

113. Study of a roof water inrush prediction model in shallow seam mining based on an analytic hierarchy process using a grey relational analysis method [中英文摘要]

114. Time prediction based on process mining [中英文摘要]

115. Unbiased data mining identifies cell cycle transcripts that predict non-indolent Gleason score 7 prostate cancer 11 Medical and Health Sciences 1112 Oncology and Carcinogenesis [中英文摘要]

116. Use of data mining to predict significant factors and benefits of bilateral cochlear implantation [中英文摘要]

117. Workload Prediction of Business Processes – An Approach Based on Process Mining and Recurrent Neural Networks [中英文摘要]


摘要

[1] A Data Mining Approach to Predict E-Commerce Customer Behaviour (2019)

Abstract: “Abstract. With rapid advancements in industry, technology and applications, many concepts have emerged in the manufacturing environment. It is generally known that the term ‘Industry 4.0 was published to highlight a new industrial revolution. Maintenance is a key operation function since it is related to all the manufacturing processes and focuses not only on avoiding the equipment breakdown but also on improving business performance. Monitoring of manu- facturing systems for maintenance helps to identify equipment condition and failures before equipment breakdowns. Total Productive Maintenance (TPM) is widely used maintenance strategy to gain a competitive advantage in the industry. In this context, this paper focuses on the incompletely perceived link between the Industry 4.0s key technologies and TPM. Conjoint analysis, which is a multi-attribute decision making method based on experimental design, is implemented to quantify the impacts of Industry 4.0s key technologies on pillars of TPM. Moreover, interaction between key technologies of Industry 4.0 and pillars of TPM demonstrates several opportunities for achieving synergies thus leading to a successful implementation of future interconnected smart factories. The major contribution of is to provide a guideline and technology roadmap for investment decision for industries that are under the transformation phase towards future smart factory and offers a space for further scientific discussion.”

摘要: “摘要。随着工业,技术和应用的快速发展,制造环境中出现了许多概念。众所周知,“工业4.0”一词的发布是为了突出一场新的工业革命。维护是自从它与所有制造过程有关,不仅着眼于避免设备故障,而且着眼于提高业务绩效;监控制造系统以进行维护有助于在设备故障之前识别设备状况和故障。广泛使用的维护策略在行业中获得竞争优势,在此背景下,本文着重探讨了工业4.0关键技术与TPM之间的不完全联系,联合分析是一种基于多属性的决策方法。实验设计用于量化工业4.0关键技术对药丸的影响TPM的档案。此外,工业4.0的关键技术与TPM支柱之间的交互展示了实现协同增效的多种机会,从而成功实现了未来互连智能工厂的成功实施。的主要贡献在于,为处于向未来智能工厂的转型阶段的行业提供投资决策的指南和技术路线图,并为进一步的科学讨论提供了空间。”

下载地址 | 返回目录 | [10.1007/978-3-319-92267-6_3]

[2] A Hybrid Classification Method Based on Machine Learning Classifiers to Predict Performance in Educational Data Mining (2019)

Abstract: Machine learning algorithm can be applied in education data mining (EDM) to extract knowledge. Educational data mining is an important practice of automatic extraction and segmentation of useful information from the education data sources. This paper is focused on comparison and study of hybrid model of classification and machine learning algorithms based on decision tree, clustering, artificial neural network, Na{\"{i}}ve Bayes, etc. This paper introduces concepts of popular algorithm for new researchers of this area. The paper discusses hybrid classification model using machine learning algorithms using voting that can be used to analyze the performance of students. We have used open source data mining tool Weka for a practical experiment on data set of students that serve the purpose of prediction, classification, visualization, etc. The findings of this paper reveal that hybrid method of classification are more efficient for prediction of student-related data.

摘要: 机器学习算法可以应用于教育数据挖掘(EDM)中以提取知识。教育数据挖掘是从教育数据源自动提取和分割有用信息的重要实践。本文致力于基于决策树,聚类,人工神经网络,Na {\“ {i}} ve贝叶斯等的分类和机器学习算法的混合模型的比较和研究。该领域的新研究人员。本文讨论了使用机器学习算法和投票技术的混合分类模型,该模型可用于分析学生的表现。我们使用开源数据挖掘工具Weka对服务于学生的学生数据集进行了实际实验。目的是进行预测,分类,可视化等。本文的研究结果表明,分类的混合方法对于预测与学生相关的数据更为有效。

下载地址 | 返回目录 | [10.1007/978-981-13-1217-5_67]

[3] A Process Mining Approach for Supporting IoT Predictive Security (2020)

Abstract: The growing interest for the Internet-of-Things (IoT) is supported by the large-scale deployment of sensors and connected objects. These ones are integrated with other Internet resources in order to elaborate more complex and value-added systems and applications. While important efforts have been done for their protection, security management is a major challenge for these systems, due to their complexity, their heterogeneity and the limited resources of their devices. In this paper we introduce a process mining approach for detecting misbehaviors in such systems. It permits to characterize the behavioral models of IoT-based systems and to detect potential attacks, even in the case of heterogenous protocols and platforms. We then describe and formalize its underlying architecture and components, and detail a proof-of-concept prototype. Finally, we evaluate the performance of this solution through extensive experiments based on real industrial datasets.

摘要: 对物联网(IoT)的日益增长的兴趣得到了传感器和连接对象的大规模部署的支持。这些系统与其他Internet资源集成在一起,以完善更为复杂和增值的系统和应用程序。尽管已经对其保护做出了重要努力,但是由于这些系统的复杂性,异构性和设备资源的有限性,安全管理仍然是这些系统的主要挑战。在本文中,我们介绍了一种过程挖掘方法来检测此类系统中的不良行为。即使在异构协议和平台的情况下,它也可以表征基于物联网的系统的行为模型并检测潜在的攻击。然后,我们描述并形式化其基础体系结构和组件,并详细说明概念验证原型。最后,我们通过基于实际工业数据集的广泛实验来评估该解决方案的性能。

下载地址 | 返回目录 | [10.1109/NOMS47738.2020.9110411]

[4] A Review on Predicting Cardiovascular Diseases Using Data Mining Techniques (2020)

Abstract: The main objective of the work is to analyze various data mining techniques in the health care field that can be employed in “predicting cardiovascular diseases and their efficient diagnosis. Cardiovascular system is the first organ system to become fully functional in uterus. Cardiovascular diseases is one of the major common diseases that cause death all around worldwide. People of all ages are affected by this disease particularly the elderly people. It is essential to predict the group of people commonly affected and identifying risk factors like age, sex, lifestyle that will be helpful in early diagnosing and prevention of heart diseases. At present huge number of people are pretentious to heart diseases and hence it is quite difficult to predict accurately. Proper mining methods can save enormous number of people from mortality due to heart diseases. This paper analyses various types of heart disease and prediction techniques used in heart disease prediction.

摘要: 工作的主要目的是分析医疗领域的各种数据挖掘技术,这些技术可用于“预测心血管疾病及其有效诊断。心血管系统是第一个在子宫内完全发挥功能的器官系统。心血管疾病是导致全世界死亡的主要常见疾病之一。所有年龄段的人都受到这种疾病的影响,尤其是老年人。预测经常受影响的人群并确定诸如年龄,性别,生活方式等危险因素将对早期诊断和预防心脏病至关重要。目前,许多人自命不凡,因此很难准确预测。正确的采矿方法可以使大量人免于因心脏病而死亡。本文分析了各种类型的心脏病以及用于心脏病预测的预测技术。

下载地址 | 返回目录 | [10.1007/978-3-030-43192-1_43]

[5] A data mining approach to predict academic performance of students using ensemble techniques (2020)

Abstract: Recently, Educational Data Mining (EDM), emerged as a new area of research due to the enlargement of various statistical methods used to explore data in educational settings. One of the applications of EDM is the prediction of student performance. The application of Data Mining methods in an educational setting is able to discover some hidden knowledge and patterns which will help in decision making for administrators for enhancing the educational system. In a web based education system, the behavioral features of learners is very significant in showing the interaction between students and the LMS. In this paper, our aim is to propose a new performance prediction model for students which is based on data mining methods which includes new features known as behavioral features of students. The proposed predictive model is evaluated using classifiers like Na{\"{i}}ve Bayesian (NB), Decision Tree (DT), K-Nearest Neighbor (KNN), Discriminant Analysis (Disc) and Pairwise Coupling (PWC). Additionally, so as to enhance the classifiers performance, the ensemble methods such as AdaBoost, Bag and RUSBoost were used to enhance the accuracy of the performance model of the students. The achieved results shows that there exist a strong relationship between behavior of students and their academic performance. The accuracy of the proposed model achieved 84.2{\%} with behavioral features while it achieved 72.6{\%} without behavioral features. More so, an accuracy of 94.1{\%} was gotten when the ensemble methods were applied to the classifiers to improve the academic performance. Therefore the result gotten shows the reliability of the proposed model.

摘要: 最近,由于用于在教育环境中探索数据的各种统计方法的扩大,教育数据挖掘(EDM)成为一个新的研究领域。 EDM的应用之一是对学生成绩的预测。数据挖掘方法在教育环境中的应用能够发现一些隐藏的知识和模式,这将有助于管理人员进行决策以增强教育系统。在基于网络的教育系统中,学习者的行为特征在显示学生与LMS之间的互动方面非常重要。在本文中,我们的目的是基于数据挖掘方法,提出一种针对学生的新的绩效预测模型,该模型包括称为学生行为特征的新特征。拟议的预测模型是使用分类器(如Na {\“ {i}} ve贝叶斯(NB),决策树(DT),K最近邻(KNN),判别分析(Disc)和成对耦合(PWC)进行评估的。此外,为了提高分类器的性能,还采用了AdaBoost,Bag和RUSBoost等集成方法来提高学生的绩效模型的准确性,所获得的结果表明,学生的行为与他们的行为之间存在很强的关系。具有行为特征的模型的准确度达到84.2 {\%},而没有行为特征的模型的准确度达到72.6 {\%},使用集成方法时的准确度达到94.1 {\%}将其应用于分类器以提高学习成绩,因此得到的结果表明了该模型的可靠性。

下载地址 | 返回目录 | [10.1007/978-3-030-16657-1_70]

[6] A data mining approach to predict falls in humanoid robot locomotion (2016)

Abstract: The inclusion of perceptual information in the operation of a dynamic robot (interacting with its environment) can provide valuable insight about its environment and increase robustness of its behaviour. In this regard, the concept of Associative Skill Memories (ASMs) has provided a great contributions regarding an effective and practical use of sensor data, under a simple and intuitive framework 2, 13. Inspired by 2, this paper presents a data mining solution to the fall prediction problem in humanoid biped robotic locomotion. Sensor data from a large number of simulations was recorded and four data mining algorithms were applied with the aim of creating a classifier that properly identifies failure conditions. Using Support Vector Machines, on top of sensor data from a large number of simulation trials, it was possible to build an accurate and reliable offline fall predictor, achieving accuracy, sensitivity and specificity values up to 95.6{\%

摘要: 在动态机器人的操作中(与环境互动)包含感知信息可以提供有关其环境的宝贵见解,并提高其行为的鲁棒性。在这方面,在简单直观的框架下,联想技能记忆(ASM)的概念为有效和实际使用传感器数据做出了巨大贡献2,13。受2的启发,本文提出了一种针对人形两足动物机器人运动中跌倒预测问题的数据挖掘解决方案。记录了来自大量模拟的传感器数据,并应用了四种数据挖掘算法,目的是创建可正确识别故障条件的分类器。使用支持向量机,除了来自大量模拟试验的传感器数据之外,还可以构建准确而可靠的离线跌落预测器,从而实现高达95.6 {\%的准确度,灵敏度和特异性值

下载地址 | 返回目录 | [10.1007/978-3-319-27149-1_22]

[7] A literature survey on various classifications of data mining for predicting heart disease (2020)

Abstract: The heart is a very important hard working organ in the human body, which pumps blood to supply nutrients and oxygen throughout the whole body. The prediction of the occurrence of heart disease in the medical area is an important task. Algorithm of data mining are very helpful in the detection of Cardiovascular disease. In this paper, a survey has been provided for data mining classification techniques, in which health professionals have been offered to help in diagnosing cardiovascular diseases. We start by over-viewing the data mining techniques and describing various classification models used in for the earlier detection of heart diseases. Then, we review the proposed research works on using data mining classification techniques in this area.

摘要: 心脏是人体中非常重要的辛勤工作的器官,它可以抽血来为整个身体提供营养和氧气。对医学领域心脏病的发生进行预测是一项重要的任务。数据挖掘算法对心血管疾病的检测非常有帮助。在本文中,已经对数据挖掘分类技术进行了调查,其中提供了卫生专业人员来帮助诊断心血管疾病。我们首先概述数据挖掘技术,并描述用于心脏病早期检测的各种分类模型。然后,我们回顾了在该领域使用数据挖掘分类技术提出的研究工作。

下载地址 | 返回目录 | [10.1007/978-3-030-34515-0_76]

[8] A survey of location prediction using trajectory mining (2015)

Abstract: This paper is a research and analysis on the prediction of location of moving objects that gained popularity over the years. Trajectory specifies the path of the movement of any object. There is an increase in the number of applications using the location-based services (LBS), which needs to know the location of moving objects where trajectory mining plays a vital role. Trajectory mining techniques use the geographical location, semantics, and properties of the moving object to predict the location and behavior of the object. This paper analyses the various strategies in the process of making prediction of future location and constructing the trajectory pattern. The analyses of various mechanisms are done based on various factors including accuracy and ability to predict the distant future. Location prediction problem can be with known reference points and unknown reference points, and semantic-based prediction gives an accurate result whereas the probability-based prediction for unknown reference points.

摘要: 本文是对多年来流行的移动物体的位置预测的研究和分析。轨迹指定任何对象运动的路径。使用基于位置的服务(LBS)的应用程序数量有所增加,这需要了解在轨迹挖掘起着至关重要作用的移动对象的位置。轨迹挖掘技术使用运动对象的地理位置,语义和属性来预测对象的位置和行为。在分析未来位置预测和构建轨迹模式的过程中,本文分析了各种策略。基于各种因素进行各种机制的分析,包括准确性和预测遥远未来的能力。位置预测问题可能存在已知参考点和未知参考点,基于语义的预测给出准确的结果,而针对未知参考点的基于概率的预测。

下载地址 | 返回目录 | [10.1007/978-81-322-2126-5_14]

[9] A wavelet analysis based data processing for time series of data mining predicting (2006)

Abstract: This paper presents wavelet method for time series in business-field forecasting. An autoregressive moving average (ARMA) model is used, it can model the near-periodicity, nonstationarity and nonlinearity existed in business short-term time series. According to the wavelet denoising, wavelet decomposition and wavelet reconstruction, the hidden period and the nonstationarity existed in time series are extracted and separated by wavelet transformation. The characteristic of wavelet decomposition series is applied to BP networks and an autoregressive moving average (ARMA) model. It shows that the proposed method can provide more accurate results than the conventional techniques, like those only using BP networks or autoregressive moving average (ARMA) models. {\textcopyright} Springer-Verlag Berlin Heidelberg 2006.

摘要: 本文提出了用于业务领域预测中的时间序列的小波方法。使用自回归移动平均(ARMA)模型,它可以对业务短期时间序列中存在的近周期,非平稳性和非线性进行建模。通过对小波去噪,小波分解和小波重构,提取出时间序列中存在的隐藏周期和非平稳性,并通过小波变换进行分离。将小波分解序列的特征应用于BP网络和自回归移动平均(ARMA)模型。结果表明,与仅使用BP网络或自回归移动平均(ARMA)模型的常规技术相比,该方法可提供比常规技术更准确的结果。 {\ t​​extcopyright} Springer-Verlag Berlin Heidelberg2006。”

下载地址 | 返回目录 | [10.1007/11731139_91]

[10] Accurate and Transparent Path Prediction Using Process Mining (2019)

Abstract: Anticipating the next events of an ongoing series of activities has many compelling applications in various industries. It can be used to improve customer satisfaction, to enhance operational efficiency, and to streamline health-care services, to name a few. In this work, we propose an algorithm that predicts the next events by leveraging business process models obtained using process mining techniques. Because we are using business process models to build the predictions, it allows business analysts to interpret and alter the predictions. We tested our approach with more than 30 synthetic datasets as well as 6 real datasets. The results have superior accuracy compared to using neural networks while being orders of magnitude faster.

摘要: 预计正在进行中的一系列活动的下一个事件在各个行业中都有许多引人注目的应用。它可以用来提高客户满意度,提高运营效率和简化医疗保健服务。在这项工作中,我们提出了一种算法,该算法通过利用使用流程挖掘技术获得的业务流程模型来预测下一个事件。因为我们使用业务流程模型来构建预测,所以它使业务分析师可以解释和更改预测。我们用30多个综合数据集和6个真实数据集测试了我们的方法。与使用神经网络相比,结果具有更高的准确性,但速度要快几个数量级。

下载地址 | 返回目录 | [10.1007/978-3-030-28730-6_15]

[11] Accurate and Transparent Path Prediction Using Process Mining (2019)

Abstract: Anticipating the next events of an ongoing series of activities has many compelling applications in various industries. It can be used to improve customer satisfaction, to enhance operational efficiency, and to streamline health-care services, to name a few. In this work, we propose an algorithm that predicts the next events by leveraging business process models obtained using process mining techniques. Because we are using business process models to build the predictions, it allows business analysts to interpret and alter the predictions. We tested our approach with more than 30 synthetic datasets as well as 6 real datasets. The results have superior accuracy compared to using neural networks while being orders of magnitude faster.

摘要: 预计正在进行中的一系列活动的下一个事件在各个行业中都有许多引人注目的应用。它可以用来提高客户满意度,提高运营效率和简化医疗保健服务。在这项工作中,我们提出了一种算法,该算法通过利用使用流程挖掘技术获得的业务流程模型来预测下一个事件。因为我们使用业务流程模型来构建预测,所以它使业务分析师可以解释和更改预测。我们用30多个综合数据集和6个真实数据集测试了我们的方法。与使用神经网络相比,结果具有更高的准确性,但速度要快几个数量级。

下载地址 | 返回目录 | [10.1007/978-3-030-28730-6_15]

[12] Adaptive Routing Mechanism in SDN to Limit Congestion (2019)

Abstract: Twitter is a microblogging website where users read and write short messages on various topics every day. Political analysis using social media is getting attention of many researchers to understand the public opinion and trend especially during election time. In this paper, we propose a novel approach based on semantics and context aware rules to detect the public opinion and further predict election results. We crawled the political tweets during the general election in India, and further evaluate our proposed approach against the election results. Experimental results show the effectiveness of the proposed rules in determining the sentiment of the political tweets.

摘要: Twitter是一个微博客网站,用户每天都在其中阅读和撰写有关各种主题的短消息。使用社交媒体进行政治分析正在引起许多研究人员的注意,以了解公众的看法和趋势,尤其是在选举期间。在本文中,我们提出了一种基于语义和上下文感知规则的新颖方法来检测民意并进一步预测选举结果。我们在印度大选期间抓取了政治推文,并针对选举结果进一步评估了我们提议的方法。实验结果证明了拟议规则在确定政治推文情绪方面的有效性。

下载地址 | 返回目录 | [10.1007/978-981-13-3329-3]

[13] An Efficient Association Rule Mining Method to Predict Diabetes Mellitus: KNHANES 2013–2015 (2020)

Abstract: The diabetes mellitus is a kind of growing epidemic which leads to several complications in recent years. Therefore, the prevention of diabetes mellitus is an important public health topic in medical domain. In this study, we try to find the risk factors of diabetes mellitus for the prediction of diabetes mellitus by using the association rule mining. In our experiment, we used the complex sampling-based feature selection approach and correlation-based feature selection approach to extract relevant features about the diabetes mellitus and generated the rules based on the discovered risk factors. Then we evaluated the discovered rules based on the support and confidence, and found useful rules. We expect our results are helpful for predicting the diabetes mellitus more easily in real life.

摘要: 糖尿病是一种流行病,近年来导致多种并发症。因此,预防糖尿病是医学领域重要的公共卫生课题。在这项研究中,我们试图通过使用关联规则挖掘来发现糖尿病的危险因素,以预测糖尿病。在我们的实验中,我们使用了基于复杂采样的特征选择方法和基于相关性的特征选择方法来提取有关糖尿病的相关特征,并根据发现的风险因素生成了规则。然后,我们基于支持和信心对发现的规则进行了评估,并找到了有用的规则。我们希望我们的结果有助于在现实生活中更轻松地预测糖尿病。

下载地址 | 返回目录 | [10.1007/978-981-13-9714-1_26]

[14] An association rule mining approach in predicting flood areas (2017)

Abstract: This study focuses on the application of Association rules mining for the flood data in Terengganu. Flood is one of the natural disasters that happens every year during the monsoon season and causes damage towards people, infrastructure and the environment. This paper aimed to find the correlation between water level and flood area in developing a model to predict flood. Malaysian Drainage and Irrigation Department supplied the dataset which were the flood area, water level and rainfall data. The association rules mining technique will generate the best rules from the dataset by using Apriori algorithm which had been applied to find the frequent itemsets. Consequently, by using the Apriori algorithm, it generated the 10 best rules with 100{\%} confidence level and 40{\%} minimum support after the candidate generation and pruning technique. The results of this research showed the usability of data mining in this field and can help to give early warning towards potential victims and spare some time in saving lives and properties.

摘要: 此研究侧重于关联规则挖掘在登嘉楼洪水数据中的应用。洪水是季风季节每年发生的自然灾害之一,对人,基础设施和环境造成破坏。本文旨在通过建立洪水预报模型来发现水位与洪水面积之间的相关性。马来西亚排水与灌溉部门提供了洪水区,水位和降雨量数据集。关联规则挖掘技术将使用Apriori算法从数据集中生成最佳规则,该算法已用于查找频繁项集。因此,通过使用Apriori算法,在候选物生成和修剪技术之后,它以100 {\%}的置信度和40 {\%}的最小支持生成了10条最佳规则。这项研究的结果表明了该领域数据挖掘的可用性,可以帮助对潜在受害者进行预警,并节省一些时间来挽救生命和财产。

下载地址 | 返回目录 | [10.1007/978-3-319-51281-5_44]

[15] “An investigation on educational data mining to analyze and predict the students academic performance using visualization” (2019)

Abstract: “Presently, educational institutions compile and store huge volumes of data such as students enrollment details, academic history, attendance records, and as well as their examination results. Traditional data mining approaches cannot be directly applied for visualization so we are using Pandas software library framework for preprocessing of the academics data and visualization of the data using matplotlib and seaborn libraries are used in this approach to get better results and easily understand and predict the outcomes from the data.”

摘要: “目前,教育机构会编译和存储大量数据,例如学生的入学详细信息,学历,出勤记录以及他们的考试结果。传统的数据挖掘方法无法直接应用于可视化,因此我们使用的是Pandas软件库该方法使用了用于对学者数据进行预处理和使用matplotlib和seaborn库对数据进行可视化的框架,以获取更好的结果,并轻松地理解和预测数据的结果。”

下载地址 | 返回目录 | [10.1007/978-981-13-3329-3_17]

[16] Application of data mining software to predict the alum dosage in coagulation process: A case study of Vientaine, Lao PDR (2018)

Abstract: This paper presents an alum dosage prediction in coagulation process by using Weka Data Mining Software. The data in this research had been collected from Dongmarkkaiy Water Treatment Plant (DWTP), Vientiane capital, Laos PDR from 1st January 2008 to 31st October 2016. The total number of collected data were 2,891 records. In this research, we compared the results from multilayer perceptron (MLP), M5Rules, M5P, and REPTree method by using the root mean square error (RMSE) and mean absolute error (MAE) value. Three input independent variables, i.e. turbidity, pH, and alkalinity were used. The dependent variable was alum added for the coagulation process. Our experimental results indicated that the MLP method yielded the highest precision method in order to predict the alum dosage.

摘要: 本文介绍了通过使用Weka Data Mining软件预测混凝过程中的明矾剂量。这项研究的数据是从2008年1月1日至2016年10月31日从老挝万象首都东马克凯伊水处理厂(DWTP)收集的。收集的数据总数为2,891条。在这项研究中,我们通过使用均方根误差(RMSE)和平均绝对误差(MAE)值比较了多层感知器(MLP),M5Rules,M5P和REPTree方法的结果。使用了三个输入独立变量,即浊度,pH和碱度。因变量是明矾添加的凝血过程。我们的实验结果表明,为了预测明矾的用量,MLP方法产生了最高精度的方法。

下载地址 | 返回目录 | [10.3233/978-1-61499-834-1-110]

[17] “Application of optimum binning technique in data mining approaches to predict students final grade in a course” (2015)

Abstract: “In Bangladesh there is a continuous rise in demand for higher education in last decade; therefore, the need for improving the education system is imminent. Data-mining techniques could be explored on educational settings to extract useful information which could be helpful for the students as well for the instructors. In this research, we present and analyze techniques that predict students final outcome (with respect to grade) for a particular course. We validate our method by conducting experiments on data that are related to grade for courses in North South University, one of the leading universities in higher education in Bangladesh. Our preliminary finding is encouraging. We further improve and extend our ideas through discretization of the continuous attributes by equal width binning and error minimization techniques. Experimental results demonstrate that improved technique through discretization outperforms the techniques with other forms such as probability estimation by almost 7-10{\%}.”

摘要: “孟加拉国在过去十年中对高等教育的需求持续增长;因此,迫切需要改进教育系统。可以在教育环境中探索数据挖掘技术,以提取有用的信息,这对在这项研究中,我们介绍并分析了预测特定课程最终学生成绩(相对于成绩)的技术,并通过对与课程成绩相关的数据进行实验来验证我们的方法北方大学是孟加拉国高等教育的领先大学之一,我们的初步发现令人鼓舞,我们通过等宽度合并和误差最小化技术对连续属性进行离散化,进一步完善和扩展了我们的思想,实验结果表明,通过离散化以其他形式优于技术,例如通过alm进行概率估计ost 7-10 {\%}。“

下载地址 | 返回目录 | [10.1007/978-3-319-13153-5_16]

[18] “Apply of sum of difference method to predict placement of students using educational data mining” (2016)

Abstract: “The purpose of higher education organizations is to offer superior education to its students. The proficiency to forecast students achievement is valuable in affiliated ways associated with organization education system. Students scores which they got in exam, can be used to invent training set for dominate learning algorithms. With the academia attributes of students such as internal marks, lab marks, age etc. it can be easily predict their performance. After getting predicted results, improvement in the performance of the student to engage with desirable assistance to the students has to be processed. Educational Data Mining (EDM) offers such information to educational organization from educational data. EDM provides various methods for prediction of students performance, which improve the future results of students. In this paper, by using their attributes such as academic records, age, and achievement etc., EDM is used for predicting the performance about placement of final year students. As a result, higher education organizations will offer superior education to its students.”

摘要: “高等教育组织的目的是为学生提供优质的教育。通过与组织教育系统相关的关联方式,预测学生的成就的能力很有价值。他们在考试中获得的分数可以用来设计训练集用于主要的学习算法,利用学生的学术属性,例如内部标记,实验室标记,年龄等,可以轻松地预测他们的表现;获得预测结果后,可以提高学生的表现,从而为学生提供理想的帮助教育数据挖掘(EDM)通过教育数据向教育组织提供此类信息,EDM提供了多种预测学生表现的方法,从而提高了学生的未来学习成绩。记录,年龄和成就等信息,EDM用于预测最终广告的展示效果ar学生。结果,高等教育机构将为其学生提供优质的教育。“

下载地址 | 返回目录 | [10.1007/978-81-322-2755-7_39]

[19] Applying process mining to support management of predictive analytics/data mining projects in a decision making center (2019)

Abstract: A Decision Making Centers (DMC) Environment facilitates the understanding of complex problems and simplifies the decision making process of policy makers and stakeholders by playing out several what if scenarios. One of the key elements in this environment is the management of Predictive Analytics and Data Mining (PADM) techniques required for real-time and accurate representation of data in scenarios. However, there has been a gap in understanding the best practices that facilitate the execution of PADM in environments such as a DMC. In this paper we aim to address this gap. Specifically, we apply process mining techniques to reveal: 1) process flow and executed activities in PADM projects and 2) The relationship and time distribution of activities within a PADM project execution. To this end two successful projects were analyzed in order to find relationships among activities and time metrics during execution. The present study aims to support managers in the execution of large scale PADM projects.

摘要: 决策中心(DMC)环境通过播放多种假设情景,有助于理解复杂问题并简化决策者和利益相关者的决策过程。这种环境中的关键要素之一是对预测分析和数据挖掘(PADM)技术的管理,这些技术是场景中实时,准确地表示数据所必需的。但是,在理解有助于在DMC等环境中执行PADM的最佳实践方面存在差距。本文旨在解决这一差距。具体来说,我们应用流程挖掘技术来揭示:1)PADM项目中的流程和执行的活动,以及2)PADM项目执行中活动的关系和时间分布。为此,对两个成功的项目进行了分析,以发现执行过程中活动和时间指标之间的关系。本研究旨在支持管理人员执行大型PADM项目。

下载地址 | 返回目录 | [10.1109/ICSAI48974.2019.9010135]

[20] Artificial intelligence in medicine (2020)

Abstract: “Artificial Intelligence (AI) is based on accurate decision-making processes which can be carried out independently by a machine. AI may be subdivided into strong AI (with consciousness and intentionality) and weak AI (lacking both and programmed to perform specific tasks only). With AI currently making rapid progress in all domains, including those of healthcare, physicians face possible competitors. Various types of AI programs are already available as consultants to the physician, and these help in medical diagnostics and treatment. At the time of writing, extant programs constitute weak AI. This paper will explore the development of AI and robotics in medicine, and will refer to Star Treks “Emergency Medical Hologram”, who is portrayed as a strong AI program. This paper will also briefly explore the issues pertaining to AI in the medical field and will show that weak AI should not only suffice in the demesne of healthcare, but may actually be more desirable than strong AI.”

摘要: “人工智能(AI)是基于可以由机器独立执行的准确决策过程。AI可以细分为强AI(具有意识和意向性)和弱AI(两者均缺乏,并且可以编程以执行特定任务随着AI在各个领域(包括医疗保健领域)的飞速发展,医生面临着潜在的竞争者,各种类型的AI程序已经可以作为医生的顾问,这些程序可以帮助医疗诊断和治疗。写作中,现存的程序构成了弱AI,本文将探讨医学中的AI和机器人技术的发展,并参考被描述为强大AI程序的《星际迷航》的“紧急医学全息图”。与医学领域的AI有关,将表明弱AI不仅应满足医疗保健的需求,而且实际上比强AI更可取。”

下载地址 | 返回目录 | [10.1016/j.earlhumdev.2020.105017]

[21] Automatic learning of predictive CEP rules: Bridging the gap between data mining and complex event processing (2017)

Abstract: Due to the undeniable advantage of prediction and proactivity, many research areas and industrial applications are accelerating the pace to keep up with data science and predictive analytics. However and due to three well-known facts, the reactive Complex Event Processing (CEP) technology might lag behind when prediction becomes a requirement. 1st fact: The one and only inference mechanism in this domain is totally guided by CEP rules. 2nd fact: The only way to define a CEP rule is by writing it manually with the help of a human expert. 3nd fact: Experts tend to write reactive CEP rules, because and regardless of the level of expertise, it is nearly impossible to manually write predictive CEP rules. Combining these facts together, the CEP is-and will stay- a reactive computing technique. Therefore in this article, we present a novel data mining-based approach that automatically learns predictive CEP rules. The approach proposes a new learning algorithm where complex patterns from multivariate time series are learned. Then at run-time, a seamless transformation into the CEP world takes place. The result is a ready-to-use CEP engine with enrolled predictive CEP rules. Many experiments on publicly-available data sets demonstrate the effectiveness of our approach.

摘要: 由于预测和主动性具有不可否认的优势,许多研究领域和工业应用正在加快跟上数据科学和预测分析的步伐。但是,由于三个众所周知的事实,当需要进行预测时,反应性复杂事件处理(CEP)技术可能会落后。第一个事实:该领域中唯一的推理机制完全由CEP规则指导。第二个事实:定义CEP规则的唯一方法是在人类专家的帮助下手动编写。第三事实:专家倾向于编写反应式CEP规则,因为无论专业水平如何,几乎都不可能手动编写预测性CEP规则。将这些事实结合在一起,CEP将会(并将保持)一种反应式计算技术。因此,在本文中,我们提出了一种新颖的基于数据挖掘的方法,该方法可自动学习预测性CEP规则。该方法提出了一种新的学习算法,可以从多元时间序列中学习复杂的模式。然后在运行时,无缝过渡到CEP世界。结果是具有已注册的预测CEP规则的即用型CEP引擎。对公开数据集进行的许多实验证明了我们方法的有效性。

下载地址 | 返回目录 | [10.1145/3093742.3093917]

[22] Collective intelligent information and database systems (2017)

Abstract: Collective intelligence is most often understood as group intelligence which arises on the basis of intelligences of the group members. This paper presents an overview of application of collective intelligence methods in knowledge engineering and in processing collective data. It also introduces papers included in this issue.

摘要: 集体智慧通常被理解为集体智慧,它是基于集体成员的智慧而产生的。本文概述了集体智能方法在知识工程和集体数据处理中的应用。它还介绍了本期中包含的论文。

下载地址 | 返回目录 | [10.3233/JIFS-169115]

[23] Comparative study of different data mining techniques in predicting forest fire in Lebanon and mediterranean (2018)

Abstract: Forest fire is one of the most complex phenomena which can cause great economic losses and make eco-environment seriously disordered. Forest fire has caused the loss of many green acres in Lebanon due to the lack of governmental policies in order to mange forest fires. This paper presents an overview of the exciting applications of data mining techniques in different fields. This study aims to predict forest fires in North Lebanon in order to reduce fire occurrence based on 4 meteorological parameters (Temperature, Humidity, Precipitation and Wind speed) using different data mining techniques: Neural networks, decision tree (J48), fuzzy logic, support vector machine (SVM) and linear discriminant analysis (LDA). A comparative study is then made to find the best performing technique tending to manage such a natural crisis. Decision tree (J48) recorded the best accuracy in forest fire prediction (97.8{\%}).

摘要: 森林火灾是最复杂的现象之一,会造成巨大的经济损失,并使生态环境严重混乱。由于缺乏管理森林大火的政府政策,森林大火在黎巴嫩造成了许多绿色土地的损失。本文概述了数据挖掘技术在不同领域中令人兴奋的应用。这项研究旨在预测黎巴嫩北部的森林大火,以便使用不同的数据挖掘技术基于4个气象参数(温度,湿度,降水和风速)减少火灾的发生:神经网络,决策树(J48),模糊逻辑,支持向量机(SVM)和线性判别分析(LDA)。然后进行了一项比较研究,以找到倾向于应对这种自然危机的最佳技术。决策树(J48)在森林火灾预测中记录了最高的准确性(97.8 {\%})。

下载地址 | 返回目录 | [10.1007/978-3-319-56994-9_51]

[24] Completion Time and Next Activity Prediction of Processes Using Sequential Pattern Mining (2014)

Abstract: Process mining is a research discipline that aims to discover, monitor and improve real processing using event logs. In this paper we describe a novel approach that (i) identifies partial process models by exploiting sequential pattern mining and (ii) uses the additional information about the activities matching a partial process model to train nested prediction models from event logs. Models can be used to predict the next activity and completion time of a new (running) process instance. We compare our approach with a model based on Transition Systems implemented in the ProM5 Suite and show that the attributes in the event log can improve the accuracy of the model without decreasing performances. The experimental results show how our algorithm improves of a large margin ProM5 in predicting the completion time of a process, while it presents competitive results for next activity prediction.

摘要: 过程挖掘是一门研究学科,旨在利用事件日志来发现,监视和改善实际处理。在本文中,我们描述了一种新颖的方法,该方法(i)通过利用顺序模式挖掘来识别部分过程模型,并且(ii)使用有关与部分过程模型匹配的活动的附加信息来训练事件日志中的嵌套预测模型。模型可用于预测新(运行)流程实例的下一个活动和完成时间。我们将我们的方法与基于ProM5 Suite中实现的Transition Systems的模型进行了比较,结果表明事件日志中的属性可以提高模型的准确性,而不会降低性能。实验结果表明,我们的算法在预测过程完成时间方面如何改进大幅度ProM5,同时为下一个活动预测提供了有竞争力的结果。

下载地址 | 返回目录 | [10.1007/978-3-319-11812-3_5]

[25] Context-Aware Data Mining vs Classical Data Mining: Case Study on Predicting Soil Moisture (2020)

Abstract: The purpose of this article is to investigate whether or not including context data (context-awareness) in a classical data mining process would enhance the overall results. For that, the efficiency of the predictions was analyzed and compared: in a classical data mining process versus a context-aware data mining process. The two processes were applied on existing data collected from more weather stations to predict the future soil moisture. The classic data mining process considers historical data on soil moisture and temperature in a time interval, while the context-aware process also adds collected context information on air temperature for that location. The obtained results show advantages of CADM over classical DM.

摘要: 本文的目的是研究在经典数据挖掘过程中包括上下文数据(上下文感知)是否会增强总体结果。为此,分析和比较了预测的效率:在传统的数据挖掘过程中,还是在上下文感知的数据挖掘过程中。将这两个过程应用于从更多气象站收集的现有数据,以预测未来的土壤湿度。经典的数据挖掘过程会考虑一个时间间隔内土壤湿度和温度的历史数据,而上下文感知过程还会添加该位置的气温相关上下文信息。获得的结果表明,CADM优于传统DM。

下载地址 | 返回目录 | [10.1007/978-3-030-20055-8_19]

[26] “DTFA Rule Mining-Based Model to Predict Students Performance” (2018)

Abstract: “Nowadays data mining plays an important role in various fields. This field may be related to education, agriculture or medicine. Education is an important part of human life. With the help of data mining in education called educational data mining, we achieve quality education, which is very essential not only for growth of students as well as country. Based on quality education, we achieve and improve performance of students. We can use the different data mining techniques to improve performance of students. This papers contents indicate the sort out the performance of students basis on 12th and graduation level marks, previous semester marks (PSM), mid sem marks (MSM), attendance (ATT) and end semester marks (ESM). Using attributes, conclude the recital of students in end semester. In this paper, for classification, we used decision tree algorithm, and for fuzzy association mining, we used the modified Apriori-like method. In this paper, we will merge these rules which are generated from decision tree algorithm and Apriori-like algorithm.”

摘要: “如今,数据挖掘在各个领域都发挥着重要作用。该领域可能与教育,农业或医学有关。教育是人类生活的重要组成部分。借助称为教育数据挖掘的教育中的数据挖掘,我们可以实现质量教育,这不仅对学生的成长和国家的发展都至关重要,在素质教育的基础上,我们实现并提高了学生的表现,可以利用不同的数据挖掘技术来提高学生的表现。根据学生的十二年级和毕业分数,上学期分数(PSM),中学期分数(MSM),出勤率(ATT)和期末分数(ESM)评估学生的表现。利用属性总结学生在学期末的朗诵对于分类,我们使用决策树算法,对于模糊关联挖掘,我们使用改进的类似Apriori的方法,在本文中,我们将合并这些规则由决策树算法和类Apriori算法编辑而成。”

下载地址 | 返回目录 | [10.1007/978-981-10-8198-9_26]

[27] “DTI Helps to Predict Parkinsons Patients Symptoms Using Data Mining Techniques” (2019)

Abstract: “Deep Brain Stimulation (DBS) is commonly used to treat, inter alia, movement disorder symptoms in patients with Parkinsons disease, dystonia or essential tremor. The procedure stimulates a targeted region of the brain through implanted leads that are powered by a device called an implantable pulse generator (IPG). The mentioned targeted region is mainly chosen to be subthalamic nucleus (STN) during most of the operations. STN is a nucleus in the midbrain with a size of 3\xa0mm × 5\xa0mm × 9\xa0mm that consist of parts with different physiological functions. The purpose of the study was to predict Parkinsons patients symptoms defined by Unified Parkinsons Disease Rating Scale (UPDRS) that may occur after the DBS treatment. Parameters had been obtained from 3DSlicer (Harvard Medical School, Boston, MA), which allowed us to track connections between the stimulated part of STN and the cortex based on the DTI (diffusion tensor imaging).”

摘要: “深层脑刺激(DBS)通常用于治疗帕金森氏病,肌张力障碍或原发性震颤患者的运动障碍症状。该程序通过植入的导线刺激大脑的目标区域,该导线由称为的设备供电在大多数手术中,上述目标区域主要被选择为丘脑下核(STN),STN是中脑核,大小为3 \ xa0mm×5 \ xa0mm×9 \ xa0mm这项研究的目的是预测由DBS治疗后可能发生的统一帕金森氏疾病评分量表(UPDRS)所定义的帕金森氏病患者的症状。参数是从3DSlicer(哈佛医学院,波士顿)获得的,MA),这使我们能够基于DTI(扩散张量成像)跟踪STN的受刺激部分与皮质之间的联系。”

下载地址 | 返回目录 | [10.1007/978-3-030-14802-7_53]

[28] Data Science for Urban Sustainability: Data Mining and Data-Analytic Thinking in the Next Wave of City Analytics (2018)

Abstract: As a research direction, big data analytics has recently attracted scholars and scientists from diverse disciplines, as well as practitioners from a variety of professional fields, given their prominence in various urban domains, especially urban design and planning, transportation engineering, mobility, energy, public health, and socioeconomic forecasting. Indeed, there has recently been much enthusiasm about the immense possibilities created by the data deluge and its new sources to better operate, manage, and plan cities to improve their contribution to the goals of sustainable development as a result of thinking about and understanding sustainability problems in a data-analytic fashion. This data deluge is increasingly enriching and reshaping our experiences of how such cities can be advanced. Big data analytics is indeed offering many new opportunities for well-informed decision-making and enhanced insights with respect to our knowledge of how fast and best to improve urban sustainability. This unprecedented shift has been brought up by data science, an interdisciplinary field which involves scientific systems, processes, and methods used to extract useful knowledge from data in structured or unstructured forms. Data mining and knowledge discovery in databases as processes are by far the most widely used techniques for extracting useful knowledge from colossal datasets for enhanced decision-making and insights in relation predominantly to business intelligence. However, in city-related academic and scientific research, it is argued that “small data” studies—questionnaire surveys, focus groups, case studies, participatory observations, audits, interviews, content analyses, and ethnographies—are associated with high cost, infrequent periodicity, quick obsolescence, reflexivity, incompleteness, and inaccuracy, i.e., capture a relatively limited sample of data that are tightly focused, time and space specific, restricted in scope and scale, and relatively expensive to generate and analyze, to provide additional depth and insight with respect to urban phenomena. Accordingly, much of our knowledge of urban sustainability has been gleaned from studies that are characterized by data scarcity. The potential of big data lies in transforming the knowledge of smart sustainable cities through the creation of a data deluge that seeks to provide much more sophisticated, wider scale, finer grained, real-time understanding, and control of various aspects of urbanity. Therefore, this chapter aims to synthesize, illustrate, and discuss a systematic framework for urban (sustainability) analytics based on Cross-Industry Standard Process for Data Mining (CRISP-DM) in response to the emerging wave of city analytics in the context of smart sustainable cities. This framework, which can be tested and used in empirical applications in the city domain, has an innovative potential to advance urban analytics by providing a novel way of thinking data-analytically about urban sustainability problems. It provides fertile insights into how to conduct “big data” studies in the field of urban sustainability. The intent is to enable well-informed or knowledge-driven decision-making and enhanced insights in relation to diverse urban domains with regard to operations, functions, strategies, designs, practices, and policies for increasing the contribution of smart sustainable cities to the goals of sustainable development. This chapter can serve to bring together city analysts, data scientists, urban planners and scholars, and ICT experts on common ground in their endeavor to transform and advance the knowledge of smart sustainable cities in terms of sustainability.

摘要: 作为研究方向,鉴于大数据分析在各个城市领域(尤其是城市设计和规划,运输工程,移动性,交通运输,交通运输等领域的重要性),最近吸引了来自各个学科的学者和科学家以及来自各个专业领域的从业人员。能源,公共卫生和社会经济预测。的确,由于思考和理解可持续性问题,数据洪泛及其新来源为改善城市的运营,管理和计划而创造了巨大的可能性,从而引起人们极大的热情,它们可以更好地操作,管理和计划城市,以提高其对可持续发展目标的贡献以数据分析的方式。大量的数据正在日益丰富和重塑我们关于如何发展这类城市的经验。大数据分析的确为我们提供了许多新的机会,使他们能够明智地进行决策,并就我们对提高城市可持续性的最快和最佳方式的了解而获得更多见解。数据科学带来了这种前所未有的转变,这是一个涉及科学系统,过程和方法的跨学科领域,用于从结构化或非结构化形式的数据中提取有用的知识。迄今为止,数据库中作为过程的数据挖掘和知识发现是最广泛使用的技术,可从庞大的数据集中提取有用的知识,以增强决策和与商业智能有关的见解。但是,在与城市有关的学术和科学研究中,有人认为“小数据”研究(问卷调查,焦点小组,案例研究,参与性观察,审计,访谈,内容分析和人种志)与高成本,少见有关。周期性,快速过时,自反性,不完整性和不准确性,即捕获相对有限的数据样本,这些样本紧密聚焦,时间和空间特定,范围和规模有限,并且生成和分析相对昂贵,以提供更多的深度和对城市现象的洞察力。因此,我们许多关于城市可持续性的知识是从以数据稀缺为特征的研究中收集的。大数据的潜力在于通过创建数据洪流来转变智能可持续发展城市的知识,该数据洪流旨在提供更复杂,更广泛的范围,更细粒度的实时了解以及对城市各个方面的控制。因此,本章旨在综合,说明和讨论基于跨行业数据挖掘标准流程(CRISP-DM)的城市(可持续性)分析系统框架,以应对智能化背景下新兴的城市分析浪潮。可持续城市。该框架可以进行测试,并可以在城市领域的经验应用中使用,它具有创新的潜力,可以通过提供一种新颖的数据分析方式来思考城市可持续性问题,从而推动城市分析的发展。它为如何进行城市可​​持续发展领域的“大数据”研究提供了丰富的见解。目的是在各种运营,功能,策略,设计,实践和政策方面,就各种城市领域做出知情或知识驱动的决策,并增强洞见,以增加智能可持续城市对目标的贡献可持续发展。本章可将城市分析人员,数据科学家,城市规划人员和学者以及ICT专家召集到一起,共同努力,在可持续性方面转变和提高智能可持续城市的知识。

下载地址 | 返回目录 | [10.1007/978-3-319-73981-6_4]

[29] Data mining approach to predict and analyze the cardiovascular disease (2017)

Abstract: This paper presents the experimental analysis of data provided by UCI machine learning repository. Weka open source machine learning tool provided by Waikato University reveals the hidden fact behind the datasets on applying supervised mathematical proven algorithm, i.e., J48 and Na{\"{i}}ve Bayes algorithm. J48 is an extension of ID3 algorithm having additional features like continuous attribute value ranges and derivation of rules. The data sets were analyzed using two approaches, i.e., first taken with selected attributes and taken with all attributes. The performance of both the algorithm reveals the accuracy of algorithm and predicting the various reasons behind this increasing problem of cardiovascular diseases.

摘要: 本文介绍了由UCI机器学习存储库提供的数据的实验分析。怀卡托大学提供的Weka开源机器学习工具揭示了在应用监督数学证明算法(即J48和Na {\“ {i}} ve Bayes算法)的数据集背后的隐藏事实。J48是ID3算法的扩展,具有附加功能具有连续属性值范围和规则推导等特征;使用两种方法对数据集进行了分析,即首先采用选定属性,然后采用所有属性,这两种算法的性能都揭示了算法的准确性并预测了背后的各种原因心血管疾病这一日益严重的问题。

下载地址 | 返回目录 | [10.1007/978-981-10-3153-3_12]

[30] Data mining approaches for packaging yield prediction in the post-fabrication process (2013)

Abstract: In the post-fabrication process for semiconductors, it is critical to predict the yield. This process consists of a series of electrical and physical tests following semiconductor fabrication, tests that generate a significant volume of parametric data. While past research has investigated yield prediction using parametric test data, most studies have difficulty correctly predicting the low and high yield because of the wide range of variables and the large data set. Also, in the case of the packaging yield, prediction is inaccurate as this yield does not directly correlate with the parametric test data. Therefore, this study proposes a framework in which the packaging yield is classified using the parametric test data of the previous step of the packaging test. This study involves three stages. In the first, data preprocessing is conducted due to the large data set. To learn a data mining model using much more data, parametric test data generated in the die level need to be changed into the wafer level. In the second stage, a random forest algorithm is used to select significant variables affecting the packaging yield. Finally, the third stage uses a nonlinear support vector machine (SVM) to classify the low and high yield. Through the three stages, this study demonstrates that this proposed algorithm has a superior performance. {\textcopyright} 2013 IEEE.

摘要: 在半导体的后制造过程中,预测产量至关重要。此过程由半导体制造后的一系列电气和物理测试组成,这些测试会生成大量的参数数据。尽管过去的研究已经使用参数测试数据研究了产量预测,但是由于变量范围广且数据集庞大,因此大多数研究难以正确预测低产量和高产量。同样,在包装成品率的情况下,由于该成品率与参数测试数据不直接相关,因此预测也不准确。因此,本研究提出了一个框架,其中使用包装测试前一步的参数测试数据对包装产量进行分类。这项研究包括三个阶段。首先,由于数据量大,因此要进行数据预处理。为了学习使用更多数据的数据挖掘模型,需要将在芯片级生成的参数测试数据更改为晶圆级。在第二阶段,使用随机森林算法来选择影响包装产量的重要变量。最后,第三阶段使用非线性支持向量机(SVM)对低产量和高产量进行分类。通过三个阶段的研究表明,该算法具有较高的性能。 {\ t​​extcopyright} 2013 IEEE。

下载地址 | 返回目录 | [10.1109/BigData.Congress.2013.55]

[31] Data mining to predict and prevent errors in health insurance claims processing (2010)

Abstract: Health insurance costs across the world have increased alarmingly in recent years. A major cause of this increase are payment errors made by the insurance companies while processing claims. These errors often result in extra administrative effort to re-process (or rework) the claim which accounts for up to 30{\%} of the administrative staff in a typical health insurer. We describe a system that helps reduce these errors using machine learning techniques by predicting claims that will need to be reworked, generating explanations to help the auditors correct these claims, and experiment with feature selection, concept drift, and active learning to collect feedback from the auditors to improve over time. We describe our framework, problem formulation, evaluation metrics, and experimental results on claims data from a large US health insurer. We show that our system results in an order of magnitude better precision (hit rate) over existing approaches which is accurate enough to potentially result in over {\$}15-25 million in savings for a typical insurer. We also describe interesting research problems in this domain as well as design choices made to make the system easily deployable across health insurance companies. {\textcopyright} 2010 ACM.

摘要: 近年来,世界各地的健康保险费用惊人地增加。费用增加的主要原因是保险公司在处理理赔时发生付款错误。这些错误通常会导致额外的行政工作来重新处理(或返工)索赔,在典型的健康保险公司中,索赔最多占行政人员的30 {\%}。我们描述了一个系统,该系统可通过预测需要重做的声明,使用解释器来帮助审核员更正这些声明,并进行功能选择,概念漂移和主动学习以收集反馈信息,从而使用机器学习技术来帮助减少这些错误。审核员会随着时间的推移而改善。我们描述了我们的框架,问题制定,评估指标以及来自一家大型美国健康保险公司的理赔数据的实验结果。我们证明,与现有方法相比,我们的系统所产生的精度(命中率)提高了一个数量级,这足以使典型保险公司节省超过{\}}到15-25百万美元。我们还将描述该领域中有趣的研究问题,以及使系统易于在健康保险公司之间部署的设计选择。 {\ t​​extcopyright} 2010 ACM。

下载地址 | 返回目录 | [10.1145/1835804.1835816]

[32] Decision tree approach to predict lung cancer the data mining technology (2015)

Abstract: “The purposes of this research is using the Data Mining techniques, and explore the potential information from the National Health Insurance Research Database, analysis lung cancer patients which has the highest mortality in Taiwan, as a reference of the analysis of Bureau of National Health Insurance files and the clinical index of hospitals. By using statistical software, “SPSS Clementine 12.0”, this research merged “The detail file of hospital medical expenses inventory” and “The basic data file of medical institution” between 06 and 09 as the study data, then hospitalized lung cancer patients screened for the study sample, and using Two-Step clustering technique to produce the results that are effective. The hos- pitalized patients that with lung cancer and death patients, males have a higher percentage than females, lung cancer people are usually accompanied with comorbidities, especially for pneumonia; most lung cancer patients go to the center of medical, and most patients make booking as out-patient. The treatment of lung cancer patients should notice the factor that lead to pneumonia, and this research shows that most patients make booking as out-patient, therefore, taking drug treatment for patients can regarded as the main dependence for curing.”

摘要: “本研究的目的是使用数据挖掘技术,并从国家健康保险研究数据库中探索潜在信息,分析台湾死亡率最高的肺癌患者,以作为国家卫生局分析的参考保险档案和医院的临床指标,通过统计软件“ SPSS Clementine 12.0”,将“医院医疗费用明细表”和“医疗机构基本数据文件”合并为06和09之间研究数据,然后筛选住院的肺癌患者作为研究样本,并使用两步聚类技术得出有效的结果。住院的肺癌患者和死亡患者中,男性的比例高于女性,肺癌患者通常伴有合并症,尤其是肺炎;大多数肺癌患者会去医疗中心,而大多数患者会预定为门诊病人。肺癌患者的治疗应注意导致肺炎的因素,这项研究表明,大多数患者都将门诊定为门诊病人,因此,对患者采取药物治疗可以视为治愈的主要依赖。”

下载地址 | 返回目录 | [10.1007/978-94-017-9618-7_26]

[33] Dropout prediction in MOOCs: A comparison between process and sequence mining (2018)

Abstract: “Recently, Massive Open Online Courses (MOOCs) have experienced rapid development. However, one of the major issues of online education is the high dropout rates of participants. Many studies have attempted to explore this issue, using quantitative and qualitative methods for student attrition analysis. Nevertheless, there is a lack of studies which (1) predict the actual moment of dropout, providing opportunities to enhance MOOCs student retention by offering timely interventions; and (2) compare the performance of such predicting algorithms. In this paper, we aim to predict student drop out in MOOCs using process and sequence mining techniques, and provide a comparative analysis of these techniques. We perform a case study based on the data from KU Leuven online course “Trends in e-Psychology”, available on the edX platform. The results reveal, that while process mining is better capable to perform descriptive analysis, sequence mining techniques provide better features for predictive purposes.”

摘要: “最近,大规模开放式在线课程(MOOC)经历了快速发展。但是,在线教育的主要问题之一是参与者的辍学率很高。许多研究都试图通过定量和定性的方法来探索这一问题。但是,目前尚缺乏以下研究:(1)预测实际的辍学时间,并通过提供及时的干预措施来提供提高MOOC学生保留率的机会;(2)比较此类预测算法的性能。 ,我们的目标是使用过程和序列挖掘技术来预测学生在MOOC中的辍学情况,并提供这些技术的比较分析。我们根据KU Leuven在线课程“电子心理学趋势”中的数据进行了案例研究结果表明,虽然流程挖掘能够更好地执行描述性分析,但是序列挖掘技术却提供了更好的预测功能的目的。”

下载地址 | 返回目录 | [10.1007/978-3-319-74030-0_18]

[34] Enabling Process Innovation via Deviance Mining and Predictive Monitoring (2015)

Abstract: A long-standing challenge in the field of business process management is how to deal with processes that exhibit high levels of variability, such as customer lead management, product design or healthcare processes. One thing that is under- stood about these processes is that they require process designs and support environments that leave considerable freedom so that process workers can readily deviate from pre-established paths. At the same time, consistent man- agement of these processes requires workers and process owners to understand the implications of their actions and decisions on the performance of the process. We present two emerging techniques—deviance mining and predictive moni- toring—that leverage information hidden in business process execution logs in order to provide guidance to stakeholders so that they can steer the process towards consistent and compliant outcomes and higher process performance. Deviance mining deals with the analysis of process execution logs offline in order to identify typical deviant executions and to characterize deviance that leads to better or to worse performance. Predictive monitoring meanwhile aims at predicting—at runtime—the impact of actions and decisions of process participants on the probable outcomes of ongoing process executions. Together, these two techniques enable evidence-based management of business processes, where process workers and analysts continuously receive guidance to achieve more consistent and compliant process outcomes and a higher performance.

摘要: 在业务流程管理领域中的一个长期挑战是如何处理表现出高度可变性的流程,例如客户线索管理,产品设计或医疗保健流程。关于这些流程的一件事是,它们需要流程设计和支持环境,而这些环境需要很大的自由度,以便流程工作者可以轻松地偏离预定路径。同时,对这些流程的一致管理要求工人和流程所有者了解其行动和决策对流程绩效的影响。我们提出了两种新兴技术-偏差挖掘和预测性监控-利用业务流程执行日志中隐藏的信息,以便为利益相关者提供指导,以便他们可以指导流程实现一致,合规的结果和更高的流程性能。偏差挖掘处理离线处理过程执行日志的分析,以识别典型的偏差执行并表征导致性能提高或降低的偏差。同时,预测性监视旨在在运行时预测过程参与者的操作和决策对正在进行的过程执行的可能结果的影响。这两种技术结合在一起,可以对业务流程进行基于证据的管理,在此过程中,流程工作者和分析师不断获得指导,以实现更一致,更合规的流程结果和更高的绩效。

下载地址 | 返回目录 | [10.1007/978-3-319-14430-6_10]

[35] Ensemble predictive process mining-taking predictive process mining to the next level (2019)

Abstract: This work explores the possibility of integrating the concept of ensemble machine learning (EML) into predictive process mining (PPM). Researchers in the field of PPM seek to enhance the accuracy of predictions. Usually, new techniques and improved algorithms are developed. EML offers a new perspective of combining multiple (different) algorithms to produce a better result than single algorithms. To achieve our goal, we apply the design science research methodology to identify the research gap and develop artifacts that realize the integration of EML into PPM. Based on this implementation we will produce a comprehensive framework to provide the best fitting combination of EML concepts with PPM approaches with regards to typical event-log characteristics.

摘要: 这项工作探讨了将集成机器学习(EML)的概念整合到预测过程挖掘(PPM)中的可能性。 PPM领域的研究人员试图提高预测的准确性。通常,会开发新技术和改进算法。 EML提供了一种结合多个(不同)算法以产生比单个算法更好的结果的新视角。为了实现我们的目标,我们应用设计科学研究方法论来确定研究差距并开发出将EML集成到PPM中的工件。基于此实现,我们将生成一个全面的框架,以针对典型的事件日志特征提供EML概念与PPM方法的最佳匹配组合。

下载地址 | 返回目录 | []

[36] Hybrid heuristics miner based on time series prediction for streaming process mining (2016)

Abstract: “To cope with time-attribute and variations of event distribution in dynamic evolving process, an streaming process mining based on time series prediction and hybrid heuristic miner is proposed. A heuristic miner is improved based on post-task of activity in event logs to optimize the initial particle distribution for Particle Swarm Optimization. Furthermore, aging factor based on time series attribute is also designed for adaptive global optimization. Besides, time-related Process Decision Indicator(PDI) is defined as a pattern observable to identify domain-independent evolution indicators in process model. The experimental results show that our algorithm is more effective and scalable for streaming process mining.”

摘要: “为了应对动态演化过程中的时间属性和事件分布的变化,提出了一种基于时间序列预测和混合启发式挖掘器的流式过程挖掘。基于事件日志中活动的后任务,对启发式挖掘器进行了改进,优化粒子群优化算法的初始粒子分布;此外,还设计了基于时间序列属性的“老化因子”进行自适应全局优化;此外,与时间相关的过程决策指标(PDI)被定义为一种可观察到的模式,用于识别领域-过程模型中具有独立的演化指标。实验结果表明,我们的算法对于流式过程挖掘更加有效和可扩展。”

下载地址 | 返回目录 | [10.1109/CSCWD.2016.7565997]

[37] In silico approaches to predict drug-transporter interaction profiles: Data mining, model generation, and link to cholestasis (2019)

Abstract: Transport proteins play a crucial role in drug distribution, disposition, and clearance by mediating cellular drug influx and efflux. Inhibition of these transporters may lead to drug-drug interactions or even drug-induced liver injury, such as cholestasis, which comprises a major challenge in drug development process. Thus, computer-based (in silico) models that can predict the pharmacological and toxicological profiles of these small molecules with respect to liver transporters may help in the early prioritization of compounds and hence may lower the high attrition rates. In this chapter, we provide a protocol for in silico prediction of cholestasis by generating validated predictive models. In addition to the two-dimensional molecular descriptors, we include transporter inhibition predictions as descriptors and evaluate the influence of the same on the performance of the cholestasis models.

摘要: 转运蛋白通过介导细胞药物流入和流出在药物分配,处置和清除中起关键作用。抑制这些转运蛋白可能导致药物相互作用,甚至导致药物诱发的肝损伤,例如胆汁淤积,这是药物开发过程中的主要挑战。因此,可以预测这些小分子相对于肝转运蛋白的药理学和毒理学特征的基于计算机的(计算机模拟)模型可以有助于化合物的早期优先排序,因此可以降低高损耗率。在本章中,我们提供了通过生成经过验证的预测模型进行胆汁淤积的计算机模拟预测的协议。除了二维分子描述符外,我们还包括转运蛋白抑制预测作为描述符,并评估转运蛋白抑制预测对胆汁淤积模型性能的影响。

下载地址 | 返回目录 | [10.1007/978-1-4939-9420-5_26]

[38] Influence of Historical Meteorological Data Processing in a Mobile Application of Weather Prediction, based on Data Mining (2017)

Abstract: “In the city of Andahuaylas Peru, agricultural region, the conditions of the weather and especially the temperature is decisive for the success or failure of the agricultural campaigns, on the other hand the information of UV radiation conditions the prevention of health and life in general, this information is shown by free internet services, but in a wrong way, because it comes from a remote location of the city of Andahuaylas and also does not provide detailed predictions for proper decision making, in this way the present investigation had the purpose of evaluating the influence of historical meteorological data processing on the efficiency of a mobile application of prediction of temperature and UV radiation, which provides real, updated and predicted information of the weather of the city of Andahuaylas, for it was used the historical and current data of the meteorological station of the National University Jos{\{e}} Mar{\{i}}a Arguedas of Andahuaylas, which were analyzed using data mining to obtain efficient prediction models and were subsequently implemented in the mobile application. To measure the efficiency of the prediction, we compared the mean absolute errors of the models used by the National Service of Meteorology and Hydrology of Peru, of the cities of Arequipa, Iquitos and Lima, with values of 1.24, 2.66 and 1.485 respectively and the mean absolute error of the prediction model of the mobile application with a value of 1.18, which verifies the efficiency of the proposed model and is the expected result.”

摘要: “在秘鲁农业地区的Andahuaylas市,天气条件,尤其是温度对农业运动的成败起决定性作用,另一方面,紫外线辐射的信息则可以预防健康和生命。通常,此信息通过免费的互联网服务显示,但是以错误的方式显示,因为它来自Andahuaylas市的偏远位置,并且也未提供正确的决策的详细预测,因此,本次调查的目的是历史气象数据处理对移动应用温度和紫外线辐射预测效率的影响的评估,该应用程序提供了历史数据和当前数据,可提供安达瓦伊拉斯市天气的真实,更新和预测信息国立大学乔斯{\{e}} 3月{\{i}}安达瓦依拉斯Arguedas气象台数据挖掘以获取有效的预测模型,并随后在移动应用程序中实现。为了衡量预测的效率,我们比较了秘鲁国家气象水文部门,阿雷基帕市,伊基托斯市和利马市使用的模型的平均绝对误差,其值分别为1.24、2.66和1.485,以及该应用程序的预测模型的平均绝对误差值为1.18,这验证了所提出模型的效率,并且是预期结果。”

下载地址 | 返回目录 | []

[39] Large scale predictive process mining and analytics of university degree course data (2017)

Abstract: For students, in particular freshmen, the degree pathway from semester to semester is not that transparent, although students have a reasonable idea what courses are expected to be taken each semester. An often-pondered question by students is: “what can I expect in the next semester?” More precisely, given the commitment and engagement I presented in this particular course and the respective performance I achieved, can I expect a similar outcome in the next semester in the particular course I selected? Are the demands and expectations in this course much higher so that I need to adjust my commitment and engagement and overall workload if I expect a similar outcome? Is it better to drop a course to manage expectations rather than to (predictably) fail, and perhaps have to leave the degree altogether? Degree and course advisors and student support units find it challenging to provide evidence based advise to students. This paper presents research into educational process mining and student data analytics in a whole university scale approach with the aim of providing insight into the degree pathway questions raised above. The beta-version of our course level degree pathway tool has been used to shed light for university staff and students alike into our university\s 1,300 degrees and associated 6 million course enrolments over the past 20 years.

摘要: 对于学生,尤其是新生,从一个学期到一个学期的学位途径并不那么透明,尽管学生们有一个合理的想法,即预计每个学期修读什么课程。学生经常问的一个问题是:“下学期我能期待什么?”更准确地说,考虑到我在这个特定课程中表现出的投入和投入以及我所取得的成绩,我能否在下学期选择的特定课程中获得类似的成绩?在这门课程中,要求和期望是否更高,如果我希望获得类似的结果,我需要调整自己的投入,投入和整体工作量吗?放弃管理期望而不是(可预测地)失败,也许不得不完全放弃学位,是更好的选择吗?学位和课程顾问以及学生支持部门发现向学生提供基于证据的建议具有挑战性。本文以整个大学规模的方法介绍了对教育过程挖掘和学生数据分析的研究,目的是提供对以上提出的学位途径问题的见解。我们的课程级别学位途径工具的beta版已用于为大学教职员工和学生提供类似的信息,以使其在过去20年中进入我们大学的1300个学位课程以及600万门相关课程。”

下载地址 | 返回目录 | [10.1145/3027385.3029446]

[40] Long term temperature monitoring and thermal processes prediction within mining dumps (2011)

Abstract: The paper deals with the description and results of long-term monitoring of thermal processes within the old coal dumps in Ostrava region. It presents methods and means of temperature measurements by specially installed temperature probes, including further wireless data transfer to dispatching center. The paper also discusses the possibility of creating predictive algorithm to predict changes of thermal processes inside the dumps, particularly increasing or decreasing temperature gradients and direction of propagation. {\textcopyright} 2011 IEEE.

摘要: 本文涉及对俄斯特拉发地区旧煤堆中热过程的长期监测的描述和结果。它介绍了通过特殊安装的温度探测器进行温度测量的方法和手段,包括进一步无线数据传输到调度中心。本文还讨论了创建预测算法来预测堆场内部热过程变化的可能性,特别是增加或减小温度梯度和传播方向。 {\ t​​extcopyright} 2011 IEEE。

下载地址 | 返回目录 | [10.1109/IDAACS.2011.6072821]

[41] Machine learning paradigms (2017)

Abstract: In this chapter, we discuss the machine learning paradigm of Support Vector Machines which incorporates the principles of Empirical Risk Minimization and Structural Risk Minimization. SupportVector Machines constitute a state-of-theart classifier which is used as a benchmark algorithm to evaluate the classification accuracy of Artificial Immune System-based machine learning algorithms. In this chapter, we also present a special class of SupportVector Machines that are especially designed for the problem of One-Class Classification, namely the One-Class Support Vector Machines.

摘要: 在本章中,我们将讨论支持向量机的机器学习范例,该范例结合了经验风险最小化和结构风险最小化的原理。 SupportVector Machines构成了一个最新的分类器,该分类器用作基准算法来评估基于人工免疫系统的机器学习算法的分类准确性。在本章中,我们还将介绍一类特殊的SupportVector机,它是专门针对一类分类问题设计的,即一类支持向量机。

下载地址 | 返回目录 | [10.1007/978-3-319-47194-5_5]

[42] Medical data mining for heart diseases and the future of sequential mining in medical field (2019)

Abstract: Data Mining in general is the act of extracting interesting patterns and discovering non-trivial knowledge from a large amount of data. Medical data mining can be used to understand the events happened in the past, i.e. studying a patients vital signs to understand his complications and discover why he has died, or to predict the future by analyzing the events that had happened. In this chapter we are presenting an overview on studies that use data mining to predict heart failure and heart diseases classes. We will also focus on one of the trendiest data-mining field, namely the Sequential Mining, which is a very promising paradigm. Due to its important results in many fields, this chapter will also cover all its extensions from Sequential Pattern Mining, to Sequential Rule Mining and Sequence Prediction. Pattern Mining is the discovery of important and unexpected patterns or information and was introduced in 1990 with the well-known Apriori. Sequential Patterns Mining aims to extract and analyze frequent subsequences from sequences of events or items with time constraint. The importance of a sequence can be measured based on different factors such as the frequency of their occurrence, their length and their profit. In 1995, Agrawal et al. introduced a new Apriori algorithm supporting time constraints named AprioriAll. The algorithm studied the transactions through time, in order to extract frequent patterns from the sequences of products related to a customer. Time dimension is a very important factor in analyzing medical data, making it necessary to present a positioning of Sequential Mining in the medical domain.

摘要: 数据挖掘通常是从大量数据中提取有趣的模式并发现重要知识的行为。医学数据挖掘可用于了解过去发生的事件,即研究患者的生命体征以了解其并发症并发现其死亡原因,或者通过分析已发生的事件来预测未来。在本章中,我们将概述使用数据挖掘来预测心力衰竭和心脏病类别的研究。我们还将关注最流行的数据挖掘领域之一,即顺序挖掘,这是一个非常有希望的范例。由于它在许多领域的重要成果,本章还将涵盖从顺序模式挖掘到顺序规则挖掘和序列预测的所有扩展。模式挖掘是对重要和意外模式或信息的发现,并于1990年与著名的Apriori一起引入。顺序模式挖掘旨在从具有时间限制的事件或项目序列中提取和分析频繁的子序列。可以根据不同的因素(例如出现的频率,长度和利润)来衡量序列的重要性。 1995年,Agrawal等人。引入了一种新的支持时间约束的Apriori算法,称为AprioriAll。该算法研究了一段时间内的交易,以便从与客户相关的产品序列中提取频繁的模式。时间维度是分析医学数据的非常重要的因素,因此有必要在医学领域提出顺序挖掘的定位。

下载地址 | 返回目录 | [10.1007/978-3-319-94030-4_4]

[43] “Mining Student Information System Records to Predict Students Academic Performance” (2020)

Abstract: “Educational Data Mining (EDM) is an emerging field that is concerned with mining and exploring the useful patterns in educational data. The main objective of this study is to predict the students academic performance based on a new dataset extracted from a student information system. The dataset was extracted from a private university in the United Arab of Emirates (UAE). The dataset includes 34 attributes and 56,000 records related to students information. The empirical results indicated that the Random Forest (RF) algorithm was the most appropriate data mining technique used to predict the students academic performance. It is also revealed that the most important attributes that have a direct effect on the students academic performance are belonged to four main categories, namely students demographics, student previous performance information, course and instructor information, and student general information. The evidence from this study would assist the higher educational institutions by allowing the instructors and students to identify the weaknesses and factors affecting the students performance, and act as an early warning system for predicting the students failures and low academic performance.”

摘要: “教育数据挖掘(EDM)是一个新兴领域,它涉及挖掘和探索教育数据中的有用模式。本研究的主要目的是基于从学生信息中提取的新数据集来预测学生的学习成绩。该数据集是从阿拉伯联合酋长国的一所私立大学中提取的,该数据集包含34个属性和56,000条与学生信息相关的记录,实证结果表明,随机森林(RF)算法是最合适的数据挖掘技术可用来预测学生的学业成绩,同时也揭示了对学生学业成绩有直接影响的最重要属性是四个主要类别,即学生的人口统计资料,学生的过往表现信息,课程以及教师信息和学生的一般信息。本研究的证据将有助于通过允许教师和学生发现影响学生表现的弱点和因素,并作为预警系统来预测学生的失败和学习成绩低下,从而建立制度。”

下载地址 | 返回目录 | [10.1007/978-3-030-14118-9_23]

[44] “Mining educational data to predict students academic performance” (2015)

Abstract: “Data mining is the process of extracting useful information from a huge amount of data. One of the most common applications of data mining is the use of different algorithms and tools to estimate future events based on previous experiences. In this context, many researchers have been using data mining techniques to support and solve challenges in higher education. There are many challenges facing this level of education, one of which is helping students to choose the right course to improve their success rate. An early prediction of students grades may help to solve this problem and improve students performance, selection of courses, success rate and retention. In this paper we use different classification techniques in order to build a performance prediction model, which is based on previous students academic records. The model can be easily integrated into a recommender system that can help students in their course selection, based on their and other graduated students grades. Our model uses two of the most recognised decision tree classification algorithms: ID3 and J48. The advantages of such a system have been presented along with a comparison in performance between the two algorithms.”

摘要: “数据挖掘是从大量数据中提取有用信息的过程。数据挖掘的最常见应用之一是使用不同的算法和工具根据以前的经验来估计未来事件。在这种情况下,许多研究人员一直在使用数据挖掘技术来支持和解决高等教育中的挑战,这一水平的教育面临许多挑战,其中之一是帮助学生选择正确的课程以提高他们的成功率。为了解决这个问题并提高学生的成绩,课程选择,成功率和保留率,本文使用不同的分类技术,以基于以前学生的学业成绩建立成绩预测模型。可以轻松地集成到推荐系统中,该系统可以根据学生和其他研究生的成绩帮助学生选择课程。我们的模型使用两种最知名的决策树分类算法:ID3和J48。提出了这种系统的优点,并比较了两种算法的性能。”

下载地址 | 返回目录 | [10.1007/978-3-319-21024-7_28]

[45] Mining predictive process models out of low-level multidimensional logs (2014)

Abstract: “Process Mining techniques have been gaining attention, especially as concerns the discovery of predictive process models. Traditionally focused on workflows, they usually assume that process tasks are clearly specified, and referred to in the logs. This limits however their application to many real-life BPM environments (e.g. issue tracking systems) where the traced events do not match any predefined task, but yet keep lots of context data. In order to make the usage of predictive process mining to such logs more effective and easier, we devise a new approach, combining the discovery of different execution scenarios with the automatic abstraction of log events. The approach has been integrated in a prototype system, supporting the discovery, evaluation and reuse of predictive process models. Tests on real-life data show that the approach achieves compelling prediction accuracy w.r.t. state-of-the-art methods, and finds interesting activities and process variants descriptions. {\textcopyright} 2014 Springer International Publishing.”

摘要: “过程挖掘技术一直受到关注,特别是在发现预测性过程模型方面。传统上专注于工作流,他们通常假定过程任务已明确指定并在日志中引用。但是,这将它们的应用限制在许多实际领域。生命的BPM环境(例如问题跟踪系统),其中跟踪的事件不匹配任何预定义的任务,但保留大量上下文数据。为了使对此类日志的预测性过程挖掘的使用更加有效和容易,我们设计了一个新方法,将发现不同执行场景与日志事件的自动抽象相结合,该方法已集成到原型系统中,支持发现,评估和重用预测性过程模型;对实际数据的测试表明,该方法使用最先进的方法可以达到令人信服的预测准确性,并找到有趣的活动和流程变体的描述。 xtcopyright} 2014年Springer国际出版。”

下载地址 | 返回目录 | [10.1007/978-3-319-07881-6_36]

[46] Next step recommendation and prediction based on process mining in adaptive case management (2015)

Abstract: Adaptive Case Management (ACM) is a new paradigm that facilitates the coordination of knowledge work through case handling. Current ACM systems, however, lack support of providing sophisticated user guidance for next step recommendations and predictions about the case future. In recent years, process mining research developed approaches to make recommendations and predictions based on event logs readily available in process-aware information systems. This paper builds upon those approaches and integrates them into an existing ACM solution. The research goal is to design and develop a prototype that gives next step recommendations and predictions based on process mining techniques in ACM systems. The models proposed, recommend actions that shorten the case running time, mitigate deadline transgressions, support case goals and have been used in former cases with similar properties. They further give case predictions about the remaining time, possible deadline violations, and whether the current case path supports given case goals. A final evaluation proves that the prototype is indeed capable of making proper recommendations and predictions. In addition, starting points for further improvement are discussed.

摘要: 自适应案例管理(ACM)是一种新的范例,它通过案例处理促进知识工作的协调。但是,当前的ACM系统缺乏为下一步建议和有关案件未来的预测提供复杂的用户指南的支持。近年来,过程挖掘研究开发了基于事件日志的建议和预测方法,这些事件日志可在过程感知信息系统中轻松获得。本文基于这些方法,并将它们集成到现有的ACM解决方案中。研究目标是设计和开发原型,该原型基于ACM系统中的过程挖掘技术给出下一步建议和预测。提出的模型建议采取可缩短案件运行时间,减轻期限违规,支持案件目标的行动,并已用于具有类似属性的先前案件。他们还提供了有关剩余时间,可能违反截止日期以及当前案例路径是否支持给定案例目标的案例预测。最终评估证明该原型确实能够提出适当的建议和预测。此外,还讨论了进一步改进的起点。

下载地址 | 返回目录 | [10.1145/2723839.2723842]

[47] On the advantage of using dedicated data mining techniques to predict colorectal cancer (2015)

Abstract: Electronic Medical Records (EMRs) provide a wealth of data that can be used to generate predictive models for diseases. Quite some studies have been performed that use EMRs to generate such models for specific diseases, but most of them are based on more traditional techniques used in medical domain, such as logistic regression. This paper studies the benefit of using advanced data mining techniques for Colorectal Cancer (CRC). CRC is the second most common cancer in the EU and is known to be a disease with very a-specific predictors, making it difficult to generate good predictive models. In addition, the EMR data itself has its own challenges, including the sparsity, the differences in which physicians code the data, the temporal nature of the data, and the imbalance in the data. Results show that state-of-the-art data mining techniques, including temporal data mining, are able to generate better predictive models than currently available in the literature.

摘要: 电子病历(EMR)提供了大量可用于生成疾病预测模型的数据。已经进行了许多研究,这些研究使用EMR生成针对特定疾病的模型,但是其中大多数是基于医学领域中使用的更传统的技术,例如逻辑回归。本文研究对结直肠癌(CRC)使用高级数据挖掘技术的好处。 CRC是欧盟第二大常见癌症,已知是一种具有非常特殊的预测因子的疾病,因此难以生成良好的预测模型。此外,EMR数据本身​​也有其自身的挑战,包括稀疏性,医生对数据进行编码的差异,数据的时间性质以及数据的不平衡。结果表明,包括时间数据挖掘在内的最先进的数据挖掘技术能够生成比文献中当前可用的更好的预测模型。

下载地址 | 返回目录 | [10.1007/978-3-319-19551-3_16]

[48] On the use of process mining and machine learning to support decision making in systems design (2016)

Abstract: Research on process mining and machine learning techniques has recently received a significant amount of attention by product development and management communities. Indeed, these techniques allow both an automatic process and activity discovery and thus are high added value services that help reusing knowledge to support decision-making. This paper proposes a double layer framework aiming to identify the most significant process patterns to be executed depending on the design context. Simultaneously, it proposes the most significant parameters for each activity of the considered process pattern. The framework is applied on a specific design example and is partially implemented.

摘要: 关于过程挖掘和机器学习技术的研究最近受到了产品开发和管理社区的极大关注。的确,这些技术既可以自动进行流程也可以进行活动发现,因此是高附加值服务,有助于重用知识来支持决策。本文提出了一个双层框架,旨在根据设计上下文确定要执行的最重要的过程模式。同时,它为所考虑的过程模式的每个活动提出了最重要的参数。该框架适用于特定的设计示例,并且已部分实现。

下载地址 | 返回目录 | [10.1007/978-3-319-54660-5_6]

[49] Performance Analysis of Collaborative Data Mining vs Context Aware Data Mining in a Practical Scenario for Predicting Air Humidity (2019)

Abstract: Predictions in data mining are a difficult process but useful in various areas. The purpose of this article is to make a parallel between the classical data mining process and two new approaches in the process of data mining: collaborative data mining and context-aware data mining. Data gathered from seven meteorological stations in Transylvania served as baseline for the research. Processes for predicting the air humidity were designed and analyzed using the same machine learning algorithms and data. The results obtained prove that collaborative and context-aware data mining approaches bring better results than the standalone approach and highlight some of the algorithms that are more suitable for each approach. The combination of the two notions could be another example of a successful approach for future research.

摘要: 数据挖掘中的预测是一个困难的过程,但在各个领域都有用。本文的目的是在经典数据挖掘过程和数据挖掘过程中的两种新方法之间实现并行:协作数据挖掘和上下文感知数据挖掘。从特兰西瓦尼亚的七个气象站收集的数据用作研究的基准。使用相同的机器学习算法和数据设计和分析了预测空气湿度的过程。获得的结果证明,协作和上下文感知的数据挖掘方法比独立方法带来的结果更好,并且突出了一些更适合每种方法的算法。这两个概念的结合可能是成功进行未来研究的另一个例子。

下载地址 | 返回目录 | [10.1007/978-3-030-31362-3_5]

[50] Pervasive decision support to predict football corners and goals by means of data mining (2016)

Abstract: Football is considered nowadays one of the most popular sports. In the betting world, it has acquired an outstanding position, which moves millions of euros during the period of a single football match. The lack of profitability of football betting users has been stressed as a problem. This lack gave origin to this research proposal, which it is going to analyse the possibility of existing a way to support the users to increase their profits on their bets. Data mining models were induced with the purpose of supporting the gamblers to increase their profits in the medium/long term. Being conscience that the models can fail, the results achieved by four of the seven targets in the models are encouraging and suggest that the system can help to increase the profits. All defined targets have two possible classes to predict, for example, if there are more or less than 7.5 corners in a single game. The data mining models of the targets, more or less than 7.5 corners, 8.5 corners, 1.5 goals and 3.5 goals achieved the pre-defined thresholds. The models were implemented in a prototype, which it is a pervasive decision support system. This system was developed with the purpose to be an interface for any user, both for an expert user as to a user who has no knowledge in football games.

摘要: 橄榄球被认为是当今最受欢迎的运动之一。在博彩界,它已经获得了杰出的地位,在一次足球比赛期间就移动了数百万欧元。足球投注用户缺乏盈利能力已成为一个问题。这种缺乏导致了这项研究提议的产生,该提议将分析现有方法来支持用户增加其投注利润的可能性。引入数据挖掘模型是为了支持赌徒在中长期内增加其利润。出于良心,模型可能会失败,因此模型中七个目标中的四个目标所取得的结果令人鼓舞,并表明该系统可以帮助增加利润。所有定义的目标都有两个可能的类别来预测,例如,单个游戏中是否存在大于或小于7.5个角。大于或小于7.5个角,8.5个角,1.5个目标和3.5个目标的目标的数据挖掘模型已达到预定的阈值。这些模型是在原型中实现的,它是一个广泛的决策支持系统。开发该系统的目的是成为任何用户的界面,无论是专家用户还是不熟悉足球比赛的用户都可以使用。”

下载地址 | 返回目录 | [10.1007/978-3-319-31307-8_57]

[51] Predicting learner performance using data-mining techniques and ontology (2017)

Abstract: “The high rates of learners dropout and failures in different courses that are offered by many universities and educational institutions through the use of e-learning or online learning systems have been a serious concern. Analyzing and studying learners learning data in order to predict their future performance can support both tutors and e-learning systems to determine learners progress or status and spot those with low performance. Thus they can offer learners with personalized learning resources and activities designed to each one in order to maximize their learning outcomes and overcome their learning gaps. This paper presents a methodology that uses semantic web technologies as well as data mining techniques to predict learners future performance based on data produced by learners through their interaction with LMS (Learning Management System) and social networks.”

摘要: “许多大学和教育机构通过使用电子学习或在线学习系统提供的学习者辍学率高和在不同课程中的失败率一直是一个严重的问题。分析和研究学习者的学习数据是为了预测他们的未来表现可以支持导师和电子学习系统来确定学习者的进度或状态,并发现表现不佳的学习者,从而可以为学习者提供针对每个学习者的个性化学习资源和活动,以最大化他们的学习成果和克服他们的学习差距。本文提出了一种方法,该方法使用语义网技术和数据挖掘技术,基于学习者通过与LMS(学习管理系统)和社交网络的交互产生的数据来预测学习者的未来表现。”

下载地址 | 返回目录 | [10.1007/978-3-319-48308-5_63]

[52] Predicting nosocomial infection by using data mining technologies (2015)

Abstract: The existence of nosocomial infection prevision systems in healthcare environments can contribute to improve the quality of the healthcare institution and also to reduce the costs with the treatment of the patients that acquire these infections. The analysis of the information available allows to efficiently prevent these infections and to build knowledge that can help to identify their eventual occurrence. This paper presents the results of the application of predictive models to real clinical data. Good models, induced by the Data Mining (DM) classification techniques Support Vector Machines and Na{\"{i}}ve Bayes, were achieved (sensitivities higher than 91.90{\%}). Therefore, with these models that be able to predict these infections may allow the prevention and, consequently, the reduction of nosocomial infection incidence. They should act as a Clinical Decision Support System (CDSS) capable of reducing nosocomial infections and the associated costs, improving the healthcare and, increasing patients\ safety and well-being.

摘要: 医疗环境中医院感染预见系统的存在可以有助于提高医疗机构的质量,并且还可以降低获得这些感染的患者的治疗费用。对可用信息的分析可以有效地预防这些感染并建立有助于识别其最终发生情况的知识。本文介绍了将预测模型应用于实际临床数据的结果。通过数据挖掘(DM)分类技术Support Vector Machines和Na {\“ {i}} ve Bayes诱导得出的良好模型(灵敏度高于91.90 {\%})。能够预测这些感染的情况可能可以预防并因此减少医院感染的发生率,它们应该作为临床决策支持系统(CDSS),能够减少医院感染和相关费用,改善医疗保健并增加患者数量\ “安全与福祉。”

下载地址 | 返回目录 | [10.1007/978-3-319-16528-8_18]

[53] Predicting preterm birth in maternity care by means of data mining (2015)

Abstract: “Worldwide, around 9{\%} of the children are born with less than 37 weeks of labour, causing risk to the premature child, whom it is not prepared to develop a number of basic functions that begin soon after the birth. In order to ensure that those risk pregnancies are being properly monitored by the obstetricians in time to avoid those problems, Data Mining (DM) models were induced in this study to predict preterm births in a real environment using data from 3376 patients (women) admitted in the maternal and perinatal care unit of Centro Hospitalar of Oporto. A sensitive metric to predict preterm deliveries was developed, assisting physicians in the decision-making process regarding the patients observation. It was possible to obtain promising results, achieving sensitivity and specificity values of 96{\%} and 98{\%”

摘要: “全球范围内,大约9 {\%}的孩子出生时的劳动时间少于37周,这给早产儿带来了风险,早产儿不准备在出生后不久就开始发展许多基本功能。为了确保产科医生及时监控那些危险的妊娠以避免这些问题,在本研究中引入了数据挖掘(DM)模型,使用来自3376名入院患者(女性)的数据来预测真实环境中的早产在波尔图中心医院的产妇和围产期保健部门,开发了一种预测早产的灵敏指标,协助医生进行有关患者观察的决策过程,有可能获得可喜的结果,达到敏感性和特异性值的96 {\%}和98 {\%“

下载地址 | 返回目录 | [10.1007/978-3-319-23485-4_12]

[54] Predicting triage waiting time in maternity emergency care by means of data mining (2016)

Abstract: Healthcare organizations often benefit from information technologies as well as embedded decision support systems, which improve the quality of services and help preventing complications and adverse events. In Centro Materno Infantil do Norte (CMIN), the maternal and perinatal care unit of Centro Hospitalar of Oporto (CHP), an intelligent pre-triage system is implemented, aiming to prioritize patients in need of gynaecology and obstetrics care in two classes: urgent and consultation. The system is designed to evade emergency problems such as incorrect triage outcomes and extensive triage waiting times. The current study intends to improve the triage system, and therefore, optimize the patient workflow through the emergency room, by predicting the triage waiting time comprised between the patient triage and their medical admission. For this purpose, data mining (DM) techniques are induced in selected information provided by the information technologies implemented in CMIN. The DM models achieved accuracy values of approximately 94{\%} with a five range target distribution, which not only allow obtaining confident prediction models, but also identify the variables that stand as direct inducers to the triage waiting times.

摘要: 医疗保健组织通常受益于信息技术以及嵌入式决策支持系统,这些系统可改善服务质量并帮助预防并发症和不良事件。在波尔图中心医院的产妇和围产期护理中心(CMIN),实施了智能预分诊系统,旨在将需要妇科和产科护理的患者分为两类:紧急和咨询。该系统旨在规避紧急情况,例如错误的分类结果和大量的分类等待时间。当前的研究旨在改善分诊系统,因此,通过预测患者分诊与入院之间的分诊等待时间,可以优化急诊室的患者工作流程。为此,在CMIN中实现的信息技术提供的选定信息中引入了数据挖掘(DM)技术。 DM模型通过五个范围的目标分布实现了大约94 {\%}的准确度值,这不仅允许获得可靠的预测模型,而且还可以识别直接导致分流等待时间的变量。

下载地址 | 返回目录 | [10.1007/978-3-319-31307-8_60]

[55] Predicting user interaction in enterprise social systems using process mining (2019)

Abstract: In this paper, we explore the potential of probabilistic process mining for Enterprise Collaboration Systems (ECS) to predict the behavior of systems users. Towards this objective, we discuss applicability, limitations and challenges of probabilistic process mining in the context of ECS. We argue that probabilistic process mining can be a valuable method for researchers as well as practitioners (managers of collaboration platforms). We create and examine two process models using a probabilistic finite automata algorithm on event data of an enterprise collaboration system to show the feasibility of probabilistic process mining in ECS. Our research illustrates the most probable sequence of user activities in the selected system and demonstrates a way to predict the communities that a user will likely be active in.

摘要: 在本文中,我们探索了用于企业协作系统(ECS)的概率过程挖掘来预测系统用户行为的潜力。为了实现这一目标,我们在ECS的背景下讨论了概率过程挖掘的适用性,局限性和挑战。我们认为,概率过程挖掘对于研究人员和从业人员(协作平台的管理人员)而言都是一种有价值的方法。我们使用概率有限自动机算法在企业协作系统的事件数据上创建和检查两个过程模型,以显示ECS中概率过程挖掘的可行性。我们的研究说明了所选系统中最可能的用户活动序列,并展示了一种预测用户可能活跃的社区的方法。

下载地址 | 返回目录 | []

[56] Prediction modeling for ingot manufacturing process utilizing data mining roadmap including dynamic polynomial neural network and bootstrap method (2005)

Abstract: The purpose of this study was to develop a process management system to manage ingot fabrication and the quality of the ingot. The ingot is the first manufactured material of wafers. Trace parameters were collected on-line but measurement parameters were measured by sampling inspection. The quality parameters were applied to evaluate the quality. Therefore, preprocessing was necessary to extract useful information from the quality data. First, statistical methods were used for data generation, and then modeling was performed, using the generated data, to improve the performance of the models. The function of the models is to predict the quality corresponding to control parameters. {\textcopyright} Springer-Verlag Berlin Heidelberg 2005.

摘要: 此研究的目的是开发一种过程管理系统,以管理铸锭的制造和铸锭的质量。锭是晶片的第一种制造材料。在线收集痕量参数,但通过抽样检查测量测量参数。应用质量参数来评估质量。因此,必须进行预处理以从质量数据中提取有用的信息。首先,使用统计方法进行数据生成,然后使用生成的数据进行建模,以提高模型的性能。模型的功能是预测与控制参数相对应的质量。 {\ t​​extcopyright} Springer-Verlag Berlin Heidelberg2005。

下载地址 | 返回目录 | [10.1007/11539117_81]

[57] Prediction of customer movements in large tourism industries by the means of process mining (2018)

Abstract: Customer movements in large tourism industries (such as public transport systems, attraction parks or ski resorts) can be understood as business processes. Their processes describe the flow of persons through the networked systems, while Information Systems log the different steps. The prediction of how large numbers of customers will behave in the near future is a complex and yet unsolved challenge. However, the possible business benefits of predictive analytics in the tourism industry are manifold. We propose to approach this task with the yet unexploited application of predictive process mining. In a prototypical use case, we work together with two major European ski resorts. We implement a predictive process mining algorithm towards the goal of predicting near future lift arrivals of skiers within the ski resort in real-time. Furthermore, we present the results of our prototypical implementation and draw conclusions for future research in the area.

摘要: 大型旅游行业(例如公共交通系统,景点公园或滑雪胜地)中的客户流动可以理解为业务流程。他们的过程描述了人员通过网络系统的流程,而信息系统记录了不同的步骤。如何预测不久的将来会有大量客户表现是一个复杂但尚未解决的挑战。但是,预测分析在旅游业中可能带来的商业利益是多种多样的。我们建议通过尚未使用的预测过程挖掘应用程序来解决此任务。在一个典型的使用案例中,我们与两个欧洲主要滑雪胜地合作。我们实施了一种预测性过程挖掘算法,旨在实时预测滑雪胜地内滑雪者的近期电梯到达量。此外,我们介绍了原型实现的结果,并为该领域的未来研究得出了结论。

下载地址 | 返回目录 | []

[58] Proceedings of 2nd International Conference on Communication, Computing and Networking (2019)

Abstract: “The proceedings contain 268 papers. The topics discussed include: learning and making sense of project phenomena in information systems education; a comparative study on structure of the motivation for information security by security incident experiences; a study of correlation between transitions and sound effects in a fairy tale movie; a survey on HCI considerations in the software development life cycle: from practitioners perspective; database design for global patient monitoring applications using WAP; determinism in speech pitch relation to emotion; mobile technology for irrigation problems in rural India; proportional fairness of call blocking probability; relationship of blink, affect, and usability of graph reading tasks; star economy in the user generated content: a new perspective for digital ecosystems; and the deployment of PDA accessible clinical-log for medical education in PBL-approach.”

摘要: “会议记录包含268篇论文。讨论的主题包括:在信息系统教育中学习和理解项目现象;通过安全事件经验对信息安全动机结构进行比较研究;对过渡与声音效果之间的相关性进行研究软件开发生命周期中的HCI注意事项调查:从从业者的角度;使用WAP进行全球患者监测应用程序的数据库设计;语音音调与情感的确定性;印度农村灌溉问题的移动技术;比例呼叫阻塞概率的公平性;图表读取任务的眨眼,影响和可用性之间的关系;用户生成的内容中的明星经济:数字生态系统的新视角;以及在PBL方法中为医学教育部署PDA可访问的临床日志。”

下载地址 | 返回目录 | [10.1007/978-981-13-1217-5]

[59] Proceedings of the International Symposium for Production Research 2018 (2019)

Abstract: “Abstract. With rapid advancements in industry, technology and applications, many concepts have emerged in the manufacturing environment. It is generally known that the term ‘Industry 4.0 was published to highlight a new industrial revolution. Maintenance is a key operation function since it is related to all the manufacturing processes and focuses not only on avoiding the equipment breakdown but also on improving business performance. Monitoring of manu- facturing systems for maintenance helps to identify equipment condition and failures before equipment breakdowns. Total Productive Maintenance (TPM) is widely used maintenance strategy to gain a competitive advantage in the industry. In this context, this paper focuses on the incompletely perceived link between the Industry 4.0s key technologies and TPM. Conjoint analysis, which is a multi-attribute decision making method based on experimental design, is implemented to quantify the impacts of Industry 4.0s key technologies on pillars of TPM. Moreover, interaction between key technologies of Industry 4.0 and pillars of TPM demonstrates several opportunities for achieving synergies thus leading to a successful implementation of future interconnected smart factories. The major contribution of is to provide a guideline and technology roadmap for investment decision for industries that are under the transformation phase towards future smart factory and offers a space for further scientific discussion.”

摘要: “摘要。随着工业,技术和应用的快速发展,制造环境中出现了许多概念。众所周知,“工业4.0”一词的发布是为了突出一场新的工业革命。维护是自从它与所有制造过程有关,不仅着眼于避免设备故障,而且着眼于提高业务绩效;监控制造系统以进行维护有助于在设备故障之前识别设备状况和故障。广泛使用的维护策略在行业中获得竞争优势,在此背景下,本文着重探讨了工业4.0关键技术与TPM之间的不完全联系,联合分析是一种基于多属性的决策方法。实验设计用于量化工业4.0关键技术对药丸的影响TPM的档案。此外,工业4.0的关键技术与TPM支柱之间的交互展示了实现协同增效的多种机会,从而成功实现了未来互连智能工厂的成功实施。的主要贡献在于,为处于向未来智能工厂的转型阶段的行业提供投资决策的指南和技术路线图,并为进一步的科学讨论提供了空间。”

下载地址 | 返回目录 | [10.1007/978-3-319-92267-6]

[60] Process inspection by attributes using predicted data (2015)

Abstract: “SPC procedures for process inspection by attributes are usually designed under the assumption of directly observed quality data. However, in many practical cases this direct observations are very costly or even hardly possible. For example, in the production of pharmaceuticals costly and time consuming chemical analyses are required for the determination of products quality even in the simplest case when we have to decide whether the inspected product conforms to quality requirements. The situation is evenmore difficultwhen quality tests are destructive and long-lasting, as it is in the case of reliability testing. In such situations we try to predict the actual quality of inspected items using the values of predictors whose values are easily measured. In the paper we consider a situation when traditional prediction models based on the assumption of the multivariate normal distribution cannot be applied. Instead, for prediction purposes we propose to use some techniques known from data mining. In this study, which has an introductory character, and is mainly based on the results of computer simulations, we show how the usage of popular data mining techniques, such as binary regression, linear discrimination analysis, and decision trees may influence the results of process inspection.”

摘要: “按属性进行过程检查的SPC程序通常是在直接观察到的质量数据的假设下设计的。但是,在许多实际情况下,这种直接观察非常昂贵,甚至几乎不可能。例如,在药品生产中成本高昂且耗时即使在最简单的情况下,我们必须决定要检查的产品是否符合质量要求,也需要进行化学分析来确定产品的质量;当质量测试具有破坏性和持久性时,情况就更加困难了。在这种情况下,我们尝试使用易于测量的预测变量的值来预测检验项目的实际质量,在本文中,我们考虑了无法应用基于多元正态分布假设的传统预测模型的情况相反,出于预测目的,我们建议使用d中已知的一些技术ata采矿。在本研究中,它具有介绍性,并且主要基于计算机模拟的结果,我们展示了如何使用流行的数据挖掘技术(例如二元回归,线性判别分析和决策树)可能会影响过程结果检查。”

下载地址 | 返回目录 | [10.1007/978-3-319-18781-5_7]

[61] Pr{\"{a}}diktives monitoring von gesch{\"{a}}ftsprozessen zur beherrschung von risiken in weltweiten netzen auf basis von process mining und simulation (2019)

Abstract: Das Monitoring betrieblicher Prozesse ist eine Routineaufgabe, bei der eine F{\"{u}}lle von Daten {\"{u}}ber die Prozessausf{\"{u}}hrung anfallen. Diese Daten k{\"{o}}nnen dazu genutzt werden, Abl{\"{a}}ufe von zuk{\"{u}}nftigen, noch nicht beobachtbaren Prozessdurchl{\"{a}}ufen vorherzusagen. Auf diese Weise k{\"{o}}nnen eine Vielzahl verschiedener Prozessparameter wie Durchlaufzeiten, n{\"{a}}chste Prozessschritte oder Risikofaktoren vorhergesagt werden. Diese Methoden beruhen darauf, dass eine F{\"{u}}lle von Daten vorliegen. Mithilfe von Simulationsans{\"{a}}tzen k{\"{o}}nnen Daten k{\"{u}}nstlich erzeugt werden, wobei zwei Zwecke verfolgt werden: Erstens werden auf diese Weise Trainingsdaten f{\"{u}}r maschinelles Lernen verf{\"{u}}gbar. Zweitens k{\"{o}}nnen auf diese Weise Process Mining und Simulationsans{\"{a}}tze verkn{\"{u}}pft werden, um tiefere Einsichten in reale und zuk{\"{u}}nftige Abl{\"{a}}ufe zu gewinnen. Diese Ideen werden in dem Beitrag weiter entfaltet und anhand eines Fallbeispiels n{\"{a}}her konkretisiert. Der Beitrag schlie{\ss}t mit einem {\"{u}}berblick {\"{u}}ber aktuelle Probleme wie der Umgang mit fehlerhaften Daten oder eine mangelnde Erkl{\"{a}}rbarkeit der Prognoseans{\"{a}}tze.

摘要: Das Monitoring Betrieblicher Prozesse ist eine Routineaufgabe,bei der eine F {\“ {u}} lle von Daten {\” {u}} ber prozessausf {\“ {u}}死而复生。Diese Daten k {\“ {o}}嫩·达祖(Nenn dazu genutzt werden),阿卜勒{\” {a}} ufe von zuk {\“ {u}} nftigen,noch nicht beobachtbaren Prozessdurchl {\” {a}} Auf diese Weise k {\“ {o}} nnen eine Vielzahl verschiedener Prozessparameter wie Durchlaufzeiten,n {\” {a}} chste Prozessschritte oder Risikofaktoren vorhergesagt werden。Diese Methoden beruhen darauf {,das }}勒·冯·达顿(vor von Daten vorliegen)。 Mithilfe von Simulationsans {\“ {a}} tzen k {\” {o}} nnen Daten k {\“ {u}} nstlich erzeugt werden,wobei zwei Zwecke verfolgt werden:Erstens werden auf diese Weise Trainingsdaten f { \“ {u}} r机械师Lernen verf {\” {u}} gbar。Zweitens k {\“ {o}} nnen auf diese Weise Process Mining and Simulationsans {\” {a}} tze verkn { \“ {u}} pft werden,um tiefere Einsichten in reale und zuk {\” {u}} nftige Abl {\“ {a}} ufe zu gewinnen。迪恩·贝迪格·威特(Dim Beitrag weiter entfaltet)和迪恩·伊恩斯·法恩(Finebeispiels n)的Diese Ideen werden,她的书中的人物。贝贝格·施利(Der Beitrag schlie){\ ss} t mit einem { }}问题专家(Umgang mit fehlerhaften Daten oder eine mangelnde Erkl),{\“ {a}}预言家{\” {a}} tze。

下载地址 | 返回目录 | []

[62] Queue mining - Predicting delays in service processes (2014)

Abstract: Information systems have been widely adopted to support service processes in various domains, e.g., in the telecommunication, finance, and health sectors. Recently, work on process mining showed how management of these processes, and engineering of supporting systems, can be guided by models extracted from the event logs that are recorded during process operation. In this work, we establish a queueing perspective in operational process mining. We propose to consider queues as first-class citizens and use queueing theory as a basis for queue mining techniques. To demonstrate the value of queue mining, we revisit the specific operational problem of online delay prediction: using event data, we show that queue mining yields accurate online predictions of case delay. {\textcopyright} 2014 Springer International Publishing.

摘要: 信息系统已被广泛采用,以支持电信,金融和卫生部门等各个领域的服务流程。最近,有关流程挖掘的工作表明,如何通过从在流程操作期间记录的事件日志中提取的模型来指导这些流程的管理以及支持系统的工程设计。在这项工作中,我们建立了操作流程挖掘中的排队视角。我们建议将队列视为一等公民,并使用排队论作为队列挖掘技术的基础。为了演示队列挖掘的价值,我们重新审视了在线延迟预测的特定操作问题:使用事件数据,我们证明了队列挖掘可产生准确的案例延迟在线预测。 {\ t​​extcopyright} 2014 Springer International Publishing。

下载地址 | 返回目录 | [10.1007/978-3-319-07881-6_4]

[63] Real-time decision support using data mining to predict blood pressure critical events in intensive medicine patients (2015)

Abstract: Patient blood pressure is an important vital signal to the physicians take a decision and to better understand the patient condition. In Intensive Care Units is possible monitoring the blood pressure due the fact of the patient being in continuous monitoring through bedside monitors and the use of sensors. The intensivist only have access to vital signs values when they look to the monitor or consult the values hourly collected. Most important is the sequence of the values collected, i.e., a set of highest or lowest values can signify a critical event and bring future complications to a patient as is Hypotension or Hypertension. This complications can leverage a set of dangerous diseases and side-effects. The main goal of this work is to predict the probability of a patient has a blood pressure critical event in the next hours by combining a set of patient data collected in real-time and using Data Mining classification techniques. As output the models indicate the probability ({\%}) of a patient has a Blood Pressure Critical Event in the next hour. The achieved results showed to be very promising, presenting sensitivity around of 95{\%}.

摘要: 患者血压是医生做出决定并更好地了解患者状况的重要生命信号。在重症监护病房中,由于患者可以通过床旁监护仪和使用传感器进行连续监护,因此可以监控血压。强化医生仅在看监护仪或每小时查询一次所收集的生命值时,才可以获取生命体征值。最重要的是所收集的值的顺序,即一组最高或最低值可以表示严重事件,并给患者带来未来并发症,例如低血压或高血压。这种并发症可以利用一系列危险疾病和副作用。这项工作的主要目标是通过组合实时收集的一组患者数据并使用数据挖掘分类技术,预测下几个小时患者发生血压严重事件的可能性。作为输出,模型表明患者在下一小时发生血压严重事件的概率({\%})。所获得的结果显示出非常有希望的结果,其灵敏度约为95 {\%}。

下载地址 | 返回目录 | [10.1007/978-3-319-26508-7_8]

[64] Redesigning Learning for Greater Social Impact (2018)

Abstract: This study purports to determine the mediating role of SE participation on four variables, namely goal orientation, study habits, connectedness, and aca- demic performance involving 267 elementary and high school student respondents, and simultaneous relationships between and among these variables were examined through structural equation modeling. Results revealed that the higher the student desire to improve his study habits and the less he is connected to his family and friends the more likely he will participate. However, the more he is connected to his family and friends and the higher his goal orientation the more likely his academic performance will improve Academic achievement in this study is not supported by participation in shadow education or private tutorial, but a more connected envi- ronment and a more positive goal orientation will lead to improvement in academic performance

摘要: 本研究旨在确定SE参与对四个变量的中介作用,即目标取向,学习习惯,连接性和学术表现,涉及267名中小学生,并且考察了这些变量之间的同时关系。通过结构方程建模。结果显示,学生对改善学习习惯的渴望越高,与家人和朋友的联系越少,他参与的可能性就越大。但是,他与家人和朋友之间的联系越紧密,目标取向越高,他的学业成绩就越有可能提高。这项研究的学术成就不受参与影子教育或私人辅导的支持,而是环境之间的联系更加紧密而更积极的目标定位将导致学习成绩的提高”

下载地址 | 返回目录 | [10.1007/978-981-10-4223-2]

[65] Review of Machine Learning and Data Mining Methods to Predict Different Cyberattacks (2021)

Abstract: Cybersecurity deals with various types of cybercrimes, but it is essential to identify the similarities in existing cybercrimes using data mining and machine learning technologies. This review paper reveals various data mining algorithms and machine learning algorithms, which can be help to create some specific schema of different cyberattacks. Machine learning algorithms can be helpful to train system to identify anomaly, specific patterns to predict the cyberattacks. Data mining plays a critical role to provide a predictive solution to rectify possible cybercrime and modus operandi and explore defense system against them. This is the era of big data, so it is very difficult to analyze and investigate the irregular activity on cyberspace. Data mining methods, while allowing the system to analyze hidden knowledge and to train expert system to alert and decision-making process. This review paper explores various data mining method like classification, association, and Clustering, while machine learning includes different methods like supervised, semi-supervised, and unsupervised learning methods.

摘要: Cyber​​security处理各种类型的网络犯罪,但是使用数据挖掘和机器学习技术来识别现有网络犯罪中的相似性至关重要。这篇综述文章揭示了各种数据挖掘算法和机器学习算法,它们可以帮助创建不同网络攻击的一些特定模式。机器学习算法有助于训练系统识别异常的特定模式以预测网络攻击。数据挖掘在提供预测性解决方案以纠正可能的网络犯罪和作案手法并探索针对它们的防御系统方面发挥着关键作用。这是大数据时代,因此很难分析和调查网络空间中的不规则活动。数据挖掘方法,同时允许系统分析隐藏的知识并训练专家系统以预警和决策过程。本文回顾了各种数据挖掘方法,例如分类,关联和聚类,而机器学习包括监督,半监督和无监督学习方法等不同方法。

下载地址 | 返回目录 | [10.1007/978-981-15-4474-3_5]

[66] “Rough Set Data Mining Algorithms and Pursuit Eye Movement Measurements Help to Predict Symptom Development in Parkinsons Disease” (2018)

Abstract: “This article presents research on pursuit eye movements tests conducted on patients with Parkinsons disease in various stages of disease and phases of treatment. The aim of described experiment was to develop algorithms allowing for measurements of parameters of pursuit eye movement in order to reference calculated results to the previously collected neurological data of patients. An additional objective of the experiment was to develop an example of data-mining procedure, allowing for classification of neurological symptoms based on oculometric measurements. Definition of such correlation enables assignment of particular patient to a given neurological group on the base of parameters values of pursuit eye movements. By using created decision table, we have achieved good results of prediction of the neurological parameter UPDRS, with total accuracy of 93.3{\%} and total coverage of 60{\%}. This allows for better evaluation of stage of the disease and its progression. It also might provide additional tool in determining efficacy of different disease treatments.”

摘要: “本文介绍了针对帕金森氏病患者在疾病的各个阶段和治疗阶段进行的追踪眼球运动测试的研究。所描述实验的目的是开发一种算法,该算法可以测量追踪眼球运动的参数,以参考计算得出的得出先前收集的患者神经系统数据的结果。该实验的另一个目标是开发一个数据挖掘程序的示例,允许根据眼压测量结果对神经系统症状进行分类。这种相关性的定义使得可以将特定患者分配给给定的基于追随眼动参数值的神经病学组,利用建立的决策表,对神经病学参数UPDRS的预测取得了良好的效果,总准确度为93.3 {\%},总覆盖率为60 {\ %}。这样可以更好地评估疾病的阶段及其进展,也可能提供其他工具来确定不同疾病治疗的功效。”

下载地址 | 返回目录 | [10.1007/978-3-319-75420-8_41]

[67] Rule mining techniques to predict prokaryotic metabolic pathways (2017)

Abstract: It is becoming more evident that computational methods are needed for the identification and the mapping of pathways in new genomes. We introduce an automatic annotation system (ARBA4Path Association Rule-Based Annotator for Pathways) that utilizes rule mining techniques to predict metabolic pathways across wide range of prokaryotes. It was demonstrated that specific combinations of protein domains (recorded in our rules) strongly determine pathways in which proteins are involved and thus provide information that let us very accurately assign pathway membership (with precision of 0.999 and recall of 0.966) to proteins of a given prokaryotic taxon. Our system can be used to enhance the quality of automatically generated annotations as well as annotating proteins with unknown function. The prediction models are represented in the form of human-readable rules, and they can be used effectively to add absent pathway information to many proteins in UniProtKB/TrEMBL database.

摘要: 越来越明显的是,需要使用计算方法来识别和定位新基因组中的途径。我们介绍了一种自动注释系统(ARBA4Path关联基于规则的路径注释器),该系统利用规则挖掘技术来预测各种原核生物的代谢途径。事实证明,蛋白质域的特定组合(记录在我们的规则中)强烈决定了蛋白质所涉及的途径,因此提供的信息使我们可以非常准确地将途径成员(精确度为0.999,召回率为0.966)分配给给定蛋白质原核生物分类群。我们的系统可用于增强自动生成的注释以及对功能未知的蛋白质进行注释的质量。预测模型以人类可读规则的形式表示,可以有效地用于在UniProtKB / TrEMBL数据库中为许多蛋白质添加缺少的途径信息。

下载地址 | 返回目录 | [10.1007/978-1-4939-7027-8_12]

[68] Smart Sustainable Cities of the Future (2018)

Abstract: The Internet of Things (IoT) is one of the key components of the ICT infrastructure of smart sustainable cities as an emerging urban development approach due to its great potential to advance environmental sustainability. As one of the prevalent ICT visions or computing paradigms, the IoT is associated with big data analytics, which is clearly on a penetrative path across many urban domains for optimizing energy efficiency and mitigating environmental effects. This pertains mainly to the effective utilization of natural resources, the intelligent management of infrastructures and facilities, and the enhanced delivery of services in support of the environment. As such, the IoT and related big data applications can play a key role in catalyzing and improving the process of environmentally sustainable development. However, topical studies tend to deal largely with the IoT and related big data applications in connection with economic growth and the quality of life in the realm of smart cities, and largely ignore their role in improving environmental sustainability in the context of smart sustainable cities of the future. In addition, several advanced technologies are being used in smart cities without making any contribution to environmental sustainability, and the strategies through which sustainable cities can be achieved fall short in considering advanced technologies. Therefore, the aim of this paper is to review and synthesize the relevant literature with the objective of identifying and discussing the state-of-the-art sensor-based big data applications enabled by the IoT for environmental sustainability and related data processing platforms and computing models in the context of smart sustainable cities of the future. Also, this paper identifies the key challenges pertaining to the IoT and big data analytics, as well as discusses some of the associated open issues. Furthermore, it explores the opportunity of augmenting the informational landscape of smart sustainable cities with big data applications to achieve the required level of environmental sustainability. In doing so, it proposes a framework which brings together a large number of previous studies on smart cities and sustainable cities, including research directed at a more conceptual, analytical, and overarching level, as well as research on specific technologies and their novel applications. The goal of this study suits a mix of two research approaches: topical literature review and thematic analysis. In terms of originality, no study has been conducted on the IoT and related big data applications in the context of smart sustainable cities, and this paper provides a basis for urban researchers to draw on this analytical framework in future research. The proposed framework, which can be replicated, tested, and evaluated in empirical research, will add additional depth to studies in the field of smart sustainable cities. This paper serves to inform urban planners, scholars, ICT experts, and other city stakeholders about the environmental benefits that can be gained from implementing smart sustainable city initiatives and projects on the basis of the IoT and related big data applications.

摘要: 物联网(IoT)是智能可持续城市ICT基础设施的关键组成部分之一,因为它具有促进环境可持续性的巨大潜力,因此它是新兴的城市发展方法。 IoT作为ICT远景或计算范例之一,与大数据分析相关联,大数据分析显然在许多城市领域都处于渗透性道路上,以优化能源效率并减轻环境影响。这主要涉及自然资源的有效利用,基础设施和设施的智能管理以及为环境提供支持的服务的增强。因此,IoT和相关的大数据应用程序可以在催化和改善环境可持续发展的过程中发挥关键作用。但是,主题研究往往主要涉及与经济增长和智慧城市领域的生活质量相关的物联网和相关大数据应用,而在很大程度上忽略了它们在改善可持续智慧城市环境中对环境可持续性的作用。未来。此外,在智慧城市中使用了几种先进技术,而对环境的可持续性没有任何贡献,而在考虑先进技术时,实现可持续城市的战略还不够。因此,本文的目的是回顾和综合相关文献,以识别和讨论物联网为环境可持续性及相关数据处理平台和计算提供的基于传感器的最新技术未来智能可持续发展城市的环境模型。此外,本文还指出了与物联网和大数据分析有关的关键挑战,并讨论了一些相关的公开问题。此外,它还探索了利用大数据应用程序扩展智能可持续发展城市信息环境的机会,以实现所需水平的环境可持续性。为此,它提出了一个框架,该框架汇集了先前有关智慧城市和可持续城市的大量研究,包括针对概念,分析和总体水平更高的研究,以及针对特定技术及其新颖应用的研究。本研究的目标适合两种研究方法的结合:专题文献综述和主题分析。在创意方面,尚未进行关于智能可持续城市中的物联网和相关大数据应用的研究,并且本文为城市研究人员在未来研究中借鉴此分析框架提供了基础。可以在经验研究中进行复制,测试和评估的拟议框架将为智能可持续城市领域的研究增加更多深度。本文旨在向城市规划人员,学者,ICT专家和其他城市利益相关者介绍基于物联网和相关大数据应用的智能可持续城市举措和项目所带来的环境效益。

下载地址 | 返回目录 | [10.1007/978-3-319-73981-6]

[69] “Supervised Educational Data Mining to Discover Students Learning Process to Improve Students Performance” (2018)

Abstract: “Online learning has social impact on students. Some students might experience impoverished learning due to high social isolation, less face-to-face interaction, difficulty in performing teamwork activities, low understanding, and lack of concentration on learning activities. However, some students can thrive on online learning. Knowledge discovery from students event logs generated from their online learning can help to understand their level of understanding and concentration on the learning subject. Subsequently, this knowledge can be used to predict students academic performance on their final grade. This paper presents supervised educational data mining, namely J48 decision tree approach to classify students based on their learning process and activities extracted from event logs in online learning system to predict their final grade. The generated model can be used to offer helpful recommendations to improve students learning process and academic performance, especially to students that experience impoverished learning, and it also allows instructors to provide appropriate feedbacks and advice to the students in a timely manner.”

摘要: “在线学习对学生具有社会影响。由于高度的社会孤立性,较少的面对面互动,难以开展团队合作活动,了解不足以及对学习活动的专注度,一些学生可能会经历学习困难。学生可以在网上学习中蓬勃发展,通过从他们的在线学习中获得的事件日志中发现知识,可以帮助理解他们对学习主题的理解和专注程度,随后,这些知识可以用来预测学生最终的学习成绩本文提出了有监督的教育数据挖掘方法,即J48决策树方法,根据学生的学习过程和从在线学习系统中事件日志中提取的活动来预测学生的最终成绩,对学生进行分类,从而为学生提供最终的成绩建议。改善学生的学习过程和学习成绩,特别是经历贫困学习的学生,这也使教员能够及时向学生提供适当的反馈和建议。”

下载地址 | 返回目录 | [10.1007/978-981-10-4223-2_23]

[70] Text mining models to predict brain deaths using X-Rays clinical notes (2017)

Abstract: The prediction of events is a task associated to the Data Science area. In the health, this method is extremely useful to predict critical events that may occur in people, or in a specific area. The Text Mining is a technique that consists in retrieving information from text files. In the Medical Field, the Data Mining and Text Mining solutions can help to prevent the occurrence of certain events to a patient. This project involves the use of Text Mining to predict the Brain Death by using the X-Ray clinical notes. This project is creating reliable predictive models with non-structured text. This project was developed using real data provided by Centro Hospitalar do Porto. The results achieved are very good reaching a sensitivity of 98{\%} and a specificity of 88{\%}.

摘要: 事件的预测是与数据科学领域相关的任务。在健康方面,此方法对于预测可能在人或特定区域中发生的关键事件非常有用。文本挖掘是一种从文本文件中检索信息的技术。在医疗领域,数据挖掘和文本挖掘解决方案可以帮助防止患者发生某些事件。该项目涉及使用文本挖掘通过使用X射线临床注释来预测脑死亡。该项目正在使用非结构化文本创建可靠的预测模型。该项目是使用波尔图中心医院提供的真实数据开发的。达到的结果非常好,灵敏度达到98 {\%},特异性达到88 {\%}。

下载地址 | 返回目录 | [10.1007/978-3-319-58130-9_15]

[71] Traffic prediction in wireless mesh networks using process mining algorithms (2018)

Abstract: Prediction of the traffic flow in particular systems will expedite discovering of an optimal path for packet transmitting in dynamic wireless networks. The main goal is to predict traffic overload while changing a network topology. Machine learning techniques and process mining enables prediction of the traffic produced by several moving nodes. Several related approaches are observed. The idea of process mining approach is proposed.

摘要: 在特定系统中对业务流的预测将加快发现动态无线网络中数据包传输的最佳路径的速度。主要目标是在更改网络拓扑时预测流量过载。机器学习技术和过程挖掘可以预测由多个移动节点产生的流量。观察到几种相关方法。提出了过程挖掘方法的思想。

下载地址 | 返回目录 | [10.23919/FRUCT.2012.8253111]

[72] A Hybrid Data Mining Model to Predict Coronary Artery Disease Cases Using Non-Invasive Clinical Data (2016)

Abstract: Coronary artery disease (CAD) is caused by atherosclerosis in coronary arteries and results in cardiac arrest and heart attack. For diagnosis of CAD, angiography is used which is a costly time consuming and highly technical invasive method. Researchers are, therefore, prompted for alternative methods such as machine learning algorithms that could use noninvasive clinical data for the disease diagnosis and assessing its severity. In this study, we present a novel hybrid method for CAD diagnosis, including risk factor identification using correlation based feature subset (CFS) selection with particle swam optimization (PSO) search method and K-means clustering algorithms. Supervised learning algorithms such as multi-layer perceptron (MLP), multinomial logistic regression (MLR), fuzzy unordered rule induction algorithm (FURIA) and C4.5 are then used to model CAD cases. We tested this approach on clinical data consisting of 26 features and 335 instances collected at the Department of Cardiology, Indira Gandhi Medical College, Shimla, India. MLR achieves highest prediction accuracy of 88.4\xa0{\%}.We tested this approach on benchmarked Cleaveland heart disease data as well. In this case also, MLR, outperforms other techniques. Proposed hybridized model improves the accuracy of classification algorithms from 8.3\xa0{\%} to 11.4\xa0{\%} for the Cleaveland data. The proposed method is, therefore, a promising tool for identification of CAD patients with improved prediction accuracy.

摘要: 冠状动脉疾病(CAD)由冠状动脉的动脉粥样硬化引起,并导致心脏骤停和心脏病发作。为了诊断CAD,使用了血管造影术,这是一种昂贵的耗时且技术含量高的侵入性方法。因此,研究人员被提示寻找其他方法,例如机器学习算法,该方法可以使用非侵入性临床数据进行疾病诊断和评估其严重性。在这项研究中,我们提出了一种用于CAD诊断的新型混合方法,包括使用基于相关特征的子集(CFS)选择和粒子游动优化(PSO)搜索方法以及K-means聚类算法的风险因素识别。然后使用监督学习算法(例如多层感知器(MLP),多项式逻辑回归(MLR),模糊无序规则归纳算法(FURIA)和C4.5)对CAD案例进行建模。我们在印度西姆拉的英迪拉·甘地医学院心内科收集的由26个特征和335个实例组成的临床数据上测试了该方法。 MLR达到了88.4 \ xa0 {\%}的最高预测准确性。我们还在基准Cleaveland心脏病数据上测试了该方法。在这种情况下,MLR也优于其他技术。提出的混合模型将Cleaveland数据的分类算法的准确性从8.3 \ xa0 {\%}提高到了11.4 \ xa0 {\%}。因此,所提出的方法是用于识别具有改善的预测准确性的CAD患者的有前途的工具。

下载地址 | 返回目录 | [10.1007/s10916-016-0536-z]

[73] A New Method of Predicting the Height of the Fractured Water-Conducting Zone Due to High-Intensity Longwall Coal Mining in China (2019)

Abstract: “Violent movement of the roof rock and severe damage to the overlying strata occur in the large mined-out space left by rapidly advancing, high-intensity longwall coal mine extraction, as the goaf forms. Knowing the height of the fractured water-conducting zone (FWCZ) above the goaf is vital in the safety analysis of coal mining, particularly under a water body. The processes of overburden failure transfer (OFT) were analyzed for such high-intensity mining, divided into two stages: transmission development, and transmission termination. Rock failure criteria were used in theoretical calculations of the maximum lengths of ‘suspended (i.e., unsupported) rock strata, and of the maximum ‘overhang (i.e., cantilever) length of each stratum. Based on this, mechanical models of the unsupported strata and the overhanging strata were established. A new theoretical method of predicting the height of the FWCZ in this form of coal mining is put forward, based on OFT processes. A high-intensity mining panel (the 8100 longwall face at the Tongxin Coal Mine, Datong Coal Mining Group) was taken as an example. The proposed theoretical method, a numerical simulation method and an engineering analogy method were used to predict the height of the FWCZ. Comparison with in situ measurements at the Tongxin mine showed that the theoretical and numerical simulation results were in close agreement with measured data, verifying the rationality of the proposed approach.”

摘要: “随着采空区的形成,在快速推进的高强度长壁煤矿开采留下的大量采空区中,顶板岩石剧烈运动,并对上覆地层造成严重破坏。知道裂缝导水的高度采空区上方的采空区(FWCZ)对于煤矿开采的安全分析至关重要,特别是在水体下方,对这种高强度开采的覆岩破坏传递(OFT)过程进行了分析,分为两个阶段:传输发展和在计算“悬垂”(即无支撑)岩层的最大长度以及各层的最大“悬垂”(即悬臂)长度的理论计算中,使用了岩石破坏准则。建立了无支撑地层和悬挑地层的特征,提出了一种基于OFT过程预测这种形式的煤矿井下FWCZ高度的新理论方法。以顶板(大同煤矿同心煤矿的8100长壁工作面)为例。提出的理论方法,数值模拟方法和工程类比方法被用于预测FWCZ的高度。与同心矿场的实地测量结果进行比较,结果表明,理论和数值模拟结果与实测数据吻合良好,证明了该方法的合理性。”

下载地址 | 返回目录 | [10.1007/s00603-018-1567-1]

[74] A data- and ontology-driven text mining-based construction of reliability model to analyze and predict component failures (2016)

Abstract: A real-life reliability system is proposed by fusing the field warranty failure data with the failure modes extracted from unstructured repair verbatim data by using the ontology-based natural language processing technique to facilitate accurate estimation of component reliability. Traditionally, the reliability estimation process uses the warranty data, but it provides limited support to handle the “failure confounding” problem, whereby different failure modes associated with a component failure are confounded into a single failure mode. The resulting reliability estimation lacks the required level of precision. Because our model takes into account textual failure modes associated with component failures, it enhances the overall reliability estimation. The performance of our system is evaluated with the baseline system for predicting absolute errors by using the real-life data from the automotive domain, e.g., headlamp failure, collected at different miles exposures. In the best case, the absolute errors predicted by our model showed an improvement of 97\xa0{\%} with respect to the baseline model (without considering the failure modes), while in worst case, it was 71\xa0{\%}.

摘要: 通过使用基于本体的自然语言处理技术,将现场保修故障数据与从非结构化维修逐字记录数据中提取的故障模式融合在一起,提出了一种现实生活中的可靠性系统,从而有助于准确地估计组件的可靠性。传统上,可靠性估计过程使用保修数据,但它只能提供有限的支持来处理“故障混杂”问题,从而将与组件故障相关的不同故障模式混淆为单个故障模式。结果得出的可靠性估算缺乏所需的精度水平。因为我们的模型考虑了与组件故障相关的文本故障模式,所以可以提高整体可靠性估计。我们使用基线系统评估了我们系统的性能,以使用来自汽车领域的真实数据(例如前照灯故障)在不同英里曝光下收集的真实数据来预测绝对误差。在最佳情况下,我们的模型预测的绝对误差相对于基准模型(不考虑故障模式)显示出了97 \ xa0 {\%}的改进,而在最坏情况下,则为71 \ xa0 {\ \%}。

下载地址 | 返回目录 | [10.1007/s10115-014-0806-3]

[75] A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs (2016)

Abstract: Process mining can be viewed as the missing link between model-based process analysis and data-oriented analysis techniques. Lions share of process mining research has been focusing on process discovery (creating process models from raw data) and replay techniques to check conformance and analyze bottlenecks. These techniques have helped organizations to address compliance and performance problems. However, for a more refined analysis, it is essential to correlate different process characteristics. For example, do deviations from the normative process cause additional delays and costs? Are rejected cases handled differently in the initial phases of the process? What is the influence of a doctors experience on treatment process? These and other questions may involve process characteristics related to different perspectives (control-flow, data-flow, time, organization, cost, compliance, etc.). Specific questions (e.g., predicting the remaining processing time) have been investigated before, but a generic approach was missing thus far. The proposed framework unifies a number of approaches for correlation analysis proposed in literature, proposing a general solution that can perform those analyses and many more. The approach has been implemented in ProM and combines process and data mining techniques. In this paper, we also demonstrate the applicability using a case study conducted with the UWV (Employee Insurance Agency), one of the largest “administrative factories” in The Netherlands.

摘要: 过程挖掘可以被视为基于模型的过程分析与面向数据的分析技术之间的缺失链接。 Lions在过程挖掘研究中的份额一直集中在过程发现(从原始数据创建过程模型)和重放技术来检查一致性和分析瓶颈。这些技术已帮助组织解决合规性和性能问题。但是,为了进行更精细的分析,必须关联不同的过程特征。例如,偏离规范过程是否会导致额外的延迟和成本?在流程的初始阶段,被拒绝的案件处理方式是否有所不同?医生的经验对治疗过程有什么影响?这些和其他问题可能涉及与不同角度(控制流,数据流,时间,组织,成本,合规性等)有关的过程特征。之前已经研究了特定的问题(例如,预测剩余的处理时间),但是到目前为止,缺少通用方法。提出的框架统一了文献中提出的多种相关分析方法,并提出了可以执行这些分析的通用解决方案。该方法已在ProM中实现,并结合了流程和数据挖掘技术。在本文中,我们还将通过与荷兰最大的“行政工厂”之一的UWV(雇员保险局)进行的案例研究来证明其适用性。

下载地址 | 返回目录 | [10.1016/j.is.2015.07.003]

[76] A new data mining-based framework to test case prioritization using software defect prediction (2017)

Abstract: Test cases do not have the same importance when used to detect faults in software; therefore, it is more efficient to test the system with the test cases that have the ability to detect the faults. This research proposes a new framework that combines data mining techniques to prioritize the test cases. It enhances fault prediction and detection using two different techniques: 1) the data mining regression classifier that depends on software metrics to predict defective modules, and 2) the k-means clustering technique that is used to select and prioritize test cases to identify the fault early. Our approach of test case prioritization yields good results in comparison with other studies. The authors used the Average Percentage of Faults Detection (APFD) metric to evaluate the proposed framework, which results in 19.9{\%} for all system modules and 25.7{\%} for defective ones. Our results give us an indication that it is effective to start the testing process with the most defective modules instead of testing all modules arbitrary arbitrarily.

摘要: 测试用例在检测软件故障时的重要性不相同;因此,使用具有检测故障能力的测试用例测试系统效率更高。这项研究提出了一个新框架,该框架结合了数据挖掘技术来确定测试案例的优先级。它使用两种不同的技术来增强故障预测和检测能力:1)数据挖掘回归分类器,该分类器依赖于软件指标来预测有缺陷的模块; 2)k-means聚类技术,用于选择测试案例并确定其优先级以识别故障。早。与其他研究相比,我们的测试用例优先级排序方法产生了良好的结果。作者使用平均故障检测百分比(APFD)度量来评估所提出的框架,该框架的所有系统模块结果为19.9 {\%},缺陷系统模块结果为25.7 {\%}。我们的结果表明,对于大多数缺陷模块来说,开始测试过程是有效的,而不是任意地测试所有模块。”

下载地址 | 返回目录 | [10.4018/IJOSSP.2017010102]

[77] A panel of Transcription factors identified by data mining can predict the prognosis of head and neck squamous cell carcinoma (2019)

Abstract: Background: Transcription factors (TFs) are responsible for the regulation of various activities related to cancer like cell proliferation, invasion, and migration. It is thought that, the measurement of TFs levels could assist in developing strategies for diagnosis and prognosis of cancer detection. However, due to lack of effective genome-wide tests, this cannot be carried out in clinical settings. Methods: A complete assessment of RNA-seq data in samples of a head and neck squamous cell carcinoma (HNSCC) cohort in The Cancer Genome Atlas (TCGA) database was carried out. From the expression data of six TFs, a risk score model was developed and further validated in the GSE41613 and GSE65858 series. Potential functional roles were identified for the six TFs via gene set enrichment analysis. Results: Based on our multi-TF signature, patients are stratified into high- and low-risk groups with significant variations in overall survival (OS) (median survival 2.416 vs. 5.934 years, log-rank test P {\textless} 0.001). The sensitivity and specificity evaluation of our multi-TF for 3-year OS in TCGA, GSE41613 and GSE65858 was 0.707, 0.679 and 0.605, respectively, demonstrating good reproducibility and robustness for predicting overall survival of HNSCC patients. Through multivariate Cox regression analyses (MCRA) and stratified analyses, we confirmed that the predictive capability of this risk score (RS) was not dependent on any of other factors like clinicopathological parameters. Conclusions: With the help of a RS obtained from a panel of TFs expression signatures, effective OS prediction and stratification of HNSCC patients can be carried out.

摘要: 背景:转录因子(TFs)负责调节与癌症有关的各种活动,例如细胞增殖,侵袭和迁移。据认为,TFs水平的测量可以帮助开发用于癌症检测的诊断和预后的策略。但是,由于缺乏有效的全基因组测试,因此无法在临床环境中进行。方法:在癌症基因组图谱(TCGA)数据库中,对头颈部鳞状细胞癌(HNSCC)队列样本中的RNA-seq数据进行了全面评估。根据六个TF的表达数据,开发了风险评分模型,并在GSE41613和GSE65858系列中进行了进一步验证。通过基因组富集分析确定了六个TF的潜在功能角色。结果:基于我们的多重TF签名,将患者分为高危和低危组,其总体生存率(OS)有显着差异(中位生存期2.416 vs. 5.934年,对数检验P {\ t​​extless} 0.001 )。我们的TF对TCGA,GSE41613和GSE65858中3年OS的敏感性和特异性评估分别为0.707、0.679和0.605,证明了在预测HNSCC患者总体生存率方面的良好再现性和鲁棒性。通过多变量Cox回归分析(MCRA)和分层分析,我们确认该风险评分(RS)的预测能力不依赖于其他任何因素,例如临床病理参数。结论:借助于从一组TFs表达特征中获得的RS,可以对HNSCC患者进行有效的OS预测和分层。

下载地址 | 返回目录 | [10.1186/s12935-019-1024-6]

[78] Activity prediction in process mining using the WoMan framework (2019)

Abstract: Process Management techniques are useful in domains where the availability of a (formal) process model can be leveraged to monitor, supervise, and control a production process. While their classical application is in the business and industrial fields, other domains may profitably exploit Process Management techniques. Some of these domains (e.g., people\s behavior, General Game Playing) are much more flexible and variable than classical ones, and, thus, raise the problem of predicting which activities will be carried out next, a problem that is not so compelling in classical fields. When the process model is learned automatically from examples of process executions, which is the task of Process Mining, the prediction performance may also provide indirect indications on the correctness and reliability of the learned model. This paper proposes and compares two strategies for activity prediction using the WoMan framework for workflow management. The former proved to be able to handle complex processes, the latter is based on the classic and consolidated Na{\"{i}}ve Bayes approach. An experimental validation allows us to draw considerations on the pros and cons of each, used both in isolation and in combination.

摘要: 过程管理技术在可以利用(正式)过程模型的可用性来监视,监督和控制生产过程的领域中很有用。尽管它们的经典应用是在商业和工业领域,但其他领域也可以从过程管理技术中获利。这些领域中的某些领域(例如,人们的行为,一般的游戏玩法)比经典领域更具灵活性和可变性,因此引发了预测接下来将进行哪些活动的问题,但事实并非如此。在古典领域引人注目。当从流程执行的示例中自动学习流程模型时(流程挖掘的任务),预测性能还可以间接指示所学习模型的正确性和可靠性。本文提出并比较了使用WoMan框架进行工作流管理的活动预测策略。前者被证明能够处理复杂的过程,后者基于经典的和合并的Na {\“ {i}} ve贝叶斯方法,通过实验验证,我们可以考虑每种方法的优缺点。既孤立又结合在一起。”

下载地址 | 返回目录 | [10.1007/s10844-019-00543-2]

[79] Ambient Intelligence for Health (2015)

Abstract: The Northern Ireland Connected Health Innovation Centre has been established to promote business led collaborative Connected Health research between academia, industry and government. Established in June 2013, the Centre currently has 26 members from backgrounds including hardware development, software development, data analytics, and domiciliary care provision. The projects have also utilized partnerships with public health and charity-based organizations. To date, the Centre has completed four calls for proposals, resulting in tangible outputs in the form of novel hardware and software, field trials, data, and shared insight. This paper describes the Centre in detail, provides two case studies demonstrating completed projects, and discusses the challenges, lessons learned and impact thus far.

摘要: 北爱尔兰互联健康创新中心的建立是为了促进学术界,企业和政府之间以业务为主导的协作式互联健康研究。该中心成立于2013年6月,目前有26名成员,其背景包括硬件开发,软件开发,数据分析和家庭住所提供服务。这些项目还利用了与公共卫生和慈善组织的伙伴关系。迄今为止,该中心已经完成了四项提案征集活动,以新颖的硬件和软件,现场试验,数据和共享见解的形式产生了切实的成果。本文详细描述了该中心,提供了两个案例研究来演示已完成的项目,并讨论了迄今为止的挑战,经验教训和影响。

下载地址 | 返回目录 | [10.1007/978-3-319-26508-7]

[80] An empirical study of using sequential behavior pattern mining approach to predict learning styles (2018)

Abstract: “The learning style of a learner is an important parameter in his learning process. Therefore, learning styles should be considered in the design, development, and implementation of e-learning environments to increase learners performance. Thus, it is important to be able to automatically determine learning styles of learners in an e-learning environment. In this paper, we propose a sequential pattern mining approach to extract frequent sequential behavior patterns, which can separate learners with different learning styles. In this research, in order to recognize learners learning styles, system uses the Myers-Briggs Type Indicators (MBTI). The approach has been implemented and tested in an e-learning environment and the results show that learning styles of learners can be predicted with high accuracy. We show that learners with similar learning styles have similar sequential behavior patterns in interaction with an e-learning environment. A lot of frequent sequential behavior patterns were extracted which some of them have a meaningful relation with MBTI dimensions.”

摘要: “学习者的学习风格是他学习过程中的重要参数。因此,在设计,开发和实施电子学习环境时应考虑学习风格,以提高学习者的表现。因此,本文提出了一种序贯模式挖掘方法,以提取频繁的序贯行为模式,以区分具有不同学习风格的学习者,从而提出一种能够自动确定学习者学习风格的方法。学习者的学习风格,系统使用Myers-Briggs类型指标(MBTI),该方法已在电子学习环境中实施和测试,结果表明学习者的学习风格可以得到高精度预测。具有相似学习风格的人在与电子学习环境交互时具有相似的顺序行为模式。提取了其中一些与MBTI尺寸具有有意义关系的燕鸥。”

下载地址 | 返回目录 | [10.1007/s10639-017-9667-1]

[81] Application of a Conceptual Ecological Model to Predict the Effects of Sand Mining around Chilsan Island Group in the West Coast of Korea (2018)

Abstract: “Chilsan Island Group, located on the west coast of South Korea, has been recognized as a critical breeding and nursery ground for endangered seabirds because food is abundant and human activity is low. Chilsan Island Group has been under protection as a Natural Monument (Article no. 389) since 1997, but attempts were made in 2004 to exploit the coastal sands around the islands. A conceptual ecological model (CEM) was employed with energy flow diagram (EFD) to predict the effects of sand mining on the islands ecosystem. The results showed that sand mining activities caused long-term damage to benthic ecosystems and threatened seabird communities by reducing fish populations throughout the food web. The changes in energy flow in the ecosystem due to sand mining operations predict that reduction in secondary production of benthic animal communities might reduce energy transfer to humans and seabirds by 30{\%}. Different ranges of uncertainty provided by the Monte Carlo simulation lead to the conclusion that sand mining could bring about undesirable consequences. One concern is that such negative consequences would have a much more severe impact on seabirds than humans because seabirds cannot compete effectively against humans for the limited food sources. In this study, the complex interrelationships between ecosystem members were simplified and a CEM was established to clearly identify which elements would be affected. The CEM with EFD was found to be helpful in promoting communication between stakeholders, and it is expected to be widely used as a decisionmaking tool by public officials.”

摘要: “位于韩国西海岸的Chilsan Island Group,由于食物丰富且人类活动少,被公认为是濒危海鸟的重要繁殖和育苗场。ChilsanIsland Group已作为自然保护区而受到保护(自1997年以来第389条),但在2004年尝试开发岛屿周围的沿海沙土,并采用了概念生态模型(CEM)和能流图(EFD)来预测采砂对岛屿的影响。结果表明,采砂活动通过减少整个食物网中的鱼类种群数量,对底栖生态系统造成了长期破坏,并威胁着海鸟群落;由于采砂活动而造成的生态系统能量流的变化表明,采砂活动的次级生产减少。底栖动物群落可能使向人类和海鸟的能量转移减少30 {\%}。结论是,采砂可能会带来不良后果。一个令人担忧的问题是,这种不利后果对海鸟的影响将比对人类的影响更为严重,因为海鸟无法有效地与人类争夺有限的食物来源。在这项研究中,简化了生态系统成员之间的复杂相互关系,并建立了一个CEM以明确确定哪些元素将受到影响。发现具有EFD的CEM有助于促进利益相关者之间的交流,并且有望被公职人员广泛用作决策工具。”

下载地址 | 返回目录 | [10.1007/s12601-018-0042-y]

[82] Arabic Text Classification Based on Word and Document Embeddings (2016)

Abstract: Recently, Word Embeddings have been introduced as a major breakthrough in Natural Language Processing (NLP) to learn viable representation of linguistic items based on contextual information or/and word co-occurrence. In this paper, we investigate Arabic docu- ment classification using Word and document Embeddings as represen- tational basis rather than relying on text preprocessing and bag-of-words representation. We demonstrate that document Embeddings outperform text preprocessing techniques either by learning them using Doc2Vec or averaging word vectors using a simple method for document Embedding construction. Moreover, the results show that the classification accuracy is less sensitive to word and document vectors learning parameters.

摘要: 最近,词嵌入已被引入,这是自然语言处理(NLP)的一项重大突破,目的是基于上下文信息或/和词共现来学习语言项目的可行表示形式。在本文中,我们研究使用Word和文档嵌入作为表示基础的阿拉伯文文档分类,而不是依靠文本预处理和词袋表示法。通过使用Doc2Vec学习它们或使用简单的文档嵌入构造方法平均单词向量,我们证明了文档嵌入优于文本预处理技术。此外,结果表明,分类精度对单词和文档向量学习参数不太敏感。

下载地址 | 返回目录 | [10.1007/978-3-319-48308-5]

[83] Biological Networks and Pathway Analysis (2017)

Abstract: In this volume, expert practitioners present a compilation of methods of functional data analysis (often referred to as “systems biology”) and its applications in drug discovery, medicine, and basic disease research. It covers such important issues as the elucidation of protein, compound and gene interactions, as well as analytical tools, including networks, interactome and ontologies, and clinical applications of functional analysis. As a volume in the highly successful Methods in Molecular Biology series, this work provides detailed description and hands-on implementation advice. Reputable, comprehensive, and cutting-edge, Biological Networks and Pathway Analysis presents both “wet lab” experimental methods and computational tools in order to cover a broad spectrum of issues in this fascinating new field.

摘要: 在本卷中,专业从业人员提供了功能数据分析方法(通常称为“系统生物学”)的汇编及其在药物发现,医学和基础疾病研究中的应用。它涵盖了诸如蛋白质,化合物和基因相互作用的阐明以及分析工具(包括网络,相互作用组和本体论)以及功能分析的临床应用等重要问题。作为非常成功的分子生物学方法系列中的一本书,该工作提供了详细的说明和动手实施建议。著名的,全面的,前沿的生物网络和路径分析提供了“湿实验室”实验方法和计算工具,以涵盖这个引人入胜的新领域中的广泛问题。

下载地址 | 返回目录 | [10.1007/978-1-4939-7027-8]

[84] Chat mining: Predicting user and message attributes in computer-mediated communication (2008)

Abstract: “The focus of this paper is to investigate the possibility of predicting several user and message attributes in text-based, real-time, online messaging services. For this purpose, a large collection of chat messages is examined. The applicability of various supervised classification techniques for extracting information from the chat messages is evaluated. Two competing models are used for defining the chat mining problem. A term-based approach is used to investigate the user and message attributes in the context of vocabulary use while a style-based approach is used to examine the chat messages according to the variations in the authors writing styles. Among 100 authors, the identity of an author is correctly predicted with 99.7{\%} accuracy. Moreover, the reverse problem is exploited, and the effect of author attributes on computer-mediated communications is discussed. {\textcopyright} 2008 Elsevier Ltd. All rights reserved.”

摘要: “本文的重点是研究在基于文本的实时在线消息服务中预测几种用户和消息属性的可能性。为此,研究了大量的聊天消息。各种受监管的适用性评估了从聊天消息中提取信息的分类技术;使用了两个相互竞争的模型来定义聊天挖掘问题;基于术语的方法用于调查词汇使用情况下的用户和消息属性,而基于样式的方法用来根据作者写作风格的变化来检查聊天消息,在100位作者中,正确地预测了作者的身份,准确度为99.7 {\%},并利用了反向问题,并取得了效果讨论了计算机介导通信中的作者属性。{\ t​​extcopyright} 2008 Elsevier Ltd.保留所有权利。“

下载地址 | 返回目录 | [10.1016/j.ipm.2007.12.009]

[85] Computational and Statistical Methods in Intelligent Systems (2019)

Abstract: \textcopyright} Springer Nature Switzerland AG. 2019. Over recent years, a lot of money have been spent by the oil and gas industry to maintain pipeline integrity, specifically in handling CO2internal corrosion. In fact, current solutions in pipeline corrosion maintenance are extremely costly to the companies. The empirical solutions also lack intelligence in adapting to different environment. In the absence of a suitable algorithm, the time taken to determine the corrosion occurrence is lengthy as a lot of testing is needed to choose the right solution. If the corrosion failed to be determined at an early stage, the pipes will burst leading to high catastrophe for the company in terms of costs and environmental effect. This creates a demand of utilizing machine learning in predicting corrosion occurrence. This paper discusses on the evaluation of machine learning algorithms in predicting CO2internal corrosion rate. It is because there are still gaps on study on evaluating suitable machine learning algorithms for corrosion prediction. The selected algorithms for this paper are Artificial Neural Network, Support Vector Machine and Random Forest. As there is limited data available for corrosion studies, a synthetic data was generated. The synthetic dataset was generated via random Gaussian function and incorporated de Waard-Milliams model, an empirical determination model for CO2internal corrosion. Based on the experiment conducted, Artificial Neural Network shows a more robust result in comparison to the other algorithms.

摘要: \ t​​extcopyright} Springer Nature Switzerland AG。 2019年。近年来,石油和天然气行业花费了很多钱来维持管道完整性,特别是在处理二氧化碳内部腐蚀方面。实际上,当前管道腐蚀维护的解决方案对公司而言是极其昂贵的。实证解决方案还缺乏适应不同环境的智能。在缺少合适的算法的情况下,确定腐蚀发生所花费的时间很长,因为需要进行大量测试才能选择正确的解决方案。如果不能尽早确定腐蚀程度,则管道将爆裂,从而给公司带来成本和环境影响方面的巨大灾难。这产生了利用机器学习来预测腐蚀发生的需求。本文讨论了在预测CO2内部腐蚀速率方面的机器学习算法评估。这是因为在评估合适的机器学习算法以进行腐蚀预测方面仍存在研究空白。本文选择的算法是人工神经网络,支持向量机和随机森林。由于可用于腐蚀研究的数据有限,因此生成了综合数据。合成数据集是通过随机高斯函数生成的,并并入了de Waard-Milliams模型,该模型是CO2内部腐蚀的经验确定模型。根据所进行的实验,与其他算法相比,人工神经网络显示出更可靠的结果。

下载地址 | 返回目录 | [10.1007/978-3-030-00211-4]

[86] Data Mining on MRI-Computational Texture Features to Predict Sensory Characteristics in Ham (2016)

Abstract: In this study, data mining technique was applied on computational texture features obtained from the analysis of magnetic resonance imaging (MRI) of hams, with the main objective of determining sensory attributes of dry-cured ham non-destructively. For that, fresh and dry-cured hams were scanned and then the MRI images were analyzed by three methods of computational texture features. Data mining was applied on the computational texture features from fresh and dry-cured hams for obtaining prediction equations of the sensory attributes of dry-cured hams. The correlation coefficient ® was used to analyze the results. Accurate prediction was found for 13 sensory attributes as a function of computational texture features of fresh ham, and three from dry-cured ham. In addition, a sensory analysis of dry-cured hams was also carried out to validate the predicted results. Similar values were found between the predicted attributes and those determined by sensory analysis. Thus, it is possible to predict sensory attributes of dry-cured hams by applying data mining on computational texture features of MRI from fresh and dry-cured hams. This supposes the chance of determining non-destructively sensory attributes of dry-cured hams, even before the curing process starts.

摘要: 在本研究中,数据挖掘技术应用于从火腿的磁共振成像(MRI)分析获得的计算纹理特征,其主要目的是无损确定干腌火腿的感官属性。为此,扫描新鲜和干腌火腿,然后通过三种计算纹理特征的方法分析MRI图像。将数据挖掘应用于新鲜和干腌火腿的计算质地特征,以获得干腌火腿的感官属性的预测方程。相关系数(R)用于分析结果。根据新鲜火腿的计算质地特征,发现了13种感官属性的准确预测,而干腌火腿则有3种感官属性得到了准确的预测。此外,还对干腌火腿进行了感官分析,以验证预测结果。在预测属性与通过感官分析确定的属性之间发现相似的值。因此,可以通过将数据挖掘应用于新鲜和干腌火腿的MRI的计算纹理特征来预测干腌火腿的感官属性。这假定即使在固化过程开始之前,也有机会确定干腌火腿的非破坏性感官属性。”

下载地址 | 返回目录 | [10.1007/s11947-015-1662-1]

[87] Data mining service recommendation based on dataset features (2019)

Abstract: Quality of service (QoS)-based web service selection has been studied in the service computing community for some time. However, characteristics of the input dataset that is going to be processed by the web service are not usually considered in the selection process, even though they might have impact on QoS values of the service, e.g. latency on processing a bigger dataset is higher than that on a smaller dataset, one service takes longer time to process a certain dataset than another service. To address this issue, in this work, we take into consideration the dataset features in the QoS-based service recommendation process and we focus on data mining services because their QoS values could be highly dependent on dataset features. We propose two approaches for data mining service recommendations and compare their performances. In the first approach, we use a meta-learning algorithm to incorporate dataset features in the recommendation process and study the use of different machine learning algorithms (both classification models and regression models) as meta-learners in recommending data mining services for the given dataset. We also investigate the impact of the number of dataset features on the performance of the meta-learners. In the second approach, we propose a novel technique of using factor analysis for web service recommendation. We use decomposition technique to identify latent features of the input dataset and then recommend services by exploiting these latent variables. Our proposed approach of web service recommendation based on latent features was shown to be a more robust model with an accuracy of 85{\%} compared to meta-feature-based recommendation.

摘要: 基于服务质量(QoS)的Web服务选择已在服务计算社区中研究了一段时间。但是,在选择过程中通常不会考虑要由Web服务处理的输入数据集的特征,即使它们可能会影响服务的QoS值,例如处理较大数据集的等待时间比较小数据集的等待时间长,一个服务处理某个数据集所花的时间比另一服务要长。为了解决这个问题,在这项工作中,我们在基于QoS的服务推荐过程中考虑了数据集特征,并且我们专注于数据挖掘服务,因为它们的QoS值可能高度依赖于数据集特征。我们为数据挖掘服务建议提出了两种方法,并比较了它们的性能。在第一种方法中,我们使用元学习算法将数据集特征纳入推荐过程中,并研究将不同的机器学习算法(分类模型和回归模型)用作元学习器,以推荐给定数据集的数据挖掘服务。我们还研究了数据集特征数量对元学习器性能的影响。在第二种方法中,我们提出了一种使用因子分析进行Web服务推荐的新技术。我们使用分解技术来识别输入数据集的潜在特征,然后通过利用这些潜在变量来推荐服务。与基于元功能的推荐相比,我们提出的基于潜在特征的Web服务推荐方法被证明是一种更可靠的模型,其准确度为85 {\%}。

下载地址 | 返回目录 | [10.1007/s11761-019-00272-y]

[88] Data mining-based flatness pattern prediction for cold rolling process with varying operating condition (2014)

Abstract: Data-rich environments in modern rolling processes provide a great opportunity for more effective process control and more total quality improvement. Flatness is a key geometrical feature of strip products in a cold rolling process. In order to achieve good flatness, it is necessary to reveal the factors that often influence the flatness quality, to develop a general flatness pattern prediction model that can handle the varying operating condition during the rolling of products with different specifications and to realize an effective flatness feedback control strategy. This paper develops a practical data mining-based flatness pattern prediction method for cold rolling process with varying operating condition. Firstly, the high-dimensional process measurements are projected onto a low-dimensional space (i.e., the latent variable space) using locality preserving projection method; at the same time, the Legendre orthogonal polynomials are used to extract the basic flatness patterns by projecting the high-dimensional flatness measurements into several flatness characteristic coefficients. Secondly, a mixture probabilistic linear regression model is adopted to describe the relationships between the latent variables and the flatness characteristic coefficients. Case study is conducted on a real steel rolling process. Results show that the developed method has not only the satisfactory prediction performance, but good potentials to improve process understanding and strip flatness quality.

摘要: 现代轧制过程中的数据丰富的环境为更有效的过程控制和更多的总体质量改善提供了绝佳的机会。平直度是冷轧过程中带材产品的关键几何特征。为了获得良好的平直度,有必要揭示经常影响平直度质量的因素,开发一种通用的平直度图案预测模型,该模型可以处理不同规格产品在轧制过程中的变化工作条件,并实现有效的平直度。反馈控制策略。本文针对工作条件变化的冷轧过程,开发了一种基于数据挖掘的实用平直度模式预测方法。首先,使用局部保留投影法将高维过程测量值投影到低维空间(即潜变量空间)上;同时,通过将高维平面度测量值投影到几个平面度特征系数中,使用勒让德正交多项式来提取基本平面度图案。其次,采用混合概率线性回归模型来描述潜在变量与平坦度特征系数之间的关系。案例研究是在真实的钢轧制过程中进行的。结果表明,所开发的方法不仅具有令人满意的预测性能,而且还具有改善工艺理解和带钢平直度质量的良好潜力。

下载地址 | 返回目录 | [10.1007/s10115-013-0716-9]

[89] Demand Side Management Using Bacterial Foraging Optimization Algorithm (2016)

Abstract: Demand side management (DSM) is one of the most significant func- tions involved in the smart grid that provides an opportunity to the customers to carryout suitable decisions related to energy consumption, which assists the energy suppliers to decrease the peak load demand and to change the load profile. The existing demand side management strategies not only uses specific techniques and algorithms but it is restricted to small range of controllable loads. The proposed demand side management strategy uses load shifting technique to handle the large number of loads. Bacterial foraging optimization algorithm (BFOA) is implemented to solve the minimization problem. Simulations were performed on smart grid which consists of different type of loads in residential, commercial and industrial areas respectively. The simulation results evaluates that proposed strategy attaining substantial savings as well as it reduces the peak load demand of the smart grid.

摘要: 需求侧管理(DSM)是智能电网中最重要的功能之一,它为客户提供了执行与能耗有关的适当决策的机会,这有助于能源供应商减少峰值负荷需求并降低能耗。更改负载配置文件。现有的需求侧管理策略不仅使用特定的技术和算法,而且限于可控制负载的小范围内。提出的需求侧管理策略使用负载转移技术来处理大量负载。实现了细菌觅食优化算法(BFOA)以解决最小化问题。在智能电网上进行了仿真,该智能电网分别由住宅,商业和工业区域中的不同类型的负载组成。仿真结果评估了所提出的策略不仅可以节省大量成本,而且可以降低智能电网的峰值负载需求。

下载地址 | 返回目录 | [10.1007/978-81-322-2755-7]

[90] Development of a Web-Based Radiation Toxicity Prediction System Using Metarule-Guided Mining to Predict Radiation Pneumonitis and Esophagitis in Lung Cancer Patients (2019)

Abstract: Radiation toxicity grades must be reviewed based on existing knowledge-based clinical evidence or be assessed by referencing the clinical research literature to minimize radiation toxicity associated with radiotherapy. The aim of this study was to develop a radiation toxicity prediction system using metarule-guided mining of the clinical research literature to predict radiation pneumonitis and esophagitis in lung cancer patients. A semantic pattern database was built using 100 clinical research articles. Semantic patterns of prognostic factors and toxicity grades were extracted by metarule-guided mining. Feature analysis and correlation investigation of prognostic factors and toxicity grades were performed by using dendrogram and heatmap. A web-based user interface for the prediction system was designed. Patient prognostic factors were used in this prediction system to predict toxicity grade results. Radiation toxicity grades for patients with radiation pneumonitis and esophagitis were calculated through our prediction system. Age and chemotherapy were prognostic factors as were toxicity grades 1–3 and 5 based on the metarule-guided mining system. The odds ratios had similar trends to those in the existing meta-analysis literature. The radiation toxicity prediction system that we developed can potentially be used as a clinical decision support system for patient-specific radiation treatment after weighting of prognostic factors, performing a correlation analysis, and performing a validity evaluation.

摘要: 放射毒性等级必须根据现有的基于知识的临床证据进行审查,或者通过参考临床研究文献进行评估,以最大程度地减少与放射疗法相关的放射毒性。这项研究的目的是开发一种使用Metarule指导的临床研究文献的放射线毒性预测系统,以预测肺癌患者的放射性肺炎和食管炎。使用100篇临床研究文章建立了语义模式数据库。通过元规则指导的提取方法提取了预后因素和毒性等级的语义模式。利用树状图和热图对预后因素和毒性分级进行特征分析和相关性研究。设计了基于Web的预测系统用户界面。在此预测系统中使用了患者预后因素来预测毒性等级结果。通过我们的预测系统计算了放射性肺炎和食管炎患者的放射毒性等级。根据元规则指导的采矿系统,年龄和化学疗法是预后因素,毒性等级1-3和5是预后因素。优势比具有与现有荟萃分析文献中相似的趋势。在评估了预后因素,进行了相关分析并进行了有效性评估之后,我们开发的放射线毒性预测系统可以潜在地用作针对特定患者的放射治疗的临床决策支持系统。

下载地址 | 返回目录 | [10.3938/jkps.75.319]

[91] Handling Concept Drift for Predictions in Business Process Mining (2020)

Abstract: Predictive services nowadays play an important role across all business sectors. However, deployed machine learning models are challenged by changing data streams over time which is described as concept drift. Prediction quality of models can be largely influenced by this phenomenon. Therefore, concept drift is usually handled by retraining of the model. However, current research lacks a recommendation which data should be selected for the retraining of the machine learning model. Therefore, we systematically analyze different data selection strategies in this work. Subsequently, we instantiate our findings on a use case in process mining which is strongly affected by concept drift. We can show that we can improve accuracy from 0.5400 to 0.7010 with concept drift handling. Furthermore, we depict the effects of the different data selection strategies.

摘要: 当今的预测性服务在所有业务部门中都扮演着重要的角色。但是,部署的机器学习模型面临着随时间变化的数据流的挑战,这被称为概念漂移。模型的预测质量在很大程度上受此现象影响。因此,通常通过重新训练模型来处理概念漂移。但是,当前的研究缺乏建议应该选择哪些数据用于机器学习模型的再训练。因此,我们在这项工作中系统地分析了不同的数据选择策略。随后,我们实例化了流程挖掘中一个用例的发现,该案例受概念漂移的影响很大。我们可以证明,使用概念漂移处理可以将精度从0.5400提高到0.7010。此外,我们描述了不同数据选择策略的效果。

下载地址 | 返回目录 | []

[92] Integrating in-process software defect prediction with association mining to discover defect pattern (2009)

Abstract: Rather than detecting defects at an early stage to reduce their impact, defect prevention means that defects are prevented from occurring in advance. Causal analysis is a common approach to discover the causes of defects and take corrective actions. However, selecting defects to analyze among large amounts of reported defects is time consuming, and requires significant effort. To address this problem, this study proposes a defect prediction approach where the reported defects and performed actions are utilized to discover the patterns of actions which are likely to cause defects. The approach proposed in this study is adapted from the Action-Based Defect Prediction (ABDP), an approach uses the classification with decision tree technique to build a prediction model, and performs association rule mining on the records of actions and defects. An action is defined as a basic operation used to perform a software project, while a defect is defined as software flaws and can arise at any stage of the software process. The association rule mining finds the maximum rule set with specific minimum support and confidence and thus the discovered knowledge can be utilized to interpret the prediction models and software process behaviors. The discovered patterns then can be applied to predict the defects generated by the subsequent actions and take necessary corrective actions to avoid defects. The proposed defect prediction approach applies association rule mining to discover defect patterns, and multi-interval discretization to handle the continuous attributes of actions. The proposed approach is applied to a business project, giving excellent prediction results and revealing the efficiency of the proposed approach. The main benefit of using this approach is that the discovered defect patterns can be used to evaluate subsequent actions for in-process projects, and reduce variance of the reported data resulting from different projects. Additionally, the discovered patterns can be used in causal analysis to identify the causes of defects for software process improvement. {\textcopyright} 2008 Elsevier B.V. All rights reserved.

摘要: 预防缺陷不是在早期阶段检测缺陷以减少其影响,而是指预防缺陷提前发生。因果分析是发现缺陷原因并采取纠正措施的常用方法。但是,在大量报告的缺陷中选择要分析的缺陷非常耗时,并且需要大量的精力。为了解决这个问题,本研究提出了一种缺陷预测方法,其中利用报告的缺陷和执行的操作来发现可能导致缺陷的操作模式。本研究中提出的方法是从基于动作的缺陷预测(ABDP)中改编而来的,该方法使用决策树技术进行分类以构建预测模型,并在行为和缺陷的记录上进行关联规则挖掘。动作定义为用于执行软件项目的基本操作,而缺陷定义为软件缺陷,并且可以在软件过程的任何阶段出现。关联规则挖掘可找到具有特定最小支持和置信度的最大规则集,因此,可以将发现的知识用于解释预测模型和软件过程行为。然后,可以将发现的模式应用于预测由后续操作生成的缺陷,并采取必要的纠正措施来避免缺陷。所提出的缺陷预测方法应用关联规则挖掘来发现缺陷模式,并应用多间隔离散化来处理动作的连续属性。所提出的方法应用于业务项目,给出了出色的预测结果并揭示了所提出方法的效率。使用这种方法的主要好处是,发现的缺陷模式可用于评估过程中项目的后续操作,并减少不同项目导致的报告数据的差异。此外,发现的模式可用于因果分析中,以识别缺陷原因,以改善软件流程。 {\ t​​extcopyright} 2008 Elsevier B.V.保留所有权利。

下载地址 | 返回目录 | [10.1016/j.infsof.2008.04.008]

[93] Intelligent Systems Design and Applications (2020)

Abstract: Cooperative Intelligent Transportation Systems (C-ITS) is one of the most prominent solutions to facilitate many new exciting applications concerning road safety, mobility, environment, and driving comfort. This technology is now on the verge of actual deployments. However, security threats, privacy, and trust management remain the most significant concerns. C-ITS relies highly on node cooperation and trust as vehicles take the decision based on the information received from the roadside network infrastructure. This information should be accurate and reliable to ensure proper functioning of the system. However, the presence of a misbehaving or compromised node in the system can lead to catastrophic results for both safety and traffic efficiency. It is therefore essential to detect misbehavior and defend the C-ITS against it. Although various studies have proposed misbehavior detection at the vehicular plane, the study that explores the machine learning capabilities to detect misbehavior at infrastructure plane is not present. Thus, in this paper, we propose a solution to detect misbehavior at the infrastructure plane of C-ITS that employs the predictive capabilities of Deep Learning. We compare the performance of Multi-Layer Perceptron (MLP) and Long Short-Term Memory (LSTM) models of Deep Neural Network.

摘要: 合作智能交通系统(C-ITS)是最杰出的解决方案之一,可促进涉及道路安全,移动性,环境和驾驶舒适性的许多令人兴奋的新应用。现在,该技术已接近实际部署。但是,安全威胁,隐私和信任管理仍然是最重要的问题。当车辆根据从路边网络基础设施接收到的信息做出决定时,C-ITS高度依赖于节点的合作和信任。此信息应准确可靠,以确保系统正常运行。但是,系统中行为不当或受损的节点的存在可能导致安全性和流量效率方面的灾难性结果。因此,至关重要的是检测不良行为并针对C-ITS进行防御。尽管各种研究都提出了在车辆平面上进行不良行为检测的研究,但仍未开展研究机器学习功能以在基础设施平面上检测不良行为的研究。因此,在本文中,我们提出了一种解决方案,该解决方案利用深度学习的预测功能在C-ITS的基础架构层面检测不良行为。我们比较了深度神经网络的多层感知器(MLP)和长短期记忆(LSTM)模型的性能。

[下载地址](http://dx.doi.org/10.1007/978-3-030-16657-1{\_}60 http://link.springer.com/10.1007/978-3-030-16657-1) | 返回目录 | [10.1007/978-3-030-16657-1]

[94] Mining biometric data to predict programmer expertise and task difficulty (2018)

Abstract: “Programming mistakes frequently waste software developers time and may lead to the introduction of bugs into their software, causing serious risks for their customers. Using the correlation between various software process metrics and defects, earlier work has traditionally attempted to spot such bug risks. However, this study departs from previous works in examining a more direct method of using psycho-physiological sensors data to detect the difficulty of program comprehension tasks and programmer level of expertise. By conducting a study with 38 expert and novice programmers, we investigated how well an electroencephalography and an eye-tracker can be utilized in predicting programmer expertise (novice/expert) and task difficulty (easy/difficult). Using data from both sensors, we could predict task difficulty and programmer level of expertise with 64.9 and 97.7{\%} precision and 68.6 and 96.4{\%} recall, respectively. The result shows it is possible to predict the perceived difficulty of a task and expertise level for developers using psycho-physiological sensors data. In addition, we found that while using single biometric sensor shows good results, the composition of both sensors lead to the best overall performance.”

摘要: “编程错误经常浪费软件开发人员的时间,并可能导致将错误引入其软件中,从而给他们的客户带来严重的风险。利用各种软件过程指标和缺陷之间的相关性,较早的工作历来试图发现此类错误风险。但是,本研究与以前的工作不同,它研究了一种使用心理生理传感器数据来检测程序理解任务难度和程序员专业水平的更直接方法,并与38位专家和新手程序员进行了研究。可以很好地利用脑电图和眼动仪来预测程序员的专业知识(新手/专家)和任务难度(容易/困难)。使用两个传感器的数据,我们可以通过64.9和97.7预测任务难度和程序员的专业知识水平{ \%}精度和68.6和96.4 {\%}的查全率,结果表明可以预测感知到的差异为开发人员使用心理生理传感器数据提供的任务和专业知识水平。此外,我们发现虽然使用单个生物识别传感器显示出良好的结果,但两个传感器的组合都能带来最佳的整体性能。”

下载地址 | 返回目录 | [10.1007/s10586-017-0746-2]

[95] Mining expressed sequence tags of rapeseed (Brassica napus L.) to predict the drought responsive regulatory network (2015)

Abstract: It is of great significance to understand the regulatory mechanisms by which plants deal with drought stress. Two EST libraries derived from rapeseed (Brassica napus) leaves in non-stressed and drought stress conditions were analyzed in order to obtain the transcriptomic landscape of drought-exposed B. napus plants, and also to identify and characterize significant drought responsive regulatory genes and microRNAs. The functional ontology analysis revealed a substantial shift in the B. napus transcriptome to govern cellular drought responsiveness via different stress-activated mechanisms. The activity of transcription factor and protein kinase modules generally increased in response to drought stress. The 26 regulatory genes consisting of 17 transcription factor genes, eight protein kinase genes and one protein phosphatase gene were identified showing significant alterations in their expressions in response to drought stress. We also found the six microRNAs which were differentially expressed during drought stress supporting the involvement of a post-transcriptional level of regulation for B. napus drought response. The drought responsive regulatory network shed light on the significance of some regulatory components involved in biosynthesis and signaling of various plant hormones (abscisic acid, auxin and brassinosteroids), ubiquitin proteasome system, and signaling through Reactive Oxygen Species (ROS). Our findings suggested a complex and multi-level regulatory system modulating response to drought stress in B. napus.

摘要: 了解植物应对干旱胁迫的调控机制非常重要。为了获得干旱暴露的甘蓝型油菜植物的转录组图谱,并鉴定和鉴定重要的干旱响应调控基因和microRNA,分析了两个在非胁迫和干旱胁迫条件下从油菜(甘蓝型油菜)叶片衍生的EST文库。 。功能本体分析揭示了油菜双歧杆菌转录组中的重大变化,以通过不同的应力激活机制来控制细胞干旱响应性。转录因子和蛋白激酶模块的活性通常响应干旱胁迫而增加。鉴定出由17个转录因子基因,8个蛋白激酶基因和1个蛋白磷酸酶基因组成的26个调节基因,它们的表达响应干旱胁迫而发生了显着变化。我们还发现在干旱胁迫期间差异表达的六个microRNA支持了甘蓝型油菜干旱反应的转录后水平调控。干旱响应性调控网络揭示了一些调控成分在各种植物激素(脱落酸,生长素和油菜素固醇),泛素蛋白酶体系统以及通过活性氧(ROS)的信号传导和信号传递中的重要性。我们的研究结果表明,一个复杂且多层次的调节系统可调节甘蓝型油菜对干旱胁迫的响应。

下载地址 | 返回目录 | [10.1007/s12298-015-0311-5]

[96] Mining software change data stream to predict changeability of classes of object-oriented software system (2016)

Abstract: In software development, changes are continuously performed and stream of change-commits are generated during the evolution of software systems. Such stream of change-commits for classes are vital to understand how classes have co-evolved or change-coupled. The premise of this paper is to present a framework for mining the stream of class-changes to predict the future changeability behavior of classes. We present changeability measures for classes namely, Change-Coupling Index and Class Change-Impact Set. For these measures firstly, stream of change-commits are mined to extract the change-coupling among the classes, secondly, changeability measures of classes are computed. The proposed measures are empirically validated and some research questions have been answered to validate the usefulness of the change-stream. The obtained results are promising and show that proposed change-data-stream based changeability prediction can be very useful for the maintenance of software systems.

摘要: 在软件开发中,在软件系统的演进过程中,不断地执行更改,并生成更改提交流。这种对类的变更承诺流对于理解类如何协同发展或与变更耦合至关重要。本文的前提是提供一个框架,用于挖掘类更改流,以预测类的未来可更改性行为。我们提供了类的可更改性度量,即更改耦合指数和类更改影响集。对于这些措施,首先,挖掘变更提交流以提取类别之间的变更耦合,其次,计算类别的变更性度量。所提出的措施在经验上得到了验证,并回答了一些研究问题,以验证变更流的有效性。获得的结果是有希望的,并表明基于变更数据流的变更性预测可以对软件系统的维护非常有用。

下载地址 | 返回目录 | [10.1007/s12530-016-9151-y]

[97] Mining the Situation: Spatiotemporal Traffic Prediction with Big Data (2015)

Abstract: With the vast availability of traffic sensors from which traffic information can be derived, a lot of research effort has been devoted to developing traffic prediction techniques, which in turn improve route navigation, traffic regulation, urban area planning, etc. One key challenge in traffic prediction is how much to rely on prediction models that are constructed using historical data in real-time traffic situations, which may differ from that of the historical data and change over time. In this paper, we propose a novel online framework that could learn from the current traffic situation (or context) in real-time and predict the future traffic by matching the current situation to the most effective prediction model trained using historical data. As real-time traffic arrives, the traffic context space is adaptively partitioned in order to efficiently estimate the effectiveness of each base predictor in different situations. We obtain and prove both short-term and long-term performance guarantees (bounds) for our online algorithm. The proposed algorithm also works effectively in scenarios where the true labels (i.e., realized traffic) are missing or become available with delay. Using the proposed framework, the context dimension that is the most relevant to traffic prediction can also be revealed, which can further reduce the implementation complexity as well as inform traffic policy making. Our experiments with real-world data in real-life conditions show that the proposed approach significantly outperforms existing solutions.

摘要: 由于可以从中获取交通信息的交通传感器的广泛可用性,已经进行了大量的研究工作来开发交通预测技术,从而改善了路线导航,交通管制,城市规划等。这是一个关键的挑战流量预测中的“预测量”是指在实时流量情况下,要多大程度依赖使用历史数据构建的预测模型,该模型可能与历史数据有所不同,并且会随时间变化。在本文中,我们提出了一种新颖的在线框架,该框架可以实时学习当前的交通状况(或上下文),并通过将当前的状况与使用历史数据训练的最有效的预测模型进行匹配来预测未来的交通。随着实时流量的到来,对流量上下文空间进行自适应划分,以便有效地估计每个基本预测变量在不同情况下的有效性。我们获得并证明了我们在线算法的短期和长期性能保证(界限)。所提出的算法在缺少真实标签(即,实际流量)或延迟可用的情况下也能有效地工作。使用所提出的框架,还可以揭示与流量预测最相关的上下文维度,这可以进一步降低实现复杂度并为流量策略制定提供依据。我们在真实条件下对真实数据的实验表明,所提出的方法明显优于现有的解决方案。

下载地址 | 返回目录 | [10.1109/JSTSP.2015.2389196]

[98] Mining usage scenarios in business processes: Outlier-aware discovery and run-time prediction (2011)

Abstract: A prominent goal of process mining is to build automatically a model explaining all the episodes recorded in the log of some transactional system. Whenever the process to be mined is complex and highly-flexible, however, equipping all the traces with just one model might lead to mixing different usage scenarios, thereby resulting in a spaghetti-like process description. This is, in fact, often circumvented by preliminarily applying clustering methods on the process log in order to identify all its hidden variants. In this paper, two relevant problems that arise in the context of applying such methods are addressed, which have received little attention so far: (i) making the clustering aware of outlier traces, and (ii) finding predictive models for clustering results. The first issue impacts on the effectiveness of clustering algorithms, which can indeed be led to confuse real process variants with exceptional behavior or malfunctions. The second issue instead concerns the opportunity of predicting the behavioral class of future process instances, by taking advantage of context-dependent “non-structural” data (e.g., activity executors, parameter values). The paper formalizes and analyzes these two issues and illustrates various mining algorithms to face them. All the algorithms have been implemented and integrated into a system prototype, which has been thoroughly validated over two real-life application scenarios. {\textcopyright} 2011 Elsevier B.V.

摘要: 过程挖掘的主要目标是自动构建一个模型,该模型解释某个交易系统的日志中记录的所有情节。但是,每当要开采的过程复杂且高度灵活时,仅用一种模型装备所有痕迹就可能导致混合使用场景,从而产生类似于意大利面条的过程描述。实际上,通常通过在过程日志上预先应用聚类方法来识别所有隐藏的变体来避免这种情况。在本文中,解决了在使用此类方法的背景下出现的两个相关问题,到目前为止,这些问题很少受到关注:(i)使聚类了解异常迹线,以及(ii)查找聚类结果的预测模型。第一个问题影响群集算法的有效性,确实会导致将真实的过程变量与异常行为或故障相混淆。相反,第二个问题涉及通过利用上下文相关的“非结构”数据(例如,活动执行者,参数值)来预测未来流程实例的行为类别的机会。本文对这两个问题进行了形式化和分析,并举例说明了面对这些问题的各种挖掘算法。所有算法均已实现并集成到系统原型中,该原型已在两个实际应用场景中得到了充分验证。 {\ t​​extcopyright} 2011 Elsevier B.V.

下载地址 | 返回目录 | [10.1016/j.datak.2011.07.002]

[99] Mining user interaction patterns in the darkweb to predict enterprise cyber incidents (2019)

Abstract: With the rise in security breaches over the past few years, there has been an increasing need to mine insights from social media platforms to raise alerts of possible attacks in an attempt to defend conflict during competition. In this study, we attempt to build a framework that utilizes unconventional signals from the darkweb forums by leveraging the reply network structure of user interactions with the goal of predicting enterprise-related external cyber attacks. We use both unsupervised and supervised learning models that address the challenges that come with the lack of enterprise attack metadata for ground-truth validation as well as insufficient data for training the models. We validate our models on a binary classification problem that attempts to predict cyber attacks on a daily basis for an organization. Using several controlled studies on features leveraging the network structure, we measure the extent to which the indicators from the darkweb forums can be successfully used to predict attacks. We use information from 53 forums in the darkweb over a span of 17\xa0months for the task. Our framework to predict real-world organization cyber attacks of three different security events suggests that focusing on the reply path structure between groups of users based on random walk transitions and community structures has an advantage in terms of better performance solely relying on forum or user posting statistics prior to attacks.

摘要: 随着过去几年中安全漏洞的增加,越来越需要从社交媒体平台挖掘见解,以发出有关可能的攻击的警报,以捍卫竞争中的冲突。在这项研究中,我们试图通过利用用户交互的回复网络结构来构建利用暗网论坛中非常规信号的框架,以预测与企业相关的外部网络攻击。我们同时使用无监督和有监督的学习模型,以解决因缺乏针对真实性验证的企业攻击元数据以及训练模型所需的数据不足而带来的挑战。我们在二进制分类问题上验证我们的模型,该问题试图每天预测组织的网络攻击。通过对利用网络结构的功能进行的一些受控研究,我们测量了Darkweb论坛中的指标可成功用于预测攻击的程度。我们使用暗网中53个论坛中17个月* xa0个月的信息来完成任务。我们用于预测现实世界中组织对三种不同安全事件的网络攻击的框架表明,仅依靠论坛或用户发布,专注于基于随机行走过渡和社区结构的用户组之间的回复路径结构就具有更好的性能方面的优势。攻击之前的统计数据。

下载地址 | 返回目录 | [10.1007/s13278-019-0603-9]

[100] Multimedia services quality prediction based on the association mining between context and QoS properties (2016)

Abstract: With the increasing presence and adoptionof multimedia services over the Internet, quality has become an essential factor in choosing a service from a set of functionally similar services. To address the time-consuming, inaccurate and expensive multimedia quality evaluations in traditional methods, this paper proposes a context-aware matrix factorization (CAMF) approach to collaborative and personalized quality prediction for multimedia services. We first mine the association rules between context and QoS properties oriented to multimedia services to determine how a QoS value to be predicted is dependent on context properties. Then the service QoS experiences collected from other users are filtered out according to the context which is similar to current user. Based on the collected QoS data, a context-aware approach is designed for personalized service QoS value prediction. Experimental results demonstrate that this approach can make significant improvement on the accuracy of QoS prediction.

摘要: 随着Internet上多媒体服务的日益普及和采用,质量已成为从一组功能相似的服务中选择一种服务的重要因素。为了解决传统方法中耗时,不准确且昂贵的多媒体质量评估,本文提出了一种上下文感知矩阵分解(CAMF)方法,以进行多媒体服务的协作和个性化质量预测。我们首先挖掘面向多媒体服务的上下文和QoS属性之间的关联规则,以确定要预测的QoS值如何依赖于上下文属性。然后根据与当前用户相似的上下文过滤掉从其他用户那里收集的服务QoS体验。基于收集的QoS数据,设计了一种上下文感知方法来进行个性化服务QoS值预测。实验结果表明,该方法可以大大提高QoS预测的准确性。

下载地址 | 返回目录 | [10.1016/j.sigpro.2015.01.013]

[101] Optimization of MRI Acquisition and Texture Analysis to Predict Physico-chemical Parameters of Loins by Data Mining (2017)

Abstract: The main objective of this study was to configure the acquisition and analysis of low-field magnetic resonance imaging (MRI) to predict physico-chemical characteristics of Iberian loin, evaluating the use of different MRI sequences (spin echo, SE; gradient echo, GE; turbo 3D, T3D), computational texture feature methods (GLCM, NGLDM, GLRLM, GLCM + NGLDM + GLRLM), and data mining techniques (multiple linear regression, MLR; isotonic regression, IR). Moderate to very good correlation coefficients and low mean absolute error were found when applying MLR or IR on any method of computational texture features from MRI acquired with SE or GE. For T3D sequence, accurate results are only obtained by applying IR on GLCM or GLCM + NGLDM + GLRLM methods. Considering not only the accuracy of the methodology but also consumed time and required resources, the use of SE sequences for MRI acquisition, GLCM method for MRI texture analysis, and MLR could be indicated for prediction physico-chemical characteristics of loin.

摘要: 这项研究的主要目标是配置低场磁共振成像(MRI)的采集和分析,以预测伊比利亚里脊的理化特征,评估不同MRI序列的使用(自旋回波,SE;梯度回波,GE; turbo 3D,T3D),计算纹理特征方法(GLCM,NGLDM,GLLRM,GLCM + NGLDM + GLRLM)和数据挖掘技术(多重线性回归,MLR;等渗回归,IR)。当在通过SE或GE获得的MRI的任何计算纹理特征方法上应用MLR或IR时,发现中等至非常好的相关系数和较低的平均绝对误差。对于T3D序列,只有通过在GLCM或GLCM + NGLDM + GLRLM方法上应用IR才能获得准确的结果。不仅要考虑方法的准确性,还要考虑所花费的时间和所需的资源,可以使用SE序列进行MRI采集,GLCM方法进行MRI纹理分析以及MLR来预测腰肉理化特性。

下载地址 | 返回目录 | [10.1007/s11947-016-1853-4]

[102] Pattern-Based Opinion Mining for Stock Market Trend Prediction (2008)

Abstract: “Stock market reports in on-line news are widely used by amateurs to make quick investment decisions. Financial analysts often give opinions about trends of stock markets based on past and present economic event indicators. These opinions commonly appear in text form and are abundant over the Internet. It is tedious and time consuming for users to browse through such text manually let alone to understand the embedded opinions. To overcome this shortcoming, automatic trend predication methods have been proposed. Under conventional methods, reports are represented using bag of words and trend prediction is treated as a 3-way trend classification problem, i.e. trend as up, down or stable. In this paper, we propose a new pattern-based opinion mining method for market trend predication. Experiments show that (1) pattern-based classification is more effective than its word-based counterpart for feature representation; and (2) opinion mining outperforms event-based classification for trend predication. The task of opinion mining gets more difficult when the users are exposed to opinions from more than one analyst. The question becomes whose opinions should he/she trust? This lays down our second research objective, i.e. to study different opinion incorporation strategies. Intuitively, one would trust the opinion supported by the majority. However, we show that on the contrary, the user is better off trusting the most credible analyst.”

摘要: “在线新闻中的股票市场报告已被业余爱好者广泛用于快速做出投资决策。金融分析师经常根据过去和现在的经济事件指标对股票市场趋势发表意见。这些观点通常以文本形式出现并且内容丰富。用户手动浏览这些文本既烦琐又费时,更不用说理解嵌入的观点了;为克服这一缺点,提出了自动趋势预测方法,在传统方法下,报告是用词袋来表示的。将趋势预测作为一种三向趋势分类问题,即趋势为“上”,“下”或“稳定”,本文提出了一种基于模式的新观点挖掘方法来预测市场趋势。 (1)基于模式的分类在特征表示上比基于单词的分类更有效;以及(2)观点挖掘在t方面优于基于事件的分类撕断预测。当用户接触多个分析人员的意见时,意见挖掘的任务将变得更加困难。问题变成他/她应该信任谁的意见?这确定了我们的第二个研究目标,即研究不同的意见纳入策略。凭直觉,人们会相信大多数人的支持。但是,我们证明,相反,用户最好不要信任最可靠的分析师。”

下载地址 | 返回目录 | [10.1142/s1793840608001949]

[103] Predicting Within-24h Visualisation of Hospital Clinical Reports Using Bayesian Networks (2015)

Abstract: This book constitutes the refereed proceedings of the 17th Portuguese Conference on Artificial Intelligence, EPIA 2015, held in Coimbra, Portugal, in September 2015. The 45 revised full papers presented together with 36 revised short papers were carefully reviewed and selected from a total of 131 submissions. EPIA 2015, following the standard EPIA format, covers a wide range of AI topics as follows: ambient intelligence and affective environments, artificial Intelligence in medicine, artificial intelligence in transportation systems, artificial life and evolutionary algorithms, computational methods in bioinformatics and systems biology, general artificial intelligence, intelligent information systems, intelligent robotics, knowledge discovery and business intelligence, multi-agent systems: theory and applications, social simulation and modelling, text mining and applications.

摘要: 这本书构成了2015年9月在葡萄牙科英布拉举行的2015年第17届葡萄牙人工智能大会EPIA的审稿。对这45篇经修订的全文和36篇经修订的简短论文进行了认真审查,并从中选择了131个提交中。遵循标准EPIA格式的EPIA 2015涵盖了以下广泛的AI主题:环境情报和情感环境,医学人工智能,运输系统人工智能,人工生命和进化算法,生物信息学和系统生物学的计算方法,通用人工智能,智能信息系统,智能机器人技术,知识发现和商业智能,多主体系统:理论与应用,社会仿真与建模,文本挖掘与应用。”

下载地址 | 返回目录 | [10.1007/978-3-319-23485-4]

[104] Predicting ground vibration induced by rock blasting using a novel hybrid of neural network and itemset mining (2020)

Abstract: Blasting operation is considered as one of the cheapest methods to break the rock into small pieces in surface and underground mines. Ground vibration is a side effect of blasting and can result in damage to, or failure of, nearby structures. Therefore, it is imperative to predict ground vibration in the blasting sites. The primary objective of this paper is to propose a new model to predict ground vibration based on itemset mining (IM) and neural networks (NN), called IM–NN. It is worth mentioning that no research has tested the efficiency of IM–NN to predict ground vibration yet. IM–NN is composed of three steps; firstly, frequent and confident patterns (itemsets) were extracted by using IM. Secondly, for each test instance, the most appropriate instances were selected based on the extracted patterns. Thirdly, NN was only trained by the selected instances. To achieve the objective of this research, a dataset including 92 instances was collected from blasting events of two surface mines in Iran, Kerman province. To demonstrate the acceptability of IM–NN, the classical NN as well as several empirical equations were also developed in this study. The results indicated that IM–NN with the correlation squared (R2) of 0.944 has better performance than NN with R2 of 0.898 and may be a promising alternative to the NN for predicting ground vibration. Thus, the use of IM was a good idea to optimize and improve the NN performance.

摘要: 爆破作业被认为是在露天和地下矿山中将岩石破碎成小块的最便宜的方法之一。地面振动是爆破的副作用,可能导致附近建筑物损坏或失效。因此,必须预测爆破现场的地面振动。本文的主要目的是提出一种基于项目集挖掘(IM)和神经网络(NN)的预测地面振动的新模型,称为IM–NN。值得一提的是,还没有研究测试IM–NN预测地面振动的效率。 IM–NN由三个步骤组成;首先,使用IM提取频繁和自信的模式(项目集)。其次,对于每个测试实例,根据提取的模式选择最合适的实例。第三,NN仅受选定实例的训练。为了实现本研究的目的,从克尔曼省伊朗的两个露天矿的爆破事件中收集了包括92个实例的数据集。为了证明IM-NN的可接受性,在这项研究中还开发了经典NN以及几个经验方程。结果表明,相关平方(R2)为0.944的IM–NN比具有R2为0.898的NN具有更好的性能,可能是预测神经网络振动的有希望的替代方法。因此,使用IM是优化和改善NN性能的好主意。

下载地址 | 返回目录 | [10.1007/s00521-020-04822-w]

[105] Predicting simulation parameters of biological systems using a Gaussian process model (2012)

Abstract: Finding optimal parameters for simulating biological systems is usually a very difficult and expensive task in systems biology. Brute force searching is infeasible in practice because of the huge (often infinite) search space. In this article, we propose predicting the parameters efficiently by learning the relationship between system outputs and parameters using regression. However, the conventional parametric regression models suffer from two issues, thus are not applicable to this problem. First, restricting the regression function as a certain fixed type (e.g. linear, polynomial, etc.) introduces too strong assumptions that reduce the model flexibility. Second, conventional regression models fail to take into account the fact that a fixed parameter value may correspond to multiple different outputs due to the stochastic nature of most biological simulations, and the existence of a potentially large number of other factors that affect the simulation outputs. We propose a novel approach based on a Gaussian process model that addresses the two issues jointly. We apply our approach to a tumor vessel growth model and the feedback Wright-Fisher model. The experimental results show that our method can predict the parameter values of both of the two models with high accuracy. {\textcopyright} 2012 Wiley Periodicals, Inc.

摘要: 寻找用于模拟生物系统的最佳参数通常是系统生物学中非常困难且昂贵的任务。由于巨大的搜索空间(通常是无限的),在实践中蛮力搜索是不可行的。在本文中,我们建议通过使用回归学习系统输出和参数之间的关系来有效地预测参数。但是,传统的参数回归模型存在两个问题,因此不适用于该问题。首先,将回归函数限制为某个固定类型(例如线性,多项式等)会引入过强的假设,从而降低了模型的灵活性。第二,传统的回归模型无法考虑以下事实:由于大多数生物学模拟的随机性以及存在可能影响模拟输出的其他大量潜在因素,固定参数值可能对应于多个不同的输出。我们提出了一种基于高斯过程模型的新颖方法,可以共同解决这两个问题。我们将我们的方法应用于肿瘤血管生长模型和反馈Wright-Fisher模型。实验结果表明,该方法可以高精度地预测两个模型的参数值。 {\ t​​extcopyright} 2012 Wiley Periodicals,Inc.

下载地址 | 返回目录 | [10.1002/sam.11163]

[106] Predicting the distribution of ground fissures and water-conducted fissures induced by coal mining: a case study (2016)

Abstract: Introduction: The use of Top Coal Caving for exploiting the thick coal seam with shallow buried depth most likely has a strong negative impact on the stability. Case description: Anjialing No. 1 Underground Mine is located in Shuozhou City, Shanxi Province of China. The 4# Coal Seam of this coal mine is the thick coal seam with shallow buried depth, which has the thickness of 12\xa0m and the depth of 180\xa0m in average. This paper focuses on predicting the distribution of ground fissures and water-conducted fissures induced by the exploiting of the 4# Coal Seam. Discussion and evaluation: We first create a 3D computational model, and then use FLAC3D software to simulate the mining of coal seam. We then calculate the displacements and tensile strain of the ground surface and strata, and predict the distribution of the ground fissures and water-conducted fissures. Finally, we further analyze the possibility of the perviousness and air leakage of the coal mine on the basis of the predicted distribution of fissures. Conclusions: The prediction results indicate that: (1) the water-conducted fissures are strongly developed and go through the Neogene aquifuge in some region; thus, it may lead to potential perviousness of coal mine; (2) part of these water-conducted fissures connect with the ground fissures; and this behavior may cause the risk of air leakage.

摘要: 介绍:使用顶煤放顶开采具有浅埋深的厚煤层很可能对稳定性产生强烈的负面影响。案例描述:安家岭一号地下矿山位于中国山西省朔州市。该煤矿的4 \#煤层是较厚的煤层,埋深较浅,其厚度为12 \ xa0m,平均深度为180 \ xa0m。本文的重点是预测由4 \#煤层的开采引起的地裂缝和导水裂缝的分布。讨论和评估:我们首先创建3D计算模型,然后使用FLAC3D软件来模拟煤层的开采。然后,我们计算地表和地层的位移和拉伸应变,并预测地裂缝和导水裂缝的分布。最后,在预测的裂隙分布的基础上,我们进一步分析了煤矿渗透性和漏气的可能性。结论:预测结果表明:(1)导水裂缝强烈发育并经过某些地区的新近纪含水层;因此,它可能导致煤矿的渗透性; (2)这些导水裂缝的一部分与地面裂缝相连;并且此行为可能会导致漏气的风险。

下载地址 | 返回目录 | [10.1186/s40064-016-2609-3]

[107] Predicting unusual energy consumption events from smart home sensor network by data stream mining with misclassified recall (2018)

Abstract: With the popularity and affordability of ZigBee wireless sensor technology, IoT-based smart controlling system for home appliances becomes prevalent for smart home applications. From the data analytics point of view, one important objective from analyzing such IoT data is to gain insights from the energy consumption patterns, thereby trying to fine-tune the energy efficiency of the appliance usage. The data analytics usually functions at the back-end crunching over a large archive of big data accumulated over time for learning the overall pattern from the sensor data feeds. The other objective of the analytics, which may often be more crucial, is to predict and identify whether an abnormal consumption event is about to happen. For example, a sudden draw of energy that leads to hot spot in the power grid in a city, or black-out at home. This dynamic prediction is usually done at the operational level, with moving data stream, by data stream mining methods. In this paper, an improved version of very fast decision tree (VFDT) is proposed, which learns from misclassified results for the sake of filtering the noisy data from learning and maintaining sharp classification accuracy of the induced prediction model. Specifically, a new technique called misclassified recall (MR), which is a pre-processing step for self-rectifying misclassified instances, is formulated. In energy data prediction, most misclassified instances are due to data transmission errors or faulty devices. The former case happens intermittently, and the errors from the latter cause may persist for a long time. By caching up the data at the MR pre-processor, the one-pass online model learning can be effectively shielded in case of intermitting problems at the wireless sensor network; likewise the stored data could be investigated afterwards should the problem persist for long. Simulation experiments over a dataset about predicting exceptional appliances energy use in a low energy building are conducted. The reported results validate the efficacy of the new methodology VFDT + MR, in comparison to a collection of popular data stream mining algorithms from the literature.

摘要: 随着ZigBee无线传感器技术的普及和负担得起,基于IoT的家用电器智能控制系统在智能家居应用中变得越来越普遍。从数据分析的角度来看,分析此类物联网数据的一个重要目标是从能源消耗模式中获得见解,从而尝试微调设备使用的能源效率。数据分析通常在后端处理大量随时间累积的大数据档案,以从传感器数据馈送中了解整体模式。分析的另一个目标(通常可能更为关键)是预测和识别异常消费事件是否即将发生。例如,突然的能量消耗会导致城市电网中出现热点,或者导致家庭停电。动态预测通常是通过数据流挖掘方法在移动数据流的操作级别上完成的。本文提出了一种改进的超快速决策树(VFDT),它可以从错误分类的结果中学习,从而过滤掉学习中的噪声数据并保持诱导预测模型的清晰分类精度。具体来说,制定了一种称为误分类召回(MR)的新技术,该技术是自我纠正误分类实例的预处理步骤。在能源数据预测中,大多数错误分类的实例是由于数据传输错误或设备故障造成的。前一种情况是间歇性发生的,后一种原因的错误可能会持续很长时间。通过在MR预处理器中缓存数据,可以在无线传感器网络出现间歇性问题的情况下有效地屏蔽一次在线模型学习。同样,如果问题持续很长时间,则可以在以后检查存储的数据。在数据集上进行了有关预测低能耗建筑物中异常电器能耗的模拟实验。与从文献中收集的流行数据流挖掘算法相比,报告的结果验证了新方法VFDT + MR的有效性。

下载地址 | 返回目录 | [10.1007/s12652-018-0685-7]

[108] Proceedings of 2016 SAI Intelligent Systems Conference (IntelliSys) (2016)

Abstract: Process mining techniques focus on extracting in- sight in processes from event logs. In many cases, events recorded in the event log are too fine-grained, causing process discovery algorithms to discover incomprehensible process models or pro- cess models that are not representative of the event log. We show that when process discovery algorithms are only able to discover an unrepresentative process model from a low-level event log, structure in the process can in some cases still be discovered by first abstracting the event log to a higher level of granularity. This gives rise to the challenge to bridge the gap between an original low-level event log and a desired high-level perspective on this log, such that a more structured or more comprehensible process model can be discovered. We show that supervised learning can be leveraged for the event abstraction task when annotations with high-level interpretations of the low-level events are available for a subset of the sequences (i.e., traces). We present a method to generate feature vector representations of events based on XES extensions, and describe an approach to abstract events in an event log with Condition Random Fields using these event features. Furthermore, we propose a sequence-focused metric to evaluate supervised event abstraction results that fits closely to the tasks of process discovery and conformance checking. We conclude this paper by demonstrating the usefulness of supervised event abstraction for obtaining more structured and/or more comprehensible process models using both real life event data and synthetic event data.

摘要: 过程挖掘技术着重于从事件日志中提取过程中的可见性。在许多情况下,事件日志中记录的事件的粒度太细,导致流程发现算法发现无法理解的流程模型或不代表事件日志的过程模型。我们表明,当流程发现算法仅能从低级事件日志中发现不具代表性的流程模型时,在某些情况下,仍然可以通过首先将事件日志抽象到更高的粒度级别来发现流程中的结构。这就提出了挑战,要求弥合原始低级事件日志和该日志上所需的高级视角之间的差距,从而可以发现更结构化或更易于理解的过程模型。我们显示,当具有低层事件的高级解释的注释可用于序列的子集(即轨迹)时,监督学习可用于事件抽象任务。我们提出了一种基于XES扩展生成事件的特征矢量表示的方法,并描述了使用这些事件特征使用条件随机字段在事件日志中抽象事件的方法。此外,我们提出了一种以序列为中心的度量来评估受监督事件抽象结果,该结果非常适合于过程发现和一致性检查的任务。我们通过证明监督事件抽象在使用现实事件数据和综合事件数据获得更结构化和/或更易理解的过程模型方面的有用性来结束本文。

下载地址 | 返回目录 | [10.1007/978-3-319-56994-9]

[109] Proceedings of International Conference on Recent Advancement on Computer and Communication (2018)

Abstract: A patterning method for the generation of epitaxial CoSi2 nanostructures was developed based on anisotropic diffusion of Co/Si atoms in a stress field during rapid thermal oxidation (RTO). The stress field is generated along the edge of a mask consisting of a thin SiO2 layer and a Si3N4 layer. During RTO of the masked suicide structure, a well-defined separation of the suicide layer forms along the edge of the mask. The technique was used to make 50-nm channel-length metal-oxide-semiconductor field-effect transistors (MOSFETs). These highly uniform gaps define the channel region of the fabricated device. Two types of MOSFETs have been fabricated: symmetric transistor structures, using the separated suicide layers as Schottky source and drain, and asymmetric transistors, with n+ source and Schottky drain. The asymmetric transistors were fabricated by an ion implantation into the unprotected CoSi2 layer and a subsequent out diffusion to form the n+ source. The detailed fabrication process as well as the I- V characteristics of both the symmetric and asymmetric transistor structures will be presented. {\textcopyright} 2004 American Institute of Physics.

摘要: 基于快速热氧化(RTO)过程中应力场中Co / Si原子在应力场中的各向异性扩散,开发了一种用于生成外延CoSi2纳米结构的图案化方法。沿由薄SiO2层和Si3N4层组成的掩模的边缘产生应力场。在被掩盖的硅化物结构的RTO期间,沿着掩膜的边缘形成了明确定义的硅化物层分离。该技术用于制造50 nm沟道长度的金属氧化物半导体场效应晶体管(MOSFET)。这些高度均匀的间隙限定了所制造的器件的沟道区域。已经制造出两种类型的MOSFET:使用分离的硅化物层作为肖特基源极和漏极的对称晶体管结构,以及具有n +源极和肖特基漏极的非对称晶体管。通过将离子注入到未保护的CoSi2层中并随后向外扩散以形成n +源,来制造不对称晶体管。将介绍对称和非对称晶体管结构的详细制造过程以及I-V特性。 {\ t​​extcopyright} 2004年美国物理研究所。

下载地址 | 返回目录 | [10.1007/978-981-10-8198-9]

[110] Product Lifecycle Management for Digital Transformation of Industries (2016)

Abstract: In this review article, some key challenges in drug delivery are first introduced and methods that have been applied in attempts to solve them enumerated. Particularly intractable problems are highlighted: these include issues of solubility, targeting and drug degradation. The technique of electrospinning is subsequently introduced, and the influence of processing parameters on the fibers produced discussed. The potential of electrospun nanofibers in drug delivery is then explored, with examples given from the recent literature to illustrate how fibers can be used to overcome hurdles in drug solubility, degradation and targeting. Future perspectives and challenges are also considered.

摘要: 在这篇评论文章中,首先介绍了药物输送中的一些关键挑战,并列举了试图解决这些挑战的方法。突出了特别棘手的问题:这些问题包括溶解性,靶向性和药物降解问题。随后介绍了电纺丝技术,并讨论了加工参数对生产的纤维的影响。然后,研究了电纺纳米纤维在药物递送中的潜力,并从最近的文献中给出了一些实例,以说明如何使用纤维克服药物溶解性,降解和靶向方面的障碍。还考虑了未来的观点和挑战。

下载地址 | 返回目录 | [10.1007/978-3-319-54660-5]

[111] Quaternion Watershed Transform (2019)

Abstract: Blockchain is a distributed network based ledger that is secured by the methods of cryptographic proof. It enables the creation of self-executable digital contracts i.e. smart contracts. This technology is working in collaboration with major areas of research including governance, IoT, health, banking and education. It has anticipated revolutionary ways, which helps us to overcome the problems of governance such as human error, voting, privacy of data, security and food safety. In governance, there is a need to ameliorate the services and facilities with the assistance of blockchain technology. This paper aims to explore the issues of governance which can be resolved with the assistance of Blockchain features. Furthermore this paper also provides the future work directions. {\textcopyright} 2018 The Science and Information (SAI) Organization Limited.

摘要: Blockchain是一个基于分布式网络的分类帐,通过加密证明的方法加以保护。它可以创建可自我执行的数字合同,即智能合同。该技术正在与主要研究领域(包括治理,物联网,健康,银行和教育)合作。它预见了革命性的方式,可以帮助我们克服人为错误,投票,数据隐私,安全和食品安全等治理问题。在治理中,需要借助区块链技术来改善服务和设施。本文旨在探讨可以借助区块链功能解决的治理问题。此外,本文还提供了未来的工作方向。 {\ t​​extcopyright} 2018年科学与信息(SAI)组织有限公司。

下载地址 | 返回目录 | [10.1007/978-3-030-14802-7]

[112] Queue mining for delay prediction in multi-class service processes (2015)

Abstract: Information systems have been widely adopted to support service processes in various domains, e.g., in the telecommunication, finance, and health sectors. Information recorded by systems during the operation of these processes provides an angle for operational process analysis, commonly referred to as process mining. In this work, we establish a queueing perspective in process mining to address the online delay prediction problem, which refers to the time that the execution of an activity for a running instance of a service process is delayed due to queueing effects. We present predictors that treat queues as first-class citizens and either enhance existing regression-based techniques for process mining or are directly grounded in queueing theory. In particular, our predictors target multi-class service processes, in which requests are classified by a type that influences their processing. Further, we introduce queue mining techniques that derive the predictors from event logs recorded by an information system during process execution. Our evaluation based on large real-world datasets, from the telecommunications and financial sectors, shows that our techniques yield accurate online predictions of case delay and drastically improve over predictors neglecting the queueing perspective.

摘要: 信息系统已被广泛采用,以支持电信,金融和卫生部门等各个领域的服务流程。系统在这些过程的操作过程中记录的信息为操作过程分析(通常称为过程挖掘)提供了一个角度。在这项工作中,我们建立了进程挖掘中的排队透视图,以解决在线延迟预测问题,该问题指的是由于排队效应而导致服务进程的运行实例的活动执行延迟的时间。我们提供了将队列视为一流公民的预测变量,或者增强了现有的基于回归的过程挖掘技术,或者直接基于排队论。特别是,我们的预测变量以多类服务流程为目标,其中按影响其处理的类型对请求进行分类。此外,我们引入了队列挖掘技术,该技术从信息系统在流程执行期间记录的事件日志中导出预测变量。我们基于来自电信和金融领域的大型现实数据集的评估表明,我们的技术可以提供准确的在线案例延迟预测,并且相对于忽略排队论的预测指标,它们的性能得到了显着提高。

下载地址 | 返回目录 | [10.1016/j.is.2015.03.010]

[113] Study of a roof water inrush prediction model in shallow seam mining based on an analytic hierarchy process using a grey relational analysis method (2018)

Abstract: “Mining-induced water inrush is a sudden and destructive underground disaster caused by a mining disturbance. This disaster occurs frequently in the northern region of Shaanxi province in China due to overburden fractures in shallow seam mining, which pose a great threat to residents safety. It is therefore essential to construct an accurate prediction model. This study first applies selection hierarchy analysis to the main controlling factors of roof water inrush to study their weights using an analytic hierarchy process (AHP) including five factors: surface water catchment features, wateriness of the aquifer, water-resistant characteristics of aquiclude, combined influence of overburden, and mining disturbance characteristics. The grey relational analysis (GRA) method is used to calculate the correlation degree of each water inrush. The AHP-GRA method presents a comprehensive evaluative model combining the advantages of both approaches to analyze mining safety. Qualitative and quantitative indicators of the roof water inrush prediction model in shallow seam mining are established. Secondly, risk prediction of roof water inrush points and comprehensive water inrush is determined using engineering examples from the Hanjiawan coal mine. Results indicate that during safety mining, water inflow data are consistent with our prediction, thereby substantiating the models accuracy and providing a new method for predicting roof water inrush in shallow seam mining.”

摘要: “采矿引起的突水是由采矿扰动引起的突发性破坏性地下灾害。由于在浅层煤层开采中出现了覆盖层裂缝,该灾害在中国陕西北部地区屡见不鲜,这对居民的生活造成了巨大威胁。因此,构建准确的预测模型至关重要。本研究首先将选择层次分析应用于屋顶突水的主要控制因素,并使用层次分析法(AHP)来研究其权重,其中包括五个因素:地表水集水特征,含水层的含水量,含水层的防水特性,上覆岩层的综合影响和采矿干扰特性,采用灰色关联分析法(GRA)来计算各次突水的相关程度,AHP-GRA方法提出了一种综合的方法。评估方法结合两种方法的优势来分析采矿安全性建立了浅煤层开采顶板突水预测模型的确定指标。其次,利用汉家湾煤矿的工程实例确定了屋面突水点和综合突水的风险预测。结果表明,在安全开采过程中,入水数据与我们的预测相符,从而证实了该模型的准确性,并为预测浅煤层开采的顶板涌水量提供了一种新方法。”

下载地址 | 返回目录 | [10.1007/s12517-018-3498-2]

[114] Time prediction based on process mining (2011)

Abstract: Process mining allows for the automated discovery of process models from event logs. These models provide insights and enable various types of model-based analysis. This paper demonstrates that the discovered process models can be extended with information to predict the completion time of running instances. There are many scenarios where it is useful to have reliable time predictions. For example, when a customer phones her insurance company for information about her insurance claim, she can be given an estimate for the remaining processing time. In order to do this, we provide a configurable approach to construct a process model, augment this model with time information learned from earlier instances, and use this to predict e.g., the completion time. To provide meaningful time predictions we use a configurable set of abstractions that allow for a good balance between “overfitting” and “underfitting”. The approach has been implemented in ProM and through several experiments using real-life event logs we demonstrate its applicability. {\textcopyright} 2010 Elsevier B.V. All rights reserved.

摘要: 过程挖掘允许从事件日志中自动发现过程模型。这些模型提供了见解,并可以进行各种类型的基于模型的分析。本文演示了可以使用信息扩展发现的流程模型,以预测正在运行的实例的完成时间。在许多情况下,进行可靠的时间预测很有用。例如,当客户打电话给她的保险公司以获取有关其保险索赔的信息时,可以为她提供剩余处理时间的估计。为了做到这一点,我们提供了一种可配置的方法来构建过程模型,并使用从较早实例中获悉的时间信息来扩展该模型,并使用其来预测例如完成时间。为了提供有意义的时间预测,我们使用一组可配置的抽象,以便在“过度拟合”和“欠拟合”之间取得良好的平衡。该方法已在ProM中实施,并且通过使用现实事件日志的多次实验,我们证明了其适用性。 {\ t​​extcopyright} 2010 Elsevier B.V.保留所有权利。

下载地址 | 返回目录 | [10.1016/j.is.2010.09.001]

[115] Unbiased data mining identifies cell cycle transcripts that predict non-indolent Gleason score 7 prostate cancer 11 Medical and Health Sciences 1112 Oncology and Carcinogenesis (2019)

Abstract: Background: Patients with newly diagnosed non-metastatic prostate adenocarcinoma are typically classified as at low, intermediate, or high risk of disease progression using blood prostate-specific antigen concentration, tumour T category, and tumour pathological Gleason score. Classification is used to both predict clinical outcome and to inform initial management. However, significant heterogeneity is observed in outcome, particularly within the intermediate risk group, and there is an urgent need for additional markers to more accurately hone risk prediction. Recently developed web-based visualization and analysis tools have facilitated rapid interrogation of large transcriptome datasets, and querying broadly across multiple large datasets should identify predictors that are widely applicable. Methods: We used camcAPP, cBioPortal, CRN, and NIH NCI GDC Data Portal to data mine publicly available large prostate cancer datasets. A test set of biomarkers was developed by identifying transcripts that had: 1) altered abundance in prostate cancer, 2) altered expression in patients with Gleason score 7 tumours and biochemical recurrence, 3) correlation of expression with time until biochemical recurrence across three datasets (Cambridge, Stockholm, MSKCC). Transcripts that met these criteria were then examined in a validation dataset (TCGA-PRAD) using univariate and multivariable models to predict biochemical recurrence in patients with Gleason score 7 tumours. Results: Twenty transcripts met the test criteria, and 12 were validated in TCGA-PRAD Gleason score 7 patients. Ten of these transcripts remained prognostic in Gleason score 3 + 4 = 7, a sub-group of Gleason score 7 patients typically considered at a lower risk for poor outcome and often not targeted for aggressive management. All transcripts positively associated with recurrence encode or regulate mitosis and cell cycle-related proteins. The top performer was BUB1, one of four key MIR145-3P microRNA targets upregulated in hormone-sensitive as well as castration-resistant PCa. SRD5A2 converts testosterone to its more active form and was negatively associated with biochemical recurrence. Conclusions: Unbiased mining of large patient datasets identified 12 transcripts that independently predicted disease recurrence risk in Gleason score 7 prostate cancer. The mitosis and cell cycle proteins identified are also implicated in progression to castration-resistant prostate cancer, revealing a pivotal role for loss of cell cycle control in the latter.

摘要: 背景:使用血液前列腺特异性抗原浓度,肿瘤T类别和肿瘤病理学Gleason评分,通常将患有新诊断的非转移性前列腺腺癌的患者分为疾病进展的低,中或高风险。分类既可用于预测临床结果,又可用于初始治疗。但是,在结果中观察到明显的异质性,尤其是在中等风险组中,并且迫切需要其他标记来更准确地进行风险预测。最近开发的基于Web的可视化和分析工具促进了对大型转录组数据集的快速查询,并且在多个大型数据集之间进行广泛的查询应确定可广泛应用的预测因子。方法:我们使用camcAPP,cBioPortal,CRN和NIH NCI GDC数据门户来收集可公开获得的大型前列腺癌数据集的数据。通过识别以下转录物开发了一套生物标志物测试集:1)改变了前列腺癌的丰度,2)格里森评分为7的肿瘤和生化复发患者的表达发生了改变,3)在三个数据集之间生化复发之前表达与时间的相关性(剑桥,斯德哥尔摩,MSKCC)。然后使用单变量和多变量模型在验证数据集(TCGA-PRAD)中检查符合这些标准的转录本,以预测Gleason评分为7肿瘤患者的生化复发。结果:20份成绩单符合测试标准,其中12份在TCGA-PRAD Gleason评分7名患者中得到验证。这些成绩单中有十个在格里森评分为3 + 4 = 7的情况下仍具有预后性,格里森评分为7的亚组患者通常被认为具有较低的不良预后风险,并且通常不作为积极治疗的目标。与复发呈正相关的所有转录本均编码或调节有丝分裂和与细胞周期相关的蛋白质。表现最佳的是BUB1,这是在激素敏感性和去势抵抗性PCa中上调的四个关键MIR145-3P microRNA目标之一。 SRD5A2将睾丸激素转化为更活跃的形式,并且与生化复发呈负相关。结论:大型患者数据集的无偏倚挖掘确定了12个转录本,这些转录本独立地预测了格里森评分7前列腺癌的疾病复发风险。鉴定出的有丝分裂和细胞周期蛋白也与去势抵抗性前列腺癌有关,揭示了后者丧失细胞周期控制的关键作用。

下载地址 | 返回目录 | [10.1186/s12894-018-0433-5]

[116] Use of data mining to predict significant factors and benefits of bilateral cochlear implantation (2015)

Abstract: Data mining (DM) is a technique used to discover pattern and knowledge from a big amount of data. It uses artificial intelligence, automatic learning, statistics, databases, etc. In this study, DM was successfully used as a predictive tool to assess disyllabic speech test performance in bilateral implanted patients with a success rate above 90\xa0{\%}. 60 bilateral sequentially implanted adult patients were included in the study. The DM algorithms developed found correlations between unilateral medical records and Audiological test results and bilateral performance by establishing relevant variables based on two DM techniques: the classifier and the estimation. The nearest neighbor algorithm was implemented in the first case, and the linear regression in the second. The results showed that patients with unilateral disyllabic test results below 70\xa0{\%} benefited the most from a bilateral implantation. Finally, it was observed that its benefits decrease as the inter-implant time increases.

摘要: 数据挖掘(DM)是一种用于从大量数据中发现模式和知识的技术。它使用人工智能,自动学习,统计数据,数据库等。在这项研究中,DM成功地用作评估双侧植入患者双音节语音测试性能的预测工具,成功率高于90 \ xa0 {\%}。该研究包括60例依次植入的双侧成年患者。通过基于两种DM技术(分类器和估计)建立相关变量,开发出的DM算法发现了单方面病历与听觉测试结果与双边性能之间的相关性。在第一种情况下实现了最近邻居算法,在第二种情况下实现了线性回归。结果表明,单侧双音节测试结果低于70 \ xa0 {\%}的患者从双边植入中受益最大。最后,观察到随着植入时间的延长,其益处降低。

下载地址 | 返回目录 | [10.1007/s00405-014-3337-3]

[117] Workload Prediction of Business Processes – An Approach Based on Process Mining and Recurrent Neural Networks (2020)

Abstract: Recent advances in the interconnectedness and digitization of industrial machines, known as Industry 4.0, pave the way for new analytical techniques. Indeed, the availability and the richness of production-related data enables new data-driven methods. In this paper, we propose a process mining approach augmented with artificial intelligence that (1) reconstructs the historical workload of a company and (2) predicts the workload using neural networks. Our method relies on logs, representing the history of business processes related to manufacturing. These logs are used to quantify the supply and demand and are fed into a recurrent neural network model to predict customer orders. The corresponding activities to fulfill these orders are then sampled from history with a replay mechanism, based on criteria such as trace frequency and activities similarity. An evaluation and illustration of the method is performed on the administrative processes of Heraeus Materials SA. The workload prediction on a one-year test set achieves an MAPE score of 19{\%} for a one-week forecast. The case study suggests a reasonable accuracy and confirms that a good understanding of the historical workload combined to articulated predictions are of great help for supporting management decisions and can decrease costs with better resources planning on a medium-term level.

摘要: 工业机器的互连性和数字化的最新进展,即工业4.0,为新的分析技术铺平了道路。实际上,与生产相关的数据的可用性和丰富性启用了新的数据驱动方法。在本文中,我们提出了一种增强了人工智能的过程挖掘方法,该方法(1)重构公司的历史工作量,(2)使用神经网络预测工作量。我们的方法依赖于日志,该日志表示与制造相关的业务流程的历史记录。这些日志用于量化供需,并被输入到递归神经网络模型中以预测客户订单。然后,根据诸如跟踪频率和活动相似性之类的标准,使用重播机制从历史记录中采样满足这些订单的相应活动。在Heraeus Materials SA的管理过程中对该方法进行了评估和说明。一年测试集上的工作负载预测在一周预测中的MAPE得分达到19 %。案例研究提出了合理的准确性,并证实了对历史工作量与明确的预测相结合的良好理解对于支持管理决策有很大帮助,并且可以通过中期计划更好的资源计划来降低成本。

下载地址 | 返回目录 | []


中文摘要仅供参考