Day 1 :
Indiana University, USA
Time : 10:15-11:00
Thomas Sterling is a Professor of Intelligent Systems Engineering at the Indiana University School of Informatics and Computing. He serves as the Chief Scientist and Associate Director of the Center for Research in Extreme Scale Technologies (CREST). After receiving his PhD from MIT in 1984 as a Hertz Fellow, he has been engaged in research fields associated with Parallel Computing System Structures and Semantics. He is the co-author of 6 books and holds 6 patents. He was the Recipient of the 2013 Vanguard Award.
Data analytics in its many forms has rapidly expanded to engage scientific, industrial, and societal application domains. But as more problem spaces yield to this expanding genre of computing, the demand for capabilities is expanding. Simultaneously, high performance computing (HPC) systems and methods is experiencing significant change in form and function with the asymptotic convergence with nano-scale semiconductor feature size and the end of Moore’s law even with exascale performance anticipated in the early years of the next decade. Historically these two processing domains have been largely independent but now a growing consensus is driving them together, aligning their respective modalities and catalyzing a synergistic convergence. A major premise of US Presidential Executive Order leading to the National Strategic Computing Initiative stipulates that the merger of big data and numeric intensive computing be a constituent of national exascale charter. This presentation will describe the significant shift in system architecture and operational methodologies that will be required to simultaneously respond to the challenges of the end of Moore’s law and the graph processing approaches, potentially dynamic that will augment the more conventional matrix-vector oriented computation. It will discuss the likely importance of dynamic adaptive resource management and task scheduling essential to dramatic improvements in scalability and efficiency for exascale computing and how these changes will be applied to knowledge discovery.
University of Derby, UK
Time : 11:20-12:05
Fionn Muragh is Professor of Data Science and previously he was into Big Data in Education, Astrophysics and Cosmology. He was the Director of National Research Funding across many domains including Computing & Engineering, Energy, Nanotechnology and Photonics. He has been the Professor of Computer Science, including Head of Department, and Head of School at many universities. He was the Editor-in-Chief of the Computer Journal for more than 10 years, and is a Member of the Editorial Boards of many other journals.
Geometric data analysis allows for “letting the data speak” and integrates qualitative and quantitative analytics. Scope and potential are major in many fields. Case studies here are large scale social media analytics, related to an area of social practice and an area of health and well-being. The interesting survey of Keiding and Louis, “Perils and potentials of self-selected entry to epidemiological studies and surveys” points to very interesting issues in big data analytics. My contribution is in the discussion part of this paper. Through the geometry and topology of data and information, with inclusion of context, of chronology and of frame-models, we are addressing such issues of sampling and representativity. The case studies to be discussed in this presentation are related to mental health and to social entertainment events and contexts in the latter case with many millions of Twitter tweets, using many languages. Particular consideration is given to use and implementation of our analytical perspectives. This includes determining the information content of our data clouds, and of mapping onto Euclidean-distance endowed semantic factor spaces, as well as the ultrametric or hierarchical topology, that is characteristic of all forms of complex systems.
InfoCodex AG-Semantic Technologies, Switzerland
Time : 12:05-12:50
Carlo A Trugenberger has earned his PhD in Theoretical Physics in 1988 at the Swiss Federal Institute of Technology, Zürich and his Master’s in Economics in 1997 from Bocconi University, Milano. An international academic career in theoretical physics (MIT, Los Alamos National Laboratory, CERN Geneva, Max Planck Institute Münich) led him to the position of Associate Professor of Theoretical Physics at Geneva University. In 2001, he decided to quit academia and to exploit his expertise in Information Theory, Neural Networks and Machine Intelligence to design an innovative semantic technology and co-founded the company InfoCodex AG-Semantic Technologies, Switzerland.
The majority of big data is unstructured and of this majority the largest chunk is text. While data mining techniques are well developed and standardized for structured data; numerical data, the realm of unstructured data is still largely unexplored. The general focus lies on information extraction, which attempts to retrieve known information from text. The Holy Grail however is knowledge discovery, where machines are expected to unearth entirely new facts and relations that were not previously known by any human expert. Indeed, understanding the meaning of text is often considered as one of the main characteristics of human intelligence. The ultimate goal of semantic artificial intelligence is to devise software that can understand the meaning of free text, at least in the practical sense of providing new, actionable information condensed out of a body of documents. As a stepping stone on the road to this vision I will introduce a totally new approach to drug research, namely that of identifying relevant information by employing a self-organizing semantic engine to text mine large repositories of biomedical research papers, a technique pioneered by Merck with the InfoCodex software. I will describe the methodology and a first successful experiment for the discovery of new biomarkers and phenotypes for diabetes and obesity on the basis of PubMed abstracts, public clinical trials and Merck internal documents. The reported approach shows much promise and has potential to impact fundamentally pharmaceutical research as a way to shorten time-to-market of novel drugs, and for early recognition of dead ends.
- Data Mining Applications in Science, Engineering, Healthcare and Medicine | Artificial Intelligence | Optimization and Big Data | Data Mining analysis | Business analytics | OLAP technologies | Big Data algorithm | ETL (Extract, Transform and Load) | New visualization techniques
Yedidi Narasimha Murty
Electronic Arts, USA
Cleveland State University, USA
Time : 13:40-14:10
Iftikhar U Sikder is an Associate Professor jointly appointed in the Department of Information Science, Department of Electrical Engineering and Computer Science at Cleveland State University, USA. His research interests include soft computing, granular computing, data mining and collaborative decision support systems. His papers appeared in the Journal of Risk Analysis, Expert Systems with Applications, International Journal of Mobile Communications, Information Resources Management Journal, International Journal of Management & Decision Making, and International Journal of Aerospace Survey and Earth Sciences. He has authored many book chapters and presented papers in many national and international conferences.
This paper presents a novel approach to spatial classification and prediction of land cover classes using rough set and evidence theory. In particular, it presents an approach to characterizing uncertainty in multisource supervised classification problem. The evidential structure of spatial classification is founded on the notions of equivalence relations of rough set. It allows expressing spatial concepts in terms of approximation space wherein a decision class can be approximated through the partition of boundary region. A key advantage of this approach is that it allows incorporating the context of spatial neighborhood interactions and the combination of multiple spatial evidences. The empirical result demonstrates that the model classifier’s predictive performance significantly improves the accuracy of classification. A comparison of the predictive performance of the model with the radial basis function-based artificial neural network algorithm shows that the predictive performance of the proposed model is significantly better than neural network model.
S A T E - Systems and Advanced Technologies Engineering Srl, Italy
Title: Simulation, modeling and diagnostics: From the health of spacecrafts to the health of astronauts
Time : 14:10-14:40
Jacopo Biancat has graduated in Control System Engineering from the University of Padua (2007). In 2007, he started working in the field of Industrial Automation. He joined S A T E in 2008 Leading the Research and Development group. In 2013, he moved to company Attain IT, undertaking the position of Innovation Manager, having experience in technical coordination of research projects, data analysis, knowledge extraction, system identification and diagnostics and software development. He has published 7 technical papers and has filed 14 patents.
The use of simulation and modeling allows gaining knowledge of the system under observation. This may help during the design phase of any systems, e.g. an industrial plant, because it allows a better understanding of the relationships among the several subsystems and processes involved or it may be used for diagnostics purposes, to automatically identify normal observations of a system (e.g. a spacecraft) and any possible new phenomenon and identify possible unknown relationships among subsystems or processes. Among the several possible approaches available in diagnostics, those based on data mining techniques are on the rise, because they allow the interpretation of large amounts of heterogeneous data using no or very little a priori knowledge, which saves resources and time. Given their generality, these methods may be tailored to a variety of sectors addressing different scopes. SATE successfully applied them, under contract with the European Space Agency, both in the space sector to telemetry data to monitor the health of satellites and to the medical sector to standard medical data, aiming at the improvement of astronaut medical autonomy during space missions. These works led to the development of two software prototype, KETTY (Knowledge Extraction Tool from TelemetrY) and CLUE (Tool for Clinical Laboratory data Understanding and knowledge Extraction), tailored, to the analysis and extraction of knowledge, the former from telemetry data, and the latter from medical laboratory data.
Principal Mining Applications and Performance Engineer| Mining | Client Solutions - WesTrac CAT
Title: Mining performance management applying data analytics and artificial intelligence methodologies
Time : 14:40-15:10
Burcin Ozturk Demirhanoz has been in and professionally in mining and machinery industry more than 12 years in Europe and in Australia. She has completed her BSc, Mining Engineering (ITU, Istanbul) and MEng, Mining Engineering, Mine Management and also Business Administration (UNSW, Sydney). She is currently working as a Principal Mining Applications and Performance Engineer at WesTrac CAT and leading mining applications performance optimization projects for best practices at Western Australia region for some of the biggest mining companies in the world. She is also Six Sigma trained, Black Belt and Project Manager qualified. Her industry research interest are mining performance analysis and modelling applying AI (Artificial Intelligence) methodologies within data analytics and machine learning.
In mining industry, efficient and cost effective project development is critical to be succeeded since it is a long term business and also because of the global economic concerns need new approaches more than before. Projects are depending on new investment decisions nowadays, are in terms of innovative technology for mining applications and performance management. One of the serious tasks faced by observing and monitoring methodologies to ally data science into detailed engineering applying scientific hypothesis-testing approach, essentially not only optimizing the algorithms but also generating new hypothesis to monitor and to improve the efficiency. Modern wireless based management systems and applications for mining equipment fleets are capable of collecting vast amounts of equipment health and mining performance data. However, when performance and machine health deviates from desired target levels, it can sometimes be difficult to determine the root cause. This is because data relating to the operating environment or maintenance actions taken often reside in different data bases, applying different fields including database design, statistics, pattern recognition, machine learning, and data visualization. This “silo” approach to data often inhibits the extent to which evidence-based root causes can be determined and generate cost modeling in advance due to actual. These study hypotheses that there is significant value to be had by integrating data from different sources and using this to determine and manage root cause of performance and machine health problems in advance. It aims to demonstrate the potential for value by undertaking a number of case studies using data collected across a number of Western Australian mining operations.
Hongfei Li is a Principal Data Scientist and Manager of Data Science Team in IBM Analytics, NY, USA. She obtained her PhD in Statistics from Department of Statistics in the Ohio State University and has published many papers in the top journals. She has given presentations in many conferences in the areas of Statistics, Machine Learning, etc.
IBM announced the acquisition of The Weather Company (TWC) in Feb 2016. TWC enables to collect a larger variety and higher velocity of data sets from billions of IoT sensors around the world while also, serving out real-time information and insights to tens of millions of users worldwide. IBM data science team has turned data into deeper insight, confident decisions and faster in-time actions, with extensive analytics capabilities. We have mitigated the impact of weather and uncovered new opportunities for business. I will use several examples to illustrate the real industry applications of weather insights. For examples, insurers can use weather data to reduce claims and respond more effectively to policyholders. Utilities can predict, respond to and mitigate outages faster. Government agencies can better plan for weather disasters to protect citizens and key infrastructure.
Hong Kong Polytechnic University, Hong Kong
Time : 16:00-16:30
Hongqin Fan has completed his PhD from the University of Alberta, Canada in 2007. His areas of expertise are in Data Mining, Construction Equipment Management, and Construction Information Technology. He is currently an Associate Professor in the Department of Building and Real Estate at the Hong Kong Polytechnic University, Hong Kong. He has published more than 30 papers in the field of Data Mining, Computer Applications and decision support in Construction Engineering and Management.
With increasing automation and computerization of engineering domain applications, outlier mining has become increasingly important in detecting abnormal behaviors in engineering systems, observations on malpractice and poor management skills. A resolution-based outlier (RB-outlier) notion and RB-outlier mining algorithm is introduced to provide better solutions to outlier detection in engineering applications which differ substantially from the other domain areas. The RB-outlier notion is defined based on the concept of resolution change, i.e. change the scale of the data plots progressively from a high-resolution close view where no point has neighbors to a small-resolution distant view where all the points are condensed in one cluster. The features of each data point in terms of its neighborhood are captured and accumulated during this process to measure its degree of outlyingness. The RB-outlier mining algorithm generates outlier results by taking both local and global features of a dataset into account without requiring input of domain specific parameters which are usually unknown a priori. The RB-outliers can be used conveniently to rank and label top-n outliers for further investigation. Experimental tests on some engineering applications, including construction equipment fleet management, construction site operations, demonstrated its effectiveness and efficiency and on the other hand, the flexibility and robustness of the proposed algorithm allows it to be easily built into any real time monitoring system or decision support system for efficient outlier detection “on the fly”.
Sophia Tsoka is Senior Lecturer in Bioinformatics at the Department of Informatics, King’s College London. Prior to joining the college, she was Staff Scientist and Medical Research Council Fellow at the European Bioinformatics Institute in Cambridge, UK. Her expertise involves genome and disease data mining, analysis of protein interactions and community detection in complex networks. Recently, she has reported applications of these methodologies in the analysis of skin inflammation due to allergy and autoimmunity, including analysis of microbial communities in skin microbiome data.
Insight into molecular interactions at systems level is critical for biological discovery and the understanding of disease mechanisms. Bioinformatics and Systems Biology strategies aim to develop appropriate computational and mathematical characterization of biological systems, in order to provide a holistic view of system properties and dynamics. I will discuss recent work in developing data mining protocols to target protein interactions, so as to link network topological properties to the underlying molecular features. Such community detection approaches are based on combinatorial optimization principles, involve data from various high throughput experiments and span weighted, consensus, dynamic networks and overlapping communities. The use of such methodologies will be illustrated in the context of gene expression and microbiome analysis in skin inflammatory disorders, so as to reveal the implication of specific biochemical pathways and the interplay of host-microbiome interactions.