Scientific Program

Conference Series Ltd invites all the participants across the globe to attend 3rd International Conference on Big Data Analysis and Data Mining London,UK.

Day 2 :

Keynote Forum

Mikhail Moshkov

King Abdullah University of Science and Technology, Saudi Arabia

Keynote: Extensions of dynamic programming for decision tree study

Time : 9:30-10:15

Conference Series Data Mining 2016 International Conference Keynote Speaker Mikhail Moshkov photo
Biography:

Mikhail Moshkov is a Professor in the CEMSE Division at King Abdullah University of Science and Technology, Saudi Arabia. He earned his Master’s degree from Nizhni Novgorod State University, received his Doctorate from Saratov State University, and Habilitation from Moscow State University. In 2003, he has worked at the Institute of Computer Science, University of Silesia, in Poland. His main areas of research are Complexity of Algorithms, Combinatorial Optimization, and Machine Learning. He is has published 5 research papers in Springer.

Abstract:

In the presentation, we consider extensions of dynamic programming approach to the study of decision trees as algorithms for problem solving; as a way for knowledge extraction and representation, and as classifiers which for a new object given by values of conditional attributes, define a value of the decision attribute. These extensions allow us : (i) To describe the set of optimal decision trees; (ii) To count the number of these trees; (iii) To make sequential optimization of decision trees relative to different criteria; (iv) To find the set of Pareto optimal points for two criteria; and (v) To describe relationships between two criteria. The results include the minimization of average depth for decision trees sorting eight elements (this question was open since 1968), improvement of upper bounds on the depth of decision trees for diagnosis of 0-1 faults in read-once combinatorial circuits; existence of totally optimal (with minimum depth and minimum number of nodes) decision trees for Boolean functions; study of time-memory tradeoff for decision trees for corner point detection; study of relationships between number and maximum length of decision rules derived from decision trees; study of accuracy-size tradeoff for decision trees which allows us to construct enough small and accurate decision trees for knowledge representation; and decision trees that as classifiers, outperform often decision trees constructed by CART. The end of the presentation is devoted to the introduction to KAUST.

Keynote Forum

Robert S Laramee

Swansea University, UK

Keynote: Data mining with data visualization

Time : 10:15-11:00

Conference Series Data Mining 2016 International Conference Keynote Speaker Robert S Laramee photo
Biography:

Robert S Laramee received a Bachelor’s degree in Physics from the University of Massachusetts, Amherst. In 2000, he received a Master’s degree in Computer Science from the University of New Hampshire, Durham. He was awarded a PhD from the Vienna University of Technology, Austria at the Institute of Computer Graphics and Algorithms in 2005. From 2001 to 2006 he was a Researcher at the VRVis Research Center (www.vrvis.at) and a Software Engineer at AVL (www.avl.com) in the Department of Advanced Simulation Technologies. Currently, he is an Associate Professor in the Department of Computer Science at the Swansea University, Wales. His research interests are in the areas of Big Data Visualization, Visual Analytics, and Human-Computer Interaction. He has published more than 100 peer-reviewed papers in scientific.

Abstract:

Some people believe that we live in the age of information. I believe it’s much more accurate to say we live in the age of data. With the rapid advancement of big data storage technologies and the ever-decreasing costs of hardware, our ability to derive and store data is unprecedented. However, a large gap remains between our ability to generate and store large collections of complex, time-dependent data and our ability to derive useful information and knowledge from it. Data visualization leverages our most powerful sense, vision, in order to derive knowledge and gain insight into large, multivariate data sets that describe complicated and often time-dependent behavior. This talk presents data mining from the perspective of data visualization with three very different applications: Computational Fluid Dynamics (CFD), marine biology and rugby, showcasing some of visualizations strengths, weaknesses and goals. Data visualization is critical to successful data mining and extracting knowledge and insight from big data.

  • Big Data Applications | Data Mining Methods and Algorithms | Big data technologies | Open data
Location: London,UK
Speaker

Chair

Fionn Murtagh

University of Derby, UK

Session Introduction

Geervani Koneti

Geervani Koneti Tata Consultancy Services Ltd,India

Title: Application of feature selection algorithms for life sciences
Speaker
Biography:

Geervani Koneti is a graduate in Computer Science from Indian Institute of Technology, Jodhpur, India. She is working as a Researcher at the Innovation Labs Hyderabad, Tata Consultancy Services Limited, India. Her current research interests are on integrated data analytics with focus on development of new computational methods, automation and platform development that have the potential to address R&D productivity.

Abstract:

Pharmaceutical industry is faced with increased attrition of drug candidates at late stages of drug development mainly due to poor drug safety and efficacy profiles. Thus, there is growing interest within pharmaceutical industries to explore and exploit new innovative approaches to reduce the attrition rates and consequently, the overall cost of discovery of a drug. One such approach that can now, compared to the past, be realized more effectively is integrated and big data analytics. In an effort to address the R&D productivity, we are in process of developing a data analytics platform to facilitate predictive model building, informed decision making using analytics and effortless transition across various stages of drug discovery and development. While the exponential increase in data presents its own challenges, we are enhancing our models to utilize the advanced computing technology to perform the required analytics in real time. One such example we will discuss is parallelization of exhaustive search for feature selection and modelling. In the absence of the advanced computing resources, the frequently employed alternative is the use of heuristic methods, which often trade accuracy for time. However, it is significant to note that the reproducibility of most of these methods is questionable and few pharmaceutical companies are devising strategies to counter this issue too. Another approach we will be discussing in the talk is how to improve the reproducibility and reliability of the modelling techniques, in specific the random based feature selection methods used for model building.

Speaker
Biography:

Daniele Menezes Nascimento is currently a PhD candidate at the McGill School of Information Studies in Montreal, Canada. Her Doctoral project centers on exploring computing technologies that mediate citizen participation in public decision-making to improve liveability in developing cities. Previously, she completed MSc in Urban Informatics from Osaka City University, Japan; MBA in Strategic Marketing Management from Fundação Getúlio Vargas, Brazil. More generally, her research interests involve: Urban Data, Urban Informatics, Community Informatics and Knowledge Management.

Abstract:

A large quantity of data is generated when citizens voice their opinions online. This data represents citizen’s knowledge about their cities and has been used by public and private organizations to understand and support citizens’ needs. In order to support these needs, citizens as well as governments have developed Participatory Citizenship Applications (PCTs), the projects and platforms involving Information and Communication Technologies (ICTs) which help to gather, create and share information about urban issues (such as pot holes, dislocated garbage, report violent situations, etc.). PCTs have helped to raise awareness and solve problems in the city by mediating information and knowledge between citizens, the private and the public sector. However, our understanding about the nature of these platforms is still limited. In this research, I intend to advance our understanding about these platforms, by looking into the data that is produced, the information flows and their impact in improving urban livability in Brazil.

Speaker
Biography:

Burcin Ozturk Demirhanoz has been in and working professionally in mining and     machinery industry more than 14 years in Europe and in Australia. She has completed her BSc, Mining Engineering (ITU, Istanbul) and MEng, Mining Engineering, Mine Management and also Business Administration (UNSW, Sydney).  She is currently working as a Principal Mining Applications and Performance Engineer at WesTrac CAT and leading mining applications performance projects at Western Australia region for some of the biggest mining companies in the world. She is currently PhD candidate at UNSW, Sydney and her research interest are mining performance analysis and modelling applying AI (Artificial Intelligence) methodologies within data analytics and machine learning.

Abstract:

In mining industry, efficient and cost effective project development is critical to be succeeded since it is a long term business and also because of the global economic concerns need new approaches more than before. Projects are depending on new investment decisions nowadays, are in terms of innovative technology for mining applications and performance management. One of the serious tasks faced by observing and monitoring methodologies to ally data science into detailed engineering applying scientific hypothesis-testing approach, essentially not only optimizing the algorithms but also generating new hypothesis to monitor and to improve the efficiency. Modern wireless based management systems and applications for mining equipment fleets are capable of collecting vast amounts of equipment health and mining performance data. However, when performance and machine health deviates from desired target levels, it can sometimes be difficult to determine the root cause. This is because data relating to the operating environment or maintenance actions taken often reside in different data bases, applying different fields including database design, statistics, pattern recognition, machine learning, and data visualization. This “silo” approach to data often inhibits the extent to which evidence-based root causes can be determined and generate cost modeling in advance due to actual. These study hypotheses that there is significant value to be had by integrating data from different sources and using this to determine and manage root cause of performance and machine health problems in advance. It aims to demonstrate the potential for value by undertaking a number of case studies using data collected across a number of Western Australian mining operations.

Speaker
Biography:

Khawar Shakeel has completed his Master of Science and Master of Philosphy in Computer Science from a University of Gujrat. He has served University of Gujrat as Computer Programmer and Database Admininstrator. He is a Research Student and his area of interest is Machine Learning, Data Mining and Information Retrival.

Abstract:

Educational Data Mining (EDM) is a topical region these days with many areas to be researched. It supports in decision making by applying Data Mining (DM) techniques on education related data to deal with matters that would be inflexible without them. These techniques try to extract valuable patterns that may lead to strategic policy making and to determine the behaviors of both, the teacher and student from educational point of view. Therefore, this evidence will lead us on the way to the finding of which approaches must be avoided, which and how teaching tactics can be improved to each group of students or to expect which students will perform good or poor so that specific group of students can be facilitated timely on initial stage. Administration might rapidly be capable to practice this innovative information as directions for course redesign and as indication for executing new assessments criteria. Technically, the study found a bit higher accuracy by using boosting ensemble technique as compared to bagging, the execution of the perposed model confirmed the claim.

Jiping Liu

Chinese Academy of Surveying and Mapping, China

Title: Research on spatial outlier mining algorithm based on distributed computing

Time : 11:20-11:50

Speaker
Biography:

Jiping Liu has received his MSc degree in Computer Aided Cartography from Wuhan Technical University of Surveying and Mapping, and PhD degree (2004) in Cartography and GIS from the PLA Information Engineering University. He has done his Post-doctoral studies from Tsinghua University. Now he is a Professor in Chinese Academy of Surveying and Mapping. He has published 2 books and more than 100 papers in reputed journals. He has also been serving as a Director of the E-government Information Commission of China Association for Geographic Information Society and a Member of Commission on Theoretical Cartography, International Cartographic Association since 2011. His research interests are in the areas of Spatial Data Mining, Government Geographic Information Service and Image Processing.

Abstract:

In view of the existing spatial outlier mining algorithms which cannot adapt to the needs of large-scale spatial data mining, this paper presents a spatial outlier mining algorithm based on distributed system. Firstly, this paper proposes the use of space filling curve to partition the data set, and speed up the nearest neighbor search of the target point. Secondly, using the theory of information entropy to define the spatial outlier factor, taking into account the impact of different attributes of multidimensional data on the outliers, the algorithm can automatically calculate the weight of each attribute according to the original features of the data. At the same time, the influence of spatial factors on the outlier factor is defined by the inverse distance weight. Experiments show that the efficiency of this algorithm is much higher than that of the traditional algorithm, and the accuracy of outlier mining is more than 90 percent.

Speaker
Biography:

Peter Löwe studied Geography at the Universities of Würzburg, Germany and UT Austin, USA. His PhD study focused on mining of weather radar data for Soil Erosion Studies in South Africa. After developing Tsunami Early Warning Systems at the German Research Centre for Geosciences, now he is working in the Remote Sensing Industry as the Head of Development at the Leibniz Information Centre for Science and Technology in Hannover, Germany.

Abstract:

The output of today’s scientists consists of much more than traditional research papers; they produce comprehensive digital collections of objects which, alongside digital texts, include digital resources such as research data, audio-visual media, digital lab journals, images, statistics and software code. The continuously growing volume and variety of this scientific-technical content (STI), becoming available with ever-increasing velocity pose challenges to research libraries. This presentation provides an over view on how these challenges are being addressed by the German National Library of Science and Technology (TIB). TIB, one of the largest specialized libraries worldwide, acquires and archives content from around the world pertaining to all areas of engineering, architecture, chemistry, information technology, mathematics and physics. The TIB’s information portal provides access to more than 160 million data sets from specialized databases, publishers and library catalogues. The TIB research and development department drives application focused research and development towards innovative library services and supportive infrastructure to accompany the lifecycle of scientific knowledge generation and transfer: An overview over the evolving TIB service offerings and emerging new capabilities from ongoing research efforts in the greater field of open science and non-textual information will be given.

 

Wangjun He

Chinese Academy of Surveying and Mapping, China

Title: A spatial overlay method for massive vector data based on Spark

Time : 12:20-12:50

Speaker
Biography:

Wangjun He is currently working at Chinese Academy of Surveying and Mapping in China

Abstract:

With the growing geographical data, typical spatial overlay methods for vector data in current GIS platform were unable to adapt to voluminous vector data. Thus, this paper presents a novel spatial overlay method for vector data based on the distributed memory computing framework. Firstly, according to the principle of distributed computing, i.e., map and reduce the vector data were divided into several grids. In this way, several partitions were made for the vector data with the aim of parallel computing. Moreover, with this method, unnecessary calculations between the apart spatial objects can be avoided. Secondly, STRtree data structure was constructed in each grid to solve the problem of the uneven distribution in each grid. Meanwhile, with the STR-tree data structure, the efficiency of overlay operation in the same grid can be improved, and the data unevenly distributed problem can be solved by this way. The final comparison between this method and other typical methods shows that this method can significantly improve the overlay operation’s performance for the large-scale vector data.

  • YRF
Speaker
Biography:

Aliyu Usman Ahmad is currently a 2nd year PhD student at the University of Aberdeen, UK. He is working on Automated Big Data Analysis Methods. He is a beneficiary of the University’s Elphinstone Scholarship of Excellence with an MSc in Software Development from Coventry University, UK and a BSc in Software Engineering from University of East London.

Abstract:

The effective modeling of high-dimensional data with hundreds to thousands of input features remains a challenging task in the field of machine learning. One of the major challenges is the implementation of effective methods for identifying a set of relevant features, buried in high-dimensional irrelevant noises by choosing a subset xn of the complete set of input features x={x1,x2,......xm} such that the subset xn predicts the output y with accuracy comparable to the performance of the complete input set x, to tackle the curse of dimensionality. The problem of feature selection is very popular and has been studied by statistic and machine learning communities for a very long time, with no fully automated solution to date. In this work, we introduced a method of measuring the relevance of each individual input feature value in the competition phase of the neural network self organizing map (SOM) training using the quantization error with an automated method that uses the relevance information to prune the irrelevant inputs and guide the training of the SOM with the relevant inputs for a higher performance. A number of synthetic datasets were created with different properties to test this method and to compare against a number of current existing feature weighting methods; we demonstrated the effect of irrelevant features on the self organizing training and the performance of these methods, with proposed method having a higher performance.

Speaker
Biography:

R Jerves-Cobo is a PhD candidate of Laboratory of Environmental Toxicology and Aquatic Ecology at Ghent University in Belgium. He is also a researcher of Water and Soil Management Program (PROMAS) at the University of Cuenca in Ecuador. His research interest corresponds to Water Quality Modeling, Water Quality Management, and Wastewater Treatment. He is currently working in developing on integrated modeling of the water quality of the Cuenca river systems in Ecuador, as his PhD topic.

Abstract:

This research introduces decision tree models (DTMs) used as tools to predict the compliance with microbial pathogen regulation which is related to the water use. Indeed, prior to its use for drinking, farming, or recreational purposes, the water quality must comply with several standards in order to safeguard both society and environment. The required data was collected in the Machangara River (Southern Andes, Ecuador) in February and March of 2012 and comprises 33 samples of macro invertebrates and physical-chemical- microbiological parameters at different locations along the basin according to land use. Thirty nine different families of macro-invertebrates were identified at the different sampled locations. The impact governed by microbial pathogens on macro-invertebrates has been analyzed and studied. With this aim, DTMs are included for development of rules for presence and abundance of some benthic families. The aforementioned DTMs lend a quick way of checking the fulfillment of the Ecuadorian regulations for water use related to microbial pathogens. The models, built and optimized with WEKA package, were evaluated based on some statistical and ecological criteria considering user convenience to make them as clear and simple as possible. During the evaluation process, the number of False Negatives obtained in the Confusion Matrix of the DTMs, was reduced by the use of a Cost-Sensitive Classifier. The models with the lowest values of confusion entropy were selected. As a result, three different models were obtained, which could be used as a first assessment of different levels of pollution due to microbial pathogens in rivers.

Speaker
Biography:

Daniele Menezes Nascimento is currently a PhD candidate at the McGill School of Information Studies in Montreal, Canada. Her Doctoral project centers on exploring computing technologies that mediate citizen participation in public decision-making to improve liveability in developing cities. Previously, she completed MSc in Urban Informatics from Osaka City University, Japan; MBA in Strategic Marketing Management from Fundação Getúlio Vargas, Brazil. More generally, her research interests involve: Urban Data, Urban Informatics, Community Informatics and Knowledge Management.

Abstract:

A large quantity of data is generated when citizens voice their opinions online. This data represents citizen’s knowledge about their cities and has been used by public and private organizations to understand and support citizens’ needs. In order to support these needs, citizens as well as governments have developed Participatory Citizenship Applications (PCTs), the projects and platforms involving Information and Communication Technologies (ICTs) which help to gather, create and share information about urban issues (such as pot holes, dislocated garbage, report violent situations, etc.). PCTs have helped to raise awareness and solve problems in the city by mediating information and knowledge between citizens, the private and the public sector. However, our understanding about the nature of these platforms is still limited. In this research, I intend to advance our understanding about these platforms, by looking into the data that is produced, the information flows and their impact in improving urban livability in Brazil.

Speaker
Biography:

Khawar Shakeel has completed his Master of Science and Master of Philosphy in Computer Science from a University of Gujrat. He has served University of Gujrat as Computer Programmer and Database Admininstrator. He is a Research Student and his area of interest is Machine Learning, Data Mining and Information Retrival.

Abstract:

Educational Data Mining (EDM) is a topical region these days with many areas to be researched. It supports in decision making by applying Data Mining (DM) techniques on education related data to deal with matters that would be inflexible without them. These techniques try to extract valuable patterns that may lead to strategic policy making and to determine the behaviors of both, the teacher and student from educational point of view. Therefore, this evidence will lead us on the way to the finding of which approaches must be avoided, which and how teaching tactics can be improved to each group of students or to expect which students will perform good or poor so that specific group of students can be facilitated timely on initial stage. Administration might rapidly be capable to practice this innovative information as directions for course redesign and as indication for executing new assessments criteria. Technically, the study found a bit higher accuracy by using boosting ensemble technique as compared to bagging, the execution of the perposed model confirmed the claim.

  • Video presentation

Session Introduction

Kahkashan Tabassum

Princess Nourah Bint Abdulrahman University, Saudi Arabia

Title: Security issues, challenges and opportunities in mobile cloud computing
Speaker
Biography:

Kahkashan Tabassum is an Associate Professor in Princess Nourah Bint Abdulrahman University, Saudi Arabia

Abstract:

The growth of the mobile applications with the evolving cloud computing concept since 2009 has become the potential technology that offers abundant mobile services. Mobile cloud computing has gained enormous popularity because of the associated popularity and high usage of mobile devices and wireless networks with computing. The integration of these two potential technologies has almost changed the communication world today in spite of the bounded obstacles encountered whenever they are in use. The integration of mobile and cloud computing can overcome problems such as restricted battery life, limited bandwidth, heterogeneity, scalability and security. Mobile cloud computing can dominate the world of computing if it can solve the security issues within the field and thus is entailed with high range of issues, challenges and the future prospects if security is a matter of concern. It is evident from the few works and reviews of existing systems that mobile cloud computing will be extremely useful in these areas if it can provide data security. This advancement in the research will transform many areas using mobile computing. For instance, the healthcare organizations can be developed further into a secure healthcare system thus enhancing the benefits of healthcare services to the possible extent. The work presented here highlights the security issues, challenges and future prospects that are associated with secure mobile cloud computing in order to help the people of interest in various organizations (Ex. healthcare organizations) whether to adopt mobile cloud computing technology and understand its benefits within the field.