Day 2 :
Keynote Forum
Mikhail Moshkov
King Abdullah University of Science and Technology, Saudi Arabia
Keynote: Extensions of dynamic programming for decision tree study
Time : 9:30-10:15
Biography:
Mikhail Moshkov is a Professor in the CEMSE Division at King Abdullah University of Science and Technology, Saudi Arabia. He earned his Master’s degree from Nizhni Novgorod State University, received his Doctorate from Saratov State University, and Habilitation from Moscow State University. In 2003, he has worked at the Institute of Computer Science, University of Silesia, in Poland. His main areas of research are Complexity of Algorithms, Combinatorial Optimization, and Machine Learning. He is has published 5 research papers in Springer.
Abstract:
In the presentation, we consider extensions of dynamic programming approach to the study of decision trees as algorithms for problem solving; as a way for knowledge extraction and representation, and as classifiers which for a new object given by values of conditional attributes, define a value of the decision attribute. These extensions allow us : (i) To describe the set of optimal decision trees; (ii) To count the number of these trees; (iii) To make sequential optimization of decision trees relative to different criteria; (iv) To find the set of Pareto optimal points for two criteria; and (v) To describe relationships between two criteria. The results include the minimization of average depth for decision trees sorting eight elements (this question was open since 1968), improvement of upper bounds on the depth of decision trees for diagnosis of 0-1 faults in read-once combinatorial circuits; existence of totally optimal (with minimum depth and minimum number of nodes) decision trees for Boolean functions; study of time-memory tradeoff for decision trees for corner point detection; study of relationships between number and maximum length of decision rules derived from decision trees; study of accuracy-size tradeoff for decision trees which allows us to construct enough small and accurate decision trees for knowledge representation; and decision trees that as classifiers, outperform often decision trees constructed by CART. The end of the presentation is devoted to the introduction to KAUST.
Keynote Forum
Robert S Laramee
Swansea University, UK
Keynote: Data mining with data visualization
Time : 10:15-11:00
Biography:
Robert S Laramee received a Bachelor’s degree in Physics from the University of Massachusetts, Amherst. In 2000, he received a Master’s degree in Computer Science from the University of New Hampshire, Durham. He was awarded a PhD from the Vienna University of Technology, Austria at the Institute of Computer Graphics and Algorithms in 2005. From 2001 to 2006 he was a Researcher at the VRVis Research Center (www.vrvis.at) and a Software Engineer at AVL (www.avl.com) in the Department of Advanced Simulation Technologies. Currently, he is an Associate Professor in the Department of Computer Science at the Swansea University, Wales. His research interests are in the areas of Big Data Visualization, Visual Analytics, and Human-Computer Interaction. He has published more than 100 peer-reviewed papers in scientific.
Abstract:
Some people believe that we live in the age of information. I believe it’s much more accurate to say we live in the age of data. With the rapid advancement of big data storage technologies and the ever-decreasing costs of hardware, our ability to derive and store data is unprecedented. However, a large gap remains between our ability to generate and store large collections of complex, time-dependent data and our ability to derive useful information and knowledge from it. Data visualization leverages our most powerful sense, vision, in order to derive knowledge and gain insight into large, multivariate data sets that describe complicated and often time-dependent behavior. This talk presents data mining from the perspective of data visualization with three very different applications: Computational Fluid Dynamics (CFD), marine biology and rugby, showcasing some of visualizations strengths, weaknesses and goals. Data visualization is critical to successful data mining and extracting knowledge and insight from big data.
- Data Mining Applications in Science, Engineering, Healthcare and Medicine | Artificial Intelligence | Optimization and Big Data | Data Mining analysis | Business analytics | OLAP technologies | Big Data algorithm | ETL (Extract, Transform and Load) | New visualization techniques
Location: London,UK
Chair
Yedidi Narasimha Murty
Electronic Arts, USA
Session Introduction
Iftikhar U Sikder
Cleveland State University, USA
Title: Application of Rough Sets and Dempster-Shafer's Evidence Theory in Spatial Data Mining
Time : 13:40-14:10
Biography:
Iftikhar U Sikder is an Associate Professor jointly appointed in the Department of Information Science, Department of Electrical Engineering and Computer Science at Cleveland State University, USA. His research interests include soft computing, granular computing, data mining and collaborative decision support systems. His papers appeared in the Journal of Risk Analysis, Expert Systems with Applications, International Journal of Mobile Communications, Information Resources Management Journal, International Journal of Management & Decision Making, and International Journal of Aerospace Survey and Earth Sciences. He has authored many book chapters and presented papers in many national and international conferences.
Abstract:
This paper presents a novel approach to spatial classification and prediction of land cover classes using rough set and evidence theory. In particular, it presents an approach to characterizing uncertainty in multisource supervised classification problem. The evidential structure of spatial classification is founded on the notions of equivalence relations of rough set. It allows expressing spatial concepts in terms of approximation space wherein a decision class can be approximated through the partition of boundary region. A key advantage of this approach is that it allows incorporating the context of spatial neighborhood interactions and the combination of multiple spatial evidences. The empirical result demonstrates that the model classifier’s predictive performance significantly improves the accuracy of classification. A comparison of the predictive performance of the model with the radial basis function-based artificial neural network algorithm shows that the predictive performance of the proposed model is significantly better than neural network model.
Jacopo Biancat
S A T E - Systems and Advanced Technologies Engineering Srl, Italy
Title: Simulation, modeling and diagnostics: From the health of spacecrafts to the health of astronauts
Time : 14:10-14:40
Biography:
Jacopo Biancat has graduated in Control System Engineering from the University of Padua (2007). In 2007, he started working in the field of Industrial Automation. He joined S A T E in 2008 Leading the Research and Development group. In 2013, he moved to company Attain IT, undertaking the position of Innovation Manager, having experience in technical coordination of research projects, data analysis, knowledge extraction, system identification and diagnostics and software development. He has published 7 technical papers and has filed 14 patents.
Abstract:
The use of simulation and modeling allows gaining knowledge of the system under observation. This may help during the design phase of any systems, e.g. an industrial plant, because it allows a better understanding of the relationships among the several subsystems and processes involved or it may be used for diagnostics purposes, to automatically identify normal observations of a system (e.g. a spacecraft) and any possible new phenomenon and identify possible unknown relationships among subsystems or processes. Among the several possible approaches available in diagnostics, those based on data mining techniques are on the rise, because they allow the interpretation of large amounts of heterogeneous data using no or very little a priori knowledge, which saves resources and time. Given their generality, these methods may be tailored to a variety of sectors addressing different scopes. SATE successfully applied them, under contract with the European Space Agency, both in the space sector to telemetry data to monitor the health of satellites and to the medical sector to standard medical data, aiming at the improvement of astronaut medical autonomy during space missions. These works led to the development of two software prototype, KETTY (Knowledge Extraction Tool from TelemetrY) and CLUE (Tool for Clinical Laboratory data Understanding and knowledge Extraction), tailored, to the analysis and extraction of knowledge, the former from telemetry data, and the latter from medical laboratory data.
Burcin Ozturk Demirhanoz
Principal Mining Applications and Performance Engineer| Mining | Client Solutions - WesTrac CAT
Title: Mining performance management applying data analytics and artificial intelligence methodologies
Time : 14:40-15:10
Biography:
Burcin Ozturk Demirhanoz has been in and professionally in mining and machinery industry more than 12 years in Europe and in Australia. She has completed her BSc, Mining Engineering (ITU, Istanbul) and MEng, Mining Engineering, Mine Management and also Business Administration (UNSW, Sydney). She is currently working as a Principal Mining Applications and Performance Engineer at WesTrac CAT and leading mining applications performance optimization projects for best practices at Western Australia region for some of the biggest mining companies in the world. She is also Six Sigma trained, Black Belt and Project Manager qualified. Her industry research interest are mining performance analysis and modelling applying AI (Artificial Intelligence) methodologies within data analytics and machine learning.
Abstract:
In mining industry, efficient and cost effective project development is critical to be succeeded since it is a long term business and also because of the global economic concerns need new approaches more than before. Projects are depending on new investment decisions nowadays, are in terms of innovative technology for mining applications and performance management. One of the serious tasks faced by observing and monitoring methodologies to ally data science into detailed engineering applying scientific hypothesis-testing approach, essentially not only optimizing the algorithms but also generating new hypothesis to monitor and to improve the efficiency. Modern wireless based management systems and applications for mining equipment fleets are capable of collecting vast amounts of equipment health and mining performance data. However, when performance and machine health deviates from desired target levels, it can sometimes be difficult to determine the root cause. This is because data relating to the operating environment or maintenance actions taken often reside in different data bases, applying different fields including database design, statistics, pattern recognition, machine learning, and data visualization. This “silo” approach to data often inhibits the extent to which evidence-based root causes can be determined and generate cost modeling in advance due to actual. These study hypotheses that there is significant value to be had by integrating data from different sources and using this to determine and manage root cause of performance and machine health problems in advance. It aims to demonstrate the potential for value by undertaking a number of case studies using data collected across a number of Western Australian mining operations.
Hongfei Li
IBM Analytics, United States
Title: Weather insights using big data analytics
Time : 15:10-15:40
Biography:
Hongfei Li is a Principal Data Scientist and Manager of Data Science Team in IBM Analytics, NY, USA. She obtained her PhD in Statistics from Department of Statistics in the Ohio State University and has published many papers in the top journals. She has given presentations in many conferences in the areas of Statistics, Machine Learning, etc.
Abstract:
IBM announced the acquisition of The Weather Company (TWC) in Feb 2016. TWC enables to collect a larger variety and higher velocity of data sets from billions of IoT sensors around the world while also, serving out real-time information and insights to tens of millions of users worldwide. IBM data science team has turned data into deeper insight, confident decisions and faster in-time actions, with extensive analytics capabilities. We have mitigated the impact of weather and uncovered new opportunities for business. I will use several examples to illustrate the real industry applications of weather insights. For examples, insurers can use weather data to reduce claims and respond more effectively to policyholders. Utilities can predict, respond to and mitigate outages faster. Government agencies can better plan for weather disasters to protect citizens and key infrastructure.
Hongqin Fan
Hong Kong Polytechnic University, Hong Kong
Title: Resolution-based outlier factor and outlier mining algorithm for engineering applications
Time : 16:00-16:30
Biography:
Hongqin Fan has completed his PhD from the University of Alberta, Canada in 2007. His areas of expertise are in Data Mining, Construction Equipment Management, and Construction Information Technology. He is currently an Associate Professor in the Department of Building and Real Estate at the Hong Kong Polytechnic University, Hong Kong. He has published more than 30 papers in the field of Data Mining, Computer Applications and decision support in Construction Engineering and Management.
Abstract:
With increasing automation and computerization of engineering domain applications, outlier mining has become increasingly important in detecting abnormal behaviors in engineering systems, observations on malpractice and poor management skills. A resolution-based outlier (RB-outlier) notion and RB-outlier mining algorithm is introduced to provide better solutions to outlier detection in engineering applications which differ substantially from the other domain areas. The RB-outlier notion is defined based on the concept of resolution change, i.e. change the scale of the data plots progressively from a high-resolution close view where no point has neighbors to a small-resolution distant view where all the points are condensed in one cluster. The features of each data point in terms of its neighborhood are captured and accumulated during this process to measure its degree of outlyingness. The RB-outlier mining algorithm generates outlier results by taking both local and global features of a dataset into account without requiring input of domain specific parameters which are usually unknown a priori. The RB-outliers can be used conveniently to rank and label top-n outliers for further investigation. Experimental tests on some engineering applications, including construction equipment fleet management, construction site operations, demonstrated its effectiveness and efficiency and on the other hand, the flexibility and robustness of the proposed algorithm allows it to be easily built into any real time monitoring system or decision support system for efficient outlier detection “on the fly”.
Sophia Tsoka
King’s College London, UK
Title: Mining biological networks in health and disease
Time : 16:30-17:00
Biography:
Sophia Tsoka is Senior Lecturer in Bioinformatics at the Department of Informatics, King’s College London. Prior to joining the college, she was Staff Scientist and Medical Research Council Fellow at the European Bioinformatics Institute in Cambridge, UK. Her expertise involves genome and disease data mining, analysis of protein interactions and community detection in complex networks. Recently, she has reported applications of these methodologies in the analysis of skin inflammation due to allergy and autoimmunity, including analysis of microbial communities in skin microbiome data.
Abstract:
Insight into molecular interactions at systems level is critical for biological discovery and the understanding of disease mechanisms. Bioinformatics and Systems Biology strategies aim to develop appropriate computational and mathematical characterization of biological systems, in order to provide a holistic view of system properties and dynamics. I will discuss recent work in developing data mining protocols to target protein interactions, so as to link network topological properties to the underlying molecular features. Such community detection approaches are based on combinatorial optimization principles, involve data from various high throughput experiments and span weighted, consensus, dynamic networks and overlapping communities. The use of such methodologies will be illustrated in the context of gene expression and microbiome analysis in skin inflammatory disorders, so as to reveal the implication of specific biochemical pathways and the interplay of host-microbiome interactions.
- Big Data Applications | Data Mining Methods and Algorithms | Big data technologies | Open data
Location: London,UK
Chair
Fionn Murtagh
University of Derby, UK
Session Introduction
Geervani Koneti
Geervani Koneti Tata Consultancy Services Ltd,India
Title: Application of feature selection algorithms for life sciences
Biography:
Geervani Koneti is a graduate in Computer Science from Indian Institute of Technology, Jodhpur, India. She is working as a Researcher at the Innovation Labs Hyderabad, Tata Consultancy Services Limited, India. Her current research interests are on integrated data analytics with focus on development of new computational methods, automation and platform development that have the potential to address R&D productivity.
Abstract:
Pharmaceutical industry is faced with increased attrition of drug candidates at late stages of drug development mainly due to poor drug safety and efficacy profiles. Thus, there is growing interest within pharmaceutical industries to explore and exploit new innovative approaches to reduce the attrition rates and consequently, the overall cost of discovery of a drug. One such approach that can now, compared to the past, be realized more effectively is integrated and big data analytics. In an effort to address the R&D productivity, we are in process of developing a data analytics platform to facilitate predictive model building, informed decision making using analytics and effortless transition across various stages of drug discovery and development. While the exponential increase in data presents its own challenges, we are enhancing our models to utilize the advanced computing technology to perform the required analytics in real time. One such example we will discuss is parallelization of exhaustive search for feature selection and modelling. In the absence of the advanced computing resources, the frequently employed alternative is the use of heuristic methods, which often trade accuracy for time. However, it is significant to note that the reproducibility of most of these methods is questionable and few pharmaceutical companies are devising strategies to counter this issue too. Another approach we will be discussing in the talk is how to improve the reproducibility and reliability of the modelling techniques, in specific the random based feature selection methods used for model building.
Daniele Menezes Nascimento
McGill University, Canada
Title: Exploring knowledge flows in citizen e-participation data
Biography:
Daniele Menezes Nascimento is currently a PhD candidate at the McGill School of Information Studies in Montreal, Canada. Her Doctoral project centers on exploring computing technologies that mediate citizen participation in public decision-making to improve liveability in developing cities. Previously, she completed MSc in Urban Informatics from Osaka City University, Japan; MBA in Strategic Marketing Management from Fundação Getúlio Vargas, Brazil. More generally, her research interests involve: Urban Data, Urban Informatics, Community Informatics and Knowledge Management.
Abstract:
A large quantity of data is generated when citizens voice their opinions online. This data represents citizen’s knowledge about their cities and has been used by public and private organizations to understand and support citizens’ needs. In order to support these needs, citizens as well as governments have developed Participatory Citizenship Applications (PCTs), the projects and platforms involving Information and Communication Technologies (ICTs) which help to gather, create and share information about urban issues (such as pot holes, dislocated garbage, report violent situations, etc.). PCTs have helped to raise awareness and solve problems in the city by mediating information and knowledge between citizens, the private and the public sector. However, our understanding about the nature of these platforms is still limited. In this research, I intend to advance our understanding about these platforms, by looking into the data that is produced, the information flows and their impact in improving urban livability in Brazil.
Biography:
Burcin Ozturk Demirhanoz has been in and working professionally in mining and machinery industry more than 14 years in Europe and in Australia. She has completed her BSc, Mining Engineering (ITU, Istanbul) and MEng, Mining Engineering, Mine Management and also Business Administration (UNSW, Sydney). She is currently working as a Principal Mining Applications and Performance Engineer at WesTrac CAT and leading mining applications performance projects at Western Australia region for some of the biggest mining companies in the world. She is currently PhD candidate at UNSW, Sydney and her research interest are mining performance analysis and modelling applying AI (Artificial Intelligence) methodologies within data analytics and machine learning.
Abstract:
In mining industry, efficient and cost effective project development is critical to be succeeded since it is a long term business and also because of the global economic concerns need new approaches more than before. Projects are depending on new investment decisions nowadays, are in terms of innovative technology for mining applications and performance management. One of the serious tasks faced by observing and monitoring methodologies to ally data science into detailed engineering applying scientific hypothesis-testing approach, essentially not only optimizing the algorithms but also generating new hypothesis to monitor and to improve the efficiency. Modern wireless based management systems and applications for mining equipment fleets are capable of collecting vast amounts of equipment health and mining performance data. However, when performance and machine health deviates from desired target levels, it can sometimes be difficult to determine the root cause. This is because data relating to the operating environment or maintenance actions taken often reside in different data bases, applying different fields including database design, statistics, pattern recognition, machine learning, and data visualization. This “silo” approach to data often inhibits the extent to which evidence-based root causes can be determined and generate cost modeling in advance due to actual. These study hypotheses that there is significant value to be had by integrating data from different sources and using this to determine and manage root cause of performance and machine health problems in advance. It aims to demonstrate the potential for value by undertaking a number of case studies using data collected across a number of Western Australian mining operations.
Khawar Shakeel
University of Gujrat, Pakistan
Title: Educational data mining to inspect low academic performance areas of the students using ensemble classification
Biography:
Khawar Shakeel has completed his Master of Science and Master of Philosphy in Computer Science from a University of Gujrat. He has served University of Gujrat as Computer Programmer and Database Admininstrator. He is a Research Student and his area of interest is Machine Learning, Data Mining and Information Retrival.
Abstract:
Educational Data Mining (EDM) is a topical region these days with many areas to be researched. It supports in decision making by applying Data Mining (DM) techniques on education related data to deal with matters that would be inflexible without them. These techniques try to extract valuable patterns that may lead to strategic policy making and to determine the behaviors of both, the teacher and student from educational point of view. Therefore, this evidence will lead us on the way to the finding of which approaches must be avoided, which and how teaching tactics can be improved to each group of students or to expect which students will perform good or poor so that specific group of students can be facilitated timely on initial stage. Administration might rapidly be capable to practice this innovative information as directions for course redesign and as indication for executing new assessments criteria. Technically, the study found a bit higher accuracy by using boosting ensemble technique as compared to bagging, the execution of the perposed model confirmed the claim.
Jiping Liu
Chinese Academy of Surveying and Mapping, China
Title: Research on spatial outlier mining algorithm based on distributed computing
Time : 11:20-11:50
Biography:
Jiping Liu has received his MSc degree in Computer Aided Cartography from Wuhan Technical University of Surveying and Mapping, and PhD degree (2004) in Cartography and GIS from the PLA Information Engineering University. He has done his Post-doctoral studies from Tsinghua University. Now he is a Professor in Chinese Academy of Surveying and Mapping. He has published 2 books and more than 100 papers in reputed journals. He has also been serving as a Director of the E-government Information Commission of China Association for Geographic Information Society and a Member of Commission on Theoretical Cartography, International Cartographic Association since 2011. His research interests are in the areas of Spatial Data Mining, Government Geographic Information Service and Image Processing.
Abstract:
In view of the existing spatial outlier mining algorithms which cannot adapt to the needs of large-scale spatial data mining, this paper presents a spatial outlier mining algorithm based on distributed system. Firstly, this paper proposes the use of space filling curve to partition the data set, and speed up the nearest neighbor search of the target point. Secondly, using the theory of information entropy to define the spatial outlier factor, taking into account the impact of different attributes of multidimensional data on the outliers, the algorithm can automatically calculate the weight of each attribute according to the original features of the data. At the same time, the influence of spatial factors on the outlier factor is defined by the inverse distance weight. Experiments show that the efficiency of this algorithm is much higher than that of the traditional algorithm, and the accuracy of outlier mining is more than 90 percent.
Peter Löwe
Leibniz University of Hanover, Germany
Title: Libraries in the big data era: Strategies and challenges in archiving and sharing research data
Time : 11:50-12:20
Biography:
Peter Löwe studied Geography at the Universities of Würzburg, Germany and UT Austin, USA. His PhD study focused on mining of weather radar data for Soil Erosion Studies in South Africa. After developing Tsunami Early Warning Systems at the German Research Centre for Geosciences, now he is working in the Remote Sensing Industry as the Head of Development at the Leibniz Information Centre for Science and Technology in Hannover, Germany.
Abstract:
The output of today’s scientists consists of much more than traditional research papers; they produce comprehensive digital collections of objects which, alongside digital texts, include digital resources such as research data, audio-visual media, digital lab journals, images, statistics and software code. The continuously growing volume and variety of this scientific-technical content (STI), becoming available with ever-increasing velocity pose challenges to research libraries. This presentation provides an over view on how these challenges are being addressed by the German National Library of Science and Technology (TIB). TIB, one of the largest specialized libraries worldwide, acquires and archives content from around the world pertaining to all areas of engineering, architecture, chemistry, information technology, mathematics and physics. The TIB’s information portal provides access to more than 160 million data sets from specialized databases, publishers and library catalogues. The TIB research and development department drives application focused research and development towards innovative library services and supportive infrastructure to accompany the lifecycle of scientific knowledge generation and transfer: An overview over the evolving TIB service offerings and emerging new capabilities from ongoing research efforts in the greater field of open science and non-textual information will be given.
Wangjun He
Chinese Academy of Surveying and Mapping, China
Title: A spatial overlay method for massive vector data based on Spark
Time : 12:20-12:50
Biography:
Wangjun He is currently working at Chinese Academy of Surveying and Mapping in China
Abstract:
With the growing geographical data, typical spatial overlay methods for vector data in current GIS platform were unable to adapt to voluminous vector data. Thus, this paper presents a novel spatial overlay method for vector data based on the distributed memory computing framework. Firstly, according to the principle of distributed computing, i.e., map and reduce the vector data were divided into several grids. In this way, several partitions were made for the vector data with the aim of parallel computing. Moreover, with this method, unnecessary calculations between the apart spatial objects can be avoided. Secondly, STRtree data structure was constructed in each grid to solve the problem of the uneven distribution in each grid. Meanwhile, with the STR-tree data structure, the efficiency of overlay operation in the same grid can be improved, and the data unevenly distributed problem can be solved by this way. The final comparison between this method and other typical methods shows that this method can significantly improve the overlay operation’s performance for the large-scale vector data.
- YRF
Session Introduction
Aliyu Usman Ahmad
University of Aberdeen, UK
Title: Automatic identification of irrelevant features for clustering with artificial neural network map on synthetic datasets
Biography:
Aliyu Usman Ahmad is currently a 2nd year PhD student at the University of Aberdeen, UK. He is working on Automated Big Data Analysis Methods. He is a beneficiary of the University’s Elphinstone Scholarship of Excellence with an MSc in Software Development from Coventry University, UK and a BSc in Software Engineering from University of East London.
Abstract:
The effective modeling of high-dimensional data with hundreds to thousands of input features remains a challenging task in the field of machine learning. One of the major challenges is the implementation of effective methods for identifying a set of relevant features, buried in high-dimensional irrelevant noises by choosing a subset xn of the complete set of input features x={x1,x2,......xm} such that the subset xn predicts the output y with accuracy comparable to the performance of the complete input set x, to tackle the curse of dimensionality. The problem of feature selection is very popular and has been studied by statistic and machine learning communities for a very long time, with no fully automated solution to date. In this work, we introduced a method of measuring the relevance of each individual input feature value in the competition phase of the neural network self organizing map (SOM) training using the quantization error with an automated method that uses the relevance information to prune the irrelevant inputs and guide the training of the SOM with the relevant inputs for a higher performance. A number of synthetic datasets were created with different properties to test this method and to compare against a number of current existing feature weighting methods; we demonstrated the effect of irrelevant features on the self organizing training and the performance of these methods, with proposed method having a higher performance.
- YRF
Session Introduction
Aliyu Usman Ahmad
University of Aberdeen, UK
Title: Automatic identification of irrelevant features for clustering with artificial neural network map on synthetic datasets
Time : 14:30-15:00
Biography:
Aliyu Usman Ahmad is currently a 2nd year PhD student at the University of Aberdeen, UK. He is working on Automated Big Data Analysis Methods. He is a beneficiary of the University’s Elphinstone Scholarship of Excellence with an MSc in Software Development from Coventry University, UK and a BSc in Software Engineering from University of East London.
Abstract:
The effective modeling of high-dimensional data with hundreds to thousands of input features remains a challenging task in the field of machine learning. One of the major challenges is the implementation of effective methods for identifying a set of relevant features, buried in high-dimensional irrelevant noises by choosing a subset xn of the complete set of input features x={x1,x2,......xm} such that the subset xn predicts the output y with accuracy comparable to the performance of the complete input set x, to tackle the curse of dimensionality. The problem of feature selection is very popular and has been studied by statistic and machine learning communities for a very long time, with no fully automated solution to date. In this work, we introduced a method of measuring the relevance of each individual input feature value in the competition phase of the neural network self organizing map (SOM) training using the quantization error with an automated method that uses the relevance information to prune the irrelevant inputs and guide the training of the SOM with the relevant inputs for a higher performance. A number of synthetic datasets were created with different properties to test this method and to compare against a number of current existing feature weighting methods; we demonstrated the effect of irrelevant features on the self organizing training and the performance of these methods, with proposed method having a higher performance.
R Jerves-Cobo
Ghent University, Belgium
Title: Macroinvertebrate based mathematical models for the prediction of microbial pathogens in rivers
Time : 15:00-15:30
Biography:
R Jerves-Cobo is a PhD candidate of Laboratory of Environmental Toxicology and Aquatic Ecology at Ghent University in Belgium. He is also a researcher of Water and Soil Management Program (PROMAS) at the University of Cuenca in Ecuador. His research interest corresponds to Water Quality Modeling, Water Quality Management, and Wastewater Treatment. He is currently working in developing on integrated modeling of the water quality of the Cuenca river systems in Ecuador, as his PhD topic.
Abstract:
This research introduces decision tree models (DTMs) used as tools to predict the compliance with microbial pathogen regulation which is related to the water use. Indeed, prior to its use for drinking, farming, or recreational purposes, the water quality must comply with several standards in order to safeguard both society and environment. The required data was collected in the Machangara River (Southern Andes, Ecuador) in February and March of 2012 and comprises 33 samples of macro invertebrates and physical-chemical- microbiological parameters at different locations along the basin according to land use. Thirty nine different families of macro-invertebrates were identified at the different sampled locations. The impact governed by microbial pathogens on macro-invertebrates has been analyzed and studied. With this aim, DTMs are included for development of rules for presence and abundance of some benthic families. The aforementioned DTMs lend a quick way of checking the fulfillment of the Ecuadorian regulations for water use related to microbial pathogens. The models, built and optimized with WEKA package, were evaluated based on some statistical and ecological criteria considering user convenience to make them as clear and simple as possible. During the evaluation process, the number of False Negatives obtained in the Confusion Matrix of the DTMs, was reduced by the use of a Cost-Sensitive Classifier. The models with the lowest values of confusion entropy were selected. As a result, three different models were obtained, which could be used as a first assessment of different levels of pollution due to microbial pathogens in rivers.
Daniele Menezes Nascimento
McGill University, Canada
Title: Exploring knowledge flows in citizen e-participation data
Time : 15:50-16:20
Biography:
Daniele Menezes Nascimento is currently a PhD candidate at the McGill School of Information Studies in Montreal, Canada. Her Doctoral project centers on exploring computing technologies that mediate citizen participation in public decision-making to improve liveability in developing cities. Previously, she completed MSc in Urban Informatics from Osaka City University, Japan; MBA in Strategic Marketing Management from Fundação Getúlio Vargas, Brazil. More generally, her research interests involve: Urban Data, Urban Informatics, Community Informatics and Knowledge Management.
Abstract:
A large quantity of data is generated when citizens voice their opinions online. This data represents citizen’s knowledge about their cities and has been used by public and private organizations to understand and support citizens’ needs. In order to support these needs, citizens as well as governments have developed Participatory Citizenship Applications (PCTs), the projects and platforms involving Information and Communication Technologies (ICTs) which help to gather, create and share information about urban issues (such as pot holes, dislocated garbage, report violent situations, etc.). PCTs have helped to raise awareness and solve problems in the city by mediating information and knowledge between citizens, the private and the public sector. However, our understanding about the nature of these platforms is still limited. In this research, I intend to advance our understanding about these platforms, by looking into the data that is produced, the information flows and their impact in improving urban livability in Brazil.
Khawar Shakeel
University of Gujrat, Pakistan
Title: Educational data mining to inspect low academic performance areas of the students using ensemble classification
Time : 16:20-16:50
Biography:
Khawar Shakeel has completed his Master of Science and Master of Philosphy in Computer Science from a University of Gujrat. He has served University of Gujrat as Computer Programmer and Database Admininstrator. He is a Research Student and his area of interest is Machine Learning, Data Mining and Information Retrival.
Abstract:
Educational Data Mining (EDM) is a topical region these days with many areas to be researched. It supports in decision making by applying Data Mining (DM) techniques on education related data to deal with matters that would be inflexible without them. These techniques try to extract valuable patterns that may lead to strategic policy making and to determine the behaviors of both, the teacher and student from educational point of view. Therefore, this evidence will lead us on the way to the finding of which approaches must be avoided, which and how teaching tactics can be improved to each group of students or to expect which students will perform good or poor so that specific group of students can be facilitated timely on initial stage. Administration might rapidly be capable to practice this innovative information as directions for course redesign and as indication for executing new assessments criteria. Technically, the study found a bit higher accuracy by using boosting ensemble technique as compared to bagging, the execution of the perposed model confirmed the claim.
- Video presentation
Session Introduction
Kahkashan Tabassum
Princess Nourah Bint Abdulrahman University, Saudi Arabia
Title: Security issues, challenges and opportunities in mobile cloud computing
Biography:
Kahkashan Tabassum is an Associate Professor in Princess Nourah Bint Abdulrahman University, Saudi Arabia
Abstract:
The growth of the mobile applications with the evolving cloud computing concept since 2009 has become the potential technology that offers abundant mobile services. Mobile cloud computing has gained enormous popularity because of the associated popularity and high usage of mobile devices and wireless networks with computing. The integration of these two potential technologies has almost changed the communication world today in spite of the bounded obstacles encountered whenever they are in use. The integration of mobile and cloud computing can overcome problems such as restricted battery life, limited bandwidth, heterogeneity, scalability and security. Mobile cloud computing can dominate the world of computing if it can solve the security issues within the field and thus is entailed with high range of issues, challenges and the future prospects if security is a matter of concern. It is evident from the few works and reviews of existing systems that mobile cloud computing will be extremely useful in these areas if it can provide data security. This advancement in the research will transform many areas using mobile computing. For instance, the healthcare organizations can be developed further into a secure healthcare system thus enhancing the benefits of healthcare services to the possible extent. The work presented here highlights the security issues, challenges and future prospects that are associated with secure mobile cloud computing in order to help the people of interest in various organizations (Ex. healthcare organizations) whether to adopt mobile cloud computing technology and understand its benefits within the field.