Scientific Program

Conference Series Ltd invites all the participants across the globe to attend 4th International Conference on BigData Analysis and Data Mining Paris, France.

Day 1 :

Keynote Forum

Fillia Makedon

The University of Texas Arlington

Keynote: Vocational Computing: A Data Mining Application for the Workplace

Time : 10:00-10:40AM

Data Mining 2017 International Conference Keynote Speaker Fillia Makedon photo
Biography:

Dr. Fillia Makedon is the Jenkins-Garrett Professor at the University of Texas at Arlington (UTA). She received her Ph.D. in Computer Science from Northwestern University in 1982. Between 1991-2006, she was Professor of computer science at Dartmouth College where she founded and directed the Dartmouth Experimental Visualization Laboratory (DEVLAB). Between 2006 and 2014 she served as the department chair of the CSE Department at UTA. Prior to that , in 2005-2006, she was Program Director at the National Science Foundation. Prior to Dartmouth, Prof. Makedon was Assistant and Associate Professor at the Univ. of Texas at Dallas (UTD), where she founded and directed the Computer LEArning Research Center (CLEAR). She supervised over 27 Ph.D. theses and numerous Masters Degree theses. Makedon has received many NSF research awards in the areas of trust management, brain computing, data mining, parallel computing, visualization, knowledge management, cyberphysical systems, major research instrumentation, and cyberhuman systems, to name a few. She has been senior investigator and co-PI of NIH, DOJ and Foundation grants. She received the Dartmouth Senior Research Professor Award, three Fulbright awards, and is author of over 350 peer-reviewed research publications. She is faculty affiliate of the Dartmouth ISTS security institute and currently directs the HERACLEIA Human Centered Laboratory, that develops pervasive technologies for human monitoring. She is member of several journal editorial boards and chair of the international PETRA conference

Abstract:

According to the US Dept. of Labor, thousands of workers die on the job each year because of accidents, or lack of training in using new technologies. Computational methods can be used to provide evidence-based quantitative assessments of worker ability and identify needs for training. Data mining methods can be applied to the analysis of performance multi-sensing interaction data collected while a person performs a certain work task. We describe the iWork smart Service work-assessment system which recommendations for personalized intervention, following multimodal data mining of activity data.  The service assesses mental, cognitive and physical skills of a worker for improved placement and informed decision-making.  The proposed service takes advantage of recent advancements in robotics, sensing technologies, and intelligent communication platforms to enhance human ability to learn by interactive experiences. The service trains assistive workplace robots to provide personalized help to complete difficult cognitive and/or physical tasks in the workplace. A new Machine Learning methodology is described and demonstrated.

Keynote Forum

Petra Perne

Institute of Computer Vision and applied Computer Sciences,Germany

Keynote: Maintenance of Engineering Systems by Big Data

Time : 10:40-11:20AM

Data Mining 2017 International Conference Keynote Speaker Petra Perne	 photo
Biography:

Petra Perner (IAPR Fellow) is the director of the Institute of Computer Vision and Applied Computer Sciences IBaI. She received her Diploma degree in electrical engineering and her PhD degree in computer science for the work on “Data Reduction Methods for Industrial Robots with Direct Teach-in-Programing”. Her habilitation thesis was about “A Methodology for the Development of Knowledge-Based Image-Interpretation Systems". She has been the principal investigator of various national and international research projects. She received several research awards for her research work and has been awarded with 3 business awards for her work on bringing intelligent image interpretation methods and data mining methods into business. Her research interest is image analysis and interpretation, machine learning, data mining, big data, machine learning, image mining and case-based reasoning.

Abstract:

The ubiquitous availablity of high quality data European industry gathers, allows to optimize manafacturing processes even more and to stay competititve. However, while the data are rich enough to include those elements needed for optimization, the even encreasing volume, veloctiy and variety of the data make mining it effectively increasingly difficult. The paper addresses the special challenges in developing scalable algorithm and infrastructures for creating responsive analytical capabilities that produce timely prediction and monitoring alerts in industrial environments. We will describe a platform that can handle the special needs of the data and has a reach enough tool of data mining techniques. Case-Based Reasoning is used to combine streaminig data of different types ( sensor data, time series, maintenance logs etc.) as well. Special time series algorithm will be developed allowing the efficient analyisis of the machine data. It will be deploded and validated in three industrial cases where data-driven maintenances is expected is expected to have a significant impact: high-tech medical equipment, high-tech manufacturing of hard disks and structural health monitoring.

Keynote Forum

Mikhail Moshkov

King Abdullah University of Science and Technology (KAUST), Saudi Arabia

Keynote: Extensions of Dynamic Programming: Applications for Decision Trees

Time : 11:35-12:15PM

Data Mining 2017 International Conference Keynote Speaker Mikhail Moshkov photo
Biography:

Mikhail Moshkov is professor in the CEMSE Division at King Abdullah University of Science and Technology, Saudi Arabia since October 1, 2008. He earned master’s degree from Nizhni Novgorod State University, received his doctorate from Saratov State University, and habilitation from Moscow State University. From 1977 to 2004, Dr. Moshkov was with Nizhni Novgorod State University. Since 2003 he worked in Poland in the Institute of Computer Science, University of Silesia, and since 2006 also in the Katowice Institute of Information Technologies. His main areas of research are complexity of algorithms, combinatorial optimization, and machine learning. Dr. Moshkov is author or coauthor of five research monographs published by Springer.

 

Abstract:

In the presentation, we consider extensions of dynamic programming approach to the investigation of decision trees as algorithms for problem solving, as a way for knowledge extraction and representation, and as classifiers which, for a new object given by values of conditional attributes, define a value of the decision attribute. These extensions allow us (i) to describe the set of optimal decision trees, (ii) to count the number of these trees, (iii) to make sequential optimization of decision trees relative to different criteria, (iv) to find the set of Pareto optimal points for two criteria, and (v) to describe relationships between two criteria. The applications include the minimization of average depth for decision trees sorting eight elements (this question was open since 1968), improvement of upper bounds on the depth of decision trees for diagnosis of 0-1-faults in read-once combinatorial circuits over monotone basis, existence of totally optimal (with minimum depth and minimum number of nodes) decision trees for Boolean functions, study of time-memory tradeoff for decision trees for corner point detection, study of relationships between number and maximum length of decision rules derived from decision trees, study of accuracy-size tradeoff for decision trees which allows us to construct enough small and accurate decision trees for knowledge representation, and decision trees that, as classifiers, outperform often decision trees constructed by CART. The end of the presentation is devoted to the introduction to KAUST.

  • Data Mining Applications in Science, Engineering, Healthcare and Medicine
Speaker
Biography:

Prof. Dr. S.N.Mohanty, received his PhD from IIT Kharagpur, India in the year 2014, with MHRD scholarship from Govt of India. He has recently joined as Asst. Professor   in the School of Computer Science & Engineering at KIIT University. His research areas include Data mining, Big Data Analysis, Cognitive Science, Fuzzy Decision Making, Brain-Computer Interface, Cognition, and Computational Intelligence. Prof. S N Mohanty has received 2 Best Paper Awards during his PhD at IIT Kharagpur from International Conference at Benjing, China, and the other at  International Conference on Soft Computing Applications organized by IIT Rookee in the year 2013. He has published 5 International Journals of International repute and has been elected as Member of Institute of Engineers and IEEE Computer Society. He also the  reviewer of IJAP, IJDM International Journals.

Abstract:

The process of cell-phone selection for purchasing is a multi-criteria decision-making (MCDM) problem with conflicting and diverse objective. In this study, discusses various techniques using machine learning approach. To begin, participants responded to a questionnaire having different latest features available in a cell-phone. Seven independent input variables cost, talk-time, rear camera, weight, size, memory and operating system, where then derive from participants respondents. Linguistic terms such as low, medium and high were used to represent each of the input variables. Using Mamdani approach both traditional fuzzy reasoning tool (FLC) and neuro-fuzzy system (ANFIS) were designed for three input and one output process. The neuro-fuzzy system was trained using a back-propagation algorithm. Compare two traditional fuzzy reasoning tool, artificial neural network approach (ANN) the neuro-fuzzy system could provide better accuracy for selecting a cell-phone for personal use.

Speaker
Biography:

Dr. Alla Sapronova has completed her PhD at the age of 29 years from Moscow State University, Russia and postdoctoral studies from UniFob, University of Bergen, Norway. She is the Head of Data Science at Center for Big Data Analysis, Uni Research, a multidisciplinary research institute in Bergen, Norway. Last 5 years she has published more than 15 papers in reputed journals and has been serving as an external censor for University of Bergen, Norway and Nelson Mandela Municipal University, South Africa.
 

Abstract:

Classification, the process of assigning data into labeled groups, is one of the most common operation in data mining. Classification can be used in predictive modeling to learn the relation between desired feature-vector and labeled classes. When the data set contains arbitrary big number of missed data and/or the amount of data samples is not adequate to the data complexity, it is important to define a strategy that allows to reach highest possible classification accuracy. In this work authors present results on classification-based predictive model's accuracy for three different strategies: input pruning, semi-auto selection of various classification methods, and data volume increase. Authors suggest that a satisfactory level of model's accuracy can be reached when preliminary input pruning is used.

The presented model is connecting fishing data with environmental variables. Even with limited number of samples the model is able to resolve the type of the fish with up to 92% of accuracy.

The results of using various classification methods are shown and suggestions are made towards defining the optimal strategy to build an accurate predictive model, opposed to common trial-and-error method. Different strategies for input pruning that assure information's preservation are described.

Bennett B. Borden

Drinker Biddle & Reath, Washington, DC

Title: Predicting Corporate Misconduct
Speaker
Biography:

Bennett B. Borden is a partner at Drinker Biddle & Reath and its Chief Data Scientist, the only Chief Data Scientist who is also a practicing attorney.  Bennett is a globally recognized authority on the legal, technology and policy implications of information. Bennett’s ground-breaking research into the use of machine-based learning and unstructured data for organizational insight is now being put to work in data-driven early warning systems for clients to detect and prevent corporate fraud and other misconduct.  Bennett received his Masters of Science in Data Analytics at New York University and his JD from Georgetown University.

Abstract:

Corporate misconduct costs $3 Trillion worldwide every year to prevent, detect and deal with its consequences.  If we can predict when someone will purchase a product, click on an ad, or how they will vote for a candidate, why can’t we predict when he or she will engage in some form of fraud or other misconduct? Well, perhaps we can.  In this session, Chief Data Scientist Bennett Borden, from the law firm Drinker Biddle & Reath will present its work on developing algorithms to predict corporate misconduct and how this technology is being used today and how it will likely be used in the future.

Speaker
Biography:

Megan founded Ixio in 2012 after seeing a need for strong, data-led modelling and analytics in business. As Chief Scientist at Ixio Analytics, Megan leads the advanced modelling programs and coordinates the technical requirements for clients. Her background in Evolutionary Biology allows her to bring a rigorous scientific approach to analytical problem solving. She takes a keen interest in environmental issues and is an avid surfer.       

Abstract:

The operations division of a large multinational company offering subscription-based services across Africa used Ixio Analytics to create a predictive model for call volumes in their inbound support call centre. The model previously used was overestimating call volumes, costing the operations division heavily in agent staffing and resulting in inefficient call scheduling. This situation had persisted for several years.

Ixio Analytics used call volume data at half hour intervals from January 2012 to create a predictive time series model. We used an ensemble modelling approach, combining a time series forecast model and data splitting at key time intervals. This was the first known application of this modelling method to call volume forecasting.

The model took into account seasonal, random and trend components in the data. Total call volumes for every 30 minute period, as well as call volumes for various types of calls (such as billing) were predicted.

The Ixio model has achieved 94% accuracy since implementation. This is a significant improvement from the previous model, which achieved only 70% accuracy. The Ixio model has been implemented and used in the company’s workforce planning. The accuracy of predictions has enabled efficient workforce planning and an increase in call scheduling. This is currently saving the company approximately USD 5 million annually. A subsequent iteration of the model now includes event data that has further improved performance.

 

Speaker
Biography:

Pierre Hansen is professor of Operations Research in the department of decision sciences of HEC Montréal. His research is focussed on combinatorial optimization, metaheuristics and graph theory. With Nenad Mladenovic, he has developed the Variable Neighborhood Search metaheuristic, a general framework for building heuristics for a variety of combinatorial optimization and graph theory problems. Pierre Hansen has received the EURO Gold Medal in 1986 as well as other prizes. He is a member of the Royal Society of Canada and the author or co-author of close to 400 scientific papers. His first paper on VNS was cited almost 3000 times.

Abstract:

Many problems can be expreesed as global or combinatorial optimization problems, however due the vast increase in the availability of data bases, realistically sized instances cannot be solved in reasonnable time. Therfore, one must be often content with approximate solutions obtained by heuristics. These heuristics can be studied systematically by some general frameworks or metaheuristics (genetic search, tabu search, simulated annealing, neuron networks, ant colonies and others). Variable Neighborhood Search (VNS) proceeds by systematic change of neighborhoods bith in the descent phase towards a local minimum and in a perturbation phase to get out of the corresponding valley. VNS heuristics have been developed for many classical problems such as TSP, quadratic assignment, p-median, and others. Instances of the latter problem with 89600 entities in the Euclidean plane have been solved with an ex-post error not larger than 3%.

In the last 2 decades, several discovery systems for graph theory have been proposed (Cvetkovic's Graph; Fajtlowicz's Graffiti; Caporossi and Hansen's AutoGraphiX (AGX)). AGX uses VNS to find systematically extremal graphs for many conjectures judged to be of interest. Aouchiche systematically studied relations between 20 graph invariants taken by pairs, considering the four basic operations (-, +, /, x). Conjectures were found in about 700 out of 1520 cases. A majority of these 1520 cases were easy and solved automatically by AGX.  Examination of the extremal graphs found suggests open conjectures to be solved by graph theoretical means. This led to several tens of papers by various authors mainly from Serbia and from China.

Fred Jacquet

Product Marketing Manager | Big Data Analytics - Modern Data Platform

Title: Data is not wrong, but your use of it might be
Speaker
Biography:

Fred Jacquet has over 20 years’ experience in the IT industry, working in Evangelist, CTO and Architect roles within a variety of leading data-driven companies.His main area of expertise is in Business Intelligence, Data Integration and Big Data. He is committed to helping organizations on their mission to become successfully data driven through evangelization, education and enablement.

Abstract:

Some companies are born in the Data Intelligence Era, some need to redefine their IT

infrastructure to stay competitive in the world of improved analytics, scale, speed, production

and economics.The old data warehousing and Business Insights tools helped find the answers to “ What happened?” but are not able to answer the new questions asked by more and more forward thinking companies. Traditional warehousing tools such as ETL, RBMS and OLAP databases can not provide answers to questions such as “What’s happening right now?” or “What will happen?”. The new Era of Analytics demands speed, scale and reduced costs from every IT team. This presentation will take you through the considerations and steps of modernizing your data warehouse to become ready for Big Data Analytics and evaluate if the Data Lake is right for you and your business needs. Don’t be one of the many companies that failed to grasp this opportunity to leapfrog their competition. After all, over 50% of companies that were originally on the Fortune 500 list have vanished since 2000, because their failed to innovate and keep up with the changes in the market.

Speaker
Biography:

Boris Mirkin holds a PhD in Computer Science and DSc in Systems Engineering degrees from Russian Universities. He published a dozen monographs and a hundred refereed papers. In 1991-2010 he extensively traveled taking visiting research appointments in France, USA, Germany, and a teaching appointment at Birkbeck University of London, UK. He develops methods for clustering and interpretation of complex data within the “data recovery” perspective.  Currently these approaches are being extended to automation of text analysis including use of hierarchical ontologies.

Abstract:

 

Clustering is a set of major data analysis techniques. The square-error clustering criterion underlies most popular clustering methods including k-means partitioning and Ward agglomeration. For the k-means, the square-error criterion to be minimized is the sum of squared Euclidean distances from all the objects to their respective cluster centers/means, W(S,c), where S is the sought partition of the set of objects and c is the set of within-cluster means. The method’s popularity stems from the simplicity of computation and interpretation. Yet there is a catch: the user is to specify both the number of clusters and the initial locations of cluster centers, which can be of an issue sometimes. To tackle the problem, the current author proposes using the complementary criterion. It is not difficult to prove that there is a complementary criterion, B(S,c), to be maximized, such that W(S,c)+B(S,c)=T where T is the data scatter. The complementary criterion B(S,c) is the sum of individual cluster contributions, equal each to the product of the cluster’s cardinality and the squared Euclidean distance from the cluster’s center to 0. Therefore, the complementary criterion leads to a set of anomalous clusters, which can be found either one-by-one or in parallel. Our experiments show that methods emerging in this perspective are competitive, and frequently superior, to other initialization methods.

Speaker
Biography:

Morgan C. Wang received his Ph.D. from Iowa State University in 1991.  He is the funding Director of Data Mining Program and Professor of Statistics at the University of Central Florida.  He has published one book (Integrating Results through Meta-Analytic Review Using SAS Software, SAS Institute, 1999), and over 80 papers in referee journals and conference proceedings on topics including interval analysis, meta-analysis, computer security, business analytics, health care analytics and data mining. He is the elected member of International Statistical Association and member of American Statistical Association and International Chinese Statistical Association.

Abstract:

An automatic prediction model building system was developed.  This system has five components: data exploration component, data preparation component, model building component, model validation and selection component, and the result automatic generation component.  All components are resident inside the data warehouse and can be used by compony personal without model building training.  A case study using this system on solving a insurance firm in china will be discussed in this presentation as well.