Scientific Program

Conference Series Ltd invites all the participants across the globe to attend 2nd International Conference on Big Data Analysis and Data Mining San Antonio, USA.

Day 1 :

  • Data Mining Methods and Algorithms

Session Introduction

Pilar Rey del Castillo

Instituto de Estudios Fiscales, Spain

Title: Big Data for Official Statistics
Speaker
Biography:

Pilar Rey del Castillo is a statistician with 35 years of experience. She got a degree in Mathematics from the Autonomous University of Madrid, a master degree in Time Series from the Bank of Spain and a PhD (Computer Sciences and Artificial Intelligence) from the Technical University of Madrid. Senior Statistician of the Spanish administration, she has performed different functions in the Spanish National Statistical Institute, the Spanish Sociological Research Center, and, currently, the Spanish Fiscal Studies Institute. She also worked as an assistant professor from 1994 to 1998 in the Carlos III University and, from 2011 until 2015, as an expert in Eurostat, European Commission.

Abstract:

The availability of copious data about many human, social and economic phenomena is being nowadays considered as an opportunity for the production of official statistics. National statistical organizations and other institutions are more and more involved in new projects for developing what sometimes is seen as a possible change of paradigm in the way statistical figures are produced. Nevertheless, there are hardly any systems in production using Big Data sources. Arguments of confidentiality, data ownership, representativeness and others make it a difficult task to get results in the short-term. Using Call Detail Records from Ivory Coast as illustration, this paper shows some of the issues that must be dealt with when producing statistical indicators from Big Data sources. A proposal of a specific method to evaluate the quality when using the data to compute figures of daily commutes between home and work is also presented.

Abdulmohsen Algarni

King Khalid University, Saudi Arabia

Title: Selecting Training Documents for Better Learning
Speaker
Biography:

Abdulmohsen Algarni received the PhD degrees in the Faculty of Information Technology at Queensland University of Technology, Brisbane, Australia in 2011. He is currently an assistant professor in the Department of Computer Science, king Khalid University. His research interests include web intelligence, data mining,Text intelligence, information retrieval and information systems.

Abstract:

In general, there are two types of feedback documents: positive feedback documents and negative feedback documents. Term-­‐based approaches can extract many features in text documents, but most include noise. It is clear that all feedback documents contain some noise knowledge that affects the quality of the extracted features. The amount of noise is different from document to another. Therefore, reducing the noise data in the training documents would help to reduce noise in the extracted features. Moreover, we believe that removing some training documents (documents that contain more noise data than useful data) can help to improve the effectiveness of a classifier. Based on that observation, we found that short documents are more important than long documents. Testing that idea, we found that using the advantages of short training documents to improve the quality of extracted features can give a promising result. Moreover, we found that not all training documents are useful for training the classifier.

Speaker
Biography:

Azamat Kibekbaev received his B.S. (2011) from Fatih University and M.S. (2013) from same University both in Industrial Engineering, and doing his Ph.D. (2013) in Industrial Engineering at Özyeğin University. He is particularly interested in data mining applications in banking and healthcare analytics.

Abstract:

This paper aims to predict incomes of customers for banks. In this large-scale income prediction benchmarking paper, we study the performance of various state-of-the-art regression algorithms (e.g. ordinary least squares regression, beta regression, robust regression, ridge regression, MARS, ANN, LS-SVM and CART, as well as two-stage models which combine multiple techniques) applied to five real-life datasets. A total of 16 techniques are compared using 10 different performance measures such as R2, hit rate and preciseness etc. It is found that the traditional linear regression results perform comparable to more sophisticated non-linear and two-stage models. The experiments also indicated that many regression techniques yield performances which are quite competitive with each other (as MARS, M5P, ANN and LSSVM).

Speaker
Biography:

Wan Sik Nam was born in Seoul, Republic of Korea, He is currently pursuing the B.S. degree from the School of Industrial Management Engineering, Korea University and working for the Samsung Electronics company as a semiconductor manufacturing engineer. His research interests include the yield prediction model of a semiconductor product that is capable of predicting the wafer yield using the variables generated from the virtual metrology in the semiconductor manufacturing.

Abstract:

Wan Sik Nam was born in Seoul, Republic of Korea, He is currently pursuing the B.S. degree from the School of Industrial Management Engineering, Korea University and working for the Samsung Electronics company as a semiconductor manufacturing engineer. His research interests include the yield prediction model of a semiconductor product that is capable of predicting the wafer yield using the variables generated from the virtual metrology in the semiconductor manufacturing.

Biography:

Arvind pandiyan is currently pursuing his MS in Computer Science in UT, Dallas and has graduated from PES institute of Technology in the year 2014. He has research interests in Data Mining, Machine learning and Big Data Analysis.

Abstract:

Abstract: Dynamic Time Warping (DTW) is one of the prevailing distance measures used in time-series, though it is computationally costly. DTW is providing optimal alignment between two time-series. The time-series show similarity and DTW exploits the existence of similarity. In this paper, we present techniques that can be employed to improve similarity search in irregular time series data. The drawbacks in the classical approach of converting the irregular time series to a regular one before the similarity search techniques are identified and appropriate solutions for overcoming them are implemented. Simulations with real and synthetic data sets reveal that the proposed techniques are performing well with irregular time series data sets.

Speaker
Biography:

Dr. Bhabani Shankar Prasad Mishra is working as an Associate Professor in School of Computer Engineering at KIIT University, Bhubaneswar, Odisha since 2006. He has received the B.Tech in Computer Science in 2003 with Honour’s and distinction, and completed M.Tech in 2005. In M.Tech he has received Gold and Silver Medal from the University. He has received his PhD degree in Computer Science from F.M.University, Balasore, Odisha in 2011. He completed his Post Doctoral Research in Soft Computing Laboratory, Yonsei University, Seoul, South Korea under the Technology Research Program for Brain Science through the National Research Foundation, Ministry of Education, Science and Technology, South Korea. His research interest includes Evolutionary Computation, Neural Networks, Pattern Recognition, Data warehousing and Data Mining, and Big Data. He has already published about 30 research papers in referred journals and conferences, has published one book and edited two books in his credit. He is also acting as an editorial member of various journals.

Abstract:

Many of the optimization problems in the real world are multi-objective in nature, and non-dominated Sorting Genetic Algorithm (NSGA II)is commonly used as a problem solving tool.However, multi-objective problems with non-convex and discrete Pareto front can take enormous computation time to converge to the true Pareto front. Hence, the classical NSGA II (i.e., non- Parallel NSGA II) may fail to solve in -tolerable amount of time.In this context, we can argue that parallel processing techniquescan be a suitable tool of choice to overcome this difficulty. In this paper we study three different modelsi.e., trigger, island,andcone separation to parallelize NSGA-IIto solve multi-objective 0/1 knapsack problem. Further,we emphasize on two factors that can scale the parallelism i.e., convergence and time. The experimental results confirm that cone separation model is showing a clear edge over trigger and island models in terms of processing time and approximation to the true Pareto front.

  • Data Mining Tasks and Processes
Location: USA
Speaker
Biography:

Dr. Podili.V.S.Srinivas is working as a Professor in the Department of Computer Science & Engineering, at Gokaraju Rangaraju Institute of Engineering & Technology, Hyderabad, Telangana. He has obtained his Ph.D in Computer Science and Engineering in the area of Computer Networks from JNTUH, Hyderabad in the year 2009. He obtained his M.Tech from JNTUH, Hyderabad in 2003 and obtained his Graduation from Institution of Engineers (India) in 1990. He has a rich experience of total 23 years, out of which 2 years of Industry and 21 Years of academic. Professor Srinivas has published 72 research papers in different refereed International journals and conferences in India as well as abroad. His areas of interests include Computer Networks, Cloud Computing, Big Data and IoT. He delivers many tutorial talks, invited talks and also presented research papers in many International Conferences and Workshops.

Abstract:

Dr. Podili.V.S.Srinivas is working as a Professor in the Department of Computer Science & Engineering, at Gokaraju Rangaraju Institute of Engineering & Technology, Hyderabad, Telangana. He has obtained his Ph.D in Computer Science and Engineering in the area of Computer Networks from JNTUH, Hyderabad in the year 2009. He obtained his M.Tech from JNTUH, Hyderabad in 2003 and obtained his Graduation from Institution of Engineers (India) in 1990. He has a rich experience of total 23 years, out of which 2 years of Industry and 21 Years of academic. Professor Srinivas has published 72 research papers in different refereed International journals and conferences in India as well as abroad. His areas of interests include Computer Networks, Cloud Computing, Big Data and IoT. He delivers many tutorial talks, invited talks and also presented research papers in many International Conferences and Workshops.

  • Data Mining Tools and Software
Location: USA
Speaker
Biography:

Dike O.A. has completed his M.Sc at the age of 39 years in Statistics from Abia State University, Uturu, and a Doctoral student in Statistics at Abia State University, Uturu. He is the Head of Department of Mathematics/Statistics in Akanu Ibiam Federal Polytechnic, Unwana, Nigeria. He has published more than 10 papers in reputed journals and is currently serving as a reviewer in Central Bank of Nigeria (CBN) Journal of Applied Statistics and a member Editorial Board of School of Science Journal.

Abstract:

In this paper, we studied the effect of square root transformation on a Gamma distributed error component of a multiplicative error model with mean 1.0with a view to establishing the condition for the successful transformation. The probability density function (pdf), first and second moments of the square root transformed error component (et*) were established. From the results of the study, it was found that the square root transformed error component was normal with unit mean and variance, approximately1/4times that of the original error (et) before transformation except when the shape parameter is equal to one. However, Anderson Darling’s test for normality on the simulated error terms confirmed normality for et* at (P<0.05). These showed that the square root transformation normalizes a non-normal Gamma distributed error component. Finally, numerical illustrations were used to back up the results established. Thus, a successful square root transformation is achieved when 1/42< 1.0 which implies that 2 ¼.

Luís Sousa

University of Porto, Portugal

Title: Models for the prediction of rockburst indexes
Speaker
Biography:

Prof. Sousa has more than 40 years of engineering experience. He has an extensive international experience on a large range of projects. He is Full Professor at the University of Porto in Portugal and is multilingual. He has authored or co-authored over 20 books and hundreds of journal articles, presentations and reports. He was President of SKEC Engineering Consulting and is consultant for Laboratory of Deep Underground Engineering, Beijing and consulting engineer in Switzerland, China, Oman, and Portugal. He is now professor at China University of Mining and Technology, Beijing, and Sichuan University, Chengdu, China.

Abstract:

In underground works engineering rockburst is characterized by a violent explosion of a rock block causing a sudden rupture in the rock and is quite common at high depths and every year is responsible for many accidents worldwide. It is critical to understand the phenomenon of rockburst, focusing on the patterns of occurrence so these events can be avoided and/or managed saving costs and possibly lives. The failure mechanism of rockburst needs to be better understood. Laboratory experiments are undergoing at the Laboratory for Geomechanics and Deep Underground Engineering of Beijing. A large number of rockburst tests were performed and their information collected, stored in a database and analyzed. Data Mining (Multiple Regression, Artificial Neural Networks and Support Vector Machines) techniques were applied to the database in order to develop predictive models for the rockburst maximum stress (σRB) and rockburst risk index (IRB). These indexes are very important in rockburst prediction and characterization. The database was composed by 139 laboratory rockburst tests. The results for σRB emphasized the importance of the uniaxial compressive strength of the rock and the horizontal in situ stresses. All the developed models presented excellent results, however the model based on the Support Vector Machines algorithm presents the best performance. The models developed for IRB presented excellent results when the Artificial Neural Network algorithm was used. With the developed models it is possible to predict these parameters with high accuracy levels using data from the rock mass and specific project.

Speaker
Biography:

Ogunjobi, Olivia Abiola is a Senior Business Analyst with Dangote group. A B.Sc Statistics graduate from University of Ilorin, Kwara State, Nigeria. She is currently working on the business strategy of a new project-Innovation (a new brand). She has been with Dangote Group since 2009. She possess excellent numeric skills, ability to multi task, innovative, target oriented Excellent communication and interpersonal relationship skills, articulate and very effective working with people of different backgrounds and temperament. She is very hard working and delivers on the job within timelines.

Abstract:

Data mining tools are software components and theories that allow users to extract information from data. The tools provide individuals and companies with the ability to gather large amounts of data and use it to make determinations about a particular user or groups of users. Some of the most common uses of data mining tools are in the fields of marketing, Sales, fraud protections and surveillance. The manual extraction of data has existed for hundreds of years. However, the automation of data mining has been most prevalent since the dawn of the computer age. During the 20th century, various computer sciences emerged to help support the concept of developing data mining tools. The overall goal of the utilization of the tools is to uncover hidden patterns. For example, if a marketing company finds that a person takes a monthly trip from New York City to Los Angeles, it becomes beneficial for that company to advertise details of the destination to the individual. Data mining is a process that analysis large amount of data to find new and hidden information that improves business efficiency and it is used to gain competitive advantage and helps the business grow. To predict market trend. Used to analyse shopping pattern within stores based on POS (Point of Sales) information and it is used to answer questions like: how much our customer is likely to spend over a period of time, to know the frequency of customer purchasing behaviour. To know the best type of advert to use to market our product. To know the most effective means of advertisement. And it helps us to improve decision making process which has led to improved efficiency in inventory management and financial forecasting. Data mining tools also helps to determine the business trend of the company, it also helps in Planning: Budgeting &forecasting, on the overall it will enhance business growth and profitability.

Speaker
Biography:

Dr. Ahmed AL-Masri received his Ph.D degree in the field of Artificial Intelligence application from University Putra Malaysia. He has more than 6 years’ experience in teaching, programming and research. He involved in many project used artificial neural network system such as forecasting, online monitoring, smart grid, security assessment in electrical power system and dynamic system stability. He is acting as a reviewer for various International and National Journals, and Member of the Institute of Electrical and Electronic Engineers (IEEE). His professional expertise is in the design, analysis of artificial intelligence system, security assessment, parallel processing, virtualization, cloud computing and system automation.

Abstract:

Big Data analytics is one of the great trials in Learning Machine algorithms, as most of real life applications include a massive information or big data knowledge base. On the other hand, the artificial intelligent system with data knowledge base should be able to compute the result in very accurate and fast manner. This paper focused on the challenges and solutions of using Learning Machine with Big Data. Data processing is a mandatory step to transform Big Data which is unstructured into meaningful and optimized data set in any LM module. However, it is necessary to deploy an optimized data set to support the distributed processing and real-time application. This work also reviewed the current used technologies on the Big Data analysis and LM computation. The revision emphasizes on the viability of using different solution for a certain applications could increase the performance of LM. The new development especially in cloud computing and data transaction speed gives more advantages to the practical use of artificial intelligence applications.

Speaker
Biography:

Dominik Ślęzak received Ph.D. in 2002 from University of Warsaw and D.Sc. in 2011 from Polish Academy of Sciences. In 2005 he co-founded Infobright Inc., where he holds position of chief scientist. He is also associate professor at Institute of Mathematics, University of Warsaw. He delivered invited talks at over 20 international conferences. He is co-author of over 150 papers and co-inventor in 5 granted US patents. He serves as associate editor for several scientific journals. In 2014 he served as general program chair of IEEE/WIC/ACM Web Intelligence Congress. In 2012-2014 he served as president of International Rough Set Society.

Abstract:

We outline current development of the Knowledge Pit platform (knowledgepit.fedcsis.org) aimed at organization of online data mining competitions. We summarize competitions held so far and planned for the nearest future. We discuss how assumptions about characteristics of complex classification problems and their modern solutions have affected the architecture of our platform, with respect to the size and dimensionality of considered data sets, comparative evaluation of submitted classifiers and their ensembles, as well as final utilization of the best submitted solutions in practice. As case studies, we investigate data-mining-related challenges emerging in our current research projects concerning risk management in coal mines and fire&rescue actions.

  • Data Warehousing
Location: USA