Scientific Program

Conference Series Ltd invites all the participants across the globe to attend 4th International Conference on BigData Analysis and Data Mining Paris, France.

Submit your Abstract
or e-mail to

[email protected]
[email protected]
[email protected]

Day 1 :

Data Mining 2017 International Conference Keynote Speaker Michael Valivullah photo
Biography:

Michael Valivullah is currently serving as the Chief Technology Officer (CTO) at US Department of Agriculture’s (USDA) National Agricultural Statistics Service (NASS). He served as Director of Information Technology (IT) and Chief Information Officer (CIO) at NASS and the Directorate of Science and Technology in the US Department of Homeland Security (DHS). He is a member of the Senior Executive Service (SES) in the US Federal Government. He has worked for 14 years in the public sector and over 15 years in the private sector and non-profits.

Abstract:

The National Agricultural Statistics Agency (NASS) of the United States Department of Agriculture (USDA) is in a mission to provide timely, accurate and useful statistics on US agriculture. Precision agriculture (PA) is a farming management principle that measures and responds to variability in crop conditions and animal health using sensors, robots, satellites and global positioning systems (GPS). The advent of precision agriculture has provided agricultural producers an unprecedented amount of data for use in data mining and big data analytics in farming operations. As more and more farmers are using automatic and remote sensing tools to collect data to be more productive, efficient and profitable, it is in the best interest of NASS to collect data from these sources (sensors, agribots, farm data hubs, drones, etc.,) and not burden farm producers by asking for the same data in NASS surveys. Automatic data collection (machine to machine) will also eliminate manual data entry errors. NASS needs to develop new survey and data collection processes, algorithms to validate, and analyze and process this data. Data analysis will require data scientists familiar with sophisticated algorithms, artificial intelligence, decision models, predictive analytics, etc. NASS also needs data dissemination tools with sophisticated big data processing capabilities and data visualization abilities. This presentation will discuss opportunities and challenges in dealing with big data in precision agriculture.

Keynote Forum

En-Bing Lin

Central Michigan University,USA

Keynote: Big data analysis and approximations in information systems
Data Mining 2017 International Conference Keynote Speaker En-Bing Lin photo
Biography:

En-Bing Lin is a Professor of Mathematics at Central Michigan University, USA. He has been associated with several institutions including Massachusetts Institute of Technology, University of Wisconsin-Milwaukee, University of California, Riverside, University of Toledo, UCLA, and University of Illinois at Chicago. He received his PhD from Johns Hopkins University. His research interests include Data Analysis, Applied and Computational Mathematics, and Mathematical Physics. He has supervised a number of graduate and undergraduate students. He serves on the editorial boards of several journals. He has organized many special sessions at regional IEEE conference.

Abstract:

With the increasing use of advanced technology, the amount of data in our world has been exploding. Big data analytics can examine large data sets and uncover hidden patterns. On the other hand, poor quality of big data results in some inaccurate insights or compliance failures that give rise to partially complete information systems. In order to obtain complete information systems, we use Rough Set Theory (RST), which was introduced by Pawlak in 1982 as a way to deal with data analysis based on approximation methods in information systems. The theory has many applications in a number of different areas, such as engineering, environment, banking, medicine, bioinformatics, pattern recognition, data mining, machine learning and others. RST is intrinsically a study of equivalence relations on the universe (a set of object). In fact, rough sets can be used to represent ambiguity, vagueness and general uncertainty. Given some relations between objects in the set, we can construct lower and upper approximations of the objects. We intend to use some advanced computing methods to determine lower and upper approximations and find several properties of the characteristics of objects within RST, as well as to extend RST to generalized RST. This line of research has to do with some developments in big data analytics. Traditional algorithm cannot satisfy the needs of big data computing. In this presentation, we will show some advanced computing methods that can solve our problems effectively. We will also present several examples to illustrate the concepts introduced in this presentation

Keynote Forum

Petra Perne

Institute of Computer Vision and applied Computer Sciences,Germany

Keynote: Maintenance of Engineering Systems by Big Data

Time : 10:00-10:30

Data Mining 2017 International Conference Keynote Speaker Petra Perne	 photo
Biography:

Petra Perner (IAPR Fellow) is the director of the Institute of Computer Vision and Applied Computer Sciences IBaI. She received her Diploma degree in electrical engineering and her PhD degree in computer science for the work on “Data Reduction Methods for Industrial Robots with Direct Teach-in-Programing”. Her habilitation thesis was about “A Methodology for the Development of Knowledge-Based Image-Interpretation Systems". She has been the principal investigator of various national and international research projects. She received several research awards for her research work and has been awarded with 3 business awards for her work on bringing intelligent image interpretation methods and data mining methods into business. Her research interest is image analysis and interpretation, machine learning, data mining, big data, machine learning, image mining and case-based reasoning.

Abstract:

The ubiquitous availablity of high quality data European industry gathers, allows to optimize manafacturing processes even more and to stay competititve. However, while the data are rich enough to include those elements needed for optimization, the even encreasing volume, veloctiy and variety of the data make mining it effectively increasingly difficult. The paper addresses the special challenges in developing scalable algorithm and infrastructures for creating responsive analytical capabilities that produce timely prediction and monitoring alerts in industrial environments. We will describe a platform that can handle the special needs of the data and has a reach enough tool of data mining techniques. Case-Based Reasoning is used to combine streaminig data of different types ( sensor data, time series, maintenance logs etc.) as well. Special time series algorithm will be developed allowing the efficient analyisis of the machine data. It will be deploded and validated in three industrial cases where data-driven maintenances is expected is expected to have a significant impact: high-tech medical equipment, high-tech manufacturing of hard disks and structural health monitoring.

Data Mining 2017 International Conference Keynote Speaker Fionn Murtagh photo
Biography:

Fionn Murtagh is Professor of Data Science and was Professor of Computer Science, including Department Head, in many universities.   Fionn was Editor-in-Chief of the Computer Journal (British Computer Society) for more than 10 years, and is an Editorial Board member of many journals. With over 300 refereed articles and 30 books authored or edited, his fellowships and scholarly academies include: Fellow of: British Computer Society (FBCS), Institute of Mathematics and Its Applications (FIMA), International Association for Pattern Recognition (FIAPR), Royal Statistical Society (FRSS), Royal Society of Arts (FRSA).Elected Member: Royal Irish Academy (MRIA), Academia Europaea (MAE). Senior Member IEEE.

Abstract:

The benefits and also the challenges of Big Data analytics can be addressed in innovative ways.  It is known that analytical focus is important.  Considering just as an analogy for our analytics, how a microscope or a telescope bring about observation and measurement at very fine scales and at very gross scales, we can take that analogy as being associated with the resolution scale of our analysis.  Another challenge is the bias in Big Data. But we may calibrate our analytical process with a Big Data framework or infrastructure.  A further challenge of an ethical nature, is how respresentativity replaces the individual.  So we want "to rehabilitate the individual".  Important opportunities arise from contextualization. That can be associated with the resolution scale of our analytics, and it can also be supported by full account taken of appropriate contexts.  The innovation that stems from the different facets of our analytical procedures can be of great benefit.  Here we seek to discuss many such themes that are always in the context of interesting and important case studies.  The main case studies for us here include the following: analytics of mental health and associated well-being; social media analytics based on Twitter; questionnaire and survey analytics with many respondents. Ultimately what is sought is not just scalability alone, but also new and insightful, revealing and rewarding, perspectives, returns and benefits.  A book of ours, to be published in April 2017: Data Science Foundations: Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics.

  • Plenary Session
Speaker
Biography:

Karmen Kern Pipan started her career in the private sector in the field of Informatics and Telecommunications and later continued in public administration for different positions at Metrology Institute of Republic Slovenia as Quality Manager and Head of Department for Quality and Business Excellence. Currently, she works as Secretary and Project Manager at Ministry of Public Administration of Republic Slovenia in IT Directorate. As an expert, she is involved in the field of Data Management, Business Intelligence and Big Data Analytics to improve data based decision making at Slovenian Public Administration. She collaborates a lot with successful Slovenian companies and international professional sphere to identify good practices in this field. She leads an inter-ministerial task force for the preparation of the Public Administration Development Strategy 2015-2020. She has published several papers in reputed national and international conferences and journals.

Abstract:

The pilot project - Big data analysis for HR efficiency improvement has been established as part of the development oriented strategy supporting ICTs as an enabler of development of data driven public administration in Republic Slovenia. It has been run within Ministry of Public Administration of Republic Slovenia in collaboration with EMC Dell as an external partner. This pilot project has been launched aiming to learn what big data tool installed on Slovenian State Cloud Infrastructure could enable in terms of the research of HR data of our ministry to improve our efficiency. Therefore, anonymized internal data sources containing time management, HR database, finance database and public procurement had been combined with external resources using postal codes of employees and weather data to identify potentials for improvement and possible patterns of behavior. The results showed that there is considerable potential for improvement in the field of HR and lowering costs in the field of public procurement within our ministry.

Speaker
Biography:

Boris Mirkin holds a PhD in Computer Science and DSc in Systems Engineering degrees from Russian Universities. He published a dozen monographs and a hundred refereed papers. In 1991-2010 he extensively traveled taking visiting research appointments in France, USA, Germany, and a teaching appointment at Birkbeck University of London, UK. He develops methods for clustering and interpretation of complex data within the “data recovery” perspective.  Currently these approaches are being extended to automation of text analysis including use of hierarchical ontologies.

Abstract:

 

Clustering is a set of major data analysis techniques. The square-error clustering criterion underlies most popular clustering methods including k-means partitioning and Ward agglomeration. For the k-means, the square-error criterion to be minimized is the sum of squared Euclidean distances from all the objects to their respective cluster centers/means, W(S,c), where S is the sought partition of the set of objects and c is the set of within-cluster means. The method’s popularity stems from the simplicity of computation and interpretation. Yet there is a catch: the user is to specify both the number of clusters and the initial locations of cluster centers, which can be of an issue sometimes. To tackle the problem, the current author proposes using the complementary criterion. It is not difficult to prove that there is a complementary criterion, B(S,c), to be maximized, such that W(S,c)+B(S,c)=T where T is the data scatter. The complementary criterion B(S,c) is the sum of individual cluster contributions, equal each to the product of the cluster’s cardinality and the squared Euclidean distance from the cluster’s center to 0. Therefore, the complementary criterion leads to a set of anomalous clusters, which can be found either one-by-one or in parallel. Our experiments show that methods emerging in this perspective are competitive, and frequently superior, to other initialization methods.

Speaker
Biography:

Kazumi Nakamatsu received the Dr. Sc. from Kyushu University, Japan. He is a Professor at University of Hyogo, Japan. He contributed over 150 journal/conference papers and book chapters,   and edited/authored 12 books published by prominent publishers. He has chaired various international conferences/workshops, and he has been a program committee member/chair of academic conferences. He serves as Editor-in-Chief of the Int’l J. Reasoning-based Intelligent Systems and an editorial member/associate editor of many international journals. He has contributed numerous invited lectures at conferences/academic organizations. He received some conference/best paper awards at some international conferences.

Abstract:

Paraconsistent logic is a well-known formal logic that can deal with contradiction in the framework of a logical system consistently. One of paraconsistent logics called annotated logic was proposed by Prof. Newton da Costa et al. and its logic program was been developed by Prof. V.S. Subrahmanian et al. later as a tool of dealing with data in knowledge bases. Some years later a kind of paraconsistent annotated logic program has been developed for dealing with non-monotonic reasoning such as default reasoning by Kazumi Nakamatsu. Recently a paraconsistent annotated logic program called Extended Vector Annotated Logic Program with Strong Negation (abbr. EVALPSN) that can deal with conflict resolving, defeasible deontic reasoning, plausible reasoning, etc. has been developed and already been applied to various intelligent controls and safety verification systems such as pipeline valve control, traffic signal control, railway interlocking safety verification, etc. Furthermore, most recently one specific version of EVALPSN called Before-after EVALPSN (abbr. Bf-EVALPSN) that can deal with before-after relations between processes (time intervals) has been developed.

In this lecture, I introduce how EVALPSN and Bf-EVALPSN deal with contradictory data with a small example and can be applied to intelligent control or safety verification of sensed data.

Speaker
Biography:

Pavel Kisilev serves as the CTO of Artificial Intelligence at Huawei Research Center in Israel. He completed his Graduation at Technion, Israel Institute of Technology in 2002 with a PhD in Electrical Engineering. Before joining Huawei, he was a Lead Scientist at IBM Research from 2011-2016, a Senior Research Scientist at HP Labs from 2003-2011, and a Research Associate at Technion. His research interests include computer vision, deep learning, general statistical methods, and inverse problems. He is an author of over 50 filed patents, three book chapters, and of nearly 50 papers in top journals and conferences in computer science.

Abstract:

Deep Learning (DL) becomes the method of choice in many fields of computer science, including computer vision, Natural Data Processing (NLP), autonomous systems, and many others. While DL methods are known for their superior performance, in order to achieve it, large amount of training examples are required. Furthermore, the quality of training examples affects largely the performance of DL training and learnt model quality. In particular, if the training examples represent well the real world phenomena, a good model is expected to learn. If, on the other hand, the examples are highly correlated and represent only sparse knowledge about phenomena, the learnt model quality will be low. In this talk, we present several general principles and methods to diagnose the data quality, and also the suitability of DL architecture to model the data at hand. We also propose several methods to pre-process raw data to better suit the requirements of DL systems. We show several examples of applications of our framework to various datasets, including known large image datasets with millions of images, binary sequence sets, gene datasets, and others. We show the efficacy of the proposed methods to analyze and predict the performance of DL methods on a given data.

Bennett B. Borden

Drinker Biddle & Reath, USA

Title: Predicting Corporate Misconduct
Speaker
Biography:

Bennett B Borden is a Chief Data Scientist at Drinker Biddle & Reath. He is a globally recognized authority on the legal, technology and policy implications of information. His ground-breaking research into the use of machine-based learning and unstructured data for organizational insight is now being put to work in data-driven early warning systems for clients to detect and prevent corporate fraud and other misconduct. He received his Master of Science in Data Analytics at New York University and his JD at Georgetown University.

Abstract:

Corporate misconduct costs $3 Trillion worldwide every year to prevent, detect and deal with its consequences. If we can predict when someone will purchase a product, click on an ad, or how they will vote for a candidate, why can’t we predict when he or she will engage in some form of fraud or other misconduct? Well, perhaps we can. In this session, Chief Data Scientist Bennett Borden, from the law firm Drinker Biddle & Reath will present his work on developing algorithms to predict corporate misconduct and how this technology is being used today and how it will likely be used in the future.

Nikolaos Freris

New York University Abu Dhabi, UAE

Title: Exact data mining from inexact data
Speaker
Biography:

Nick Freris is an Assistant Professor of Electrical and Computer Engineering (ECE), and Director of Cyber-physical Systems Laboratory (CPSLab) at New York University Abu Dhabi. He received Diploma in ECE at National Technical University of Athens in 2005; MS degree in ECE in 2007; MS in Mathematics in 2008 and; PhD in ECE at University of Illinois at Urbana-Champaign. His work was recognized with 2014 IBM High Value Patent award. He is a senior member of IEEE, and member of SIAM and ACM.

Abstract:

Big data pertain to multiple facets of modern science and technology enlisting biology, physics, social networks, financial analysis, smart cities and many more. Despite the overwhelming amount of accessible data alongside the abundance of mining schemes, the prelude of data mining faces a key challenge in that the data are hardly ever available in their original form. Common operations such as compression, anonymization and right protection may significantly affect the accuracy of the mining outcome. We will discuss the fundamental balance between data transformation and data utility under prevalent mining operations such as search, K-nearest neighbors and clustering. In specific, we will illustrate classes of data transformation – information extraction methods where it is actually feasible to acquire the exact mining outcome even when operating on the transformed domain. This talk will feature three specific problems: Optimal distance estimation of compressed data series; nearest neighbor preserving watermarking and; cluster preserving compression. We provide provable guarantees of mining preservation, and further highlight the efficacy and efficiency of our proposed methods in a multitude of datasets: weblogs, VLSI images, stock prices, videos, and images from anthropology, natural sciences, and handwritings.

  • Data Mining Applications in Science, Engineering, Healthcare and Medicine | Data Mining Methods and Algorithms | Artificial Intelligence | Big Data Applications | Big Data Algorithm | Data Privacy and Ethics | Data Mining Analysis | Business Analytics | Optimization and Big Data | New visualization techniques | Clustering
Speaker

Chair

Michael Valivullah

NASS, USA

Speaker

Co-Chair

Petra Perne

Institute of Computer Vision and applied Computer Sciences, Germany

Session Introduction

Tommi Kärkkäinen

University of Jyväskylä, Finland

Title: Scalable robust clustering method for large and sparse data

Time : 15:00-15:20

Speaker
Biography:

Tommi Kärkkäinen has completed his PhD at University of Jyväskylä in 1995 and worked as a Full Professor in the Faculty of Information Technology since 2002. He has been and is serving in many positions of administration and responsibility at the faculty and the university level. He has published over 150 research papers, led dozens of R&D projects, and supervised over 20 PhD theses.

Abstract:

Clustering is the most common unsupervised, descriptive analysis technique to reveal hidden patterns and profiles from a dataset. There exists large number of different clustering algorithms, but approaches that specifically address clustering of sparse datasets are still scarce, even if real world datasets are many times characterized by missing values with unknown sparsity pattern. Typical approaches in the knowledge discovery process is to either completely omit the observations with missing values or use some imputation method to fill in the holes of data. However, the throw data away approach does not utilize all possible data and the imputation necessarily introduces assumptions about the unknown density of the data. Moreover, by the well-known curse-of-dimensionality results, such assumptions are no more valid in the high dimensional spaces. The purpose of this presentation is to describe and summarize a line of research that addresses the sparse clustering problems with the available data strategy and robust prototypes. The strategy allows one to utilize all available data without any additional assumptions. The actual prototype-based clustering algorithm, the k-spatial medians, relies on the computation of a robust prototype as cluster centroid, again argumenting on non-Gaussian within-cluster error in comparison to the classical k-means method. As with any prototype-based algorithm, the initialization step of the locally improving relocation algorithm has an important role and should be designed to handle the sparse data. Such an approach is proposed and the scalability of a distributed implementation of the whole algorithm is tested with openly available large and sparse datasets.

Fred Jacquet

Product Marketing Manager | Big Data Analytics - Modern Data Platform

Title: Data is not wrong, but your use of it might be
Speaker
Biography:

Fred Jacquet has over 20 years’ experience in the IT industry, working in Evangelist, CTO and Architect roles within a variety of leading data-driven companies.His main area of expertise is in Business Intelligence, Data Integration and Big Data. He is committed to helping organizations on their mission to become successfully data driven through evangelization, education and enablement.

Abstract:

Some companies are born in the Data Intelligence Era, some need to redefine their IT

infrastructure to stay competitive in the world of improved analytics, scale, speed, production

and economics.The old data warehousing and Business Insights tools helped find the answers to “ What happened?” but are not able to answer the new questions asked by more and more forward thinking companies. Traditional warehousing tools such as ETL, RBMS and OLAP databases can not provide answers to questions such as “What’s happening right now?” or “What will happen?”. The new Era of Analytics demands speed, scale and reduced costs from every IT team. This presentation will take you through the considerations and steps of modernizing your data warehouse to become ready for Big Data Analytics and evaluate if the Data Lake is right for you and your business needs. Don’t be one of the many companies that failed to grasp this opportunity to leapfrog their competition. After all, over 50% of companies that were originally on the Fortune 500 list have vanished since 2000, because their failed to innovate and keep up with the changes in the market.

Speaker
Biography:

Morgan C. Wang received his Ph.D. from Iowa State University in 1991.  He is the funding Director of Data Mining Program and Professor of Statistics at the University of Central Florida.  He has published one book (Integrating Results through Meta-Analytic Review Using SAS Software, SAS Institute, 1999), and over 80 papers in referee journals and conference proceedings on topics including interval analysis, meta-analysis, computer security, business analytics, health care analytics and data mining. He is the elected member of International Statistical Association and member of American Statistical Association and International Chinese Statistical Association.

Abstract:

An automatic prediction model building system was developed.  This system has five components: data exploration component, data preparation component, model building component, model validation and selection component, and the result automatic generation component.  All components are resident inside the data warehouse and can be used by compony personal without model building training.  A case study using this system on solving a insurance firm in china will be discussed in this presentation as well. 

Speaker
Biography:

Witold Dzwinel holds Full Professor position at AGH University of Science and Technology, Department of Computer Science in Krakow. His research activities focus on “Computer modeling and simulation methods employing discrete particles”. Simultaneously, he is doing research in interactive visualization of big data and machine learning algorithms. He is the author and co-author of about 190 papers in computational science, computational intelligence and physics.

Abstract:

Data embedding (DE) and graph visualization (GV) methods are very congruent tools used in exploratory data analysis for visualization of complex data such as high-dimensional data and complex networks, respectively. However, high computational complexity and memory loads of existing DE and GV algorithms (based on t-SNE concept from one hand, and force-directed methods from the other), considerably hinders visualization of truly large and big data consisting of as many as M~106+ data objects and N~103+ dimensions. In this paper, we demonstrate the high computational efficiency and robustness of our approach to data embedding and interactive data visualization. We show that by employing only a small fraction of distances between data objects, one can obtain very satisfactory reconstruction of the topology of N-D data in 2D in a linear-time O (M). The IVHD (Interactive Visualization of High-Dimensional Data) method quickly and properly reconstructs the N-D data topology in a fraction of computational time required for the state-of-art DE methods such as bh-SNE and all its clones. Our method can be used for both metric and non-metric (e.g. large graphs) data visualization. Moreover, we demonstrate that even poor approximations of the nearest neighbor (NN) graph, representing high-dimensional data, can yield acceptable data embedding. Furthermore, some incorrectness in the nearest neighbor list can often be useful to improve the quality of data visualization. This robustness of IVHD, together with its high memory and time efficiencies, meets perfectly the requirements of big and distributed data visualization, when finding the accurate nearest neighbor list represents a great computational challenge.

Speaker
Biography:

Nicola Wesner has completed his PhD in Economics at Paris X Nanterre University in 2001 and he is an Associate Actuary since 2011. He is the Head of the Pension department at Mazars Actuariat, an audit and advisory firm. He has published many papers in reputed journals and has specialized periodicals on various subject such as econometrics, quantitative finance, insurance and pension, and data mining.

Abstract:

This paper presents a very simple and intuitive multi-objective optimization method that makes use of interactive visualization techniques. This approach stands mid-way between the brush and link technique, a visual method used in operational research for exploratory analysis of multidimensional data sets and interactive multi-criteria decision methods that use the concept of reference point. Multiple views of the potential solutions on scatterplots allow the user to directly search acceptable solutions in bi-objective spaces whereas a Venn diagram displays information about the relative scarcity of potential acceptable solutions under distinct criteria. Those very intuitive data visualization techniques allow for comprehensive interpretation and permit to communicate the results efficiently. More generally the combination of information visualization with data mining allows the user to specify what he is looking for, yields easily reportable results and respects human responsibility. An application to the visual steering of genetic algorithms in a multi-criteria strategic asset allocation optimization problem is presented.

Speaker
Biography:

Prof. Dr. S.N.Mohanty, received his PhD from IIT Kharagpur, India in the year 2014, with MHRD scholarship from Govt of India. He has recently joined as Asst. Professor   in the School of Computer Science & Engineering at KIIT University. His research areas include Data mining, Big Data Analysis, Cognitive Science, Fuzzy Decision Making, Brain-Computer Interface, Cognition, and Computational Intelligence. Prof. S N Mohanty has received 2 Best Paper Awards during his PhD at IIT Kharagpur from International Conference at Benjing, China, and the other at  International Conference on Soft Computing Applications organized by IIT Rookee in the year 2013. He has published 5 International Journals of International repute and has been elected as Member of Institute of Engineers and IEEE Computer Society. He also the  reviewer of IJAP, IJDM International Journals.

Abstract:

The process of cell-phone selection for purchasing is a multi-criteria decision-making (MCDM) problem with conflicting and diverse objective. In this study, discusses various techniques using machine learning approach. To begin, participants responded to a questionnaire having different latest features available in a cell-phone. Seven independent input variables cost, talk-time, rear camera, weight, size, memory and operating system, where then derive from participants respondents. Linguistic terms such as low, medium and high were used to represent each of the input variables. Using Mamdani approach both traditional fuzzy reasoning tool (FLC) and neuro-fuzzy system (ANFIS) were designed for three input and one output process. The neuro-fuzzy system was trained using a back-propagation algorithm. Compare two traditional fuzzy reasoning tool, artificial neural network approach (ANN) the neuro-fuzzy system could provide better accuracy for selecting a cell-phone for personal use.

Speaker
Biography:

Jenny Lundberg has completed her PhD in Computer Science at 2011 at BTH, the most profiled IT University in Sweden. She is a Senior Lecturer at Linnaeus University and a Researcher at Lund University. She has extensive international research and education collaboration experiences. Her research interest is in health applications and work in close cooperation with clinical researchers in healthcare with Big data, e- and m-health approaches and techniques.

Abstract:

As the industry 4.0 era gives us extensive IoT opportunities to provide evidence and context based data, opening for new approaches and methods to meet societal challenges. The handling of chronic diseases is global and poses challenges to the current health systems. The incidence of the chronic disease diabetes is of epidemic character. In 2015, 415 million in the world have diabetes and it is estimated that by 2040, 642 million in the world will have diabetes. More specifically: Diabetes is a heterogeneous group of conditions that all result in, and defined by, rise in plasma glucose level if not well treated. If it is untreated, death is sooner or later, if later with a lot of unpleasant complications over time. There are two main types of diabetes, type 1 (10% of all); and type 2 (85-90% of all). Diabetes places extremely high demands on the individual in terms of self-care, and a lot of complications can occur. It is a well-known fact that this creates serious health condition and high social costs. Potentially, this can be prevented with new methods for better support for self-care. Given developments, recent advances in mobile computing, sensor technology, big data can be used to better understand diabetes, measurements and data. To overcome some of the problems in this area, open data, such as social media, special designed apps, sensors and wearables can be used to find proactive ways and methods of diabetes treatment.

  • Young Researchers Forum

Session Introduction

Kruy Seng

Lingnan University, Hong Kong

Title: Cost-sensitive deep neural networks to address class imbalanced problem
Speaker
Biography:

Kruy Seng is currently an MPhil student in Department of Computing and Decision Sciences at Lingnan University. His research interests include Data Mining in Business Application and Machine Learning.

Abstract:

Class imbalance is one of the common issues in data mining. Most data in real-life applications are imbalanced, that is, the number of examples belonged to a class is significantly greater than those of the others. Abnormal behaviors or rare events are the main reasons that cause the distribution of classes to be imbalanced. The scarcity of minority class examples leads the standard classifiers to be more focused on the majority class examples while ignores the other. As a result, classification becomes more challenging. A number of classification algorithms were proposed in the past decades; however, they are accuracy-oriented and unable to address class imbalanced problem. Much research has been conducted to address this problem. However, it is still an intensive research topic. In this work, we propose an approach called class-based cost-sensitive deep neural networks to perform classification on imbalanced data. The costs of misclassification of each type of errors are treated differently and are incorporated into the training process of deep neural networks. We also generalize the method by reducing the effort of hyper parameter selection by adapting evolutionary algorithm search for optimal cost vector setting and network structure. Finally, experiments will be conducted to analyze the performance and compare with other existing methods.

Prajakta Diwanji

FHNW University of Applied Sciences and Arts Northwestern

Title: Data driven intelligent analytics in education domain
Speaker
Biography:

Prajakta Diwanji is working as a Researcher in Information Systems at University of Applied Sciences and Arts, Northwestern Switzerland (FHNW). She is a first-year Doctoral student at University of Camerino, Italy. Her research interest is in the area of intelligent data analytics in education domain. She has completed her Master’s degree in Business Information Systems from FHNW, Switzerland, and Masters in Computer Science at University of Pune, India. She has more than seven years of work experience in IT industry where she has taken up several challenging roles. During this tenure, she has worked with international companies like Roche Pharma, Switzerland and IBM, India etc.

Abstract:

In recent times, there is a steady growth in student’s personal as well as academic data in the education field. Many universities and institutes have adopted information systems like virtual learning environments, learning management systems and social networks that collect student’s digital footprints. This data is both large in volume and diversity. Learning analytics offers tools to facilitate the understanding of different parameters related to student’s engagement/motivation, learning behavior, performance, teaching content and learning environment. Such information could help teachers better prepare for the classroom sessions and to deliver personalized or adaptive learning experiences. This, in turn, could enhance student performance. The current literature research states that there is a shift of focus from classroom based learning to a more anytime, anywhere learning; as well as from a teacher as a sole knowledge contributor to agent or learner as a contributor towards learning. The use of intelligent digital tutors/chatbots has taken the learning process to a new level of student engagement, interaction, and learning. Such intelligent data analysis tools/systems make use of data analysis techniques like machine learning, natural language processing etc. along with artificial/cognitive intelligence techniques. The research work identifies the current challenges faced by universities in learning/teaching processes in a real world context and tries to address these problems using data driven intelligent analysis methods. The main goal would be to focus on preparing students as well as lecturers effectively for the classroom lectures; to understand learning needs of students beforehand; to address those needs proactively in a timely manner.

Speaker
Biography:

Yuko SASA is a young researcher in the field of social robotics, finishing a PhD the LIG - computer science lab and financed by the Labex Persyval-lab. She completed a computational linguistics Master in 2012, and a Gerontechnology Master in 2013, at Grenoble Alps University. Her supervisors are V. Aubergé (LIG), G. Feng (Gipsa-Lab) and Y. Sagisaka (Waseda University). She is enrolled in several academic committees and was selected for international research programs as the French-American Doctoral Exchange Seminar on Cyber-Physical Systems (FadEx - French Embassy), or the ROW (Research Opportunities Week - Technical University of Munich).

Abstract:

The availability of real-life data through the IoT and ICT technologies, completed by the augmenting computers' calculability have developed the paradigm of Big Data, to process an enormous amount of information to automate various system as speech technologies and robotics. The motivation resided on the "intelligence" of data to handle the naturalness and variability of the human behaviors. The actual machine learning techniques, as DNN, let the computer mathematics to approximate and to generalize the cognitive processes more than modelizing them. The quantity of data aim is then to cover the knowledge, which is poorly explicit. They maybe rely on too many implicit mechanisms locked up into black boxes. The Domus-LIG Experimental Living-Lab methodology and the EmOz platform, a wizard of oz tools interfaced with a robot, both developed to induce, observe and collect spontaneous and ecological interactional data of human-robot communication, particularly with socio-affective values. The robot is thus a measuring instrument of the human multimodal speech features' effects, strongly hypothesized. This methodology works on agile loops of processing to control the data contents and format them in short and rapid processes. This evolving corpus is the basis for an iterative machine learning, dedicated to automatic recognition systems. Different studies leading to the EEE (EmOz Elderly Expressions) or the GEE (Gesture EmOz Expressions) corpora this approach. This bootstrap is an invitation to discuss the possible mechanisms to move from Small Smart Data to relevant Big Data.

Speaker
Biography:

Febriana Misdianti is a Post-graduate student at University of Manchester. She has completed her Bachelor degree of Computer Science at Universitas Indonesia and has two years of working experience in startup companies in Jakarta and Singapore. She has been winning a lot of competitions related to computer science and has published a paper about data security in a reputable journal.

Abstract:

K-nearest neighbor (k-NN) classifier is widely used for classifying data in various segments. However, k-NN classifier has high computational cost because it uses linear search through all training data. In naïve implementation, k-NN will go through all training set i to compute its distance d from input data (O(nd)). Then, it loops again for all training set n to find k smallest results (O(nk)). So, the overall time complexity for k-NN is O(nd+nk). Thus, it is not suitable to classify multidimensional data with a huge number of training set. Meanwhile, k-NN needs a large number of samples in order to work well. There have been several ideas proposed to increase the time performance of k-NN in predicting a test data. One of the popular ideas is by reducing the number of TS in a model so that it would cut the testing time as the number of data that need to be explored becomes smaller. The aim of this experiment is to implement k-NN editing algorithms that cut the number of training data so that it become faster in predicting an input. This experiment implements three editing algorithms, namely, Wilson’s editing, holdout editing, and Multiedit algorithm, and also compare the performance of them.