Day 2 :
University of Derby, UK
Fionn Murtagh is Professor of Data Science and was Professor of Computer Science, including Department Head, in many universities. Fionn was Editor-in-Chief of the Computer Journal (British Computer Society) for more than 10 years, and is an Editorial Board member of many journals. With over 300 refereed articles and 30 books authored or edited, his fellowships and scholarly academies include: Fellow of: British Computer Society (FBCS), Institute of Mathematics and Its Applications (FIMA), International Association for Pattern Recognition (FIAPR), Royal Statistical Society (FRSS), Royal Society of Arts (FRSA).Elected Member: Royal Irish Academy (MRIA), Academia Europaea (MAE). Senior Member IEEE.
The benefits and also the challenges of Big Data analytics can be addressed in innovative ways. It is known that analytical focus is important. Considering just as an analogy for our analytics, how a microscope or a telescope bring about observation and measurement at very fine scales and at very gross scales, we can take that analogy as being associated with the resolution scale of our analysis. Another challenge is the bias in Big Data. But we may calibrate our analytical process with a Big Data framework or infrastructure. A further challenge of an ethical nature, is how respresentativity replaces the individual. So we want "to rehabilitate the individual". Important opportunities arise from contextualization. That can be associated with the resolution scale of our analytics, and it can also be supported by full account taken of appropriate contexts. The innovation that stems from the different facets of our analytical procedures can be of great benefit. Here we seek to discuss many such themes that are always in the context of interesting and important case studies. The main case studies for us here include the following: analytics of mental health and associated well-being; social media analytics based on Twitter; questionnaire and survey analytics with many respondents. Ultimately what is sought is not just scalability alone, but also new and insightful, revealing and rewarding, perspectives, returns and benefits. A book of ours, to be published in April 2017: Data Science Foundations: Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics.
National Research University Higher School of Economics,Russia
Professor Fuad Aleskerov is a leading scientist in mathematics and multicriterial choice and decision making theory. Fuad Aleskerov is the Head of the International Laboratory of Decision Choice and Analysis and the Head of the Department of Mathematics for Economics of the National Research University Higher School of Economics (Moscow, Russia). He has published 10 books, many articles in leading academic journals. He is a member of several scientific societies, member of editorial boards of several journals, founder and head of many conferences and workshops. He has been an invited speaker on numerous international conferences, workshops, and seminars.
The problem of the high computational complexity of most accurate algorithms in search, rank, and recommendation applications is critical when we deal with large datasets. Even the quadratic complexity may be unadmissible. Thus, the task is to develop efficient algorithms by consistent reduction of information and by the use of linear algorithms on the first steps.
The problem of whether functions of several variables can be expressed as superposition of functions of fewer variables was firstly formulated by Hilbert in 1900 as the Hilbert’s thirteens problem. The answer to this general question for the class of continuous functions was given in 1957 by Arnold and Kolmogorov. For the class of choice functions this matter was studied only by our team.
A new effective method for search, ranking, and recommendation problems in large datasets is proposed based on superposition of choice functions. The developed algorithms have low computational complexity so they can be applied on big data. One of the main features of the method is the ability to identify the set of efficient options when one deals with large number of options or criteria. Another feature of the method is the ability to adjust its computational complexity. The application of the developed algorithms to the Microsoft LETOR dataset showed 35% higher efficiency comparing to the standard techniques (for instance, SVM).
The proposed methods can be applied, for instance, for the selection of effective options in search and recommendation systems, decision support systems, Internet networks, traffic classification systems and other relevant fields.
- Data Mining Tasks and Processes
Independent Consultant (Architect, Developer) Big Data & Data Science, USA
Sumit has more than 22 years of experience in the Software Industry in various roles spanning companies from startups to enterprises.He is a big data, visualisation and data science consultant and a software architect and big data enthusiast and builds end-to-end data-driven analytic systems.Sumit has worked for Microsoft (SQL server development team), Oracle (OLAP development team) and Verizon (Big Data analytics team) in a career spanning 22 years.Currently, he works for multiple clients advising them on their data architectures and big data solutions and does hands on coding with Spark, Scala, Java and Python.Sumit has spoken at Big Data Conferences in Boston, Chicago, Las Vegas and Vancouver.He has extensive experience in building scalable systems across the stack from middletier, data tier to visualization for analytics applications, using BigData, NoSQL DB. Sumit has deep expertise in DataBase Internals, Data Warehouses, Dimensional Modeling, Data Science with Scala, Java and Python and SQL.Sumit started his career being part of SQLServer Development Team at Microsoft in 1996-97 and then as a Core Server Engineer for Oracle Corporation at their OLAP Development team in Boston, MA, USA.Sumit has also worked at Verizon as an Associate Director for Big Data Architecture, where he strategized, managed, architected and developed platforms and solutions for analytics and machine learning applications.Sumit has also served as Chief Architect at ModelN/LeapfrogRX (2006-2013)- where he architected the middle tier core Analytics Platform with open source olap engine (Mondrian) on J2EE and solved some complex Dimensional ETL, Modelling and performance optimization problems.
With the rapid adoption of Hadoop in the enterprise it has become all the more important to build SQL Engines on Hadoop for all kinds of workloads for almost all kind of end users and use cases. From low latency analytics based SQL to ACID based semantics on Hadoop for Operational Systems, to SQL for handling unstructured and streaming data, SQL is fast becoming the ligua-franca in the big data world too. The talk focuses on the exciting tools, technologies and innovations and their underlying architectures and the exciting road ahead in this space. This is a fiercely competitive landscape with vendors and innovators trying to capture mindshare and piece of the pie – with a whole suite of innovations like – index based SQL solutions in Hadoop to OLAP with Apache Kylin and Tajo to BlinkDB and MapD.
- Why SQL on Hadoop
- Challenges of SQL on Hadoop
- SQL on Hadoop Architectures for Low Latency Analytics ( Drill, Impala, Presto, SparkSQL, JethroData)
- SQL on Hadoop Architecture for Semi-Structured Data
- SQL on Hadoop Architecture for Streaming Data and Operational Analytics
- Innovations ( OLAP on Hadoop, Probabilistic SQL Engines, GPU Based SQL Solutions )
Drinker Biddle & Reath, Washington, DC
Bennett B. Borden is a partner at Drinker Biddle & Reath and its Chief Data Scientist, the only Chief Data Scientist who is also a practicing attorney. Bennett is a globally recognized authority on the legal, technology and policy implications of information. Bennett’s ground-breaking research into the use of machine-based learning and unstructured data for organizational insight is now being put to work in data-driven early warning systems for clients to detect and prevent corporate fraud and other misconduct. Bennett received his Masters of Science in Data Analytics at New York University and his JD from Georgetown University.
Analytic models are playing an increasing role in the development, delivery and availability of goods and services. Who gets access to what goods or services and at what price are increasingly influenced by algorithms. This may not matter when we’re talking about a $0.25 coupon for a candy bar, but what about public goods and services like education, healthcare, and energy distribution? What about predicting who will get a job or how we will police our society? In this session, we will explore the socioeconomic impact of algorithms, the ethics of big data, and how to work ethics into our analytics projects.
Jenny Lundberg has completed her PhD in Computer Science at 2011 at BTH, the most profiled IT University in Sweden. She is employed as a senior lecturer at Linnaeus University and as a researcher at Lund University. She has extensive international research & education collaboration experiences. Her research interest is in health applications and work in close cooperation with clinical researchers in healthcare with Big Data, e- and m-health approaches & techniques. Taking an active approach to including computing competence in early age of education, fostering technology for all, are another important focus area for her.
As the industry 4.0 era gives us extensive IoT opportunities to provide evidence and context based data, opening for new approaches and methods to meet societal challenges. The handling of chronic diseases are global and pose challenges to the current health systems. The incidence of the chronic disease diabetes are of epidemic character. 2015, 415 million in the world have diabetes and it is estimated that in 2040, 642 million in the world will have diabetes http://www.idf.org/about-diabetes/facts-figures. More specifically:
• Diabetes is a heterogeneous group of conditions that all result in, and defined by, plasma glucose rises above normal levels chronic, if not well treated. If untreated, death is sooner or later, if later with a lot of unpleasant complications over time.
• There are two main types of diabetes, of interest to know the type 1 (10% of all); and type 2 (85-90% of all).
Diabetes places extremely high demands on the individual in terms of self-care, and a lot of complications can occur. It is a well-known fact that this creates serious health condition and high social costs. Potentially, this can be prevented with new methods for better support for self-care. Given developments, recent advances in mobile computing, sensor technology, Big Data can be used to better understand diabetes, measurements and data. To overcome some of the problems in this area, open data, social media, special designed apps, sensors and wearables can be used to find proactive ways and methods of diabetes treatment.