Biography
Biography: Alex V. Vasenkov
Abstract
This talk will focus on Big Data for Research and Development (R&D). There are several definitions of Big Data which create confusion about this subject. There is even more confusion about synthetic big data that can be defined as a collection of research articles, Ph. D. theses, patents, test reports and product description reports. Such data have emerging attributes like high volume, high velocity, high variety, and veracity that make an analysis of synthetic data difficult. There is an emergent need for a framework that can synergistically integrate search or information retrieval (IR) with information extraction (IE). Traditional IR-based text searching can be used for a quick exploration of large collections of synthetic data. However, this approach is incapable of finding specific R&D concepts in such collections and establishing connections between these concepts. Also, the IR models lack an ability to learn concepts and relationships between the concepts. In contrast, the IE models are too specific and typically require customization for a domain of interest. A novel framework will be presented and its feasibility to mine synthetic data will be shown. It was found possible to partially or fully automate analysis of synthetic data to find labeled information and connecting concepts. The present framework can help individuals to identify non-obvious solutions to R&D problems, to serve as an input for innovation, or to categorize prior art relevant to a technological concept or a patent application in question.