Pengchu Zhang
Sandia National Laboratories, USA
Title: Enhancement of Enterprise Search with Neural Language Model
Biography
Biography: Pengchu Zhang
Abstract
A significant problem that reduces the effectiveness of enterprise search is query terms that do not exist in the enterprise data. Consequently, enterprise search generates no results or the answers match the exact query terms and do not take into account related terms. This results in a high rate of false positives in terms of information relevance. Recent developments in neural language model (NLM), specifically, the word2vec model initiated by Google researchers has drawn a great deal of attention in last two years. This model uses multiple layers of neural networks to represent words into vector spaces. The vector representation of words carries both semantic as well syntactic meanings. Terms with the semantic similarities are close together in the vector space as measured by their Euclidean distances. Enterprise search may utilize the “contextual†relationships between words to intelligently increase the breath and quality of search results. Application of the NML in our enterprise search promises to significantly improve the findability and relevance of returned information. We expand the query term(s) into a set of related terms using the trained term vectors based on corporate data repositories as well as well as making use of Wikipedia. The expanded set of terms is used to search the indexed enterprise data. The most relevant data rises in ranking including documents which may not contain the original query terms. In this presentation, we will also discuss the potential and limitations of applying NLM in search and other aspects of enterprise knowledge management.