Knowledge Upload Service Using Semantic Based Categorization

AI (Artificial Intelligence) is one of the hot keywords in Computer Science research and development these days. Applications of AI are very diverse that include but not limited to image recognition, voice recognition, industry manufacturing, auto driving and human interaction for any service industry. Here we propose a knowledge upload system utilizing different AI techniques. The knowledge recording part of the system works in this way. First the human voice is captured using microphone. Later on the captured voice is converted into text using voice recognition techniques. Finally, the recognized text is categorized into personal or general category using semantic techniques. The knowledge retrieval part of the system on the other hand helps in retrieving the thoughts of a person whose knowledge has been uploaded regarding some particular subject. A knowledge upload service can be beneficial for each generation as the people of that generation can benefit from the thoughts of people of earlier generation. In this way precious human knowledge can be saved for the benefit of mankind. Normally AI is general which is not specific to a person but the system proposed here can be termed as a step towards PAI (Personal Artificial Intelligence).


INTRODUCTION
T he mind of a person is one of the most important parts of his body. It is human mind that has resulted in the advancement of technology for the betterment of the human society therefore its knowledge should be preserved. The knowledge of a human mind is its precious asset. AI tries to mimic the behavior of natural intelligence of human mind. Although there is progress in the field of AI but still there is a long road ahead to cover. The importance of AI can be estimated from the fact that almost all major technology companies are working on various AI technologies these days.
In this paper we propose use different AI techniques like Speech Recognition [1] and NLP (Natural Language Processing) [2] in order to upload knowledge of different people so that later on family, friends and general public can query from the uploaded knowledge and get their answers. Additionally, the paper has proposed using semantic techniques for text categorization purpose. Text categorization can be performed by two approaches. One approach is that of machine learning based which need creation of a training dataset. On the other hand, the approach utilized in this paper is semantic based that does not require the training dataset. The creation of the training dataset can be sometimes difficult and time consuming. Further it may require paid services of a data operator to accomplish the task. The semantic approach helps us in avoiding this dataset. This paper provides a brief overview of the proposed system architecture. It mentions different tools that can be utilized in Java in order to achieve the proposed goal.
Java is a platform independent programming language that is the same source code can run on different platforms like Microsoft Windows, Apple Mac OS X, and Linux etc.
The best case for the system is achieved when it passes the Turing Test while answering the queries.
It should be noted that our approach for preserving human knowledge is different from the approach that utilizes using complex machinery directly connected with the mind in order to transfer the knowledge. The negative effects of such an approach on human health cannot be neglected.
The rest of the paper is organized in this manner. Section 2 discusses the related work while the system architecture is discussed in section 3. Methodology covering the algorithms is discussed in section 4 while section 5 covers the evaluation part. Finally, the paper is concluded in section 6.

RELATED WORK
The subject of knowledge upload is very close to the mind upload concept. Research work has been done related to mind upload e.g. in [3] the author has discussed the advantages that can be achieved through mind upload. The concept of the Digital Mind in the paper covers both Human Mind as well as Artificial General Intelligence.
Similarly, the authors in [4] propose connecting two uploaded minds in order to get more advantages. The authors have discussed the benefits of this approach as well as the barriers in achieving it.
A much cited paper in the mind upload domain is [16] which provides the philosophical analysis of the mind uploading concept. This paper is an excerpt from "The

SYSTEM ARCHITECTURE
Our proposed system architecture consists of two parts.
The first part is termed as 'Knowledge Recording' while the second part can be termed as 'Knowledge Retrieval'.
The first part will be utilized by the person whose knowledge is to be uploaded while the second part will be used by persons who want to utilize the already uploaded knowledge by asking different questions in order to get good answers to their questions.Figs. 1-2 provide an overview of the proposed system for part one and part two respectively.
In part one or knowledge recording module different speeches of the person whose knowledge is to be uploaded will be recorded on different occasions. Fig. 3 in a screenshot of the knowledge upload demo application.
For each recorded speech the next step is the conversion of speech into text using speech recognition library. There  or 'General' categories as an individual speech can be at a broader level either personal like on some family related topic or general like political discussion. Next the categorized speech of the person will be uploaded to some server machine using the Internet. This completes part one of the proposed architecture.
In part two or knowledge retrieval module different people like family members, friends or general public would ask different questions from the uploaded knowledge of a person in his absence in order to get this person's opinion regarding these questions. In order to achieve this the questioner will record his query first. Later on using the procedure already discussed this query will be converted into text. Next we will determine whether the query is personal or general purpose. Suppose the query is general purpose then we will retrieve every general purpose speech text of the person whose knowledge is already uploaded and then apply semantic similarity on the speech text and the questioner query text. The speech text for which we have the maximum similarity score will be returned as the answer. Similar pattern will follow for personal type query. This completes part two of the proposed architecture. Fig. 4 in a screenshot of the knowledge retrieval demo application.

METHODOLOGY
We implemented our complete system in Java programming language which is one of the platform independent programming languages. First we name different packages along with the technologies that we utilized in our system (Table 1).
In order to simulate a testing environment, we made a collection of text for 13 different categories. The purpose of this text collection is to convert it into voice using the text to voice functionality on the Internet and to test the semantic categorization module of the system. Table 2 lists the categories names along with the internal numbering assigned to that category. We propose 80 plus categories for the semantic categorization system but Table 2 lists only some of the categories that are more relevant with the proposed system.
Next we list the main Algorithms only. Similarity can be of two types lexical or semantical in the broader sense.
According to the lexical similarity "dog" can be similar to "dog" only. On the other hand, under semantic similarity a "dog" can be similar to a "cat" as both are of type animal. Therefore, in some cases we can get better results using semantic similarity approach described in Algorithm-I. lexically same then in that case also the mapping is 1.
The functionality of Algorithm-III is utilized for semantic similarity.

EVALUATION
In this section we will try to evaluate the two modules described earlier. First we tested the semantic categorization part of the knowledge recording module.

Knowledge Upload Service Using Semantic Based Categorization
We utilized YAGO2s as a semantic knowledgebase for enrichment of tokens. YAGO2s is available for download in TTL (turtle) format. We need to convert the knowledgebase from TTL to TDB (Tuples Database) format.
Next using the text to speech functionality available on

FUTURE WORK
In future we want to add support for Japanese language in the system. Japanese is the mother tongue of the main author. Our current principal supported language is English. Availability of tools for Japanese language is lesser in comparison with English language. We need to find out high accuracy libraries in Java for speech recognition and NLP of Japanese language. In addition to the above we can use linked data for semantic based text categorization. Linked open data includes upper level ontologies connected with domain specific knowledgebase that help in covering more domains.
Extensive system evaluation is also one of the important future tasks. Additionally, the overall system accuracy need to be improved.