اندازه گیری اطلاعات تبعیضی برای نمایش دانش انسانی
ترجمه نشده

اندازه گیری اطلاعات تبعیضی برای نمایش دانش انسانی

عنوان فارسی مقاله: بنیان مبتنی بر قانون قدرت برای اندازه گیری اطلاعات تبعیضی برای نمایش دانش انسانی
عنوان انگلیسی مقاله: Power law based foundation for the measurement of discrimination information for human knowledge representation
مجله/کنفرانس: سیستم های کامپیوتری نسل آینده-Future Generation Computer Systems
رشته های تحصیلی مرتبط: مهندسی فناوری اطلاعات، مهندسی کامپیوتر
گرایش های تحصیلی مرتبط: اینترنت و شبکه های گسترده
کلمات کلیدی فارسی: الگوریتم ها، طراحی، نظریه، اطلاعات تبعیضی، قانون قدرت، نظریه اطلاعات
کلمات کلیدی انگلیسی: Algorithms، Design، Theory، Discrimination information، Power law، Information theory
نوع نگارش مقاله: مقاله پژوهشی (Research Article)
شناسه دیجیتال (DOI): http://dx.doi.org/10.1016/j.future.2016.10.021
دانشگاه: The Third Research Institute of the Ministry of Public Security, 201142, Shanghai, China
صفحات مقاله انگلیسی: 11
ناشر: الزویر - Elsevier
نوع ارائه مقاله: ژورنال
نوع مقاله: ISI
سال انتشار مقاله: 2019
ایمپکت فاکتور: 7.007 در سال 2018
شاخص H_index: 93 در سال 2019
شاخص SJR: 0.835 در سال 2018
شناسه ISSN: 0167-739X
شاخص Quartile (چارک): Q1 در سال 2018
فرمت مقاله انگلیسی: PDF
وضعیت ترجمه: ترجمه نشده است
قیمت مقاله انگلیسی: رایگان
آیا این مقاله بیس است: خیر
کد محصول: E12085
فهرست مطالب (انگلیسی)

Abstract

1. Introduction

2. Related work

3. The power law function of keywords

4. DI and power law function: theoretic analysis

5. Identifying the general keywords

6. Identifying the minimum rank keyword

7. Computing DI

8. Document clustering based on DI

9. Conclusion

References

بخشی از مقاله (انگلیسی)

Abstract

The discrimination information (DI) of keyword plays an important role in information retrieval and data mining. However, the measurement of DI is still a challenge because the existing methods cannot leverage the contradiction between accuracy and complexity. In this paper, a new model is proposed, does not need any prior knowledge and the computing complexity is O(nm) for a collection of m documents with n keywords. Firstly, we define three types of keywords according to the document frequency spectrum, which divides the spectrum of keywords into two monotonically spectrums that can give a qualitative analysis of DI. Secondly, in order to decrease the complexity, the power law function of keywords’ document frequencies is built. Thirdly, we propose an algorithm to classify keywords by using the distances between the adjacent points on the linear regression line. Finally, a piecewise function is used for computing DI according to the monotonically spectrums, which transforms DI into a scalable value to be used directly, thereby reducing the computing complexity of DI significantly. Moreover, a new weighting scheme of keywords based on DI is employed for document clustering, which shows that DI has a good prospect on the information retrieval area.

Introduction

It has been widely recognized that different keyword possesses diverse discrimination information (DI) in a knowledge base system. For example, ‘‘Computer’’ possesses a lower DI than ‘‘CPU’’ in the computer field. ‘‘Example Learning’’ possesses a higher DI than ‘‘Intelligence’’ in the area of artificial intelligence. In reality, DI has a wide range of applications including semantic annotations for Web pages [1–3], discovery of semantic community [4–6], documents clustering/classification [7–9], e-learning technology [10,11,4], etc. In addition, DI is important for web search [12–15], which can be used for query expansion to help users find more relevant information. Therefore, how to compute DI is a basic problem for information retrieval and data mining. In [16], Salton et al. regarded DI as a measurement of the variation in the average similarity between documents in a collection. A good discriminator is an assigned keyword which can reduce the average similarity between documents. In contrast, a poor discriminator increases the inter-document similarity. Unfortunately, the computing complexity of DI is proportional to O(nm2 ) for a collection of m documents with n keywords, which is unpractical to be used directly for a collection containing large documents. Cai [17] uses information theory to compute DI. In that work, the discrimination information of a keyword refers to the amount of information conveyed by a keyword in support of a certain category of documents and rejecting other categories. An informative keyword should have a high capability of categorizing document.