اندازه گیری اطلاعات تبعیضی برای نمایش دانش انسانی

عنوان فارسی مقاله: بنیان مبتنی بر قانون قدرت برای اندازه گیری اطلاعات تبعیضی برای نمایش دانش انسانی

عنوان انگلیسی مقاله: Power law based foundation for the measurement of discrimination information for human knowledge representation

مجله/کنفرانس: سیستم های کامپیوتری نسل آینده-Future Generation Computer Systems

رشته های تحصیلی مرتبط: مهندسی فناوری اطلاعات، مهندسی کامپیوتر

گرایش های تحصیلی مرتبط: اینترنت و شبکه های گسترده

کلمات کلیدی فارسی: الگوریتم ها، طراحی، نظریه، اطلاعات تبعیضی، قانون قدرت، نظریه اطلاعات

کلمات کلیدی انگلیسی: Algorithms، Design، Theory، Discrimination information، Power law، Information theory

نوع نگارش مقاله: مقاله پژوهشی (Research Article)

شناسه دیجیتال (DOI): http://dx.doi.org/10.1016/j.future.2016.10.021

دانشگاه: The Third Research Institute of the Ministry of Public Security, 201142, Shanghai, China

صفحات مقاله انگلیسی: 11

ناشر: الزویر - Elsevier

نوع ارائه مقاله: ژورنال

نوع مقاله: ISI

سال انتشار مقاله: 2019

ایمپکت فاکتور: 7.007 در سال 2018

شاخص H_index: 93 در سال 2019

شاخص SJR: 0.835 در سال 2018

شناسه ISSN: 0167-739X

شاخص Quartile (چارک): Q1 در سال 2018

فرمت مقاله انگلیسی: PDF

وضعیت ترجمه: ترجمه نشده است

قیمت مقاله انگلیسی: رایگان

آیا این مقاله بیس است: خیر

کد محصول: E12085

فهرست مطالب (انگلیسی)

Abstract

1. Introduction

2. Related work

3. The power law function of keywords

4. DI and power law function: theoretic analysis

5. Identifying the general keywords

6. Identifying the minimum rank keyword

7. Computing DI

8. Document clustering based on DI

9. Conclusion

References

بخشی از مقاله (انگلیسی)

Abstract

The discrimination information (DI) of keyword plays an important role in information retrieval and data mining. However, the measurement of DI is still a challenge because the existing methods cannot leverage the contradiction between accuracy and complexity. In this paper, a new model is proposed, does not need any prior knowledge and the computing complexity is O(nm) for a collection of m documents with n keywords. Firstly, we define three types of keywords according to the document frequency spectrum, which divides the spectrum of keywords into two monotonically spectrums that can give a qualitative analysis of DI. Secondly, in order to decrease the complexity, the power law function of keywords’ document frequencies is built. Thirdly, we propose an algorithm to classify keywords by using the distances between the adjacent points on the linear regression line. Finally, a piecewise function is used for computing DI according to the monotonically spectrums, which transforms DI into a scalable value to be used directly, thereby reducing the computing complexity of DI significantly. Moreover, a new weighting scheme of keywords based on DI is employed for document clustering, which shows that DI has a good prospect on the information retrieval area.

Introduction

It has been widely recognized that different keyword possesses diverse discrimination information (DI) in a knowledge base system. For example, ‘‘Computer’’ possesses a lower DI than ‘‘CPU’’ in the computer field. ‘‘Example Learning’’ possesses a higher DI than ‘‘Intelligence’’ in the area of artificial intelligence. In reality, DI has a wide range of applications including semantic annotations for Web pages [1–3], discovery of semantic community [4–6], documents clustering/classification [7–9], e-learning technology [10,11,4], etc. In addition, DI is important for web search [12–15], which can be used for query expansion to help users find more relevant information. Therefore, how to compute DI is a basic problem for information retrieval and data mining. In [16], Salton et al. regarded DI as a measurement of the variation in the average similarity between documents in a collection. A good discriminator is an assigned keyword which can reduce the average similarity between documents. In contrast, a poor discriminator increases the inter-document similarity. Unfortunately, the computing complexity of DI is proportional to O(nm2 ) for a collection of m documents with n keywords, which is unpractical to be used directly for a collection containing large documents. Cai [17] uses information theory to compute DI. In that work, the discrimination information of a keyword refers to the amount of information conveyed by a keyword in support of a certain category of documents and rejecting other categories. An informative keyword should have a high capability of categorizing document.

دانلود رایگان مقاله انگلیسی

سفارش ترجمه این مقاله

مشاهده خریدهای قبلی

مقالات مشابه

نماد اعتماد الکترونیکی

پیوندها