تجزیه جغرافیایی چند زبانه
ترجمه نشده

تجزیه جغرافیایی چند زبانه

عنوان فارسی مقاله: تجزیه جغرافیایی چند زبانه براساس ترجمه ماشینی
عنوان انگلیسی مقاله: Multi-lingual geoparsing based on machine translation
مجله/کنفرانس: سیستم های کامپیوتری نسل آینده-Future Generation Computer Systems
رشته های تحصیلی مرتبط: مهندسی کامپیوتر
گرایش های تحصیلی مرتبط: الگوریتم ها و محاسبات
کلمات کلیدی فارسی: شناسایی موجودیت نامدار، موقعیت، تجزیه جغرافیایی، چند زبانه، ترجمه ماشینی، صف بندی کلمه
کلمات کلیدی انگلیسی: Named entities recognition، Location، Geoparse، Multi-lingual، Machine translation، Word Alignment
نوع نگارش مقاله: مقاله پژوهشی (Research Article)
شناسه دیجیتال (DOI): http://dx.doi.org/10.1016/j.future.2017.07.057
دانشگاه: State Key Laboratory of Software Engineering, Computer School, Wuhan University, China
صفحات مقاله انگلیسی: 11
ناشر: الزویر - Elsevier
نوع ارائه مقاله: ژورنال
نوع مقاله: ISI
سال انتشار مقاله: 2019
ایمپکت فاکتور: 7.007 در سال 2018
شاخص H_index: 93 در سال 2019
شاخص SJR: 0.835 در سال 2018
شناسه ISSN: 0167-739X
شاخص Quartile (چارک): Q1 در سال 2018
فرمت مقاله انگلیسی: PDF
وضعیت ترجمه: ترجمه نشده است
قیمت مقاله انگلیسی: رایگان
آیا این مقاله بیس است: خیر
کد محصول: E12090
فهرست مطالب (انگلیسی)

Abstract

1. Introduction

2. Related work

3. Our multi-lingual geoparser, LanguageBridge

4. Data

5. Evaluation of our LanguageBridge prototype for multi-lingual geoparsing

6. Conclusion

Acknowledgments

References

بخشی از مقاله (انگلیسی)

Abstract

Our method for multi-lingual geoparsing uses monolingual tools and resources along with machine translation and alignment to return location words in many languages. Not only does our method save the time and cost of developing geoparsers for each language separately, but also it allows the possibility of a wide range of having a wide range of language capabilities within a single interface. We evaluated our method in our LanguageBridge prototype on location named entities using newswire, broadcast news and telephone conversations in English, Arabic and Chinese data from the Linguistic Data Consortium (LDC). Our results for geoparsing Chinese and Arabic text using our multi-lingual geoparsing method are comparable to our results for geoparsing English text with our English tools. Furthermore, our experiments using our tools on machine translation approach in accuracy results on results from the same data that was translated manually, further showing the robustness of locations to machine translation.

Introduction

Named Entity Recognition is central to many Natural Language Processing tasks, including information retrieval, question answering, data mining and text analysis. Often, finding named entities in different languages is approached by developing tools in each language separately. NLP tools for English are widely developed and used and can be downloaded easily on Internet. However, minority languages have little useful NLP tools, such as Mongol, Vietnamese and so on. In this paper, our method aims to reduce development time for Named Entity Recognition tools by processing in a single language via machine translation. We assume that our method extends to person and organization named entities, although our research focus is on named entities for location. Named entities for location. Named Entity Recognition typically encompasses named entities for person, organization and location. Our focus for experimentation is on named entities for location, which we alternately refer to as toponym. That is because our ultimate goal is to produce not only the locations, but also the geographic coordinates for each location. Our results can be displayed on a geographic map, if desired. Logic of method. The previous version of our English geoparser can find location named entities in high quality English text, as well as in English text produced by machine translation from other languages. Our method is based on a finding in our previous research that finding locations in Spanish tweets with a geoparser trained for Spanish was less accurate than geoparsing an English translation of the same Spanish tweets with a geoparser trained for English [1]. Similar results were found when using machine translation and English tools to find named entities in source texts in Swahili and Arabic [2]. In fact, statistical machine translation is often used for cross-language information retrieval [3].