行业报告详情 - 行业报告数据库

行业分类

找到报告 1 篇当前为第 1 页共 1 页

云计算框架中的堵文本处理和挖掘

Plugging Text Processing and Mining in a Cloud Computing Framework

作者：Akil RajdhoMarenglen Biba 作者单位：Department of Computer Science, University of New York in Tirana, Tirana, Albania,School of Computing and Mathematical Sciences, University of Greenwich, London, UK 加工时间：2013-10-10 信息来源：科技报告（other）

索取原文[22 页]

关键词：电子信息；云计算；堵文本；文本挖掘；数据收集；数据存储
摘要：Computational methods have evolved over the years giving developers and researchers more sophisticated and faster ways to solve hard data processing tasks. However, with new data collecting and storage technologies, the amount of gathered data increases everyday making the analysis of it a more and more complex task. One of the main forms of storing data is plain unstructured text and one of the most common ways of analyzing this kind of data is through Text Mining. Text Mining is similar to other types of data mining but the problem is that differently from other forms of data that are properly structured (such as XML) in text mining data in the best case scenario is semi-structured. In order for them to derive valuable information, text mining systems have to execute a lot of complex natural language processing algorithms. In this chapter we focus on text processing tools dealing with stemming algorithms. Stemming is the step that deals with finding the stem (or root) of the word which is essential in every text processing procedure. Stemming algorithms are complex and require high computational effort. In this chapter we present an Apache Mahout plugin for a stemming algorithm making possible to execute the algorithm in a cloud computing environment. We investigate the performance of the algorithm in the cloud and show that the new approach significantly reduces the execution time of the original algorithm over a large dataset of text documents.

行业分类

友情链接

联系我们

QQ咨询

电话咨询

微信公众号

感谢访问