实时音译使用InfoSphere Streams自定义Java操作符和ICU4J——用自定义的关于InfoSphere Streams的Java运营商集成Java音译模块
Real-time transliteration using InfoSphere Streams custom Java operator and ICU4J——Integrating a Java transliteration module with a custom Java operator of InfoSphere Streams
关键词:实时音译;Java操作符;ICU4J;InfoSphere Streams
摘 要:With the ever growing importance of Internet monitoring and sentiment analysis, there is an immediate need for identifying patterns (performing text analytics) in big data. However, one of the challenges during this exercise is that countries can have multiple languages that create a challenge for effectively running the text analytics, since rules are not available for all the languages. For example, in India, the official language of each state is different, and data is available in both English and local languages. This article describes how to bring about consistency during the transliteration process, and to use IBM? InfoSphere? Streams? to prepare linguistic data and apply text analytics or pattern recognition logic.