Real-world text mining using machine learning

в 9:39, , рубрики: machine learning, sentiment analysis, text mining, Мероприятия, метки: , ,

21-ого апреля 2012 в рамках семинара по Автоматической обработке естественного языка состоится выступление Яна Жижки (Mendel University, Брно, Чехия).
Он прочитает доклад об использовании машинного обучения для извлечения информации из текстов. Будут рассмотрены применение различных алгоритмов и интерпретация результатов.
Отдельно будут показаны результаты по применению этих методов к реальным данным на примере анализа пользовательских отзывов на отели.
Доклад будет прочитан на английском языке.

Аннотация от докладчика:

Today, huge volumes of text data are available, especially on the Internet. Very often, the data is not structured and the text is freely written by the Internet users in natural languages. Such the data is expected to contain interesting or valuable information that can be used for different goals in a lot of application areas. Because the data is too big, it is very difficult or impossible to process it «manually» within an acceptable time. Fortunately, modern informatics procedures and methods enable us to apply sophisticated methods included in artificial intelligence, especially the set of algorithms called machine learning. Machine learning methods applied to text mining are based on the inductive learning from existing examples.

In the first part, the talk deals with a brief introduction to some machine learning methods applied to text mining. The main problems are connected with the appropriate preprocessing of the data, designing the mining procedure including selection of suitable algorithms and interpreting the results.

In the second part, some interesting results obtained from the real-world data will be presented. The data represents opinions/sentiments of customers' reviews relating to services provided by hotel accommodation all over the world. The reviews are written by hundreds of thousands of customers in many languages. The focus of the described research was on revealing typical words and phrases in several languages, including English, Spanish, French, German, Japanese, Russian, Czech, and others.

Возможно, нам удастся поднять он-лайн трансляцию. Перед семинаром об этом будет объявлено в твиттере семинара:!/nlpseminar

Автор: tlando

* - обязательные к заполнению поля