A Study of New Media Information Analysis and Event Detection: Using Disaster as an Example
Shiu-Feng Shih
When disaster events occur, information diffusion and transmission need to be in real-time in order to exploit the information in disaster prevention and recovery. With the establishment of network infrastructure, mass media also joins the role of information providers of disaster events on the internet. However retrieved information through search engines often cannot reflect the status of a progressing disaster. Traditional channels such as disaster reaction centers also have difficulty handling the inpour of disaster information, and which is usually beyond the ability of human processing. Thus there is a need to develop new tools to quickly automate classification of information from new media, to provide reliable information to disaster reaction centers, and assist policy decision-making. In this study, we use the data during typhoon Morakot collected from five different channels. After word processing and content classification by experts, we observe the difference between these datasets by the frequency distribution, classification structures and word co-occurrence network. We use the vector space model to train the OAO-SVM classification model without considering speech and grammar, and evaluate the performance of automated classification. From the results, we found that the chronology of internet data can identify a number of stages throughout the progression of disasters, allowing us to oversee the development of the disaster through each channel. Through word relation in word co-occurrence network, experts use fewer repeating words and high heterogeneity than amateur writing channels. The training results of classifier from the OAO-SVM model indicate that channels maintained by experts perform better than amateur writing. The cross compare classifier has better performance for channels with the same properties. When we merge the same property channel dataset to train classifier, we found that when the training data quality is good enough, the classifier can have a good performance. If the data quality is not enough, you can increase amount of training data to improve classification performance. As a contribution of this research, we believe the techniques developed and results of the analysis can be used to design more efficient and accurate social sensors in the future.
Category: New Media Content Analysis