Student Initiative Project
AY2015 GSDM Student Initiative Report
Group:Big Data
Utilization of Big Data in the Public Policy Domain: Text Mining for Social Policy Solving and National Security

2016/05/30

1)Keywords
Artificial Intelligence, Big data, Public policy
 
2)Main applicants (Name, Department, School, Title)
・KOVACEVIC, Goran, D1, Dept. of Electrical Engineering and Information Systems,
 Graduate School of Engineering
・HAYASHI, Teruaki, D2, Dept. of Systems Innovation, Graduate School of Engineering
・HONG, Giwon, M2, Dept. of Systems Innovation, Graduate School of Engineering
・HU, Fangyuan, D1, Dept. of Systems Innovation, Graduate School of Engineering
・MAEDA, Takashi Nicholas, M2, Dept. of Systems Innovation, Graduate School of
 
Engineering
・MUROGA, Kiho, D1, Division of Economics, Graduate School of Economics
・NONAKA, Naoki, D2, Dept. of Technology Management for Innovation, Graduate
 
School of Engineering
・SUGIMOTO, Aoi, D1, Dept. of Global Agricultural Sciences, Graduate School of
 Agricultural and Life Sciences
・SUZUKI, Takashi, D1, Dept. of Global Agricultural Sciences, Graduate School of
 Agricultural and Life Sciences
・UCHIDA, Gyo D1, Division of Economics, Graduate School of Economics
 
3)Project Faculty member(s)
・HANAI, Kazuyo, Graduate School of Public Policy, Project Research Associate
・YARIME, Masaru, Graduate School of Public Policy, Science & Technology Innovation
 governance, Project Associate Professor
 
4)Departments and schools related to this project
Graduate School of Agricultural and Life Sciences
Graduate School of Economics
Graduate School of Engineering
Graduate School of Public Policy
 
5)Content
【Background and objectives】
Recently, words like ‘big data’ or ‘artificial intelligence’ have attracted much attention. Ubiquitousness of the Internet and smart phones, which enable to track activities of individuals, is one reason behind it. Furthermore, ‘big data’ analysis contributes to find important perspectives for decision making in business. In addition, advancement in computer performance and analysis tools had a major impact. In academic fields, there is a wide variety of utilization of ‘big data’. For example, (1) Prediction of influenza pandemic by queries submitted to search engine [1], (2) Earthquake detection from Twitter [2], and (3) consumer analysis based on e-commerce data. Methodologies applied to those researches are widely applicable. And quantitative analysis of data, which is traditionally analyzed in qualitative way, can lead to new insights.
【Content and methods】
The project flow is written as follows. First, project members gathered and learned basic skills for text mining. Then we applied Text Mining method to the REF data explained below.
It is important for universities to understand the impact of research activity to society beyond academia, in order to keep up with social issues in cooperation with stakeholders. Research Excellence Framework(REF)data from United Kingdom (UK), consists of 6,670 case studies from different universities and provides opportunities for detailed analysis. Members tested how research activities in universities have an impact on the society and the connection between academic fields.
【Participants】
・KOVACEVIC, Goran, D1, Dept. of Electrical Engineering and Information Systems,
 Graduate School of Engineering
・HAYASHI, Teruaki, D2, Dept. of Systems Innovation, Graduate School of Engineering
・HONG, Giwon, M2, Dept. of Systems Innovation, Graduate School of Engineering
・HU, Fangyuan, D1, Dept. of Systems Innovation, Graduate School of Engineering
・MAEDA, Takashi Nicholas, M2, Dept. of Systems Innovation, Graduate School of
 Engineering
・MASUDA, Akiyuki, D1, Department of Systems Innovation, Graduate School of
 Engineering
・MIYANO, Sayumi, M1, Graduate Schools for Law and Politics, Graduate School of
 Legal and Political Studies
・MUROGA, Kiho, D1, Division of Economics, Graduate School of Economics
・NONAKA, Naoki, D2, Dept. of Technology Management for Innovation, Graduate
 School of Engineering
・SUGIMOTO, Aoi, D1, Dept. of Global Agricultural Sciences, Graduate School of
 Agricultural and Life Sciences
・SUZUKI, Takashi, D1, Dept. of Global Agricultural Sciences, Graduate School of
 Agricultural and Life Sciences
・UCHIDA, Gyo D1, Division of Economics, Graduate School of Economics
・WEI, Keiti, D1, Graduate Schools for Law and Politics, Graduate School of Legal and
  Political Studies
・YARIME, Masaru, Graduate School of Public Policy, Science & Technology Innovation
 governance, Project Associate Professor
・HANAI, Kazuyo, Graduate School of Public Policy, Project Research Associate
[1] Ginsberg, J., M. H.Mohebbi, R. S. Patel, L. Brammer, M. S. Smolinski, & L. Brilliant, (2009), “Detecting influenza epidemics using search engine query data”, Nature, 457 (7232), 1012-1014.
[2] Sakaki, T., M. Okazaki, & Y. Matsuo (2010), “Earthquake shakes Twitter users: real-time event detection by social sensors”, In Proceedings of the 19th international conference on World Wide Web, pp. 851-860, ACM.
 
6)Interdisciplinary and social aspect
By analyzing REF data, which provides detail information about how academic research has an impact on society in UK, we can gain perspectives on how we should approach social issues from various research fields.
 
7)Output
All the team members learned the basic methods of Text Mining and some application techniques of machine learning (Bag-of-words, tf-idf, word2vec, Latent Dirichlet Allocation, KeyGraph, and so force), and applied these methods to REF data. By applying them to real data, members learned the limitations and possibilities of analysis tools, and the importance of collaboration between humans and computers. The following is a part of our analysis and discussion.

Topic Analysis
Members extracted topics (agenda, title, keywords) of project reports, which are shown as sets of words. We compared the categories labeled by humans with categories calculated by computers using LDA. 166,692 words are extracted from 6,640 projects, and they were divided into 36 categories, which is the same number of categories labeled by humans. We got the list of probability and possible category of each project, and found the differences in labeling between humans and computers.

Relationship Analysis
Members analyzed the relationship among research fields shown in REF data using KeyGraph. We visualized 36 categories labeled by humans (Fig.1). We found that environment, medical, health, and medicine-related fields are closely correlated. However, business, law, society, and science-related fields have few links with others. Interdisciplinary fields such as sociology and area studies do not have many linkages with other fields. Art and design fields exist on the bridge between liberal arts and general sciences. This result suggests that a field, which is not connected to another, could cooperate with each other, and innovations between them may be expected.

SI_AY2015_Bigdata_1
Fig.1 Visualized Categories of REF data Using KeyGraph

 
8)Contribution to GSDM education
Contribution of this SIP to GSDM can be summarized as follows.
-Each member learned basic skills Text Mining, which is very hard to learn individually for people who are not familiar with programming. Text Mining skills provide opportunities of quantitative analysis to researchers who mainly work on qualitative research. This can be interpreted as an example of interdisciplinary collaboration, which is one of the key ideas of GSDM.
-Group members have continuously acted as group for six month. Most of the members involved continuously and proactively towards SIP project for relatively long term. Working as group continuously is one of the key factors for growing leaderships. This SIP is one of rare cases in GSDM activity in which students continuously and proactively acted. This track record can be a contribution to GSDM.
-Presented project achievements at IEL. Not only presenting our achievements but we also provided participants of IEL the chance to discuss new research possibility in form of workshop.
 
9)Expenses: None
 
【Photos】

SI_AY2015_Bigdata_1 SI_AY2015_Bigdata_2
SI_AY2015_Bigdata_3 SI_AY2015_Bigdata_4