Dr. Suat ATAN

Experiments on R, Python and Data Science

16 Jan 2020

Syrian Refugee Crisis in Turkey

News Analytics Report: Syrian Refugee Crisis in Turkey

Definition

Turkey now holds the world’s largest refugee population. According to a new World Bank Policy Brief, Turkey’s response to the Syrian Refugee Crisis and Road Ahead, the Turkish Government (GoT) puts the total number of registered Syrians under Temporary Protection (SuTPs) at 2,225,147 1.

In this preliminary analysis, I employed text mining tools. I collected 4056 news from various news outlets in Turkish about the Syrians by the special algorithm that I developed.

Objection

I intend to understand what’s main themes of this news. Are they any insights about this news? News analytics based on text mining may provide a landscape of media perception of “Syrians” and of course main themes of stories about them.

Tools and Techniques

  • R Language
  • RStudio IDE
  • Web Scraping Libraries
  • Text Mining libraries
  • SQLite Database

Number of news stories

db_default_adres = "db/suriyeliler.sqlite"
text_df <- derlenen_tum_haber_basliklari_df(db_adres = db_default_adres)
nrow(text_df)
## [1] 4056

Sources of News

Distribution of news among the news sources.

il_il_haber <- sayfa_url_bazinda_haber_sayilarini_getir(db_adres = db_default_adres)
il_il_haber %>% 
  ggplot(aes(x=reorder(sayfa_url,-n),y=n)) + geom_col()  +
  theme(axis.text.x = element_text(angle = 90)) +
  labs(x="News Source",y="Number of News")

Most Common Terms in All News

text_word <- basliklari_kelimelere_kir(text_df)
cok_gorulen_kelimeler <- derlenen_toplam_haberlerde_cok_gorulen_kelimeler(text_word)
cok_gorulen_kelimeler #%>% top_n(1000) %>% datatable()
## # A tibble: 295 x 2
##    kok         n
##    <fct>   <int>
##  1 suriyel  1151
##  2 göçme     662
##  3 kaçak     513
##  4 yakala    489
##  5 çocuk     373
##  6 iç        248
##  7 il        191
##  8 öldürül   170
##  9 savaş     169
## 10 ger       159
## # … with 285 more rows

News Vocabulary

nrow(cok_gorulen_kelimeler)
## [1] 295

Re-Categorization of the Most Common Terms

Diagram of Recategorized Terms

# en sağa kategori adında başlık oluştur.
# x ile işaretlenen ilgisiz kategori ayıklanmıştır
duzeltilmis_data <- read.xlsx("config/duzeltilecek_data_suriyeliler.xlsx",sheetName = "Sheet1")
duzeltilmis_data %>% 
  group_by(kategori) %>% 
  summarise(nn = sum(n)) %>% 
  arrange(desc(nn)) %>% 
  filter(kategori != "x") %>%
  ggplot(aes(x=reorder(kategori,-nn),y=nn)) + geom_col()+ theme(axis.text.x = element_text(angle = 90)) +
  labs(x="Category",y="Frequence")

Word Cloud

library("wordcloud2")
cok_gorulen_kelimeler_buyuk <- derlenen_toplam_haberlerde_cok_gorulen_kelimeler(text_word,esik=15)
cok_gorulen_kelimelerden_kelime_bulutu_uret(cok_gorulen_kelimeler_buyuk, azami_kelime_adedi = 300)

Conclusion

Naturally, the two most common terms are about immigrants and asylum seekers and war. These keywords are deterministically connected because this crisis is the ramification of ‘war’. The third most common term is Judicial Cases and Fugitive Unfortunately there are tons of stories about murders, deads, fight, corpse, funeral, etc. This issue stems from the social integration problems of refugees. Other conspicuous terms in the list are:

  • Child
  • Ship
  • School
  • Humanitarian Aid
  • Hospital
  • Istanbul

These terms reflect the main themes of this humanitarian crisis. Although the efforts to alleviate the problem, the humanitarian crisis lasts and seems like it affects the children.


comments powered by Disqus