Perpustakaan Unika Atma Jaya

Anda belum login :: 23 Nov 2024 04:14 WIB

Home

Logon

» »

Detail

Towards Morphological Resource of F-Words to Improve Automatic Censhorship System

Oleh:

Prihantoro

Jenis: Article from Proceeding
Dalam koleksi: KOLITA 13 : Konferensi Linguistik Tahunan Atma Jaya Ketiga Belas : Tingkat Internasional, Jakarta, 8-9 April 2015, page 193-198.
Topik: F-words; Corpus; Information Retrieval; Morphology; compound; Censorship
Fulltext: (193-198) Prihantoro - Towards Morphological . . . - 020415.pdf (125.64KB)

Ketersediaan

Perpustakaan PKBB
- Nomor Panggil: 406 KLA 13
- Non-tandon: tidak ada
- Tandon: 1

Lihat Detail Induk

Isi artikelF-word is a frequently used insult word, not only in spoken conversation but also in the written form. This paper is aimed at describing the morphological features of F-words, which might later be used to improve the accuracy of automatic retrieval as a machine-readable morphological resource. The resource can further be used to amplify the accuracy of censorship system of digital texts on line as in social media like Facebook, Twitter, Instagram and etc. The data is obtained the Corpus of Contemporary American English (COCA). I have used the asterisk metasymbol (*) as a wildcard to retrieve the morphological contexts of the F-word. COCA here successfully recognize inflections and derivations. However, it treats compound as a single word. This problem roots from the orthography where non-space and hyphenated compounds are considered as a token. Beyond POS, a compound module must be designed to overcome this problem; it will perform deep morphological analysis for character strings that are not present in simple lexical resource, and when they are licensed as compounds or multi word units, they will be treated separately.

Opini AndaKlik untuk menuliskan opini Anda tentang koleksi ini!

Kembali

Process time: 0.015625 second(s)