List of Language Tags

The Official BLIP Lab approved language tag list lives here. When you tag anything as a particular language please just pick from the following list of approved languages. Never use any other tags other than the tags on this list. Do not, for example, tag “Mandarin” as “Chinese”.

Every Utterance has to be tagged with a corresponding language tag without exception.

Contents


General Rules about Language Tagging

The transcription template file contains a Controlled Vocabulary of language and matrix language tags. The list will appear when a cell is generated in the Language or Matrix Tier of a given participant. Select the appropriate tag from the drop-down list.

tts_languagecv.png

For your quick reference, here is a table of the relevant language tags. Please scroll down for more detailed information.

Tag Description Examples
Standard Language Tag Normal Language Tags, such as “English” or “Mandarin”. Capitalize appropriately. Complete list found on the BLIP Lab Wiki. English, Mandarin, Tamil, Malay etc. 
Vocal Sounds Sounds made by the mouth but are not considered speech. “dadadadada”
Non Vocal Sounds Sounds that are not made by the mouth. :s:clicking, :s:clapping
Languageless Non-words that serve a pragmatic function linguistically. Does not belong to any specific language per se. Hmm, Um, Oh, Ooh, Mm, Ah
Red Dot Words used and understood by local speakers that may no longer be tagged to a particular language (especially its language of origin). Prata, Mammam (Eat), Sheeshee (Pee)

Back to table of contents

General Language Tags

Chinese Languages

  • Mandarin
  • Hokkien
  • Teochew
  • Hainanese
  • Cantonese
  • Hakka

Malay Languages

  • Malay
  • Other Bahasa (Javanese, Indonesian, etc.)

Indian Languages

  • Tamil
  • Malayalam
  • Telugu
  • Punjabi
  • Hindi

Other languages

  • Arabic

Back to table of contents

Vocal Sounds

  • Sounds such as dadadadadada are to be tagged as “Vocal Sounds”.
  • :v:laughing :v:gasp etc. are to be tagged as “Vocal Sounds” under the Language Tier as well. See List of Special Codes for more details.

Back to table of contents

Non-Vocal Sounds

  • Sounds such as clapping are classed as Non-Vocal Sounds. They will be marked as :s:clapping. See List of Special Codes for more details.

Back to table of contents

Languageless

  • As of 13 September 2021, the “Interjection” language tag has been replaced with a new “Languageless” tag.
  • Below are a list of Languageless words (non-exhaustive).
  • They should be marked in the Language tier as Languageless because it is not clear what language they are from.
  • Things like “hmm” and “mm” will be recorded as such (i.e., one or two ‘m’s). If it is especially long it can be recorded with a tilde at the end like “hmm~”
  • Note that an “eh” for an adult is a Languageless word, but for a baby it is to be marked as :v:vocalizations in the “Baby (Language) tier”. This is because the function of these two utterances are different.
  • ‘Oh/ooh’ on it’s own should be tagged as Languageless, but when paired up with a word (e.g., oh no, oh dear) it should be tagged as English.
  • ‘Uh oh’ should be tagged as English.

List of Languageless words (non-exhaustive)

All numbers of m’s are standard.

  • Hmm
  • Um
  • Oh
  • Ooh
  • Mm
  • Mhm
  • Uh huh
  • Ah
  • Eh

Below is a table (special thanks to Woon Fei Ting, Research Associate at BLIP Lab) which can help one to decide if a word should be tagged as “Languageless”.

languageless_decisiontree_20210914.png

If you are tagging a word as “Languageless”, it should look something like this:

languageless_example.png

languageless_example_differentchunk.png

Back to table of contents

Red Dot

Red Dot is a tag for words used and understood commonly by local speakers. These words may have originated from one of the local languages spoken in Singapore e.g., kopi originated from Malay. These words are now used widely by local speakers regardless of the speakers’ language backgrounds and it is difficult for us to classify it as belonging to one particular language.

For example, when a non-Malay-speaking speaker uses the word kopi, it should be tagged as Red Dot because this speaker is not speaking in or codeswitching to Malay. This speaker is just selecting a token from their speech repertoire.

However, when a Malay-speaking speaker uses kopi in a Malay utterance, it should be tagged as Malay.

Red Dot words are tagged as ‘Red Dot’ in the language tier, but their Matrix language will depend on the language spoken in, before, or after the utterance with the Red Dot word. In other words, ‘Red Dot’ cannot be a Matrix tier language tag.

When you see a Red Dot word, make a cell above the word and select Red Dot from the drop-down list:

languagetaglist_reddot_example_2022.PNG

List of Noted Red Dot Words

Word Meaning
Mammam Eat
Sheeshee Pee
Pompom Bathe
Prata Prata
Alamak Oh no/oh dear
Aiya* Exclamation to express surprise, displeasure, or frustration
Korkor, jiejie** Big brother, big sister
Didi, meimei** Younger brother, younger sister
Ah-gong, gonggong, yeye** Grandfather, or an elderly man
Ah-ma, popo, nainai ** Grandmother, or an elderly woman
Auntie A middle-aged or elderly woman who may or may not be a relative
Uncle A middle-aged or elderly man who may or may not be a relative
Ah-boy A general term for a young male child or a young guy
Ah-girl A general term for a young female child or a young lady

*Another exclamation, ‘aiyo’, should be tagged as Mandarin if used in a Mandarin utterance, and tagged as Tamil if used in a Tamil utterance. Similarly, if a Mandarin speaker uses it in an English utterance, it should be tagged as Mandarin. If a speaker of an Indian language uses it in an English utterance, it should be marked as a Tamil loan word.

**These instances should only be tagged as Red Dot if they do not sound like Mandarin/Cantonese based on the tone. If you are unsure, please do not hesitate to check with a full-time staff member who speaks Mandarin or Cantonese.

Do refer to our BLIP Dictionary for transcribers for more Red Dot Words.

Back to table of contents

Baby noises

  • :v:laughter
  • :v:crying
  • :v:vocalizations (babbling, cooing, includes screaming, whining, exclaimations or singing)
  • :v:airstream (burping, coughing etc.)

Back to table of contents

Unsure about what language you heard?

It is possible that you will hear languages that you are completely unfamiliar or only semi-familiar with. Using the following tags will indicate this, and will also help signal an appropriate speaker of that language to check the transcription afterwards.

  • #!#Chinese (i.e., sounds like Mandarin)
  • #!#Bahasa (i.e., sounds like Bahasa)
  • #!#Dialect (i.e., sounds like a Chinese dialect)
  • #!#Indian (i.e., sounds like an Indian language)
  • #!#? (i.e., completely unsure what language you are hearing. Add a comment to mark the cell/timestamp where the instance occurs.)

Back to table of contents