New tools for working with the ORAL series corpora of spoken Czech: AchSynku and Mluvkonk

David Lukeš, Prague
October 21st, 2015

Introduction

Two simple web-based tools for working with the ORAL series corpora of informal spoken Czech.

  1. AchSynku
    • compensates for lack of lemmatization in the corpora
  2. MluvKonk
    • intuitive multi-layer visualization of spoken concordances

→ Supplement the features of the standard KonText interface

Motivation (I)

  1. why still no lemmatization?
    • greater variation in spoken language → complement morphological dictionary first
    • unruly syntax (false starts, aposiopeses, apo koinou constructions) → harder to disambiguate
  2. structure of spoken language
    • informal interactions are multi-layered: many speakers taking turns, overlaps, back-channelling
    • classic concordance format based on linear structure of written language is inappropriate

Motivation (II)

A spoken corpus concordance inside KonText: