Swe-Clarin Catalogue View

You are here

Home / Swe-Clarin Catalogue View

Swedish UD Treebank

  • Swedish UD Treebank is a syntactically annotated corpus which, as opposed to the Swedish Treebank, is annotated using universal dependencies. It is based on Talbanken and contains approx. 97,000 tokens.


  • Uplug is a collection of tools for text editing. The collection entails tools for word linking etc., used to create parallel corpora.

Uppsala Persian Corpus (UPC)

  • UPC is a large Persian corpus. The annotation is based on 31 different word classes. UPC contains over 2.7 million tokens.

Uppsala Persian Dependency Treebank (UPDT)

  • UPDT is a syntactic corpus annotated for dependency structure. The corpus is part of Uppsala Perian Corpus (UPC), which has been analysed using the MaltParser to get the dependency structure and then manually corrected. UPDT consists of 151,671 tokens.


  • WaveSurfer is an Open Source tool for sound visualization and manipulation.

Lexin: Swedish-Turkish Dictionary

  • Swedish-Turkish dictionary. Approx. 28,500 entries. 2004.


  • SALDO (Swedish Associative Thesaurus version 2) is an extensive lexicon resource for modern Swedish written language. Version 2.3.

Laws of 1734

  • The Swedish Laws of 1734.


  • News articles from 8 SIDOR. The material is sentence scrambled.

Af Soomaali 1971-79

  • Af Soomaali 1971-79. The material is sentence scrambled.