wiki:WikiStart

Welcome to UTT - UAM Text Tools

  1. What is UTT?
  2. Authors
  3. More on UTT
  4. License
  5. Getting the code
  6. Browsing the code online
  7. Contact

What is UTT?

UTT is a package of language processing tools developed at Adam Mickiewicz University. Its functionality includes:

  • tokenization,
  • dictionary-based morphological analysis,
  • heuristic morphological analysis of unknown words,
  • spelling correction,
  • pattern search,
  • sentence splitting,
  • generation of concordance tables,
  • syntactic parsing (undocumented though).

The toolkit is destined for processing of raw (not annotated) unrestricted text for any conceivable purpose.

The system is organized as a collection of command-line programs, each performing one operation, e.g. tokenization, lemmatization, spelling correction. The components are independent one from another, the unifying element being the uniform i/o file format.

The components may be combined in various ways to provide various text processing services. Also new components supplied by the used may be easily incorporated into the system provided that they respect the i/o file format conventions.

UTT component programs does not depend on any specific tagset or morphological description format.

Authors

  • Tomasz Obrębski
  • Michał Stolarski
  • Justyna Walkowska
  • Pawel Konieczka
  • Paweł Wereński (kor)
  • Marcin Walas (mar)
  • Mateusz Hromada (build system)
  • Maciej Prill (build system)
  • Krzysztof Szarzyński (lem/UTF8, compdic/UTF8)
  • Mateusz Boryga (documentation)

More on UTT

Mateusz Boryga, UTT Based on Examples

License

UTT is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

The Polex/PMDBF dictionary is licensed under the Creative Commons by-nc-sa License which prohibits commercial use.

Getting the code

UTT uses git for managing its code.

Assuming you have git installed, the following command in a terminal will fetch the most recent code for you:

git clone https://git.wmi.amu.edu.pl/obrebski/utt.git

(NOTE: Previous repository location was: git clone http://utt.wmi.amu.edu.pl/utt.git, It still exists, although it is no more up to date)

We are working on use of native git protocol.

"Official" release tarballs are not available at the moment.

Browsing the code online

Contact

  • trac: create ticket for suggestions, improvements, and bugs,
  • e-mail: obrebski at amu.edu.pl.
Last modified 7 years ago Last modified on 11/11/16 17:49:14