Last change
on this file since 56c300b was
5f4d9c3,
checked in by Maciej Prill <mprill@…>, 14 years ago
|
Rewritten the build system, added lem UTF-8 version.
|
-
Property mode set to
100644
|
File size:
1.4 KB
|
Rev | Line | |
---|
[5f4d9c3] | 1 | General information |
---|
| 2 | ********************* |
---|
| 3 | |
---|
| 4 | UAM Text Tools (UTT) is a package of language processing tools |
---|
| 5 | developed at Adam Mickiewicz University. Its functionality includes: |
---|
| 6 | * tokenization |
---|
| 7 | * dictionary-based morphological analysis |
---|
| 8 | * heuristic morphological analysis of unknown words |
---|
| 9 | * spelling correction |
---|
| 10 | * pattern search |
---|
| 11 | * sentence splitting |
---|
| 12 | * generation of concordance tables |
---|
| 13 | |
---|
| 14 | The toolkit is destined for processing of raw (not annotated) |
---|
| 15 | unrestricted text for any conceivable purpose. |
---|
| 16 | |
---|
| 17 | |
---|
| 18 | Installation |
---|
| 19 | ************** |
---|
| 20 | |
---|
| 21 | 1) unpack the UTT tar archive |
---|
| 22 | 2) in the same directory, unpack the tar archives of all UTT dictionary modules you have |
---|
| 23 | 3) run |
---|
| 24 | make install |
---|
| 25 | in the root directory of the installation |
---|
| 26 | 4) add the bin directory to the PATH variable |
---|
| 27 | |
---|
| 28 | |
---|
| 29 | Requirements |
---|
| 30 | ************* |
---|
| 31 | |
---|
| 32 | * File::HomeDir |
---|
| 33 | |
---|
| 34 | the Perl package File::HomeDir must be installed |
---|
| 35 | (to install the package, run 'perl -MCPAN -e shell' and write |
---|
| 36 | 'install File::HomeDir' after the 'cpan>' prompt appears) |
---|
| 37 | |
---|
| 38 | * flex |
---|
| 39 | |
---|
| 40 | to run the ser component, flex must be installed in your system |
---|
| 41 | |
---|
| 42 | * ruby |
---|
| 43 | |
---|
| 44 | to run the tre component, ruby must be installed in your system |
---|
| 45 | |
---|
| 46 | * locale pl_PL.iso-8852-2 |
---|
| 47 | |
---|
| 48 | the locales pl_PL.iso-8859-2 (pl_PL in short) must be installed |
---|
| 49 | and set while using UTT with the Polish module. The text you |
---|
| 50 | process with UTT must be encoded in iso-8859-2. |
---|
| 51 | |
---|
Note: See
TracBrowser
for help on using the repository browser.