Last change
on this file since 555c7f8 was
5f4d9c3,
checked in by Maciej Prill <mprill@…>, 13 years ago
|
Rewritten the build system, added lem UTF-8 version.
|
-
Property mode set to
100644
|
File size:
1.4 KB
|
Line | |
---|
1 | General information |
---|
2 | ********************* |
---|
3 | |
---|
4 | UAM Text Tools (UTT) is a package of language processing tools |
---|
5 | developed at Adam Mickiewicz University. Its functionality includes: |
---|
6 | * tokenization |
---|
7 | * dictionary-based morphological analysis |
---|
8 | * heuristic morphological analysis of unknown words |
---|
9 | * spelling correction |
---|
10 | * pattern search |
---|
11 | * sentence splitting |
---|
12 | * generation of concordance tables |
---|
13 | |
---|
14 | The toolkit is destined for processing of raw (not annotated) |
---|
15 | unrestricted text for any conceivable purpose. |
---|
16 | |
---|
17 | |
---|
18 | Installation |
---|
19 | ************** |
---|
20 | |
---|
21 | 1) unpack the UTT tar archive |
---|
22 | 2) in the same directory, unpack the tar archives of all UTT dictionary modules you have |
---|
23 | 3) run |
---|
24 | make install |
---|
25 | in the root directory of the installation |
---|
26 | 4) add the bin directory to the PATH variable |
---|
27 | |
---|
28 | |
---|
29 | Requirements |
---|
30 | ************* |
---|
31 | |
---|
32 | * File::HomeDir |
---|
33 | |
---|
34 | the Perl package File::HomeDir must be installed |
---|
35 | (to install the package, run 'perl -MCPAN -e shell' and write |
---|
36 | 'install File::HomeDir' after the 'cpan>' prompt appears) |
---|
37 | |
---|
38 | * flex |
---|
39 | |
---|
40 | to run the ser component, flex must be installed in your system |
---|
41 | |
---|
42 | * ruby |
---|
43 | |
---|
44 | to run the tre component, ruby must be installed in your system |
---|
45 | |
---|
46 | * locale pl_PL.iso-8852-2 |
---|
47 | |
---|
48 | the locales pl_PL.iso-8859-2 (pl_PL in short) must be installed |
---|
49 | and set while using UTT with the Polish module. The text you |
---|
50 | process with UTT must be encoded in iso-8859-2. |
---|
51 | |
---|
Note: See
TracBrowser
for help on using the repository browser.