Last change
on this file since b97a556 was
e0cd003,
checked in by Tomasz Obrebski <to@…>, 12 years ago
|
wsp�lny parametr -e usuni�ty
wyg�adzone teksty help
|
-
Property mode set to
100644
|
File size:
813 bytes
|
Line | |
---|
1 | package "tok" |
---|
2 | version "0.1" |
---|
3 | usage "tok [OPTIONS]" |
---|
4 | purpose "tok transforms raw text into UTT format." |
---|
5 | |
---|
6 | description "OPTIONS" |
---|
7 | |
---|
8 | option "interactive" i "Interactive mode (no output buffering)." flag off |
---|
9 | |
---|
10 | text " |
---|
11 | DESCRIPTION |
---|
12 | |
---|
13 | tok reads from standard input, identifies tokens on the basis of their orthographic form and writes a sequence of segments in UTT format to |
---|
14 | the standard output. |
---|
15 | |
---|
16 | OUTPUT FORMAT |
---|
17 | |
---|
18 | UTT-file with four fields: START, LENGTH, TYPE, and FORM. In the TYPE field five types of tokens are distinguished: |
---|
19 | |
---|
20 | W (word) - continuous sequence of letters |
---|
21 | N (number) - continuous sequence of digits |
---|
22 | S (space) - continuous sequence of space characters |
---|
23 | P (punctuation) - single printable character other than W, N, S |
---|
24 | B (unprintable character) - single unprintable character |
---|
25 | |
---|
26 | USAGE EXAMPLE |
---|
27 | |
---|
28 | tok |
---|
29 | " |
---|
Note: See
TracBrowser
for help on using the repository browser.