package "tok" version "0.1" usage "tok [OPTIONS]" purpose "tok transforms raw text into UTT format." description "OPTIONS" option "interactive" i "Interactive mode (no output buffering)." flag off text " DESCRIPTION tok reads from standard input, identifies tokens on the basis of their orthographic form and writes a sequence of segments in UTT format to the standard output. OUTPUT FORMAT UTT-file with four fields: START, LENGTH, TYPE, and FORM. In the TYPE field five types of tokens are distinguished: W (word) - continuous sequence of letters N (number) - continuous sequence of digits S (space) - continuous sequence of space characters P (punctuation) - single printable character other than W, N, S B (unprintable character) - single unprintable character USAGE EXAMPLE tok "