- Timestamp:
- 12/16/08 23:40:47 (16 years ago)
- Branches:
- master, help
- Children:
- 2969c84
- Parents:
- 9ace5d2
- git-author:
- walas <walas@…> (12/16/08 23:40:47)
- git-committer:
- walas <walas@…> (12/16/08 23:40:47)
- Location:
- app
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
app/doc/utt.texinfo
r9ace5d2 r2d89d4b 1750 1750 @end multitable 1751 1751 1752 [TODO] 1753 1754 (see mar's help 'mar -h' for some information) 1752 @subsection Description 1753 @code{mar} is a perl script, which matches given pattern on the utt-formated text 1754 and tags matching parts with any number of user-defined tags. 1755 1756 @subsection Command line options 1757 @table @code 1758 @parhelp 1759 @parversion 1760 1761 @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}} 1762 The search pattern. 1763 @item @b{@minus{}@minus{}action=@var{action}, @minus{}a @var{action} [p] [s] [P]} 1764 Perform only indicated actions. Where: 1765 @multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 1766 @item @code{p} @tab preprocess 1767 @item @code{s} @tab search 1768 @item @code{P} @tab postprocess 1769 @end multitable 1770 default: psP 1771 1772 @item @b{@minus{}@minus{}command} 1773 print generated sed command, then exit 1774 1775 @item @b{@minus{}@minus{}help, @minus{}h} 1776 print help, then exit 1777 1778 @item @b{@minus{}@minus{}version, @minus{}v} 1779 print version, then exit 1780 @end table 1781 @subsection Tokens in pattern 1782 @code{mar} pattern is based on @code{ser} patterns(see @pxref{ser pattern}). @code{mar} pattern is a @code{ser} pattern, 1783 in which you can add any number of matching tags, which will be printed in exacly the place, where 1784 they were placed in the pattern. A valid token starts with @@ which follows any number of alphanumeric 1785 characters. For example valid match tokens are: @@STARTMATCH @@ENDMATCH 1786 1787 Matching tokens can be placed between, before or after any of @code{ser} pattern terms. They don't have 1788 to be paritied. There can be any number of them in the pattern (zero or more). They don't have to be unique. 1789 They can be placed one after another. For example: 1790 1791 @multitable {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaa} 1792 @item @code{@@BOM lexeme(pomoc)} @tab place tag @b{BOM} before any form of the lexeme 'pomoc' 1793 @item @code{@@MATCH lexeme(pomoc) @@MATCH} @tab place tag @b{MATCH} before and after any form of the lexeme 'pomoc' 1794 @item @code{cat(<ADJ>) @@MATCH lexeme(pomoc) @@MATCH} @tab place tag @b{MATCH} before and after any form of the lexeme 'pomoc' which is followef by adjective 1795 @item @code{cat(<ADJ>) @@TAG @@BOM lexeme(pomoc) @@EOM} @tab place tags @b{TAG} and @b{BOM} before any form of the lexeme 'pomoc' which is followed by adjective and tag @b{EOM} after it 1796 @end multitable 1797 1798 (see mar's help 'mar -h' for some more information) 1799 1800 @subsection How mar works 1801 @code{mar} translates given @code{ser} pattern with @code{m4} macroprocessor to regular expression. Then it changes it into @code{sed} command script, which is then executed. 1802 1803 You can see translated sed script by using the @code{@minus{}@minus{}command} option. 1804 @subsection Limitations 1805 The complexity of computations performed by @code{mar} increases linearly with the number of placed tokens. So it is highly recommended not to place too much tokens. 1806 @subsection Requirements 1807 In order to run @code{mar}, the following programs must be installed in the system: 1808 1809 @itemize 1810 1811 @item @command{m4} 1812 @item @command{grep} 1813 @item @command{sed} 1814 1815 @end itemize 1816 1817 1755 1818 1756 1819 @c --------------------------------------------------------------------- 1757 1820 @c KOT 1758 1821 @c --------------------------------------------------------------------- 1759 1760 1822 1761 1823 @page -
app/src/mar/mar
radb4c8d r2d89d4b 10 10 #which is one of the parametres of the script 11 11 #contact: d287572@atos.wmid.amu.edu.pl, walasiek@gmail.com 12 13 my $version = '1.0'; 12 14 13 15 use lib "/usr/local/lib/utt"; … … 37 39 my $morfield='lem'; 38 40 my $tags=0; 41 my $show_version = 0; 39 42 40 43 #read configuration files########################### … … 90 93 "action=s" => \$action, 91 94 "help|h" => \$help, 92 "space|s" => \$explicit_space 95 "space|s" => \$explicit_space, 96 "version|v" => \$show_version, 93 97 ); 94 98 95 99 96 100 101 if($show_version){ 102 print "Version: $version\n"; 103 exit 0; 104 } 97 105 98 106 if($help) … … 103 111 Options: 104 112 --pattern -e PATTERN Pattern. 105 -- bos -E PATTERN Segment serving as sentence beginning marker. [TODO]113 --eos -E PATTERN Segment serving as sentence beginning marker. [TODO] 106 114 --macros=FILE Read macrodefinitions from FILE. [TODO] 107 115 --define=FILE Add macrodefinitions from FILE. [TODO] … … 110 118 s - search 111 119 P - postprocess 112 (default p gP)120 (default psP) 113 121 --command Print generated shell command and exit. 114 122 --help -h Print help. 123 --version -v Script version 115 124 116 125 In patern you can put any tag. Tags should begin with the @ character.
Note: See TracChangeset
for help on using the changeset viewer.