A Toolbox for Field linguists: a package for processing electronic corpora of endangered languages

2003 (c) CHIBA Shoju, All rights reserved.

This is a .NET Framework application package, by which you can create and use 
Unicode & XML texts of endangered languages or of other field linguistic purposes.

This package contains the following applications:

1. Unicode-based IPA String Editor Version 1.0
2. SIL to Unicode IPA Font Converter Version 1.0
3. UniGrep: Unicode-compliant Grep Version 1.1
4. XML Converter for Tab-separated Text Version 1.0.1
5. Simple XML Data Viewer for Field Linguists Version 0.1*   *Note this version is still unstable!

All applications work under .NET Framework, so first you need to install Microsoft
.NET Framework. The install package of .NET Framework is available on the 
following URI:

English Version: http://msdn.microsoft.com/netframework/downloads/howtoget.asp
Japanese Version: http://www.microsoft.com/japan/msdn/netframework/prodinfo/getdotnet.asp

After installing .NET Framework, just download a self-extracting ZIP archive toolbox.exe 
from URI:

http://www.fl.reitaku-u.ac.jp/~schiba/tools/

and save the contents to the directory you specify. These applications are ready to 
use. Applications don't taint Windows registry database, so you can safely uninstall
the application by just deleting them from your PC.

On the details of each application, please consult the short introduction in this 
document enclosed below.

Finally, please note that these applications are "heavily" Unicode-oriented software. 
I checked the functions of them on Windows 2000 & XP, but not on Windows 98, 
98 SE, and ME, which handles Unicode in a very different way.

Enjoy!

First Created 2003/03/03; Last Updated 2003/03/08
CHIBA Shoju, College of Foreign Languages, Reitaku University
e-mail: schiba@reitaku-u.ac.jp

======
Unicode-based IPA String Editor Version 1.0

2003 (c) CHIBA Shoju, All rights reserved.

This is a .NET Framework application, which composes IPA strings in
Unicode, using an IPA chart-like interface.

Support URI is http://www.fl.reitaku-u.ac.jp/~tools/, where you will 
find updated information on this application and download the newest 
binary & documents.

When you install this software, you will find this file and a more 
detailed document, IPAStringEditor.txt, which is currently written in 
Japanese.

======
SIL to Unicode IPA Font Converter Version 1.0

2003 (c) CHIBA Shoju, All rights reserved.

This is a .NET Framework application, which converts RTF file with 
SIL IPA fonts to a Unicode file (UTF-8, UTF-16, UTF-16 big endian).

Support URI is http://www.fl.reitaku-u.ac.jp/~tools/, where you will 
find updated information on this application and download the newest 
binary & documents.

When you install this software, you will find this file and a more 
detailed document, SIL2Unicode.txt, which is written in Japanese.

======
UniGrep: Unicode-compliant Grep Version 1.1

2003 (c) CHIBA Shoju, All rights reserved.

This is a .NET Framework application, which searches Unicode texts and
show matched results either with or without line number. UniGrep under-
stands regular expressions.

Support URI is http://www.fl.reitaku-u.ac.jp/~tools/, where you will 
find updated information on this application and download the newest 
binary & documents.

When you install this software, you will find this file and a more 
detailed document, UniGrep.txt, which is currently written in Japanese.

======
XML Converter for Tab-separated Text Version 1.0.1

2003 (c) CHIBA Shoju, All rights reserved.

This is a .NET Framework application, which converts tab-separated 
Unicode texts to XML data, thus creates structured text corpus. XML
Converter for Tab-separated Text also produce HTML version of the data,
which contains the detailed data table and the sentence list, where 
each word and sentence is linked to the relevant part of the data table
(HTML output is encoded with UTF-8).

The structure of the source text which XML Converter for Tab-separated 
Text accepts is the following:

 -Source text should consists of lines with word-level information 
     (word, lemma, phonetic transcription, and other), each separated 
     with Tab
 -Sentence boundary should be separated by a blank line, while paragraph 
     boundary again by two sequential blank lines
 -The first line of the text should be the headers of each columns, again
     each separated with Tab.

The simple sample text following shows how the text structure looks:

====Beginning of sample.txt====
word	lemm	gramm
This	this	DD1
is	be	VBZ
a	a	AT1
sample	sample	NN1
.	.	C_YSTP

This	this	DD1
is	be	VBZ
another	another	DD1
sample	sample	NN1
.	.	C_YSTP
====End of text====

XML Converter for Tab-separated Text accepts (CSV-like) Unicode Text 
auto-generated by Excel, which may include double quotations at the 
beginning/end of the columns if they contain comma (",").

Anyway, XML Converter for Tab-separated Text will convert the sample 
above to the following structured data (see sample.xml):

<?xml version="1.0" encoding="utf-8"?>
<body>
  <div>
    <p>
      <s>
        <w lemm="this" gramm="DD1">This</w>
        <w lemm="be" gramm="VBZ">is</w>
        <w lemm="a" gramm="AT1">a</w>
        <w lemm="sample" gramm="NN1">sample</w>
        <w lemm="." gramm="C_YSTP">.</w>
      </s>
      <s>
        <w lemm="this" gramm="DD1">This</w>
        <w lemm="be" gramm="VBZ">is</w>
        <w lemm="another" gramm="DD1">another</w>
        <w lemm="sample" gramm="NN1">sample</w>
        <w lemm="." gramm="C_YSTP">.</w>
      </s>
    </p>
  </div>
</body>

You can select in which Unicode format (UTF-8, UTF-16, UTF-16 big 
endian) you will save your conversion result and which column should 
be the content of w-element.

Note XML Converter for Tab-separated Text is not a general-purpose 
application: you can't modify the element names presented above (body,
div, p, s, w).

Support URI of XML Converter for Tab-separated Text is:
    http://www.fl.reitaku-u.ac.jp/~tools/
where you will find updated information on this application and download 
the newest binary & documents.

When you install this software, you will find this file and a more 
detailed document, XMLConverter.txt, which is currently written in 
Japanese.

======
Simple XML Data Viewer for Field Linguists Version 0.1

2003 (c) CHIBA Shoju, All rights reserved.

IMPORTANT NOTICE: This is a unstable beta-version (Version 0.1), thus 
there will be unexpecting bugs and the data obtained through this software 
may contain uncorrect results.

Simple XML Data Viewer for Field Linguists (henceforce XML Data Viewer)
is a .NET Framework application, which views Unicode-based XML text 
corpus and search word using structural information described with XML 
format. XML Data Viewer facilitates Field Linguists' data analysis, 
enabling them to search patterns by attributive information (phonetic, 
grammatical, lemma, etc) attached to each word and to view search 
results quickly by eliminating miscellaneous information and showing
whole sentence including matched pattern.

Support URI is http://www.fl.reitaku-u.ac.jp/~tools/, where you will 
find updated information on this application and download the newest 
binary & documents.

When you install this software, you will find this file and a more 
detailed document, XMLViewer01.txt, which is currently written in Japanese.
