XML Converter for Tab-separated Text Version 1.0.1

2003 (c) CHIBA Shoju, All rights reserved.

This is a .NET Framework application, which converts tab-separated 
Unicode texts to XML data, thus creates structured text corpus. XML
Converter for Tab-separated Text also produce HTML version of the data,
which contains the detailed data table and the sentence list, where 
each word and sentence is linked to the relevant part of the data table
(HTML output is encoded with UTF-8).

The structure of the source text which XML Converter for Tab-separated 
Text accepts is the following:

 -Source text should consists of lines with word-level information 
     (word, lemma, phonetic transcription, and other), each separated 
     with Tab
 -Sentence boundary should be separated by a blank line, while paragraph 
     boundary again by two sequential blank lines
 -The first line of the text should be the headers of each columns, again
     each separated with Tab.

The simple sample text following shows how the text structure looks:

====Beginning of sample.txt====
word	lemm	gramm
This	this	DD1
is	be	VBZ
a	a	AT1
sample	sample	NN1
.	.	C_YSTP

This	this	DD1
is	be	VBZ
another	another	DD1
sample	sample	NN1
.	.	C_YSTP
====End of text====

XML Converter for Tab-separated Text accepts (CSV-like) Unicode Text 
auto-generated by Excel, which may include double quotations at the 
beginning/end of the columns if they contain comma (",").

Anyway, XML Converter for Tab-separated Text will convert the sample 
above to the following structured data (see sample.xml):

<?xml version="1.0" encoding="utf-8"?>
<body>
  <div>
    <p>
      <s>
        <w lemm="this" gramm="DD1">This</w>
        <w lemm="be" gramm="VBZ">is</w>
        <w lemm="a" gramm="AT1">a</w>
        <w lemm="sample" gramm="NN1">sample</w>
        <w lemm="." gramm="C_YSTP">.</w>
      </s>
      <s>
        <w lemm="this" gramm="DD1">This</w>
        <w lemm="be" gramm="VBZ">is</w>
        <w lemm="another" gramm="DD1">another</w>
        <w lemm="sample" gramm="NN1">sample</w>
        <w lemm="." gramm="C_YSTP">.</w>
      </s>
    </p>
  </div>
</body>

You can select in which Unicode format (UTF-8, UTF-16, UTF-16 big 
endian) you will save your conversion result and which column should 
be the content of w-element.

Note XML Converter for Tab-separated Text is not a general-purpose 
application: you can't modify the element names presented above (body,
div, p, s, w).

Support URI of XML Converter for Tab-separated Text is:
    http://www.fl.reitaku-u.ac.jp/~tools/
where you will find updated information on this application and download 
the newest binary & documents.

When you install this software, you will find this file and a more 
detailed document, XMLConverter.txt, which is currently written in 
Japanese.

Enjoy!

====
First Created 2003/03/03; Last Updated 2003/03/08
CHIBA Shoju, College of Foreign Languages, Reitaku University
e-mail: schiba@reitaku-u.ac.jp
