|
Computers & Writing Systems
You are here: Encoding > Conversion > Utilities SILConverters — Obsolete version
(updated: 2005-04-21) Obsolete version Please note that this product has been replaced by SILConverters 4.0 and you are strongly encouraged to use that product. This page is retained for those who, for whatever reasons, are unable to use the new version and require the older, unsupported, version. This package provides a system-wide repository for encoding converters and transliterators (TECkit, CC, or ICU based) and a simple COM interface to select and use a converter from the repository. It is easy to use from VBA or C++. An included VBA macro provides a simple interface to manage and use the repository, making it easy to convert any file (e.g. SFM texts, lexicons, and even Word documents) to a different encoding based on one or more TECkit maps and/or CC tables. The macro interface also provides the ability to install and remove user-developed converters to the repository. Installation and useJust extract the install set zip file on your hard disk (with the “Use folder names” switch selected) and run the Setup.exe program to install it. The package includes the TECkit, CC, and ICU runtime files so it does not require a separate installation of these packages. The user interface is relatively simple to master: Data Conversion Macro main window Notice three distinct areas:
Further information is in the zip file available below, together with the utility itself and a set of data files that it requires. Download
Frequently Asked QuestionsNote The Bulk Word document converter in the newest versions of SILConverters 4.0 resolve most of these issues. There is also an Office converter in the newest versions of SILConverters 4.0 which resolve other issues. Question: When I try to convert my file using the SILConverters 4.0 package some of the characters are not being converted. If I select the text that was not converted correctly and run the Unicode Word Macro “Show Unicode” I see that Word thinks every one of these is U+0028 LEFT PARENTHESIS. What is happening? Answer: Our SIL legacy fonts have traditionally been encoded as symbol fonts. People who want to convert their data to Unicode using the Microsoft Word/COM support for TECkit, CC, and ICU package have run into problems when they entered their data using . The characters which were inserted in this manner do not convert correctly. They are converted to U+0028 LEFT PARENTHESIS. Note Characters that were inserted into the Word document using are encoded in a way that the macro cannot identify the character.The way to fix this is to follow these steps:
Note The exact sequence of commands may vary depending on your version of Word. Bug: The WordPad solution above does not work as well as the OpenOffice solution. WordPad does not recognize autonumbering or footnotes and these will be lost in WordPad. Question: I have a lot of autonumbering in my document and do not have OpenOffice installed and do not want to use the above WordPad solution. Is there some other way to convert my legacy data to Unicode? Answer: You could still run the Microsoft Word/COM support for TECkit, CC, and ICU package macros on your data. First though, you will want to convert all the characters you inserted into the text with to the proper legacy codepoint. Let's say that the schwa is not converting properly because it was inserted with . In order to do this you will need to copy that character into the field (you cannot just type it in) and in the field you will want to either type in the correct Symbol encoding or go ahead and replace it with the proper Unicode character. In this case it would be U+0259 . This character will probably not show up properly until you format it with Doulos SIL. Finally, you should run the Microsoft Word/COM support for TECkit, CC, and ICU package macro on your data. Note This solution will not work if you are running Win98 (or versions of earlier than Word XP), because you cannot successfully search for any IPA93 characters, unless you feel up to some number crunching. Here is that solution: Take the decimal value of the character, add it to 61440 (= F000 hex), then use Alt-key typing (hold down Alt key, Question: I noticed in conversion that the IPA in the footnotes did not convert. And when I tried to do the conversion piece-by-piece funnier things happened. I selected one SILIPA93 character in a footnote, tried the Data Conversion macro, and the font fields came up blank. Any ideas what else I can do? Answer: The most likely solution is to make sure that you check the “Include footnotes” checkbox under “Scope of Change” in the dialog box. Otherwise, it may be that you have inserted them as endnotes. If you use the Word command to convert endnotes to footnotes, the macro will convert SILDoulos IPA93 text to Unicode. (But it won't convert the characters you inserted using ). Then you can run the Word command to convert the footnotes back to endnotes. This Word command is then click . This will convert your endnotes to footnotes if you have endnotes, and footnotes to endnotes if you have footnotes.Question: How do I use SILConverters to convert a Publisher document to Unicode? Answer: The easiest way is to open your legacy Publisher document. Go to the first text box containing data you wish to convert. Click . Wait a minute for it to copy your data and open Word. Do your Data Conversion as you would normally do. Then click the small x to close the document. It will then take the converted data back to your Publisher document and display it there. Repeat for each text box. Question: I have Standard Format Marker (sfm) text files and want to convert only one or two of the sf markers from an SIL IPA93 encoding to Unicode. Can I do that? Answer: is a command-line utility which can convert sfm files to and from Unicode. Various sfms can be converted with different mapping files.
Open this file in a text editor. Because your data will have different sfms and inline markers you will probably need to adapt it. To understand the file, look at this part of the code: <sfMarkers escape="" chars="abcdefghijklmnopqrstuvwxyz_ABCDEFGHIJKLMNOPQRSTUVWXYZ" mapping="ISO-8859-1"> <marker name="ph" mapping="silipa93"/> <marker name="pm" mapping="silipa93"/> </sfMarkers> The first line tells TECkit what characters form sfms and what mapping to use to convert the text of the markers to Unicode (i.e. I don’t want to change “p‿ to an IPA93 character). Then there is an exception list. So <marker name="ph" mapping="silipa93"/> is saying that for the sfm ph the silipa93 mapping file should be used. (It should also be used for pm.) You need to add a line for each of the sfms in your data that use the IPA93 encoding. If you have other sf markers that need a different mapping file, you can add another line, following this example, and use the appropriate sf marker (without the backslash) and mapping file name. Note: SFconv converts all the data in the file at one sweep, not just the IPA fields that you define. The default mapping tells it what to do with the rest of the file. Next we will look at inline markers. Take a look at the following lines in the xml file. <inlineMarkers escape="|" start="{" end="}" chars="abcdefghijklmnopqrstuvwxyz_ABCDEFGHIJKLMNOPQRSTUVWXYZ" mapping="ISO-8859-1"> <marker name="en" mapping="ISO-8859-1"/> <marker name="ip" mapping="silipa93"/> </inlineMarkers> The first line tells TECkit what characters form inline markers and what mapping to use to convert the text of the markers to Unicode (i.e. I don’t want to change “|ip” to an IPA93 character). (You can change the “|” to another character, but do not try to use < or >.) Then there is an exception list. So <marker name="ip" mapping="silipa93"/> is saying that for text after the inline marker |ip the silipa93 mapping file should be used. You need to add a line for each of the inline markers in your data that use the IPA93 encoding. If you have other inline markers that need a different mapping file, you can add another line, following this example, and use the appropriate inline marker (without the “|”) and mapping file name. Finally, to do the actual conversion from 8-bit to Unicode (UTF8), we need to go to the and type this (you will need to add the right path, depending on where you have put sfconv.exe):"Program FilesTECkitsfconv" -8u -utf8 -c IPA93-map.xml -i filename.sfm -o filename-utf8.txt filename-utf8.txt can be viewed in a Unicode text editor or in Word. Notice also that the actual markers should all be as expected, and the content of the ph,pm and |ip markers should be in Unicode IPA. If you want to do a test to see if it has done an accurate job of converting you can convert the Unicode (UTF8) file back to 8-bit text: "Program FilesTECkitsfconv" -u8 -utf8 -c IPA93-map.xml -i filename-utf8.txt -o test.sf If you do this, test.sf should be identical to filename.sfm. Note: Further information for SFconv can be found in the TECkit documentation (..TECkitdocumentationTECkit version 2.doc.pdf).
© 2003-2024 SIL International, all rights reserved, unless otherwise noted elsewhere on this page. |