You are here: Encoding > Conversion > Utilities
Short URL: https://scripts.sil.org/RTF2SFM
RTF2SFM — converting styled Word documents to SFM
Bob Hallissy, 2009-10-16
RTF2SFM converts a styled Word .RTF file to UTF8-encoded SFM. Unlike the old SF Converter package, RTF2SFM correctly handles Unicode characters.
RTF2SFM is part of the SIL::RTF Perl module that provides an event-driven parser for examining or processing RTF files. The program is supplied either as a Perl module (requiring Perl 5.8 or later) or as a standalone Windows .EXE file.
New version: 1.10 (2009-10-16)
see history for details.
Contents
Synopsis
RTF2SFM [-s] [-q] [-c ControlFile] [-o outFile] [-a annotationFile] inFile
RTF2SFM -p [-o outFile]
RTF2SFM [-h] [-v]
Converts a styled Word .RTF file to UTF8-encoded SFM.
Options
- -c names an options configuration file
- -o names an output file (otherwise writes to STDOUT)
- -a names an output file to hold annotations (comments, revision tracking)
- -s suppress extra processing needed to convert Insert Symbol
- -p output the built-in control file to outFile or STDOUT.
- -q quiet mode (no % completion or 'done' message)
- -h output extended help message
- -v output version information.
If -c is not supplied, looks for an RTF2SFM.INI file in the current directory. If RTF2SFM.INI isn't found, uses a standard set of options (based on SFConverter). Note: .INI files are assumed to be UTF-8! The control file has syntax like a Windows .INI file. Use -p option or see Downloads for working examples.
Any residue (e.g., text in a style for which there is no sf tag defined, or unknown RTF destinations) is written to a residue file (named after the output file if -o supplied, else named residue.res).
Usage notes
Residue file
RTF2SFM always generates a residue file that contains information about things in the RTF that it didn't understand. Always review the residue file. Although some of the messages may not make much sense, most messages end with a [chapter:verse] reference (assuming such are present in the input) which will point you to the offending area of your RTF file.
There are two common kinds of residue: unidentified styles and unhandled destinations.
Unidentified styles
Messages such as:
Paragraph style 'Subtitle' has no associated SFM tag [31:30 ]
are likely to be important. What it means is that the document contains (near chapter 31 vs 30) text in a style ('Subtitle') that RTF2SFM doesn't know about. If your document is correct, then you need to enhance the control file for RTF2SFM to tell the program what should happen with text in this style.
Unhandled destinations
Another message you may see in the residue looks like:
UNHANDLED dest: '*xyzzy', '', '4' [10: 1]
end dest: '*xyzzy', '4' [10: 1]
RTF is, even after all these years, an evolving standard. New features of programs like Microsoft Word often mean new features in the RTF. Sometimes these come as new "destinations" that RTF2SFM doesn't know about, and this causes this type of message.
Most likely these errors can be ignored. But check the verses indicated to see if any important text is missing from the SFM file.
Also, I would appreciate receiving copies of RTF files (and the control file you are using, if any) that generate such messages. I will attempt to eliminate the offending destinations in future releases.
If you can't figure out the residue, you can contact me.
Processing of fields
If the font used for an inserted symbol has a known mapping to Unicode, then RTF2SFM will convert the symbol according to that mapping. Then, if the resultant Unicode character is one that has the mirrored property and the immediately preceding character was in a right-to-left run, RTF2SFM will surround the mirrored character with U+202D LEFT-TO-RIGHT OVERRIDE and U+202C POP DIRECTIONAL FORMATTING so it will display correctly. Currently the only font with a known mapping is the Symbol font.
If the font is unknown, then the symbol is mapped to the PUA area, specifically to the range U+F000 to U+F0FF.
Known problems
When parsing a file with changes tracked and there is some deleted text, the marker associated with that text may be output (even though there is no data after the marker).
Installation instructions
Please note that RTF2SFM is a command line utility. To use it you need to open up a Command window. You can set up shortcuts to the program if you like, but there is no pretty graphical user interface.
Standalone Windows Executable
If you want to use the standalone Windows executable, simply download it and put it in a folder on your PATH somewhere.
Note
The downloadable EXE files are not installers or setup programs — they are the actual program. Simply put the EXE in a directory (such as Windows) that is named on your PATH variable. To find out what directories are named on your path, start a command window and type PATH <return>. The directories will be delimited by semicolons.
SIL-RTF perl module
If you want to use the Perl source code rather than the stand-alone Windows executable, then you must have Perl 5.8 or later installed. Download the archive and unpack it to a temporary directory. Start a command shell in the SIL-RTF-1.5 subfolder and execute the command sequence:
perl makefile.pl
make
make install
Now you should be able to execute RTF2SFM from any command prompt.
If you don't have a make program, you might see if Microsoft still offers their older NMAKE15.EXE.
Disclaimers
The RTF2SFM program and the SIL::RTF module on which it is based are unreleased software and carry no warranties of any kind. Use at your own risk.
If you find or fix bugs then the author would appreciate hearing from you. See support for contact information.
Change history
1.10 |
2009-10-16 |
Changes to support converting typeset dictionaries (better handling of empty markers, adding space after SFMs due to character styles)
Added -q option |
1.9 |
2009-10-05 |
Quiet unhandled destination message re: pntext
Added -p option
Added support for [destinations] in INI |
1.8 |
2009-09-28 |
Quiet unhandled destination messages re: upr, *ud |
1.7 |
2009-03-27 |
Major update:
Insert v 1 when it is absent
Changed default control file to match USFM ver 2.2
Reorders things like s and r to be after c
Extended help
Remove anchor from textbefore handling so it can be more useful — now it can be an expression like [^0-9] to remove anything other than digits in the chapter & verse nums. Since this would break the existing .INI files for footnotes, I've made this behaviour controlled by a new option, textbeforeIsUnanchored, from the [options] section of the .INI file.
Use this capability, added regex to c and v textbefore to strip out all but digits
Removed {} from char style SFMs that have no endmarker (e.g. tr tc)
Quiet unhandled destinations: *defchp *defpap *themedata *colorschememapping *datastore *background
Fix bug that caused "Can't find Unicode property definition "Mirrored"" message. |
1.6 |
2007-09-07 |
Bug fixes:
Now handles MacRoman character set
Less residue from Word 2003 documents |
1.5 |
2006-02-16 |
Bug fixes:
Correction to footnote endings when not inline.
Better handling of whitespace around chapter and verse numbers.
Was completely omitting empty markers (e.g. b) |
1.4 |
2005-01-24 |
Support for UBS usfm (ver 2.0). Control file for usfm available in ControlFileExamples.zip.
Can identify PT6 generated hard formatted superscript footnote callers; can also put the footnote caller literal into the SFM.
Normalizes style names to remove spaces after commas. |
1.3 |
2005-01-18 |
Allow embedded styles to be defined with an endtag, in which case the end tag is used to delimit text rather than enclosing the text in braces ...}
Warn about inappropriate styles or missing style defs in footnotes
Added [chapter:verse] to RESIDUE output to aid manual review
Remove Old Properties destinations (*oldcprops, *oldpprops, *oldtprops, *oldsprops) from residue
Provide version of EXE that supports CJK character sets |
1.1 |
2004-10-01 |
Support for a few archaic character sets added |
1.0 |
2004-09-22 |
Understands fields, but this can be suppressed (for slight speed improvement) by supplying -s
Removes escapes that preceded some symbols in output
Can supply Paratext footnote caller character via .INI |
0.8 |
2004-08-04 |
Rewritten to utilize Perl 5.8 Unicode facilities. Will no longer run on pre-5.8 Perl versions. RTF2SFM users should be unaffected, but if you are using the RTF parser for other programs, check documentations for changes
Removed *pgptbl from residue (by skipping)
Detect and warn about missing input file
Detect and warn about paragraph styles used but not in configuration. |
0.7 |
2003-11-03 |
Paratext-compatible footnotes; fix ZWNJ problem |
0.6 |
2003-10-02 |
Did some cleanup on the residue file |
0.5 |
2003-07-04 |
First version posted here |
Downloads
| Standalone Windows executable of RTF2SFM program (ver 1.10) Bob Hallissy, 2009-10-16 Download "RTF2SFM.exe", Windows application, 3MB [2643 downloads] |
| Same as above but includes mappings for Chinese, Japanese and Korean character sets and is, as a result, a larger download. Bob Hallissy, 2009-10-16 Download "RTF2SFM-full.exe", Windows application, 4MB [2419 downloads] |
| Example RTF2SFM control files, including SFConverter.ini and several USFM versions (2.0 through 2.2) Bob Hallissy, 2009-03-27 Download "ControlFileExamples.zip", ZIP archive, 24KB [2181 downloads] |
To obtain the Perl source module, view the public Subversion repository or download the tarball.
Previous versions
| Standalone Windows executable of RTF2SFM program (ver 1.9) Bob Hallissy, 2009-10-05 Download "RTF2SFM.exe", Windows application, 3MB [2314 downloads] |
| Same as above but includes mappings for Chinese, Japanese and Korean character sets and is, as a result, a larger download. Bob Hallissy, 2009-10-05 Download "RTF2SFM-full.exe", Windows application, 4MB [2335 downloads] |
| Standalone Windows executable of RTF2SFM program (ver 1.8) Bob Hallissy, 2009-09-28 Download "RTF2SFM.exe", Windows application, 3MB [1943 downloads] |
| Same as above but includes mappings for Chinese, Japanese and Korean character sets and is, as a result, a larger download. Bob Hallissy, 2009-09-28 Download "RTF2SFM-full.exe", Windows application, 4MB [2214 downloads] |
| Standalone Windows executable of RTF2SFM program (ver 1.7 beta) Bob Hallissy, 2009-03-27 Download "RTF2SFM.exe", Windows application, 3MB [2391 downloads] |
| Same as above but includes mappings for Chinese, Japanese and Korean character sets and is, as a result, a larger download. Bob Hallissy, 2009-03-27 Download "RTF2SFM-full.exe", Windows application, 4MB [2370 downloads] |
| Standalone Windows executable of RTF2SFM program (ver 1.6) Bob Hallissy, 2007-09-07 Download "RTF2SFM.exe", Windows application, 2MB [2377 downloads] |
| Same as above but includes mappings for Chinese, Japanese and Korean character sets and is, as a result, a larger download. Bob Hallissy, 2007-09-07 Download "RTF2SFM-full.exe", Windows application, 4MB [2416 downloads] |
| Example RTF2SFM control files, including DefaultControlFile.ini (matches RTF2SFM default) and USFM (versions 2.0, 2.05, and 2.1) Bob Hallissy, 2007-09-07 Download "ControlFileExamples.zip", ZIP archive, 11KB [2876 downloads] |
| Standalone Windows executable of RTF2SFM program (ver 1.5) Bob Hallissy, 2006-02-16 Download "RTF2SFM.exe", Windows application, 2MB [2678 downloads] |
| Standalone Windows executable of RTF2SFM program (ver 1.5) including CJK Bob Hallissy, 2006-02-16 Download "RTF2SFM-full.exe", Windows application, 3MB [2765 downloads] |
| Perl-based RTF parser SIL::RTF, including RTF2SFM program (ver 1.4) Bob Hallissy, 2005-01-24 Download "SIL-RTF-1.4.tar.gz", gzipped tar archive, 32KB [2544 downloads] |
| Standalone Windows executable of RTF2SFM program (ver 1.4) Bob Hallissy, 2005-01-24 Download "RTF2SFM.exe", Windows application, 2MB [2874 downloads] |
| Same as above but includes mappings for Chinese, Japanese and Korean character sets and is, as a result, a larger download. Bob Hallissy, 2005-01-24 Download "RTF2SFM-full.exe", Windows application, 3MB [3105 downloads] |
Support
Randy Hasty has written a tutorial on how to use this tool to convert RTF documents to SFM. Though written in terms of version 0.7, it may still be helpful.
As this program is provided at no cost, I am unable to provide a commercial level of personal technical support. I am interested in hearing from you, however, and will try to resolve problems that are reported to me. You can send feedback to me via a webform here. Alternatively, my email address looks like Вob_Нallissy@ѕіl.org (but cutting & pasting from this window into your emailer won't result in a working address — you will need to type it into your email program.)
Other resources
Microsoft Rich Text Format (RTF) specifications, versions 1.6 for older versions of Word, 1.7 for Word 2002, and 1.8 for Word 2003.
© 2003-2024 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Writing Systems Technology team (formerly known as NRSI). Read our Privacy Policy. Contact us here.