Please support our sponsors!
This service provided by BeBits: The Best Source of BeOS Software!
ChararacterSetEncodingsForFFS
Foreign file systems pose problems with character set encodings as well. There are two types of problems. One relates to filenames and the other relates to file contents.

ForeignFileSystems and file names

In BFS, filenames are UTF-8. You can have virtually any character you like in a filename. This is great, especially for asian languages. However, in many filesystems the allowed character set is much smaller, or the encoding is different. On NTFS it is Unicode (UTF-16), for example. This poses a special case of the problem of name preservation when copying between file systems. If you copy a file to another file system that uses a different encoding, the name becomes garbage. Sometimes it is impossible to extract the original name from the name of the file on the new system. The other file system can even become unbootable from this process, if the OS is not tolerant. (This has happened to me.) Even worse, just attempting this can sometimes escort you into KernelDebuggingLand.

Possible solution

In this arena the encodings and the file systems have to work together. Here's how it could work:

  1. Each FileSystem would know which encoding it used for filenames.

  2. All file systems could accept UTF-8 names and attempt to store them in the proper encoding. (on some ForeignFileSystems this may not be possible. ex: ISO-9660)

  3. All file systems could convert their encoding to UTF-8 names when asked for filenames.

ForeignFileSystems and file contents

On BFS there are special problems related to ChararacterSetEncodingsForBFS. Once past these problems, one is still faced with a problem when moving to ForeignFileSystems. This is because another OS may expect a text file's contents to be in a particular encoding. This applies to text/plain, text/html, etc. This is usually less severe than the above problem. One can usually convert the file on the destination OS or manually convert it in BeOS first. This is a hassle though.

Possible solution

We'd like to make life easy for our users. It'd be great if we could automate this conversion. Here's how it could work:

  1. Each volume (contrast with FileSystem above) would have a preferred encoding for text file contents. This encoding could be determined by default from the FileSystem or perhaps we just use UTF-8 as a default.

  2. When copying from BFS to a ForeignFileSystem, the conversion utility would use the charset encoding specified in the charset property (see ChararacterSetEncodingsForBFS) and the preferred encoding for the destination FileSystem to perform an automatic conversion.

Problems with the proposed solution

  1. Where is the preferred encoding for the volume stored?

    The ideal place seems to be on the volume itself but there is hardly a natural place for it to exist in there. A more practical but less pleasing alternative is to store it in some sort of mount settings file, or as an option to the "mount" command.

  2. It's slower
    However this could probably be done pretty quickly. People don't generally toss around 700MB text files. Copying is generally an i/o bound process anyway and adding to the CPU overhead will probably not add to the copy time significantly although it could possibly impact other processes. This is probably worth the service cost, and could be turned off if the target is using the same encoding as the source.

  3. Sometimes these character set conversions are not reversable
    There can be a many to one mapping between character encodings, for example. This is a problem common to all CharacterSetConversions. It could be addressed in the GE CharacterSetConversions software.

  4. This doesn't address strings in non text-files.
    While this is generally a hard problem, it still doesn't make anyone happy when they copy their MP3 over to a new OS and the ID3 turns into garbage. Note: this is possibly just a problem with the ID3 spec or ID3 editors/viewers.

  5. It also doesn't address strings in multiply-encoded files.
    See ChararacterSetEncodingsForBFS.

  6. Sometimes files are internally annotated with their character set type.
    I believe HTML and XML support this. If we "helpfully" convert the internals without updating the internal annotation, the file will seem like garbage on the other end. Updating the internal annotation would require a parsing process knowledgable about these standards. This may be reasonable, especially since we would be already performing a rather drastic file manipulation procedure. Also this annotation is usually near the file's front. On the other hand, the file may be syntactically incorrect. What does one do in this situation? To stop a copying procedure for a syntax error seems ... odd. :-) (What to do in this case could be a general character set conversion preference.)

shatty!


PAGE VISITS
763

NEW PAGES
BuildingCairo
StoringDataInBetweenOSes
ScriptingBeosRuby
ScriptingBeosPython
HaikuOS
QemUwinbe
MinimalBeos
XpMBRoverwrite
SteveSakoman
MailingLists

RECENT CHANGES
BuildingFirebird
BuildingCairo
BeCommunity
PlayGround
CorumIII
BeAcademic
SupportForMachinesAndArchitectures
BeOsReleases
HowTo
HaikuOS
Edit Page | Front Page | BeBits
Site content is in the public domain. Unless otherwise noted, everything else is copyright © 1999-2002 Fifth Ace Productions, LLC. All Rights Reserved.
For more legal trivia, take a gander at our
Legal Stuff page and our Privacy Statement.
Fifth Ace Productions