Please support our sponsors!
This service provided by BeBits: The Best Source of BeOS Software!
CharacterSetEncodingsForBFS
Although it might seem like once you have a file on BFS the encoding problem goes away, this is not the case. Although I personally convert (nearly) all files that touch my BFS partition to UTF-8, I have to (manually) do this every time I download a web page full of chinese lyrics for example. Also, if I keep a webpage on my desktop in the GB-2312 encoding, I have to manually set my browser to that encoding if I open it. Similarly I have to be careful not to open it in opera because opera 3.52 just doesn't understand that encoding at all.

In order for applications to be smart about encodings they need to be informed. Each application could try to examine the file and its contents and determine the encodings from that. This does not seem like an optimal solution by any means.

A possible solution

On BFS we have support for managing file content types via the mimetype property. There are other parts of BeOS that contribute to this support. Namely, the FileType Tracker add-on and the FileTypes application. My proposal is similar to others proposed on the GE list although some of those proposals are more extensive than this one.

  1. Have a character set encoding property.
    charset is probably a good a name as any. I haven't seen any similarly used properties.
  2. Add support for this property in the "identify" mechanism for OpenTracker.
    This would mean that if you "identify" a file or otherwise invoke that mechanism, OpenTracker would run a set of sniffer rules to determine the file's charset. This is only necessary for file's under the text/* mimetypes, if I am not mistaken. Such sniffer rules could possibly follow the same syntax as the mimetypes sniffer rules. There are some filename extensions that are used to indicate encodings as well although I am not sure that they are in widespread use. (Example: foo.U8.txt for UTF-8)
  3. Add support for changing the charset property via a Tracker add-on.
    This could also parallel the FileType Tracker add-on.
  4. Add support for updating the sniffer rules or file extensions via a ChararacterSetEncodings application, to parallel the FileTypes application.

Problems with the proposed solution

  1. It doesn't address files that may have multiple encodings. I believe that XML and even HTML supports multiple encodings for a single file.
    Here are possible workarounds: (roughly in order of increasing stinkiness)
    1. Store a special "multi-encoding" charset type for files that use multiple encodings.
    2. Store the first specified encoding type.
    3. Store the encoding type of the first text encountered.
    4. Store the most frequently used encoding type.

shatty!


IndexPage | TableOfContents

PAGE VISITS
774

LINKS HERE
CharacterSetEncodings
CharacterSetEncodingsForFFS

NEW PAGES
CrosscompilingFirefox
BuildingCairo
StoringDataInBetweenOSes
ScriptingBeosRuby
ScriptingBeosPython
HaikuOS
QemUwinbe
MinimalBeos
XpMBRoverwrite
SteveSakoman

RECENT CHANGES
CrosscompilingFirefox
HowTo
BuildingFirebird
BuildingCairo
BeCommunity
PlayGround
CorumIII
BeAcademic
SupportForMachinesAndArchitectures
BeOsReleases
Edit Page | Front Page | BeBits
Site content is in the public domain. Unless otherwise noted, everything else is copyright © 1999-2002 Fifth Ace Productions, LLC. All Rights Reserved.
For more legal trivia, take a gander at our
Legal Stuff page and our Privacy Statement.
Fifth Ace Productions