BeBits Information Developer Central Submit Application Your Account Web Links Contact Us
BeBits
CharacterSetEncodingsForMarkup
Markup files can have multiple text encodings in the same file, delimited by tags. When doing MarkupParsing, one needs to use the appropriate text decoder. To do this, a markup parser could use the CharacterSetEncodings support directly by specifying a character decoder.

However, if the parser is not aware of character set encodings there are a few options:

  1. The text decoder used for the whole file could be used to decode the contained text. Since markup files are text, the tags themselves are subject to being encoded as well. Text contained by the tags can be decoded using the same decoder as a reasonable default.
  2. Alternatively, the text encoding could be auto-detected for each subsection of text. Some sort of priority scheme could break ties. For example, UTF-8 and ASCII are indistinguishable below 128. If the text is all in this range, one could have a priority given to UTF-8 for example.

shatty!


PAGE VISITS
1,127

LINKS HERE
MarkupParsing
ChararacterSetEncodingsForWeb
CharacterSetEncodingsForWeb

NEW PAGES
CrosscompilingFirefox
BuildingCairo
StoringDataInBetweenOSes
ScriptingBeosRuby
ScriptingBeosPython
HaikuOS
QemUwinbe
MinimalBeos
XpMBRoverwrite
SteveSakoman

RECENT CHANGES
CorumIII
BeUserProfiles
PhilipDybowski
BuildingFirebird
FrontPage
BeAcademic
CrosscompilingFirefox
HowTo
BuildingCairo
BeCommunity
Edit Page | Front Page | BeBits
Site content is in the public domain. Unless otherwise noted, everything else is copyright © 1999-2010 Haikuware. All Rights Reserved.
For more legal trivia, take a gander at our
Legal Stuff page and our Privacy Statement.