Opened 12 years ago
Last modified 8 years ago
#125 assigned new-feature
parsers -- html prettyprint
Reported by: | Fred T. Hamster | Owned by: | bugdock |
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | feistymeow-nucleus | Version: | |
Keywords: | Cc: |
Description
idea from web:
python pretty printing using lxml library:
from BeautifulSoup import BeautifulSoup as bs
root=lh.tostring(sliderRoot) #convert the generated HTML to a string
soup=bs(root) #make BeautifulSoup
prettyHTML=soup.prettify() #prettify the html
==========
how about a recursive descent parser,
which is actually kind of similar to a state machine, but
which allows returns rather than having to know the next state.
plain text state
< seen, go to gather tag state.
any thing else seen, emit char.
splitter on blocks of text to avoid too long lines?
gather tag forking state
/ seen, go to open closure tag
all chars up to space or > go into tag name buffer
space seen, go to gather attribs state
-> go to completed tag state
gather opener attribs state
space seen, ignore
chars seen, go to gather tag name
seen, go to completed owner tag state
gather tag name
(could be used by other states too)
take all non space cars into tag name accum
space ignore
seen, return
completed opener tag state
record tag by push on stack
emit gathered tag and attributes at approp indent level
indent level ++
go to plain text state
open closure tag state
completed closure tag state