Opened 12 years ago

Last modified 8 years ago

#125 assigned new-feature

parsers -- html prettyprint

Reported by: Fred T. Hamster Owned by: bugdock
Priority: minor Milestone:
Component: feistymeow-nucleus Version:
Keywords: Cc:

Description

idea from web:

python pretty printing using lxml library:

from BeautifulSoup import BeautifulSoup as bs
root=lh.tostring(sliderRoot) #convert the generated HTML to a string
soup=bs(root) #make BeautifulSoup
prettyHTML=soup.prettify() #prettify the html

==========

how about a recursive descent parser,
which is actually kind of similar to a state machine, but
which allows returns rather than having to know the next state.

plain text state

< seen, go to gather tag state.
any thing else seen, emit char.

splitter on blocks of text to avoid too long lines?

gather tag forking state

/ seen, go to open closure tag
all chars up to space or > go into tag name buffer
space seen, go to gather attribs state

-> go to completed tag state

gather opener attribs state

space seen, ignore
chars seen, go to gather tag name

seen, go to completed owner tag state

gather tag name
(could be used by other states too)

take all non space cars into tag name accum
space ignore

seen, return

completed opener tag state

record tag by push on stack
emit gathered tag and attributes at approp indent level
indent level ++
go to plain text state

open closure tag state

completed closure tag state

Change History (1)

comment:1 by Fred T. Hamster, 8 years ago

Owner: changed from Fred T. Hamster to bugdock
Status: newassigned
Note: See TracTickets for help on using tickets.