Вход Регистрация
Файл: htmlpurifier-4.3.0/docs/proposal-plists.txt
Строк: 372
THE UNIVERSAL DESIGN PATTERN: PROPERTIES Steve Yegge Implementation:
get(name) put(name, value) has(name) remove(name)
iteration, with filtering [this will be our namespaces]
parent Representations: - Keys are strings - It's nice to not
need to quote keys (if we formulate our own language, consider
this) - Property not present representation (key missing) -
Frequent removal/re-add may have null help. If null is valid, use
another value. (PHP semantics are weird here) Data structures: -
LinkedHashMap is wonderful (O(1) access and maintains order) - Using a
special property that points to the parent is usual - Multiple
inheritance possible, need rules for which to lookup first - Iterative
inheritance is best - Consider performance! Deletion - Tricky
problem with inheritance - Distinguish between "not found"
and "look in my parent for the property" [Maybe HTML
Purifier won't allow deletion] Read/write asymmetry (it's
correct!) Read-only plists - Allow ability to freeze [this is what
we have already] - Don't overuse it Performance: - Intern
strings (PHP does this already) - Don't be case-insensitive - If
all properties in a plist are known a-priori, you can use a
"perfect" hash function. Often overkill. -
Copy-on-read caching "plundering" reduces lookup, but uses memory
and can grow stale. Use as last resort. - Refactoring to
fields. Watch for API compatibility, system complexity, and lack of
flexibility. - Refrigerator: external data-structure to hold
plists Transient properties: [Don't need to worry about this]
- Use a separate plist for transient properties - Non-numeric
override; numeric should ADD - Deletion: removeTransientProperty() and
transientlyRemoveProperty() Persistence: - XML/JSON are good -
Text-based is good for readability, maintainability and bootstrapping
- Compressed binary format for network transport [not necessary] -
RDBMS or XML database Querying: [not relevant] - XML database is
nice for XPath/XQuery - jQuery for JSON - Just load it all into a
program Backfills/Data integrity: - Use usual methods - Lazy
backfill is a nice hack Type systems: - Flags: ReadOnly, Permanent,
DontEnum - Typed properties isn't that useful [It's also Not-PHP]
- Seperate meta-list of directive properties IS useful - Duck typing
is useful for systems designed fully around properties
pattern Trade-off: + Flexibility + Extensibility +
Unit-testing/prototype-speed - Performance - Data integrity
- Navagability/Query-ability - Reversability (hard to go back) HTML
Purifier We are not happy with our current system of defining
configuration directives, because it has become clear that things will get
a lot nicer if we allow multiple namespaces, and there are some features
that naturally lend themselves to inheritance, which we do not really
support well. One of the considered implementation changes would be to
go from a structure like: array( 'Namespace' => array(
'Directive' => 'val1', 'Directive2' => 'val2',
) ) to: array( 'Namespace.Directive' => 'val1',
'Namespace.Directive2' => 'val2', ) The below implementation takes
more memory, however, and it makes it a bit complicated to grab all values
from a namespace. The alternate implementation choice is to allow nested
plists. This keeps iteration easy, but is problematic for inheritance (it
would be difficult to distinguish a plist from an array) and retrieval
(when specifying multiple namespaces we would need some multiple
de-referencing). ---- We can bite the performance hit, and just do
iteration with filter (the strncmp call should be relatively cheap). Then,
users should be able to optimize doing something like: $config =
HTMLPurifier_Config::createDefault(); if (!file_exists('config.php')) {
// set up $config $config->save('config.php'); } else {
$config->load('config.php'); } Or maybe memcache, or something. This
means that "// set up $config" must not have any dynamic parts,
or the user has to invalidate the cache when they do update it. We have to
think about this a little more carefully; the file call might be more
expensive. ---- This might get expensive, however, when we actually
care about iterating over the configuration and want the actual values. So
what about nesting the lists? "ns.sub.directive" =>
values['ns']['sub']['directive'] We can distinguish between plists and
arrays by using ArrayObjects for the plists, and regular arrays for the
arrays? Alternatively, use ArrayObjects for the arrays, and regular arrays
for the plists. ---- Implementation demands, and what has caused
them: 1. DefinitionCache, the HTML, CSS and URI namespaces have caches
attached to them Results: - getBatchSerial() - getBatch()
: in general, the ability to traverse just a namespace 2.
AutoFormat/Filter, this is a plugin architecture, directives not
hard-coded - getBatch() 3. Configuration form - Namespaces
used to organize directives Other than that, we have a pure plist.
PERHAPS we should maintain separate things for these different
demands. Issue 2: Directives for configuring the plugins are regular
plists, but when enabling them, while it's "plist-ish", what
you're really doing is adding them to an array of
"autoformatters"/"filters" to enable. We can
setup magic BC as well as in the new interface, but there should also be
an add('AutoFormat', 'AutoParagraph'); which does the right thing. One
thing to consider is whether or not inheritance rules will apply to
these. I'd say yes. That means that they're still plisty, in fact, the
underlying implementation will probably be a plist. However, they will get
their OWN plists, and will NOT support nesting. Issue 1: Our current
implementation is generally not efficient; md5(serialize($foo)) is pretty
expensive. So, I don't think there will be any problems if it gets
"less" efficient, as long as we give users a properly fast
alternative; DefinitionRev gives us a way to do this, by simply telling
the user they must update it whenever they update Configuration directives
as well. (There are obvious BC concerns here). In such a case, we
simply iterate over our plist (performing full retrievals for each value),
grab the entries we care about, and then serialize and hash. It's going to
be slow either way, due to the ability of plists to inherit. If we
ksort(), we don't have to traverse the entire array, however, the cost of
a ksort() call may not be worth it. At this point, last time, I started
worrying about the performance implications of allowing inheritance, and
wondering whether or not I wanted to squash the plist. At first blush, our
code might be under the assumption that accessing properties is cheap; but
actually we prefer to copy out the value into a member variable if it's
going to be used many times. With this is mind I don't think CPU
consumption from a few nested function calls is going to be a problem. We
*are* going to enforce a function only interface. The next issue at hand
is how we're going to manage the "special" plists, which should
still be able to be inherited. Basically, it means that multiple plists
would be attached to the configuration object, which is not the best for
memory performance. The alternative is to keep them all in one big plist,
and then eat the one-time cost of traversing the entire plist to grab the
appropriate values. I think at this point we can write the generic
interface, and then set up separate plists if that ends up being necessary
for performance (it probably won't.) Now lets code our generic plist
implementation. ---- Iterating over the plist presents some problems.
The way we've chosen to solve this is to squash all of the
parents. ---- But I don't need iteration. vim: et sw=4 sts=4
Онлайн: 1
Реклама