Match snippets in an HTML document
Somebody has presented me with a very large list of copyedits to make to a
long HTML document. The edits are in the format:
"religious" should be "religions"
"their" should be "there"
"you must persistent" should be "you must be persistent"
The copyedits were typed by hand; in some cases, the "actual" value on the
left is not an exact match for the content on the right. The order of
edits is usually correct, but even that is not guaranteed.
It's a straightforward but very large task to apply these edits by hand to
the document. I'd like to automate the process as much as possible, e.g.
by automatically searching for the snippets.
In a long document like this, I can't just search for all instances of
"their" and replace them with "there." Sometimes "their" was used
correctly, just not in one particular instance.
In other words, I'm looking for a fuzzy text match, where the order of the
edits influences the search.
What's a good approach to a problem like this? I'm hoping that there's
some off-the-shelf open source project that can search for the snippets in
a fuzzy order.
No comments:
Post a Comment