Package student.testingsupport
Class StringNormalizer
- All Implemented Interfaces:
Serializable
,Cloneable
,Iterable<StringNormalizer.NormalizerRule>
,Collection<StringNormalizer.NormalizerRule>
,List<StringNormalizer.NormalizerRule>
,RandomAccess
This class represents a programmable string "normalizing" engine that
can be used to convert strings into a canonical form, say, before
comparing strings for equality or something. Basically, a normalizer
is a list of zero or more rules, or transformations. The
normalize(String)
method can be used to apply the entire
set of transformations to a given string.
For example, you can build a string normalizer that replaces all
sequences of one or more whitespace characters by a single space
character, trims any leading or trailing space, and converts a
string to lower case. This class provides a number of predefined
transformations in the StringNormalizer.StandardRule
enumeration.
Some examples:
// An "identity" transformation that does nothing: StringNormalizer norm1 = new StringNormalizer(); // norm1.normalize(...) returns its argument unchanged // A "lower case" normalizer: StringNormalizer norm2 = new StringNormalizer( StringNormalizer.StandardRule.IGNORE_CAPITALIZATION); // norm2.normalize(...) returns a lower case version of its argument // self-explanatory: StringNormalizer norm3 = new StringNormalizer( StringNormalizer.StandardRule.IGNORE_CAPITALIZATION, StringNormalizer.StandardRule.IGNORE_PUNCTUATION); // A "standard" normalizer: StringNormalizer norm4 = new StringNormalizer(true); // norm4.normalize(...) returns its contents with all punctuation // characters removed, all letters converted to lower case, all // whitespace sequences replaced by single spaces, all MS-DOS or // Mac line terminators replaced by "\n"'s, and all leading and // trailing whitespace removed.
Note that string normalizers that contain multiple rules apply those
rules in order (i.e., in the order added, or the
List
order of this class). This may produce
inconsistent results if you are not careful when you add your rules.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
This interface defines what it means to be a normalizer rule: an object having an appropriateStringNormalizer.NormalizerRule.normalize(String)
method.static class
A highly reusable concrete implementation ofStringNormalizer.NormalizerRule
that applies a series ofregular expression
substitutions.static enum
This enumeration defines the set of predefined transformation rules. -
Field Summary
Fields inherited from class java.util.AbstractList
modCount
-
Constructor Summary
ConstructorsConstructorDescriptionCreates a new StringNormalizer object containing no rules (the "identity" normalizer).StringNormalizer
(boolean useStandardRules) Creates a new StringNormalizer object, optionally containing the standard set of rules.StringNormalizer
(Collection<? extends StringNormalizer.NormalizerRule> rules) Creates a new StringNormalizer object containing the given set of rules.Creates a new StringNormalizer object containing the given set of rules.Creates a new StringNormalizer object containing the given set of rules. -
Method Summary
Modifier and TypeMethodDescriptionboolean
Add the specified rule.void
Add the specified standard rule, as defined inStringNormalizer.StandardRule
.void
Add the standard set of rules.Normalize a string by applying a set of normalization rules (transformations).normalize
(String content, ArrayList<StringNormalizer.NormalizerRule> outParticipatingRules) Normalize a string by applying a set of normalization rules (transformations).void
Remove the specified standard rule, as defined inStringNormalizer.StandardRule
.Retrieve a standard rule by name.Methods inherited from class java.util.ArrayList
add, addAll, addAll, clear, clone, contains, ensureCapacity, equals, forEach, get, hashCode, indexOf, isEmpty, iterator, lastIndexOf, listIterator, listIterator, remove, remove, removeAll, removeIf, removeRange, replaceAll, retainAll, set, size, sort, spliterator, subList, toArray, toArray, trimToSize
Methods inherited from class java.util.AbstractCollection
containsAll, toString
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.util.Collection
parallelStream, stream, toArray
Methods inherited from interface java.util.List
containsAll
-
Constructor Details
-
StringNormalizer
public StringNormalizer()Creates a new StringNormalizer object containing no rules (the "identity" normalizer). -
StringNormalizer
public StringNormalizer(boolean useStandardRules) Creates a new StringNormalizer object, optionally containing the standard set of rules. The standard set is all those inStringNormalizer.StandardRule
exception the OPT_* rules.- Parameters:
useStandardRules
- If true, the set of standard (non-OPT_*) rules will be used. If false, an "identity" normalizer will be produced instead.
-
StringNormalizer
Creates a new StringNormalizer object containing the given set of rules.- Parameters:
rules
- a (variable-length) comma-separated sequence of rules to add
-
StringNormalizer
Creates a new StringNormalizer object containing the given set of rules.- Parameters:
rules
- a (variable-length) comma-separated sequence of rules to add
-
StringNormalizer
Creates a new StringNormalizer object containing the given set of rules.- Parameters:
rules
- a collection of rules to add (could be another StringNormalizer, or any other kind of collection)
-
-
Method Details
-
normalize
Normalize a string by applying a set of normalization rules (transformations).- Parameters:
content
- The string to transform- Returns:
- The result after all rules have been applied
-
normalize
public String normalize(String content, ArrayList<StringNormalizer.NormalizerRule> outParticipatingRules) Normalize a string by applying a set of normalization rules (transformations). When using this version of normalize, all rules must implement equals and hashCode methods.- Parameters:
content
- The string to transformoutParticipatingRules
- returns those rules that had an effect on the output. If the list is not empty, new rules are added and no rule is deleted from the list.- Returns:
- The result after all rules have been applied
-
addStandardRules
public void addStandardRules()Add the standard set of rules. The standard set is all those inStringNormalizer.StandardRule
exception the OPT_* rules. -
add
Add the specified standard rule, as defined inStringNormalizer.StandardRule
. Note that you can also use the inheritedList.add(Object)
method to add custom NormalizerRule objects.- Parameters:
rule
- The rule to add
-
add
Add the specified rule. For efficiency, only adds the rule if it is not already present in this normalizer.- Specified by:
add
in interfaceCollection<StringNormalizer.NormalizerRule>
- Specified by:
add
in interfaceList<StringNormalizer.NormalizerRule>
- Overrides:
add
in classArrayList<StringNormalizer.NormalizerRule>
- Parameters:
rule
- The rule to add- Returns:
- True if the rule was added, or false if it is already present
-
remove
Remove the specified standard rule, as defined inStringNormalizer.StandardRule
. Note that you can also use the inheritedList.remove(Object)
method to remove other kinds of NormalizerRule objects.- Parameters:
rule
- The rule to remove
-
standardRule
Retrieve a standard rule by name.- Parameters:
rule
- the rule to retrieve- Returns:
- The corresponding
StringNormalizer.NormalizerRule
-