SmartTrim (Movable Type plugin)

SmartTrim is a plugin for Movable Type that aims to provide a smarter trimming global modifier than words or trim_to.

The stock MT global modifier words cuts at a certain number of words without any control on the resulting length, it will also remove all HTML tags. trim_to cuts to a specific length but can cut a string in the middle of a word and can also output invalid HTML in the form of unclosed tags.

SmartTrim:

  • trims a string to a target character length without cutting words
  • will close any unclosed tags if the source contains HTML and HTML::Tidy is present (otherwise any HTML will be removed before trimming occurs)
  • can append an optional ending string like an ellipsis to the result

Syntax

Once SmartTrim is installed in the MT plugins directory, the global modifier smarttrim can be used with four parameters: a length value (required), a suffix as an ending string, a type and a list of options (those last three arguments are optional). E.g.:

<$mt:EntryBody smarttrim="[-|+]length"[,"suffix"[,"type"[,"options"]]]$>

length is an integer defining the target length. length can be prefixed with a + or - sign to influence the behavior of SmartTrim as follows:

  • "-length" will trim the string at the word ending immediately before or at the specified length, ensuring that the result has a length that is inferior or equal to length.
  • "+length" will include the word that would otherwise be cut at the specified length, if any.
  • If length is not prefixed by either - or + then the string is trimmed as closely as possible to length.

If a suffix string is provided and trimming occurs then it is appended to the result.

You can use a single character such as the ellipsis (…) or a string such as " (…)" (note the leading space), provided it does not start with a digit. Note that the ending is appended as is, so start with a space if you want a space between the resulting string and the ending. Also note that the ending length is not factored in the target length calculations.

type defines the type of output and can take two values: "html" or "xml" (lowercase). If type is omitted or different from those values, then it defaults to "xhtml".

options is a list of any number of HTML::Tidy arguments strung together as key:value pairs and separated by semicolons. Run tidyp -help-config for the complete list.

Examples

  • <$mt:EntryBody smarttrim="40"$> will result in a string that has a length closest to 40 characters
  • <$mt:EntryBody smarttrim="-40"$> will result in a string that has a maximum length of 40 characters
  • <$mt:EntryBody smarttrim="+40"$> will result in a string that has a length of more or less 40 characters whether a word may be cut at 40, in which case it is included in the result
  • <$mt:EntryBody smarttrim="-40","…"$> will trim the source at a maximum length of 40 characters, and add an ellipsis if a cut is performed
  • <$mt:EntryBody smarttrim="40","…", "html", "numeric_entities:1;drop-empty-paras:1"$> will result in a string that has a length closest to 40 characters, any HTML will be tidied up with special chars replaced with numeric entities and empty paragraphs dropped

Caveats

If HTML::Tidy is not installed, any HTML in the source string will be removed before trimming occurs (this is a safeguard to prevent the output of malformed HTML). In effect, this is equivalent to <$mt:EntryBody remove_html="1" smarttrim="…"$>, which you can use to force removal of HTML.

If HTML::Tidy is present and the source contains HTML, the HTML code counts against the target length. This means that the rendered result may appear shorter than the target length. However, the actual length of the resulting string may be bigger than the expected length if unclosed tags are closed.

If SmartTrim cuts within the start of an HTML block before its enclosing content, it may output an empty block, such as <a href="…"></a>.

Installation

Release history

  • Version 2.0 - 2010/11/09 - Switch to HTML::Tidy in replacement of HTML::Defang.
  • Version 1.1.2 - 2010/01/05 - Relaxing the sanitization done by HTML::Defang.
  • Version 1.1.1 - 2009/11/02 - Added L10N files and French translation.
  • Version 1.1 - 2009/10/22 - Modified the arguments syntax. Added HTML::Defang to close HTML tags. Added POD.
  • Version 1.0 - 2009/10/19 - First public version.

NOTE

Once you have installed HTML::Tidy, you might also be interested by the Tidings plugin for MT.

Tribute

This plugin has been inspired by the TrimWordsbyLen plugin from Crys Clouse.
Thanks to Su, Brice, Steeve, Michael and Byrne for the feedback and suggested improvements. Thanks to Andy Lester for HTML::Tidy and tidyp.

Copyright & License

Copyright (C) 2009-2010 François Nonnenmacher, Ubiquitic.

This free software is provided as-is WITHOUT ANY KIND OF GUARANTEE; you can redistribute it and/or modify it under the same terms as Perl itself.

9 Comments

Sometimes if the cut-off point occurs within certain tags (objects, embeds, some links) the filesystem path to the blog home page appears, as in the example below.

---example:
To assist you with this process, the Computer Training Center will be offering one more training session titled Archiving & Copying Blackboard Courses on Wednesday, 12/09 at 3 PM. Please register by selecting the course from the /var/www/html/blogs/shu_tech/index.html
----end example

/var/www/html/blogs/shu_tech/index.html shows as a hyperlink to the correct URL, http://blogs.shu.edu/shu_tech, but displays as shown here. It would be nice to "sanitize" this somehow. Any thoughts?

I am having the same issue - if the entry is trimmed in the middle of a link tag, it displays the filesystem path to the current page. Is there any way to avoid this?

I was able to reproduce the bug but I'm befuddled at why this is happening, as this path isn't in the string that is transformed by the plugin.
Anyway, I'm planning to replace the library I'm using for the HTML cleanup, I hope this will fix it.

Hi Francois,

I have a client who installed the Smart Trim plugin in June of 2010 and is just this last week having problems with the filesystem path showing up in the exerpt.

Should we update our perl lib or is there an update for the plugin?

Thanks!
Jen

Hi, Ms. Jen, I haven't updated the plugin yet.. Is this the same problem reported by Tom above?

Hi François,

I've just noticed this problem, too, on our news index. The source code looks like this:

<em>Weight of the World</em>, Glenn Sutter (Saskatchewan)<brdefang_…>/home/theusername/website.com/www/media/news/index.php<!-- close mismatch --></p>

Would love to fix this asap as it makes me a little concerned.

Thanks!

Alas this bug is still a complete mystery to me. I can reproduce it, but I haven't found where it comes from yet (the string that Defang receives contains a URL, not a path, how the URL turns into the server path is the mystery).

I've just release a new version of this plugin that hopefully fixes the path bug in links, and has a better handling of HTML overall.

Would love to use this to do excerpted 'teasers' on my blog home page with full HTML. This way images and any hyperlinks I have get put in the teasers. However, it doesn't seem to work correctly. I don't get closed html tags.
example (using brackets instead of less than / greater than sign:

I was looking up info on new iPad 2. I searched [a href="http://www.google.com">Google…

It should have [/a] at the end AND then the three dots. Similarly, [img HTML code doesn't get closed.

Maybe this plugin doesn't do what I'm looking for. I'm running HTML::Tidy v1.54 and your latest smarttrim plugin.

Here's what I had in my test template:
[$mt:EntryBody smarttrim="+542","…","html"$]

Is there something I'm doing wrong?

Leave a comment

N.B. by commenting here, you accept the comments policy.