Multimedia Learning

Multimedia Learning is the use of words and visuals together to support learning. Based on Richard Mayer’s research showing that people learn more effectively from combined text and images than from text alone — provided the design doesn’t overwhelm working memory.

Why it matters#

Adding media to a course isn’t inherently better. Poorly chosen or excessive media increases cognitive load and gets in the way of learning. Mayer’s principles give you specific, evidence-based rules for when and how to use media so it actually helps.

Mayer’s core principles#

Principle What it means in practice
Multimedia Use words and images together, not words alone
Coherence Remove anything that doesn’t directly support the learning goal — decorative media hurts more than it helps
Signalling Highlight the structure and key ideas — headings, arrows, emphasis — to guide attention
Redundancy Don’t show the same content in both on-screen text and narration simultaneously — pick one
Spatial contiguity Place related words and images near each other on screen
Temporal contiguity Present corresponding words and images at the same time, not sequentially
Segmenting Break content into learner-paced chunks rather than one continuous stream
Modality Use spoken narration with visuals rather than on-screen text with visuals — it splits the cognitive load more effectively

Multimedia and video#

Video training is the most demanding multimedia format to produce and the most demanding for the learner’s attention system. Several of Mayer’s principles apply with particular force:

  • Coherence — a video with background music, animated lower thirds, and decorative b-roll that doesn’t support the narration is violating this principle. Every element must earn its place.
  • Segmenting — a single long video is harder to process and navigate than a series of shorter focused segments.
  • Modality — narration over visuals is the native format of video; avoid putting on-screen text that reads the same as the narration.

Attention resets in video are a practical application of the signalling and temporal contiguity principles — they mark transitions, highlight key moments, and keep the learner’s visual and auditory channels engaged together.

Key facts#

  • Less media is often better. The coherence principle is the most commonly violated. Intro animations, background music, and decorative stock images all add cognitive load without adding meaning. Cut them.
  • Narration + visuals outperforms text + visuals. The eyes and ears process information through separate channels. Using both together — narration over a diagram — is more efficient than reading text next to a diagram.
  • Chunking is essential. Learners can only hold a limited amount in working memory at once. Break content into short, focused segments (5–10 minutes) and let learners control the pace where possible. This connects directly to cognitive load theory.
  • Media should serve the learning objective, not the production budget. An expensive video that could have been a diagram is a common mistake. Match the medium to what the learner needs to understand, not to what looks impressive.
  • Format should match your learner. Consider how and where they’ll access the course — mobile vs. desktop, quiet office vs. noisy floor — before choosing media types and file sizes.

When to use it#

  • When deciding what type of media to create for a module
  • When reviewing existing content for unnecessary complexity
  • When a course feels heavy or learners report feeling overwhelmed
  • When planning a training video — to validate that each media element is justified

Resources#