Machine Teaching for Topic Model Refinement

1 Vector Space Model – 2 Visual Interactive Interfaces.

Avoiding tedious backbox model refinement through interactive semantic interfaces!

This visual analytics technique is designed to enable domain knowledge externalization.

It supports four tasks, enabling users to:

[T1] understand the relationships of the two views
[T2] diagnose potential conflicts
[T3] refine the concept space based on their domain understanding
[T4] update the topic modeling based on the refined concept space

A continuous quality monitoring and refinement recommendation supports these tasks and enables targeted user guidance.

Two Independent Hierarchies

Supporting Two Views based on the same Vector Space.

The Concept Hierarchy represents the user’s semantics and is initialized by suggestions (top-down).
The Topic Hierarchy is based on the automatically computed results of a topic model (bottom-up).

Base Words are all words that are neither part of the higher levels of the concept, nor of the topic hierarchies. They can be promoted to become keyword and/or descriptors through user interaction. On the other hand, demoted keywords and/or descriptors traverse down the hierarchy to become base words.

A Multilayered Canvas

Linked through Spatial Positioning.

The Semantic Concept Space is designed as two stacks of layered, interactive canvases comprising the two views of the visual interface:

The Concept View
The Topic View

Both views are separate, super-positioned canvases. Each view is composed of three layers, representing its hierarchy levels above the base words.

Users can interact with one view at a time, while the other is toggled inactive. To facilitate comparison between views, the inactive view is shown with a low opacity in the background of the active one, making its elements shine through the canvas.

Processing Steps

Modeling the Semantic Concept Space.

Stage I

Interactive Concept Generation

Output: Weighted Concept Vectors

Processing Steps:

Seed Concept Extraction
Concept Vector Expansion
Interactive Editing and Enrichment
Scoring and Ranking

Stage II

Concept Neighborhood Computation

Output: 2D Coordinates for each Word

Processing Steps:

Corpus and Topic Keyword Insertion
Initial Concept-Anchor Setting
t-SNE Reduction

Stage III

Concept Hierarchy Building

Output: Concept Hierarchy Relations

Processing Steps:

Parameter and Constraint Setting
Semantic Similarity Update
Quadtree Mesh Generation
Hierarchical Density-Based Clustering

Stage IV

Layered Canvas Mapping

Output: Lyered Mapping of all Elements on a Canvas

Processing Steps:

Transformation and Rescaling
Concept-Anchored Projection
Overlap Reduction
Color Mapping
Voronoi Tessellation

Visual Encoding

Mapping the Concept and Topic Layers.

Our visual workspace is designed to support: (1) finding different elements on the canvas; (2) decoding the type of word object at hand; and (3) analyzing the spatial association of words.

For each hiearchy layer we represent each word object by default with a label and enable users to toggle on a circle as an additional marker. Both the circle and the label sizes encode the object level in the data hierarchy.

For the topic view, we designed a topic glyph that represents the topic or document association with different concept regions. This glyph can be used as another alternative representation for the object marks on the canvas layers.

Featured Interactions:

Navigation through Word Search
Lasso Selection
X-Ray Lens
Guided Tours

The Concept View

The Machine Teaching Window.

An example initial concept view based on the 2012 US Presidential Debates between Romney and Obama. In this example, the left side of the concept view represents a region on renewable energy (bottom) and terrorism (top).

The Topic View

The Machine Learning Window.

In the corresponding topic view, the topic on on oil production is placed to renewable energy and the topic on a terror attack in Libya between the two concepts as it is related to both.

Topic Model Refinement

Interactive Learning from the User’s Semantics.

The iterative refinement of a topic modelling output is achieved indirectly through concept refinement. Users can trigger an update to the t-SNE projection, as well as the topic modeling, at any time during this process.

Concept Refinement

Users have two options for Concept Refinement:

The direct manipulation enables exploratory refinement.
The guided relevance feedback is designed for targeted refinement. It gives refinement suggestions while touring the concept view.

Both options can be used anytime throughout the visual analytics process to adjust the concept hierarchy.

Topic Modeling Adaptation

The topic modeling view can not be adjusted directly but is used for inspecting and analyzing the topic modeling result. Only through recomputing the topic modeling algorithm (on-demand) do the layers of the topic modeling change to adapt to the concept refinements.

This duality of views enables users to teach the machine learning model their domain knowledge, as well as the machine learning model to respond through learning the new semantics.

Machine Teaching for Topic Model Refinement

Two Independent Hierarchies

A Multilayered Canvas

Processing Steps

Stage I

Interactive Concept Generation

Stage II

Concept Neighborhood Computation

Stage III

Concept Hierarchy Building

Stage IV

Layered Canvas Mapping

Visual Encoding

The Concept View

The Topic View

Topic Model Refinement

Read more in the paper…

Semantic Concept Spaces: Guided Topic Model Refinement using Word-Embedding Projections