Dictionary Compression

The Idea

Dictionary compression looks at each word in the message and replaces it with a number that represents what the word is - this could point to the word in an actual dictionary, or be an index into a list of words used in the message.

Imagine the message was one man one vote, for example. The list of unique words in the message would be ["one", "man", "vote"], so you could code the message as [0, 1, 0, 2] (where the numbers represent the position of the letters in the list). For someone to read the message they would need the numbers and the word list.

Your Task

Write a program that:

  1. takes text as an input message
  2. produces a list of unique words in the message - the dictionary
  3. produces a list containing the position of each word from the input message (in order) in the dictionary
  4. displays the ouput

Extension

Could you reverse the process and turn the two lists back into the original message? Could you also calculate the total amount of storage required for the newly-encoded message and compare it with the original input to calculate the saving as a percentage?