One of my favorite passages from Lewis Carroll is the dialog in “Through the Looking Glass” between Alice and Humpty Dumpty:
“There’s glory for you!”
“I don’t know what you mean by ‘glory,’ ” Alice said.
Humpty Dumpty smiled contemptuously. “Of course you don’t — till I tell you. I meant ‘there’s a nice knock-down argument for you!’ ”
“But ‘glory’ doesn’t mean ‘a nice knock-down argument,’ ” Alice objected.
“When I use a word,” Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean — neither more nor less.”
“The question is,” said Alice, “whether you can make words mean so many different things.”
“The question is,” said Humpty Dumpty, “which is to be master — that’s all.”
It often seems exactly like that when we have technical discussions.
I first became aware of this when the areas of Identity and Security began to seriously overlap late last century. Around 2005 a group, the “Identity Gang” coalesced around the idea of having informal discussions about identity issues either before, during, or after conferences (Catalyst, Digital ID World, EIC, etc.). What we all quickly discovered was that we didn’t agree on the meanings of terms. So we launched the Lexicon project. We didn’t get very far.
When I joined the Identity Ecosystem Steering Group (IdESG) last year, the same issue came up almost immediately – we didn’t all agree on the meaning of terms. A project was started to create a taxonomy for the IdESG which, incredibly, ended up containing 785 terms! Some had only one definition listed, but others had 2, 3, 4 – up to 13 different definitions. Needless to say that this is still an ongoing project with no end in sight.
I bring this up because of a Twitter conversation I was having yesterday. While Twitter isn’t ideal for dialog among more than 3 people (there were, at times, five involved in this discussion) it does have an immediacy that other methods (chat forums, email, et al) don’t. The drawback, of course, is the 140 character limitation per tweet, which leads to lots of abbreviations, elided letters and texting shorthand – none of which is helpful for understanding, especially among people who normally don’t converse with each other.
What happened was that someone referred to “PII” which I understand as Personally Identifiable Information. Others, though, consider it an abbreviation for Personal Identifying Information. There’s a subtle difference.
Personally Identifiable Information (I’ll call this PII1) is information, either a single attribute or a combination of attributes, which can uniquely identify an individual in a given context or namespace. Your date of birth does not uniquely identify you, but in combination with your mother’s maiden name and place of birth certainly can. Other attributes such as a national ID number (Social Security, National Health, etc.) are PII all by themselves.
Personal Identifying Information (PII2), on the other hand, is defined as “Information which can be used to distinguish or trace an individual’s identity, such as their name, social security number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mother’s maiden name, etc.” according to the IdESC taxonomy, quoting from The US government’s FICAM Trust Framework Provider Adoption Process ( that’s the US Federal Identity, Credential and Access Management Program).
The difference between the two is subtle, but significant. Under PII1, date of birth – in and of itself – is not PII. Under PII2 it is, even though it doesn’t uniquely identify you. So when creating privacy law or examining privacy issues (as are raised by the US Government’s PRISM program) it is very important to know which definition of PII is being used. Suppose, for example, it becomes illegal to knowingly distribute PII of others. Or, in the context of computer breach situations, it becomes necessary to inform entities when their PII has been leaked or stolen. How do we decide if it really is PII that’s escaped into he wild?
The definition I use, PII1, is entirely in keeping with the work KuppingerCole has done on Information Stewardship, where we differentiate between data and information – “Data is nothing more than the symbols which are processed by the computer. Data, in itself, has no meaning and no value. Information is data with context or processing that makes it useful.” Some attributes (such as date of birth) are simply data, of little use without context or other qualifying data that creates information.
We recognize that the terms we use are not always understood by everyone. In fact, we at KuppingerCole have the added problem of bi-lingual (in our writing) and multi-lingual (in our discussions) use of terms. “Digital Identity” (which Google tells me is “digitale Identität” in German) may have numerous translations each with multiple meanings.
For this reason, almost all KuppingerCole published works include a Glossary section, in which we define the terms used in the paper. This doesn’t mean that the definitions we use are universally accepted nor that other definitions might not be better. It simply means that when the term is used in the publication this is what you should understand it to mean. That way, any subsequent discussion starts off with everyone on the same page, so to speak.
This works well for publications, not so well for impromptu discussions. Maybe we should each create our own personal lexicon/taxonomy/glossary on the ‘net so we could reference it when we Tweet.