Saturday, April 27, 2013

Form Input Validation and Defeasibility

A major subcategory within Philosophy is the study of Logic.  Computer programmers are very familiar with some categories of logic, for example Boolean logic, but they may not be aware of how many different kinds of logic philosophers have come up with in the past 2500 years. While the familiar systems of logic seek to insure their conclusions are absolutely true, there are other logics that do not require this, and they have lessons for programmers.

Aristotle described a system of logic that was so iron-clad, it was 2000 years before philosophers and mathematicians started seriously adding to it. It is famous for defining the syllogism which is a form of deductive logic, where certain conclusions can be mathematically proved to be true as long as the starting premises are true. The prototypical example of a syllogism is:

P1: All men are mortal.
P2: Socrates is a man.
C: Therefore, Socrates is mortal.

As long as the two premises P1 and P2 are true, C is guaranteed to be true.

However, unlike deductive reasoning, everyday situations usually need to come up with conclusions with less than absolute provability. Jurors need to come up with a verdict even though they have not been given iron-clad arguments. One of the several types of reasoning that is not deductive is called defeasible logic.

Many everyday arguments rely on assumptions that are true "all things being equal". For example, if told that birds were being transported in boxes with no lids, you might conclude that there was a large risk that the birds would fly away, and therefore lids were needed. Given what you know, this is a reasonable conclusion. However, those arguments are defeasible, which means that there are other additional statements that could change the conclusion if they were added to the mix. If we add the statement "the birds being transported are penguins", the conclusion that they might fly away would be defeated since penguins can't fly. On the other hand, some arguments are indefeasible. For example, "John is a bachelor therefore John is unmarried" would not be false, no matter what additional statements are made because being unmarried is the very definition of being a bachelor.

It is helpful to know this little distinction between defeasible and indefeasible conclusions, as is shown in the following little case study.

Case Study: Form Input Validation
It is very common for electronic form systems (e.g. a web page order form) to validate the data entered by a user before allowing the form to be processed. In developing a widget framework used for building  a collection of online banking applications, there were two levels of form input validation provided. One level caused error messages to be immediately given to the user, and the other level would only display after the user attempted to submit the form.

Knowing the distinction between defeasibility and indefeasibility is the key to knowing which level any particular validation rule should be.  For those input strings that are invalid and no additional input will change that, their error status is indefeasible and the user can be notified immediately. For example, if a field requires a number and the user has entered a letter "A", no additional input will make it a legal number.

On the other hand, if a field requires an email address, and the user has entered "foo@", that is invalid because it is not a complete address. However that status is defeasible because more input can change it to a valid value.  Therefore the user should not be harassed about it until a submit is attempted.

Saturday, February 2, 2013

What is Philosophy?

Because I am endeavoring to teach ideas from Philosophy to computer programmers, who typically have no background in it, a question I must answer right off the bat is "what IS philosophy?"  Here is the answer I provided to a new MOOC "Introduction to Philosophy":
Given that the single word "philosophy" is commonly used to refer to multiple different (albeit related) things, all answers to the question "Q: What is [Western] Philosophy?" actually depend on one: "doing philosophy; the act of philosophizing" which I explain to even children as:
A: Being able to answer the question, Why do you think what you think? Why do you believe what you believe?
And while you want to be able to answer to someone else's satisfaction, you foremost want to be able to answer to your own satisfaction because you want to know the truth. (BTW, the western bit is "we believe humans are capable of working it out ourselves")
All the other things the word philosophy can refer to build on, and depend on, the above. e.g. Philosophy as meaning "the body of knowledge accumulated by those doing philosophy" (which entails history of philosophy, individual ideas, individual philosophers, tools/techniques/criteria for doing it well, favorite topics for philosophizing about, etc, etc) all naturally spring from someone somewhere starting to "do philosophy".

Of course for programmers who will say, "so what does that have to do with me?", I have to quickly add that even thousands of years ago, philosophers already had come up with some really good techniques for describing the world, and being able to justify that those techniques were better than our intuition.  So, since a large portion of our work as programmers is to describe the world via our data models, object models, class hierarchies, classification schemes, etc, etc, we would get better at it if we replaced our intuition with techniques philosophers know but we typically havent been taught.

Thursday, May 3, 2012

Not All Properties Are Created Equal (Part II: Says Who!?)

It is said that Beauty is in the eye of the beholder. If one were developing a data model for Person, there are various properties that might be attached. Unlike some other properties, it might be more self-evident that a property like isBeautiful would be problematic because it begs the question “Says who?”. There is a relationship between a particular beholder and a particular beheld, in which an isBeautiful property would be more appropriately placed. It may even be a many-to-many relationship with multiple values of isBeautiful coming from multiple beholders as in a beauty pageant panel of judges. Or, as in the case study below, the credit grade of a banking customer is actually several grades, coming from different bankers and algorithms over the life of the business relationship, and international banking regulations now mandate their tracking and optimization.

Of the many subcategories of properties that Philosophers have come up with, an important pair are what John Locke named Primary and Secondary Properties (aka Qualities). The distinction between them being that Primary properties are those that are “objective” and “in the object”, while Secondary properties are “subjective” and “in the mind of the perceiver”. Primary properties of an apple would be its mass, shape, size, etc. Secondary properties would be its color, taste, smell, etc.

That apple isn't really red?
Color is a classic case in point because it seems that color might be objective.  There are surely some collection of wavelengths of light reflected off an apple that could be classifiable as “red”. Alas, there are mountains of evidence that color perception depends on the person and the external conditions.

A famous mountain of evidence (sorry, pun intended) is Ayer’s Rock (aka Uluru) in Australia which attracts thousands of tourists to see it dramatically change colors right before their eyes. Over the minutes of sunset or sunrise, “the color” ranges thru black, red, pink, orange, brown, etc.

In the rain it even turns blue and purple...


So, the mountain can’t really have a simple “color” property with a simple single value.

 


Again, as with Essential properties, how is this primary/secondary distinction supposed to make me develop differently than I do now?
  • Model Clarity: Keeping the properties of an entity or class limited to primary qualities helps to insure that your data model will match data models developed by others. You will be more likely to agree on what the properties are in the first place, and, on what data type best represents that property (see Stronger Types below).

  • Better Normalization: Your entity database tables are normalized when limiting to Primary properties because those truly are properties of that entity. By recognizing and removing Secondary properties, you won’t be mixing in columns that are really a flattened relationship with some other beholder entity.

  • Better Keys: When deciding which properties of an entity are candidates for being part of it’s “key” or “identifier”, it definitely helps to verify they are really objective properties of the entity. Otherwise, they are based on some external beholder & conditions that can change over time, even though the entity itself didn’t change! A drivers license search will fail if a witness’ notion of a suspect’s hair-color doesn’t match the DMV’s notion of that hair-color.

  • Data Provenance: Realizing that each value of a secondary property begs the question “says who?”, you need to identify the authority that provided each value for that property. As you can see, there could be anything from a one-to-one to a many-to-many relationship between the original class and the various authorities providing data values. If the answer is “all values of this property come from a single source X”, then that need merely be noted in the documentation.  At the other end of the spectrum, there may need to be a sophisticated sub-model just to keep track of the source and circumstances of each of the several values that property could take on for a single object! (see the Basel II banking case study below).

  • Stronger Types: Authorities providing data values for secondary properties, usually define entire “types” rather than just values from some universal type. For example, authorities specifying colors usually limit them to a custom collection of colors (i.e. a palette), or even collections of palettes.  Ralph Lauren defines many palettes of colors, most with one or more “reds”, but none of them are the same as the “red” of a 1964 Ford Mustang which comes from the small palette of factory colors from Ford for that year. If you need the simplicity of a single universal “red” value then you are looking at defining mappings from one palette to another. Car salesmen do this mapping intuitively by showing you a "cypress pearl"Infiniti when you ask to see either "black" cars, or "green" cars. Do you need that sort of detailed modeling? You do if you are trying to make it easy for customers to find what they want (see the fabric.com case study below).
1964 Mustang Factory Colors

Ralph Lauren Palette

Case Study: Basel II Banking Accord

One of the fundamental practices of banks is to keep a certain amount of money in reserve. When taking customer deposits in, and loaning it out to make a profit from the interest charged, there is a danger if all the deposits are handed out as loans. So, a reserve must be kept, but, there is a conflict between larger reserves for more safety and smaller reserves for more profit. Because banks have erred on the side of more profit, over the past several years international banking regulations have added requirements that the amount of reserve be calculated on a more scientific basis, and be optimized over time.

One of the major criteria in determining how much reserve is required, is to base it on the quality of the customers that the bank lends money to, as measured by their credit grades. A credit grade is really just a prediction of how likely a borrower is to pay back a loan, and how much they would leave unpaid if they did default on a loan.

The regulators recognized (though maybe not in these philosophical terms) that “credit grade” is not a primary property of a customer; it is a secondary property based on the grader and the procedure or algorithm used. The Basel II Banking Accord specified that simply keeping track of customer credit grades was not enough.  Banks needed to start keeping track of “says who?” and “what did they base it on?”. With this data, it can be verified after the fact which of these predictions of future default panned out. This enables evolving better algorithms and weeding out graders and methods that were not very good.

Case Study: fabric.com

In the early days of marketing on the web, the start-up fabric.com (since bought by Amazon) was building a web store to sell clearance fabrics and apparel. Like all online retailers, there is the problem of making it easy for the customer to find the products they want. A common approach is to build a web site with a left-column navigation bar containing filter-by-property controls. This works fine for Primary properties, but is more of a problem for Secondary properties.

Alas, if one doesn’t know that there is a difference, one builds all of the filters in the same manner. For fabrics, filtering by the dimensions of the piece being sold is straightforward and effective because it is a primary property. For color however, there are problems because it is a secondary property, and hence opinions differ on how to describe the color.

Look, even though the “color” of a fabric may be in the eye of the beholder, it is an objective fact that the manufacturer described the color as X, right?. How about we use that since we have to pick something?  Well, two big problems:
  • All those colorful color descriptions cause an overly large list of colors in the color filter, plus related colors are spread all thru the list (e.g. avocado, green, lime, olive, etc
  • When the customer searches for “green” she doesn’t find all those fabrics with the color described as “lime”.
Okay, fine, we are going to have to go to the trouble of mapping each item to a simple set of colors that we pick. So, Mr. Programmer, go set up a set of simple colors in the database so that we can pick the color when we enter these items into inventory. Well, one other big problem:
  • While everyone is entitled to their own opinion, some opinions are worth more than others.  The programmers did not have the industry experience needed to pick an appropriate set of colors.  It was the brick and mortar fabric sellers that knew things like “stripe is a color”! So, it took someone who had experience with what buyers actually ask for to know that along with “green” in the color list, also needed was “green and white stripe” but not the manufacturer’s description “lime and white stripe”. They also knew that, in addition to simple color families like yellow and brown, subtle categories like "gold, “beige”, and “cream” were needed (and in fact the last two combined into a single category). On the other hand the green family did not need to be augmented by “lime” and "avocado" families.
SO, just as with Essential properties, more important than any particular use case is knowing that there IS a distinction between objective intrinsic Primary properties versus Secondary properties which are in the eye of the beholder.