Saturday, February 15, 2014

It's about Time, It's about Space

A 1960s TV series theme song began, "It's about time, it's about space...". Some, from Physics to Philosophy, say it's about both, claiming they are each aspects of a single space-time. Computer systems developers need to consider this as they build GIS applications.

Ontology, being the branch of philosophy concerned with describing "what exists", tackles the topics of Space and Time since they are often used to describe things. An Introduction to Ontology[1] devotes a chapter to each. As usual, things are more complicated than our initial intuition expects, and debate continues about different viewpoints. In a nutshell, the following are discussed:
  • Space is usually defined in terms of "regions"
  • Space is either absolute or relative.
  • Space is either something things "are in", or it is synonymous with the thing itself
    i.e. regions only have properties like size and location, versus, a region itself having the property blue if the stuff in it is blue.
  • Space is either Euclidean or not (i.e. flat or curved)
  • Space is either separate from Time, or parts of the same thing: space-time.
Most programmers today, in the age of Map apps, Geographic Information Systems, and geocoding, take the view that an entity such as a business or address is located at some location. The location ideally could be defined as a collection of regions defined by GPS coordinates. Often, the location is (over)simplified to a single point on a map.

While it is recognized that many problems exist with actual databases of geocoded entities, it is usually assumed that they are in the realm of epistemology rather than ontology. In other words, it is assumed the problem is with "our knowledge" due to inaccuracies in the set of GPS coordinates; not that locations don't actually have a definite set of coordinates.

However, not every entity that takes up space has a well-defined and unchanging mapping to a set of GPS coordinates.  ZIP codes, for example, are not defined in terms of geography but rather as collections of delivery routes. Another example, as shown in the title insurance case study below, is in real estate legal descriptions. In addition to a knowledge problem caused by ambiguous language used in these descriptions, they can also refer to ephemeral landmarks.

While a naive assumption that space is different than time is often made in data model design, entities like ZIP codes and Legal Descriptions require a time dimension to be completely accurate. It turns out that the mapping of zip codes to postal routes changes several times a year.  And landmarks, referred to in property descriptions, can change location and shape over time.

Case Study: TICOR Title Insurance System
OMEX was a startup that was an early pioneer in creating optical disk technology for data storage. It took on a contract to produce a computer system to support TICOR, the largest title insurance company in the U.S.  TICOR itself had the contract to keep backup copies of all the real estate transactions filed with Los Angeles county.  As a part of archiving copies of the documents, it was free to use the information in them, and hence support its business of providing title insurance.

The computer system was to replace using microfilm photos of the documents with optical disk storage of the images.  It would link these images with a structured database of information related to each property. One of the goals of the database was to enable answering basic questions about property locations.

The programmers, having a naive notion of how property boundaries were defined, were surprised to see that a common method is “metes and bounds” which uses plain english descriptions using landmarks. E.G. "beginning with a corner at the intersection of two stone walls near an apple tree on the north side of Muddy Creek road one mile above the junction of Muddy and Indian Creeks, north for 150 rods to the end of the stone wall bordering the road, then northwest along a line to a large standing rock on the corner of the property now or formerly belonging to John Smith, thence west 150 rods to the corner of a barn near a large oak tree, thence south to Muddy Creek road, thence down the side of the creek road to the starting point."

As can be seen, it would be difficult to translate this into a collection of GPS coordinates. But even if you did, you would not be done with the problem.  Like ZIP codes that change over time, the location and shape of creeks, rivers, etc change over time. Lest you think this is a merely theoretical problem, for centuries, States have sued each other over land ownership due to border rivers migrating over time.

Ultimately, the computer system wound up just using unstructured text fields to contain the legal description rather than the more ambitious GIS database they had originally promised.

[1] An Introduction to Ontology, Nikk Effingham, Polity Press, 2013
[2] A River Runs Thru It, How the States Got Their Shapes, History Channel, 2011

Wednesday, July 31, 2013

Recapping this blog for a Fan

An answer to some fan mail...

I've recently stumbled upon your blog "Existential Programming" and the post about "there is no component". I found the post to be very interesting, thank you for writing it.

I'm not sure which version of that post you read, but be sure to see the most recent version (and the other "which came first" post in the thread "Holes or Parts?").

Do you have any additional resources you can recommend for learning about component-based architectures?

I don't have any in particular; because I was around during the evolution of the ideas that moved software development into the "structured programming" and then "modular programming" and then "object oriented programming" eras, I picked up bits and pieces from many sources.

It is ironic that back when I first wrote about "there is no such thing...", I was making a design philosophy point to people who basically understood what a component was all about, but I now find more and more developers who don't really understand the difference between a "component" and just a chunk of logic, and why there is an "interface" around an "implementation".  I wrote a (newly revised) rant about this on my other blog  which has some links in it to important concepts.

Also are there any concrete examples of your philosophy? The philosophy that one should use a type-less data model. I'd love to see some actual code or applications built around that concept.

Alas, I spend most of my time/energy on my day job which mostly consists (in another irony) of simulating in JavaScript the static nature of traditional OOP languages like Java/C++, instead of simulating in Java and SQL the dynamic nature I envisioned in Existential Programming.  Imposing the strong types discipline to JavaScript is a necessary but not sufficient piece of the puzzle.

I always advocated strong types, just not so static. Once, you have a "type-less" platform like JavaScript, you can use mix-ins to implement "roles".  Each entity (e.g. person) and each role (e.g. student,patient,employee) is implemented via a strong type, but they are separated, with roles being dynamically added/removed

One of my higher level musings in Existential Programming was to use a dynamic OOP foundation to allow multiple "ontologies"of the same entities/roles to exist simultaneously to support ontology bridging. 

While I now can envision how to implement that sort of thing via an object persistence framework between JavaScript and some nosql/tuple-store/column-store/etc database, I haven't had a day-job project that needed/subsidized it (yet!)

I’m looking forward to hearing from you.

Thanks for writing and making me look back at all those old writings given what I've learned since.  After that initial burst in 2006, I've actually spent much more time reading/writing about basic Philosophy and how to do better analysis (i.e. Philosophical Programming) than Existential Programming.

Saturday, April 27, 2013

Form Input Validation and Defeasibility

A major subcategory within Philosophy is the study of Logic.  Computer programmers are very familiar with some categories of logic, for example Boolean logic, but they may not be aware of how many different kinds of logic philosophers have come up with in the past 2500 years. While the familiar systems of logic seek to insure their conclusions are absolutely true, there are other logics that do not require this, and they have lessons for programmers.

Aristotle described a system of logic that was so iron-clad, it was 2000 years before philosophers and mathematicians started seriously adding to it. It is famous for defining the syllogism which is a form of deductive logic, where certain conclusions can be mathematically proved to be true as long as the starting premises are true. The prototypical example of a syllogism is:

P1: All men are mortal.
P2: Socrates is a man.
C: Therefore, Socrates is mortal.

As long as the two premises P1 and P2 are true, C is guaranteed to be true.

However, unlike deductive reasoning, everyday situations usually need to come up with conclusions with less than absolute provability. Jurors need to come up with a verdict even though they have not been given iron-clad arguments. One of the several types of reasoning that is not deductive is called defeasible logic.

Many everyday arguments rely on assumptions that are true "all things being equal". For example, if told that birds were being transported in boxes with no lids, you might conclude that there was a large risk that the birds would fly away, and therefore lids were needed. Given what you know, this is a reasonable conclusion. However, those arguments are defeasible, which means that there are other additional statements that could change the conclusion if they were added to the mix. If we add the statement "the birds being transported are penguins", the conclusion that they might fly away would be defeated since penguins can't fly. On the other hand, some arguments are indefeasible. For example, "John is a bachelor therefore John is unmarried" would not be false, no matter what additional statements are made because being unmarried is the very definition of being a bachelor.

It is helpful to know this little distinction between defeasible and indefeasible conclusions, as is shown in the following little case study.

Case Study: Form Input Validation
It is very common for electronic form systems (e.g. a web page order form) to validate the data entered by a user before allowing the form to be processed. In developing a widget framework used for building  a collection of online banking applications, there were two levels of form input validation provided. One level caused error messages to be immediately given to the user, and the other level would only display after the user attempted to submit the form.

Knowing the distinction between defeasibility and indefeasibility is the key to knowing which level any particular validation rule should be.  For those input strings that are invalid and no additional input will change that, their error status is indefeasible and the user can be notified immediately. For example, if a field requires a number and the user has entered a letter "A", no additional input will make it a legal number.

On the other hand, if a field requires an email address, and the user has entered "foo@", that is invalid because it is not a complete address. However that status is defeasible because more input can change it to a valid value.  Therefore the user should not be harassed about it until a submit is attempted.