Why We Can't Have Nice Things: FLoats, Dates, and Names

Why We Can't Have Nice Things: FLoats, Dates, and Names
John Feminella • San Francisco, CA • Talk

Date: September 19, 2014
Published: unknown
Announced: unknown

By, John Feminella
The real world is a messy place, and software reflects this to some extent. This messiness, however, doesn't mesh well with the general tendencies of software developers, who like to try to simplify the world with assumptions. When those assumptions later turn out to be wrong, bad things happen.

In this talk, we'll discuss three perennial sources of bad developer assumptions: floating point numbers, dates and times and the names of people, places, and things.

We'll illustrate why each of several commonly-made assumptions is incorrect, show how to use Ruby to arrive at the correct answer, and empower you to make better decisions about your code in the process.

GoGaRuCo 2014

00:00:13.280 welcome to why we can't have nice things and why it's all our fault by me so as
00:00:19.960 Leia told you I'm a co-founder of a company called upex uh we do analytics for digital marketing agencies and I'd
00:00:26.960 love to talk to you about that afterwards if you happen to represent an agency or be friends with powerful people who own agencies you can read
00:00:33.920 more about the company at upack.com and that's my website and that's my Twitter handle on the bottom I also want to give
00:00:39.640 a shout out to the organizers um that's Leia Josh Jim and the volunteers Who are
00:00:47.160 Kate John Jonathan Matt Sarah Ryan Rachel and Emily and if I forgot your
00:00:52.640 name it's because you're not on the website so it's not my fault and thanks very much to thank you and thanks very
00:00:59.359 much to the SP ERS you know without without you guys this event wouldn't happen and I want you guys to know what
00:01:05.720 every time I took a bite of food at lunch I was thinking wow this tastes so much better because I didn't have to pay for it so this talk is about is
00:01:13.840 fundamentally about assumptions and the assumptions that we make as people and how that uh translates into what winds
00:01:21.200 up in our software so most of the time for most of the assumptions we make they
00:01:26.360 serve us well there's no problem so I think few people would disagree that eating a balanced diet is good for your
00:01:32.000 health or that you should be careful in your open Flames but sometimes the assumptions don't turn out to be true
00:01:37.399 like yesterday when I was walking to the party I uh thought it would be okay to cross in the middle of a street in front
00:01:43.320 of a police officer but it was not so uh but more specifically with reference to
00:01:50.159 software sometimes things that seem like they should be true and are perfectly reasonable statements uh don't turn out
00:01:56.240 to be true in every possible case we could imagine so here's an example from python where if you do reference
00:02:02.360 equality comparisons uh two numbers that are small are actually cached by certain
00:02:08.679 implementations of python so that all the numbers for example between -5 and 256 and cpython are given a uh a
00:02:17.400 Singleton identity but every time you make a new number that's outside of that range you're getting a new object so
00:02:23.120 reference equality doesn't work outside those ranges that's a very surprising result if you didn't know that that's
00:02:28.160 what was going on under the covers and for Ruby you know we have kind of uh similar things that might crop up so
00:02:35.000 here's one that's not necessarily specific to Ruby but we'll use Ruby code to demonstrate it you might think that for all x x is equal to itself but
00:02:42.760 that's not true for float uh nans or not a numbers in fact that's how they're
00:02:48.280 defined is they're not equal to themselves they're the only thing that's not equal to themselves sometimes you
00:02:54.519 might make assumptions about the way our dates and times work so you might think that this is a perfectly reasonable
00:03:00.640 statement that the day after October 4th is always October 5th is could it ever
00:03:06.519 be true but that or could ever be could that ever be an incorrect statement so we'll explore that later and sometimes
00:03:12.680 we make assumptions about uh social or cultural situations that wind up in our
00:03:18.040 software so here's a database schema that I actually once had to update in a
00:03:23.680 Massachusetts County Court System you so in 2003 the Massachusetts
00:03:28.760 Supreme Court decided case called Goodridge versus Department of Public Health and that was the case that
00:03:33.840 legalized samesex marriage for Massachusetts at least so lots of assumptions that people made at least
00:03:39.120 the original implementers of the system turned out to be wrong not because of uh
00:03:44.200 not not because of anything that was wrong about the relationships between the objects at the time but because of
00:03:49.319 changes in um how we assumed things would work so fundamentally this talk is about people and the assumptions that
00:03:56.319 people make and what happens when they go wrong and how we can do a better job
00:04:01.519 so let's start with time so time is hard uh why is it hard well because people
00:04:06.959 made it that way so before we can talk about that let's talk about what time is so let's forget about your software
00:04:13.879 constructions of what time is let's just talk about what time is generally speaking and to do that we have to go
00:04:19.560 back to what some philosophers thought about time uh so this is a open question
00:04:25.120 I would say in philosophy and it's one that's being debated all the time for centuries
00:04:30.280 uh Isaac Newton thought that time was sort of like a container uh that existed independently of whether or not we as
00:04:37.919 people existed so it's a sort of a universal property of uh some kind of
00:04:43.800 essence of the fabric of the world and the universe that we inhabit whereas Emmanuel Kant thought sort of the
00:04:49.280 opposite that time was basically entirely a human construct a way that we perceive the world around us so for
00:04:56.759 purposes of this talk we'll treat time like an ordering of events so just like
00:05:01.960 space is What Separates Me from someone in the audience uh time is what separates two events from happening at
00:05:08.520 the same moment so we're doing pretty good so far right we're we're only a few slides into this and we've already
00:05:13.880 resolved a few major philosopical questions so I think we can give ourselves a pan on the back so everyone
00:05:21.280 could just breach conference etiquette for a moment here and take a look at your watch or your smartphone or and
00:05:27.000 just note what time it is and just yell out what time you think it is right
00:05:32.840 now okay so most people most people yelled a hour and a number and I heard someone over here yell business time
00:05:38.600 which I thought was funny uh so so most people said an hour and a
00:05:46.280 minute and we can get more specific than that of course we can be more granular
00:05:51.400 about the moment of time that we're referring to so we could specify the minutes and then the seconds on top of
00:05:57.400 that and the milliseconds and the microseconds and so on and we can be as granular as we'd like to be about that
00:06:02.680 moment in time so when we think about a moment in time what we're really talking about though is not a specific not not a
00:06:09.520 DOT so much as a interval on which a statement is true so when we say
00:06:14.720 230 what we mean is that there is a range of time a range of possible values
00:06:20.520 on the time continuum for which the statement it's 230 is true so that's not a point rather but an interval of
00:06:27.120 possible values and if we get more specific with what we mean uh all we're really doing is narrowing that interval
00:06:33.919 so we're not we're not talking about different times necessarily we're just focusing in on one specific smaller
00:06:39.680 subset of that larger interval now there's a problem with this of course which is that moments are ambiguous uh
00:06:45.840 if I told you that it's 230 there are many possible values of time for which that's true there's not one specific not
00:06:52.120 not one specific continuous interval so here are two different moments that for which it would be correct to say but
00:06:58.199 it's 2:30 and of course this pattern repeats on a daily basis uh there are many many
00:07:04.039 moments not just two where it's also correct to say that it's 2:30 and if we get more specific it doesn't really help
00:07:10.599 the situation so we just add on seconds and microsc and milliseconds that doesn't get us
00:07:16.160 anywhere now the reason that I understand you when you yell at me that it's 2:30 is because I understand that
00:07:23.759 we're talking about a window of possible values that includes just this afternoon
00:07:29.720 probably sometime between two and four so I don't have to know the exact time but I know the possible range of values
00:07:35.960 you might be talking about and there's only one interval that falls in that context so I I can resolve the
00:07:42.840 ambiguity but we need specificity if we want to tell a computer uh that additional context right we can't just
00:07:50.080 say oh it's 2:30 and I'm talking about sometime this afternoon a computer's not going to know that so we have to be more
00:07:56.240 specific uh and tell basically a computer everything that would be necessary to resolve that ambiguity so
00:08:03.280 we can't just say it's 2:30 we have to say it's 2:30 p.m. on a Monday on this date in this year and so on so now that
00:08:11.240 we have this idea of what times are let's talk about how we can compare times right because it's one thing you
00:08:17.360 just tell someone what time it is but usually you want to know a fact about time like whether or not it's time for
00:08:23.599 an appointment or uh if it's your birthday yet or something like that so what kind of problems might we encounter
00:08:29.199 when comparing times the first problem is that we might not be using the same calendar so you can imagine that someone
00:08:35.959 in San Francisco California is probably using the Gregorian calendar the January February March Etc that we've all come
00:08:42.680 to know and love but someone in Beijing might not be using that they might be using the Han calendar um or a different
00:08:49.080 civil calendar okay so we can solve that problem pretty easily let's just you know get everyone on the same calendar
00:08:56.519 check but if we do that we're going to have to remember when we switched calendars so that old dates still make
00:09:03.360 sense so for example before we were on the Gregorian calendar we were on the Julian calendar and when we switched
00:09:09.040 calendars we had to jump 11 days in the future um and when that was when that calendar was introduced four countries
00:09:16.760 did it right away basically all the Catholic countries uh moved over right away but a lot of other countries moved
00:09:22.600 over at different times the colonies of those Catholic countries didn't all move over at the same time and it was
00:09:28.120 important that you switched on that date because if you didn't you needed to add a different number of days depending on
00:09:33.399 when you actually did switch or otherwise your alignment would be wrong next problem your local time isn't
00:09:40.839 the same as my local time depending on where I am in the world so San Francisco California if it's 2:30 here it's going
00:09:47.320 to be dark in San Francisco in the Philippines that's an actual city name in the Philippines you'd be surprised
00:09:53.079 how many cities are named San Francisco so we can solve that problem
00:09:58.360 what if we all adopt a local time that's offset with reference to some Global
00:10:04.120 time and that's what UTC is so we can all establish a uh a global time we'll
00:10:09.640 just tell each other what our offsets are relative to that time so then we can all have the same daylight hours over uh
00:10:17.440 so the same hour on the clock corresponds to the same amount of daylight that we'll each get and that's what Charles dowed a uh
00:10:25.600 us Seminary teacher who proposed time zones to a bunch of Railway operators did so they like the idea of time zone
00:10:32.240 so much that it was legally adopted in the US in 1883 so when you got off a
00:10:37.440 train or a platform you would set your watch in a different time zone by whatever the clock was on the platform
00:10:44.279 so we all right great so now we've got these time offsets that will minimize the amount of difference between our
00:10:50.760 solar days but different problem the time offset that we have at one point in
00:10:56.079 the year isn't always the same as the time offset in a different part of the year so for example in the US we have
00:11:02.360 daylight savings time so in the summer we're at UTC minus 8 but in the winter we're at UTC minus 9 in Pacific time and
00:11:09.680 in San Francisco and the Philippines they have UTC Plus 8 year round there is no daylight savings time okay fine we
00:11:16.760 can fix that we'll just tell each other what the offsets are at at different times in the year and that will that
00:11:22.480 will solve that but another problem the offsets weren't the same all the time
00:11:28.000 for all of historical values of time so if I want to go back and tell you what time it was or what what time an event
00:11:33.839 occurred in 1970 I would need to go look up what the time offsets were in 1970 in
00:11:39.720 that location and even in the US maybe you think that daylight savings time is pretty easy but there have been a lot of
00:11:46.079 changes to Daylight Savings time when it started when it ended and so on so San Francisco and the Philippines didn't
00:11:52.480 observe the yearound stuff they had Daylight Savings Time briefly to avoid an oil shortage uh they wanted to
00:11:59.240 conserve energy so they adopted daylight savings time for about 12 years and then they stopped doing it another problem
00:12:05.519 some of the local times you want to talk about don't actually exist at all in my time zone so here's 2:30 a.m. on the day
00:12:12.600 on the spring forward day of this year now this time doesn't exist there's no
00:12:17.760 UTC time that maps to 2:30 a.m. so if you try to resolve that because you go forward an hour right as soon as the
00:12:24.160 clock ticks 2: a.m. on that Sunday you jump ahead to 3:00 a.m. so there are no
00:12:29.320 moments in time for which that's a valid uh UTC time and sometimes the conversions are ambiguous so if we look
00:12:37.480 at 1:30 a.m. on the Sunday in the fall when we go back an hour there are two
00:12:43.199 possible moments right because you arrive at 2 you you arrive at 10:1 a.m.
00:12:48.560 the first time through and then it's 1:59 a.m. and then it's 2: a.m. and you roll your clock back and you have a
00:12:53.880 second 101 a.m. so there are two possible UTC values for that same local
00:12:59.519 time when there's more than one local time you have to be able to resolve that ambiguity somehow okay so maybe we can
00:13:06.880 fix this by telling each other all of the possible offsets when they happened
00:13:13.360 when they changed historically and then we can store that somewhere and share it with each other then we won't have this
00:13:18.880 problem anymore so that's what TZ data is this is a time zone database that's
00:13:24.519 uh managed by I or the internet assigned numbers Authority and it's their job to
00:13:29.880 keep track of this database so we create one time zone for every distinct list of
00:13:35.079 historical offsets we have to remember so here's the Pacific time time zone uh
00:13:40.880 set of rules so you can see here's all the times that pacific time change uh that first line under the Zone header
00:13:47.680 near the bottom that's when Pacific time was actually established so you can see that it started in 1883 with Charles
00:13:54.199 Dow and sometimes time zones are way more complicated than you would think they are because remember you need a new
00:13:59.959 time zone every time you establish a new set of rules so Indiana has lots of
00:14:05.759 different rules within its cities anybody from Indiana just out of curiosity is you guys probably know that
00:14:10.880 it's difficult when you go between two cities in Indiana you might change time zones two or three times just driving
00:14:17.240 through different areas and so each one of those is a separate time zone just like Pacific Time or Central time or
00:14:23.720 whatever and that's distinct from all the others because that specific City or local has decided that they want to
00:14:29.639 start the light saving time earlier or later or not observe it at all or whatever so times and Ruby uh really
00:14:37.440 center around two big classes time and date there's also date time which is uh
00:14:43.040 sort of funny if you go look in the documentation the header the uh first sentence of the first and only sentence
00:14:49.040 of the documentation in the Ruby cord docs for datetime is the single word date time so you not very helpful but uh
00:14:57.720 time and date are the two ones that you probably should be most concerned about if we're not assuming something like an active support then these are the two uh
00:15:05.040 plain old Ruby objects so you use time to make time offset aware object so when
00:15:10.600 you make a new time you're encoding the offset uh and it's by default it's your
00:15:15.800 local time offset into that time value dates however don't know about a time
00:15:21.240 offset and they don't know about what time zone you are and neither dates nor times know about what time zone you're
00:15:27.399 in so that's what uh Ruby the TZ info class does for you this is this uses a
00:15:33.639 TZ data library and what this will do is get you a time zone object which you can
00:15:39.279 then use to make conversions between different uh time or date instances
00:15:44.959 here's a problem though Ruby unfortunately does not have a native time zone aware concept of duration or
00:15:52.360 period so that means we can't do some things easily uh so for example here's
00:15:57.440 here's that daylight savings time change over on November 11 November 1st November 2nd and November 3rd if we try
00:16:04.600 to measure what the difference between those two dates is we get a different answer for the distance between November
00:16:11.120 1st and November 2nd than we do between November 2nd and November 3rd and that's because you can probably see the offset
00:16:18.279 changes from 04 hours to 05 hours there was an extra hour in the one to uh
00:16:26.560 November 1st November 2 Chain so that distance changes so there is active support uh
00:16:33.680 has a advanced method that it puts onto time and this will let you do the period
00:16:39.279 stuff sort of um the problem is that so if it's can to resolve the ambiguous Time stuff we saw before but the problem
00:16:46.199 is that it may not always work the way you expect uh so for example if I want to advance the date January 30th by two
00:16:55.319 by two months I might expect that the correct value would be March 30th but if I do that two times in a row with one
00:17:02.440 month each I get a different answer because the first month advances to February 28th and the second month
00:17:08.839 advances a month from February 28th to March 28th which is a different value than March 30th so it's not an
00:17:14.160 associative operation so in conclusion with times they're really really hard to get right we want to make sure that we
00:17:19.799 always store and work in UTC should probably let your library handle anything don't invent anything from
00:17:25.880 scratch and be aware of the limitations of whatever particular time Library you like to use uh and just a special plea if
00:17:33.760 anybody's seen Joda time in Java that's really awesome and you should try to report that to Ruby because that would
00:17:39.679 make a lot of these time problems go away so next problem floats floating points why are these hard uh I think if
00:17:46.360 you look even cursorily on stack Overflow or just Google on the internet for problems that people are having with
00:17:52.520 floating points um they'll insist that something is wrong with their computer or that uh something is broken about the
00:17:59.280 language or a library they're using here's one guy complaining that JavaScript is or JavaScript may be
00:18:05.840 broken for other reasons but uh but it's it's doing the right thing at least for floating points here so is it actually
00:18:12.840 broken and the answer I think is no it's just not a very obvious mental model so we made a lot of assumptions about how
00:18:18.640 we thought people would use floats with these uh standards that were developed to use floating points and we've round
00:18:24.120 up with answers that aren't great so we can see that 1 + 2 = 3 but 0.1 +2 we get
00:18:31.559 a result that looks like it's 0.3 but when we look a little bit deeper we notice that we don't wind up with
00:18:37.440 exactly 0.3 there's some extra digits at the end so it's not exactly representing
00:18:42.480 this value that we thought it should be exactly representing what's even weirder is that for some values we do get an
00:18:48.880 exact representation so if I add 0.25 plus 0.25 it is truly representing 0.5
00:18:55.799 exactly so why does that happen how can it be that these will be these
00:19:00.840 will be so inconsistent or apparently inconsistent and I'm not going to have time to do it too much today but if you
00:19:06.320 want to go play around with the internals of floating points I just put up a quick little Library last night
00:19:11.559 that lets you open up the uh internals of Ruby and look at the bit strings that are corresponding to floating Point
00:19:17.799 numbers that you might generate so the first thing to understand here is something called the pigeon hole principles anyone heard of the pigeon
00:19:24.039 hole principle okay great so if you're a computer scientist maybe you've heard this term and a pigeon principle is
00:19:29.360 pretty simple all it says is that if you have n objects and you have M places to
00:19:35.400 put them then if you have more objects than places to put them you're going to have at least one place to put them with
00:19:41.840 more than one object right so here's an illustration of this if I have three pigeons and six slots to put the pigeons
00:19:47.760 in I can fit uh I can fit a pigeon into each slot without uh without a problem
00:19:53.440 but if I have more pigeons than slots some of the pigeons are going to have to share a slot right so these two
00:19:59.240 are going to have to share a slot so if I imagine the integers in a similar way
00:20:04.400 we can say okay the integers are basically like the pigeons there are a set of different uh uh objects that we
00:20:10.159 might want to store somewhere and let's say that we allocate one bite worth of space to store integers and that would
00:20:17.480 give us eight bits of space to play around with to store those integers somewhere and of course since a bit is
00:20:23.240 short for the word binary digit we've got one space to put all that stuff so
00:20:29.080 we can put a zero or a one in each of those slots and we have two choices there so a zero or one we have eight
00:20:36.200 choices to make so that's 2 to the8 possible values or 256 possible values
00:20:41.400 and these values can map to anything right we could say that they are 0 to 255 or 127 to 127 or we could pick some
00:20:50.000 arbitrary numerical range it doesn't matter they don't even have to represent numbers right we could assign the values to colors or cat names or musical notes
00:20:56.840 or utf8 characters and if we give ourselves more space we can add more possible values so we have
00:21:03.159 a 32-bit uh uh we have a 32 bit place to store things we can store about 4
00:21:08.720 billion possible values so if we assign the integers to each slot we can do it
00:21:14.720 and one natural way to do it is just to map each consecutive integer starting from zero to a slot but it can be
00:21:20.840 arbitrary like we said before we could decide that the first slot corresponds to 183 and then just randomly fill in
00:21:26.919 the other 255 Val if we wanted to but it's easier if we stick with sequential values so we'll use that so floating
00:21:34.240 points we saw what integers look like but what if I told you that there are numbers other than integers right so
00:21:41.520 here are here's an example of this um there phys physicists often care about very large and very small
00:21:48.360 numbers and uh those large and small numbers can't exactly be represented by
00:21:53.559 integers so we have to come there's one notation that scientists like to use is called scientific notation and this
00:22:00.000 breaks up each of those values into three components there's a sign that's either positive or negative there's a
00:22:06.799 fraction which is a number between zero and the base of the representation so in
00:22:12.720 this case we're representing things in base 10 so this is a number somewhere between zero and base 10 and then the
00:22:18.159 exponent which is the power that's on the power of 10 that you're raising it to so we can say that the sign or which
00:22:24.159 we'll call S here is a number that's either zero or one representing positive or negative values uh one means negative
00:22:30.799 Z means positive F which will be a value that's somewhere between zero and the base that we're working in minus one and
00:22:38.120 then an exponent which will be somewhere in between a range of possible exponents that we can store so all floating Point
00:22:44.400 values can be represented by a a topple of s e and f those three values
00:22:50.000 determine the choices you're making in terms of how you want to store those bits so i e 754 is a standard that
00:22:57.799 describes how that going to work and the one in particular that we care about is called binary
00:23:02.960 64 this is the one that Ruby uses so every floating point value has is allocated 64 bits and they're split up
00:23:10.600 in this way uh the i e standard calls the what we call the fraction they call that the mantisa and the reason it's not
00:23:17.840 the same as the fraction is the mantisa is actually 1 Point the value there's an
00:23:23.440 implicit one at at the front and remember since this is a binary Choice uh since we in base 2 all the values
00:23:30.360 would be somewhere between 0 and 1 so that's what the fraction is and the exponent for reasons we won't get into
00:23:36.679 the value it's actually stored is one23 minus the exponent you actually care about so if it's the exponent is say two
00:23:43.840 you will store the value 1,200 or
00:23:49.000 1,21 so uh so that that brings up another problem which is we have a fixed
00:23:56.559 amount of space to store this stuff but how many values are there in between let's say -2.7 and3 well there are an
00:24:05.880 infinite number of possible numbers that we could store there and what happens if we try to store an infinite number of
00:24:12.559 pigeons in a finite number of slots it's going to get really crowded so some of
00:24:18.720 the values that we'd like to represent cannot be exactly represented that's the Crux of the floating Point problem we
00:24:25.559 would like to represent all possible values with floating points but what winds up happening is only a subset of
00:24:31.559 those values can be represented exactly and in fact since there's an infinite number of possible values we could
00:24:37.559 represent but only a finite number of values that we can represent that means
00:24:42.760 that virtually all values cannot be represented exactly there are it's the exception to the rule that a value can
00:24:49.440 be represented exactly with floating points right you have an infinite number of pigeons uh only only a very small
00:24:55.840 number of them will be able to fit into these slots without crowding so that's the Crux of the uh
00:25:02.120 floating Point problem we don't have time to do a demo but I encourage you to go check out the uh the library there
00:25:08.200 and I'll I'll I'll be happy to answer questions about that after the fact so the the problem here is that we have
00:25:15.799 this infinite number infinite set that we'd like to represent in a finite number of values and that's the Crux of
00:25:22.880 the floating Point issue so if you are tempted to write your own floating Point library or something that translate
00:25:28.840 stuff into Strings and then back into decimals please don't do that um please avoid doing that use big decimal if
00:25:35.480 you'd like to represent values exactly um Ruby also has a rational type that's very nice so if you're representing
00:25:41.960 fractions you can use uh 8/3 and then an R suffix will give you a rational value
00:25:49.279 and never use floats for a precise calculation so if you care about the exact value you shouldn't be using a
00:25:54.440 floating point so finally names why are names art because humans made them that
00:26:00.360 way everyone should be reading this article after the talk it's by a guy named Patrick McKenzie great article
00:26:06.039 called falsehoods programmers believe about names and I'll put the slides up afterwards so the fundamental problem is
00:26:11.279 that names are a possibly empty set of strings that is someone may not have a name that map to something a person a
00:26:18.799 place a location and there are uh so there may be many names for the same
00:26:24.559 person place or thing but we don't model things that way we just assume names fit into this very strict rule about or very
00:26:32.399 strict structure about how we think they should work so for example a lot of systems assume that someone in the west
00:26:38.480 has a first middle first and last name then an optional middle name then like maybe a suffix and a title that's true
00:26:46.399 for very few people uh in the world and many systems treat names
00:26:52.279 differently depending on which part of the name they're talking about they may require that your first name be no
00:26:57.600 longer even 20 characters but your last name be no longer than 35 characters and the total uh string be no longer than 40
00:27:04.559 characters or things like that so this always leads to inevitable disaster and
00:27:10.679 that's formulated most precisely in something called the skun Thor problem so skun Thor the skun Thor problem is
00:27:17.000 named after a town in England and this town contains an offensive word in the
00:27:22.919 characters 2 through 5 but that town also contains about 100,000 people who live there all of
00:27:30.320 whom at some point or another get caught by overzealous filters this woman named Linda Callahan
00:27:37.799 uh couldn't sign up for a Yahoo email account because Callahan contains the string Allah which at the time Yahoo was
00:27:45.559 Banning this is after September 11th and Yahoo is banning all accounts that uh
00:27:50.600 contain that name why did they do that because they thought it was a good idea even though it was clearly a terrible idea you couldn't register this domain
00:27:57.600 name for the first 10 years of the internet's life because I can the canonical registar of the time forbade
00:28:05.039 people from registering any of the seven dirty words that uh George Carlin came up with and this contains an offensive
00:28:11.399 or offensive to some people in the first four characters so you couldn't register it so internic I'm sorry not Ian but
00:28:18.919 internic prohibited domains that had profane terms in them so you can probably guess based on the previous ones what's wrong with this name so he
00:28:25.960 got this guy couldn't register Coburn is how you pronounce that at
00:28:31.360 hotmail.com and see if you can guess what happened here his email type his not just his uh once he was able to
00:28:37.600 register it he also had trouble sending email because it kept getting caught by spam filters and his title was software
00:28:43.399 specialist can you guess why that's a problem well it's because it contains the ring Calis which is highly
00:28:49.360 associated with Spam so Google+ banned banned people
00:28:55.840 real people for having names that looked fake like me my name's John feminella I think that's a pretty weird name uh
00:29:02.039 maybe it sounds fake to some people I don't know but there are people with way weirder names in me on on Google Plus so
00:29:10.120 I feel like Dr if anyone hears if anyone hears from Google I would like you to answer for Dr Loki Sky
00:29:17.559 lizard or how about this University of Alabama football player HaHa Clinton Dicks real football player actually has
00:29:25.240 stats has played in games and names are awesome we you know we should be proud of these names these are really sweet
00:29:31.840 names and I wish I had names that were as cool as some of these people but the problem is that we make a lot
00:29:38.360 of we make a lot of bad assumptions about how names work and some
00:29:44.279 assumptions that uh we make that aren't good are that names don't change so imagine if someone gets married and changes their last name terrible
00:29:51.039 terrible uh assumption to make another assumption we might make is that legal names don't change without going through a court but think about the case where
00:29:58.039 where someone enters witness protection or that people have a single canonical name how many people have
00:30:03.919 nicknames right or that people have a single canonical name but for legal or financial purposes so forget about the
00:30:10.279 nickname thing but what about if you have credit reports that aren't in sync with each other what if you got married and Chang your last name now you'll have
00:30:16.120 two different reports that match the same thing another problem is that some people think names are
00:30:21.679 unique not true 820,000 people have this Chinese name 46,000 people are named
00:30:27.200 John Smith 120 people are named John fella another problem is that some
00:30:32.720 people assume names will have capital letters in them not true B hooks a prominent feminist E Cummings a
00:30:38.799 prominent poet or that names will contain numbers you know surely nobody's name would
00:30:45.120 contain a numerical value this is an actual New Zealand child's name number 16 bus
00:30:51.080 shelter or that names won't contain alpha numeric characters right how about
00:30:56.760 Jay-Z right there's a hyphen in uh there's hyphen in both his stage name and his actual legal name okay fine but
00:31:04.399 names are always Unicode character surely no one could have a name that wasn't a Unicode character how about
00:31:12.760 Prince and finally what about the fact what about the assumption that people have names right like probably everyone
00:31:18.919 in this room has a name but is is that required or people actually required to have names the answer is no you're not
00:31:25.399 legally required to have a name you're that might make your life really hard but there's no actual requisite uh value
00:31:32.240 that someone have a name so the conclusion I think that we should learn from this is that you shouldn't filter inputs with the name of anything that's
00:31:38.200 a real world person place or thing and you should probably just have a single unrestricted field for whatever the name
00:31:44.240 value is so thanks very much for having me and I appreciate all of your uh
00:31:49.519 awesome comments on Twitter from earlier today if you have any questions I'd love to take them afterwards thanks very much
00:31:59.639 me I'm going to duck in it on
Explore all talks recorded at GoGaRuCo 2014
+18