All posts by jmooney

A Slight Rant about the College Football Playoff

I’ve gotten into it from time to time with other college football fans about the nature of championships in Division I-A (ahem, Bowl Championship Subdivision). The basic controversy is this: since teams are assigned to the four playoff berths by a committee, how should they evaluate the teams? Is it the best teams? Or the most deserving teams? To put it more concretely, should it be the teams who have the best chance of beating any other team at the end of the season? Or should it be the team that has put together the best resume over the course of the season?

Of course, the two characteristics are not mutually exclusive; Alabama this year is probably the best team, and has (so far) put together an undefeated record. Accordingly, they are ranked #1. However, they are about to play a team (Auburn) who, lately, have shown they are a threat to beat anybody at this point in the season, but earlier this year lost two games. Even after this game is played and there is a winner and a loser, who is to say the result didn’t hinge largely on luck? The committee may still find a way to put Alabama in the playoff even if they lose this game. And there are many other cases like this to consider.

Meanwhile, University of Central Florida is undefeated this season so far. If they continue, they will be their conference champions. However, for a combination of reasons they will never be considered for the playoff because

  1. They have no football pedigree or brand. Unlike Alabama or Ohio State, or even Pitt, they have no history of football success and have only recently begun playing at this level.
  2. Their opponents have been weaker so the team hasn’t yet been tested to the limits of their ability; however, a school of rising strength will consistently struggle to schedule stronger opponents because the stronger opponents don’t have any incentive to play them.
  3. Their opponents similarly have no pedigree, and thus Central Florida gets less credit for beating University of South Florida than they would for beating Tennessee, though Tennessee is undoubtedly weaker.
  4. When it gets right down to it, the College Football Playoff is about money, and Central Florida will never draw as much money as the blue-chip programs at Alabama, Texas, Notre Dame, and so on.

This is a fundamentally flawed system, because it relies on a “beauty contest”–a systematic beauty contest, but a beauty contest nonetheless. In other words, a team has to rely on the opinions of a handful of people to even get into the game–people who carry into the room their own preconceived notions and biases. This is anathema to American sports–a team can do everything right and still get left out.

I propose a simple system for an eight-team playoff that a) ensures that non-legacy teams (e.g. Utah, Central Florida, Boise State) have an avenue to compete for the championship, b) still allows for an advantage based on strength of schedule, and c) makes conference championships matter. Here is how it goes:

  • There are eight playoff teams.
  • Five of the teams will be the champions of the SEC, ACC, Big 10, Big 12, and Pac-12, however the individual conferences decide to determine their champions.
  • One of the teams will be the highest ranked undefeated conference champion from the group of all other conference (American, Mountain West, etc.) If no teams meet this criteria, this will be an at-large team (see next bullet).
  • The other two teams will be selected by the committee at-large based on the committee’s assessment of the teams’ strength/deservedness.
  • Finally, the seeding will be selected by the committee based on the committee’s assessment of team strength as well as considerations of matchups and bowl venues.

This year, what that might look like, assuming conference championships are won by the current frontrunners, would be (1) Alabama vs. (8) Central Florida, (2) Clemson vs. (7) Stanford, (3) Oklahoma vs. (6) Wisconsin, (4) Georgia vs. (5) Ohio State.

In this scenario, Alabama gets an “easy” first round game as reward for a great regular season and conference championship; UCF and Wisconsin get a shot at the title despite the apparent weakness in their schedules; highly talented teams that may have just had a hiccup get a second chance; and most importantly we get 6 or 7 high-stakes marquee matchups that college football fans would drool over to end the season.

Riddler, 9 November 2017

For once, it seems like the Riddler Express is actually more challenging than the main Riddler puzzle:

The regular hexagon below has an area of 1. What is the area of the shaded region?

If you draw lines from every vertex of the hexagon to every other, you can see that these lines form another, smaller regular hexagon which is composed of 6 of the shaded regions from above. This smaller hexagon has a minor diameter equal to the length of one of the larger hexagon’s sides.

The area of a regular hexagon with respect to one of its sides is

A = \frac{3 \sqrt{3}}{2} s^2

For a unit area hexagon, the side length is thus

s = \sqrt{\frac{2}{3 \sqrt{3}}}

And the minor radius is half that amount. The area of a hexagon with respect to its minor radius is

A_{small} = 2 \sqrt{3} r^2 = 2 \sqrt{3} \left(\frac{1}{4} \frac{2}{3 \sqrt{3}}\right) = \frac{1}{3}

And the area of the shaded area is one sixth of that, or 1/18.

On to the big Riddler:

The largest circle below has a radius of 10, the medium circle has a radius of 5 and the small, orange circle has a radius of 2. The orange circle crawls counterclockwise along the edge of the largest circle until it meets the medium circle, at which point it crawls up along the edge of the medium circle until it reaches the crest. What is the area of the shaded orange region in the right image?

This is pretty simple, if you notice that the shaded area can be split into two regions: the large radius and the small radius. Each region’s area is equal to the difference of two circles. Finally, it’s unclear whether the total area includes the two orange circles, or not, or if the area includes half of each circle.

A = \frac{\pi (12^2 - 8^2)}{2} + \frac{\pi (7^2 - 3^2)}{2} = 60 \pi

If we include the area of the small circles, we add 4 \pi, and if we don’t, we subtract it.

Why the American Health Care System is a Mess: Perspective from an Ordinary Schlub

When I was an undergraduate, USMA required an introductory course in microeconomics. I can’t say I remember all that much; yadda yadda supply and demand curves, marginal cost vs. marginal revenue, yadda yadda. But one thing I do remember this idea of an ideal market, where it was proven that resources are allocated optimally; I also remember that no real markets are ideal, but the general consensus is that small deviations from ideal result in small inefficiencies, and large deviations result in large inefficiencies.

This whole set of ideas rushed back to mind when I saw a video from Vox about how much having a baby costs (side note: we just had a baby, too!), and more importantly, how hard it is to find out how much it costs before you get the final bill. It also reminded me of health care coverage for me and my family while I was a graduate student; we had only one choice for insurance with a premium that cost more per year than our rent, with a significant deductible, and an insurance company that could not be reached by phone even after they mistakenly drained my bank account the day before our rent was due. Yet, even with all those problems, any other path would have been even more exorbitant and out of reach for a graduate student.

The Vox video is embedded at the bottom of this post, but I want to go a little bit beyond. I don’t know what the exact right policy changes are, the below named problems of the health insurance market that can be seen by a consumer in this market need to be fixed before we start getting rid of the large market inefficiencies. To that end, I’ll suggest a few general policy ideas.

Because I only focused enough in Econ 2001 to do well in class but not enough to remember everything, I leaned on Wikipedia to remind me all the pieces of a perfect market. Briefly, I’ll the parts of the health care/insurance market that SEEM aligned with a perfect market before pointing out the non-ideal aspects.

How the health care market is ideal:

  • A large number of buyers and sellers.
  • Every participant is a price taker (no participant has market power to set prices). You might argue this one, but I would claim that even the biggest of Big Pharma or biggest insurance company doesn’t have anything near monopoly power.
  • Perfect factor mobility, i.e. Doctors and medical equipment/facilities can move to where the demand for care is.
  • Non-increasing returns to scale and no network effects. In the health care context, there’s a limit to the number of patients a single doctor or other provider can see.
  • Homogeneous products. This is iffy, I guess, but fundamentally doctors provide care that is far more similar to each other than, say, cars or furniture.
  • Well defined property rights that determine what may be sold, as well as what rights are conferred on the buyer.

And how it is not:

  • Perfect information: All consumers and producers know all prices of products and utilities each person would get from owning each product. This is first and most obvious on the list, and the one that prompted the Vox video. It’s not just having a baby; I challenge you to find out how much an appendectomy, or even a simple exam or prescription costs without actually having to buy it. You can’t! And how can a consumer shop for the right price if you can’t find it out ahead of time. What’s worse is that even if the hospital could tell you the price, they’re probably not able to tell you the price the insurance company will actually pay for it, nor your responsibility in that bill.
  • No barriers to entry or exit. The American health care system has huge barriers to entry and exit: doctors and other providers pay huge sums for education and licensure. Health care consumers, since they typically rely on employer-based insurance coverage, face the obstacle of having to have a good enough job to get decent coverage. A critical and currently insurmountable barrier is that insurers are not legally allowed to sell insurance outside the borders of the state they are registered in. While there are national health corporations (Blue Cross/Blue Shield and Kaiser Permanente being two examples), if you read the fine print, there are individual corporations for each state, and you can only buy the coverage for the state where you live.
  • Zero transaction costs. Any provider, whether they see 1 patient a year or 10,000, pays malpractice insurance; the multi-million dollar claims on this insurance make the premiums borderline exorbitant.
  • Profit maximization of sellers. Many (maybe most?) health providers are non-profits to begin with; even if you asked doctors in for-profit hospitals whether they were trying to maximize profit, I’m sure they would blanch.
  • Rational buyers: Buyers make all trades that increase their economic utility and make no trades that do not increase their utility. This element could go in either category; patients by and large can make good decisions about the health care they consume if presented with the right incentives, but I also know how irrational people can be when they are presented with certain circumstances.
  • No externalities—Costs or benefits of an activity do not affect third parties. This criteria also excludes any government intervention. Obviously based on most of the previous points I’ve mentioned, there are externalities in this market.

These deficiencies naturally point to a handful of policies that would actually help:

  • Require providers to make an itemized cost list available to patients BEFORE they select a provider or request care; and insurers must make it easy to see any deductible or cost share for this list. This would allow people to shop for care and make informed decisions. Require providers to charge those who pay cash for medical services be charged at the lowest rate negotiated by insurance companies, so that the price charged truly reflects the price of service.
  • Create “Yelp for doctors” that makes it easy for patients to register feedback and other patients to see that feedback, as well as some kind of quality rating of various doctors. Granted, measuring doctor quality is an inexact science at best, but there IS useful information to be had.
  • Allow patients to own their medical data and easily port it from provider to provider, as well as read and reference it themselves. This allows people to, if they choose, make themselves better informed to make the right decisions.
  • Everybody pays something for care, maybe means-adjusted. Even a nominal fee of a few dollars would provide incentive for people to consider whether it’s worthwhile to consume healthcare.
  • Give patients the option to cap potential malpractice claims in exchange for reduced cost.
  • Allow national, or at least regional competition of healthcare providers.
  • Tax healthcare benefits to untie health insurance. While I understand that health insurance is not required to be employer based, it almost always is so, because of the tax benefits, which makes the non-employer-based market tiny. New taxes are never popular, so this could be possibly offset by relief in marginal rates or a higher standard deduction, for example.

Finally, and I realize these kind of go against my free-market tendencies, but

  • emergency medical services should be fully government-funded, if not government-run. I say this because services to save life, limb, or eyesight are time-critical, cannot be “shopped around” and a benefit afforded to all people through a basic moral obligation. This is a life-saving service, same as the police or firefighters, and the immense cost of an ambulance and ER stay needs to be out of the equation when somebody needs to call 911; to keep a cap on people abusing the free service, people who frivolously call an ambulance could be subject to the justice system, same as those who call in fake crimes or fires. Perhaps this government service could be a uniformed service, with those who serve accorded the honors given to policemen and firefighters and soldiers.
  • basic pre-natal care, childbirth, and pediatric care to say, age 12, should be free. I don’t think that there is a right to health care for kids, but a) we should minimize the damage done to a person’s whole life by their parents’ inability to provide for them, and b) the investment in childhood wellness is a productivity boost and cost savings as those people are healthier as they age.

Yet Another Matlab Vs. Python Post — But This Time, for Aerospace GNC Engineers

There exists a lively debate on the internet about the goodness of Matlab as compared to Python.

That battle has not passed by my workplace. I work at X, formerly Google[x], as an aerospace guidance, navigation, and control (GNC) engineers on Project Wing. On one hand, we have a stable of talented GNC engineers who have worked on big aerospace projects for Boeing, Rockwell Collins, Blue Origin, and NAVAIR. On the other hand, we have perhaps THE best software engineers in the world, who have worked on (for example) Google Search, Youtube, Android, and Google Maps.

We aerospace engineers learned and used Matlab from our university days, continuing through our professional careers. Airbus and Boeing use Matlab to design their autopilots and use the Matlab Coder to produce C code, which flies their commercial aircraft every day. Companies like Raytheon trust Matlab to generate similar code for their missile control systems. The Matlab Aerospace Toolbox, Control Design Toolbox, and Simulink are invaluable tools that speed up the controls design and analysis process.

The Google software engineers were kind of appalled at our heavy use of Matlab. In their eyes, it’s not a “real” programming language. “The indexing is wrong.” “You can’t scale it.” “You have to buy somebody else’s software just to run yours.” Most frustrating of all is that Google’s amazing array of software development, testing, and review tools are all but incompatible with Matlab, making the codebase that much more difficult to maintain.

First, a defense of our Matlab use: If you hired a carpenter to work on your house, and he brought a funky Japanese hammer, how would you react? Would you chide him that his hammer doesn’t look or work like you’re used to? Or would you assume that as a craftsman, he’s brought the tool that can serve him best for the job you’ve hired him to do? The same with Matlab: most of the control design and analysis tasks that we’ve been hired to do, we know how to do best with Matlab + toolboxes. Yes, we COULD learn Python and re-create the calculators, etc. that power our analyses, but we’re trying to get a product out the door, and the latter choice just doesn’t seem like a wise use of our limited time.

Second, a defense of our software engineers: Matlab isn’t a great programming language; it HAS grown to be a racket, and people in my line of work probably need to move on to a better toolset.

With that in mind, I wanted to outline, after having used both a good bit, some factors why and why not a student or an organization in the aerospace GNC world would switch from Matlab to Python.

Cons of switching to Python from Matlab:

  • Plotting functionality is much less convenient:
  • >> x = 0:0.01:pi;
    >> plot(x, sin(x))
    

    Becomes

    In[0]: import numpy as np
    In[1]: import matplotlib.pyplot as plt
    In[2]: x = np.arange(0, np.pi, 0.01)
    In[3]: plt.plot(x, np.sin(x))
    In[4]: plt.show()
    
  • Additionally, the interactive-ness of Matplotlib plots is far less; there’s no built-in datacursor, and you can’t select plot elements with the mouse.
  • Matrix manipulation is less convenient.
  • If you’re already got a code base in Matlab that you depend upon, a switch could be quite painful.
  • There is nothing even close to a replacement for Simulink.
  • The documentation for Python modules is not uniform (though the documentation for the biggest modules and the native pieces is very good).
  • Debugging is harder and takes a change of mentality to test-driven development.

Pros of switching to Python from Matlab:

  • Python is free and open-source, which means
    • The direct monetary cost of switching is (near) zero.
    • The number of libraries, modules, toolboxes, etc. is astoundingly high, also mostly free and open-source. This includes functionality found in many of the most popular Matlab toolboxes such as the controls toolbox and the image processing toolbox.
    • The ability of aerospace engineers who use Python to integrate their work with other teams, especially ones with software engineers, is drastically improved, especially those working outside Big Aerospace.
    • The quality of Python and its toolboxes aren’t dependent upon a single organization (Mathworks) and instead can draw from a very wide community.
  • Python has features of a “more serious” programming language that make it much more appropriate to scale up for production purposes–for example, namespaces and mature unit testing support.
  • Python is faster to load and typically runs programs faster than Matlab because it only loads modules when you need them.
  • Wide variety of development environments: command line, PyCharm, Eclipse, Spyder, IPython Notebook/Jupyter, just to name a few.

Recommendations:

  • University Aerospace Engineering departments adopt Python as their language of choice for Computer Science 101 courses, and as the language that supports all their undergraduate and graduate coursework.
    • Cost: while the student version of Matlab is cheap, university licenses are not. Python is free, full stop.
    • Transition ability: It’s much easier to learn to program in Python and switch to Matlab than the other way around, and with the rise of the drone economy, many many more aerospace jobs will switch to startup-like companies who are much more likely to use Python as their lingua franca.
    • The open source model much more closely mirrors the ideals of scholarship, academic freedom, and information sharing.
  • Everyone use Matlab for what it’s really good at: things like quick & dirty prototyping, concept development, etc. Avoid trying scale or production-ize Matlab code, especially if it’s not explicitly supported by the organization’s version control and testing systems.

Riddler, 14 July 2017

From FiveThirtyEight:

Congratulations! The Acme Axegrinders, which you own, are the regular season champions of the National Squishyball League (NSL). Your team will now play a championship series against the Boondocks Barbarians, which had the second-best regular season record. You feel good about Acme’s chances in the series because Acme won exactly 60 percent of the hundreds of games it played against Boondocks this season. (The NSL has an incredibly long regular season.) The NSL has two special rules for the playoffs:

The owner of the top-seeded team (i.e., you) gets to select the length of the championship series in advance of the first game, so you could decide to play a single game, a best two out of three series, a three out of five series, etc., all the way up to a 50 out of 99 series.
The owner of the winning team gets $1 million minus $10,000 for each of the victories required to win the series, regardless of how many games the series lasts in total. Thus, if the top-seeded team’s owner selects a single-game championship, the winning owner will collect $990,000. If he or she selects a 4 out of 7 series, the winning team’s owner will collect $960,000. The owner of the losing team gets nothing.
Since Acme has a 60 percent chance of winning any individual game against Boondocks, Rule 1 encourages you to opt for a very long series to improve Acme’s chances of winning the series. But Rule 2 means that a long series will mean less winnings for you if Acme does take the series.

How long a series should you select in order to maximize your expected winnings? And how much money do you expect to win?

This is a probability problem, and a good bit easier than I anticipated. Basically, the expected winnings is (one_million_dollars – price_per_win * num_wins_required) * probability_of_getting_required_wins. Calculating the probability of getting the required wins is simply

P(\text{winning n-game series}) = \sum_{i = \frac{n + 1}{2}}^{n} \begin{pmatrix} n \\ n - i \end{pmatrix} p^i (1 - p)^{n - i}

Where \begin{pmatrix} n \\ n - i \end{pmatrix} is defined by Pascal’s triangle.

I wrote this in Python to compute the expected winnings. The answer is: a 23-game series nets expected winnings of $736,222.

#!/usr/bin/ipython

import numpy as np
import matplotlib.pyplot as plt

def probwin(n, p):
    pascals_triangle = np.zeros((n + 1, n + 1))
    for i in range(n + 1):
      pascals_triangle[i, 0] = 1
      pascals_triangle[i, i] = 1
      for j in range(1, i):
        pascals_triangle[i, j] = (
          pascals_triangle[i - 1,j-1] + pascals_triangle[i-1,j])
    P = 0.
    minwin = (n + 1)/2
    maxwin = n
    for i in range(minwin, maxwin + 1):
      P += pascals_triangle[n, n - i] *(p**i * (1 - p)**(n - i))
        
    return P

def expected_income(n, p):
    return (1 - 0.01*(n + 1)/2)*probwin(n, p)

if __name__ == "__main__":
    
    num_games = range(1, 100, 2)
    
    J = np.zeros(50)
    
    prob_individual_win = 0.6

    for i in num_games:
       j = (i + 1)/2
       J[j - 1] = expected_income(i, prob_individual_win)

    print np.argmax(J) * 2 - 1
    print J[np.argmax(J)]
    
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.plot(np.array(range(1,100,2)), J)
    plt.ion()
    plt.show()

And below is the curve of expected winnings using a modified version of this script for win probabilities of 0.4 thru 0.9 (I assumed that if we won less than 40% of our regular season games, we’re probably not going to make the championship!) The takeaway? If you have a 50% chance of winning any particular game, take a shot at the 1-game series–anything could happen. If you have better than 50/50 odds, you probably want to have a long series–something is better than nothing, and you’re favored to win over the long haul. But as you become more dominant, you need to play less games to be assured of a win, so you can expect to win more with shorter series (though it’s not worth risking it all on a 1-game series!).

Expected winnings chart

Riddler Express, 14 July 2017

From FiveThirtyEight:

You and your two older siblings are sharing two extra-large pizzas and decide to cut them in an unusual way. You overlap the pizzas so that the crust of one touches the center of the other (and vice versa since they are the same size). You then slice both pizzas around the area of overlap. Two of you will each get one of the crescent-shaped pieces, and the third will get both of the football-shaped cutouts. Which should you choose to get more pizza: one crescent or two footballs?

The answer is, the two footballs. I enjoyed this problem, but I don’t enjoy typing out equations in LaTeX if I don’t have to, so below is my paper scratching showing how to do it. The process is basically this: the footballs are twice the area of a circle segment/secant, and the crescent is the area of a circle minus the area of a football. The problem is scale invarient, so it doesn’t matter how big the pizzas are, and the angle of the secant is \frac{2 \pi}{3} because the two centers and one point of intersection form an equilateral triangle.



A Model for a Modern-day Shop Teacher

When I was in high school, my shop teacher was this grizzled old blue-collar guy who seemed genuinely resentful to have to be a shop teacher. Moreover, as a college-bound student who was interested in physics and math, he didn’t have much interest in me, other than letting me check off the vocational credit requirement for graduation. He did brighten up a bit for the handful of students who showed interest in becoming a plumber or mechanic after high school.

For me, however, the only thing that came out of shop was the sense that I couldn’t seem to cut wood in a straight line or nail anything together square.

Fast forward 20 years. I finished college and went on get a masters and (hopefully soon) a PhD. At the same time, I’ve found a real joy in being able to make things that, from the perspective of the high school shop, would have seemed out of my range: a sweater chest, loft beds for my kids, a toy workbench, and a toy kitchen, among other things. I found that I could learn the skills on my own through experimentation. I’ve tried not only carpentry, but also sewing and electronics; I’ve learned to use a laser engraver and now hope to try my hand at 3D printing. I only wish that my experiences in shop class had sparked this interest sooner.

I imagine that most shop classes, if they still exist, are still heavily oriented on woodworking or other traditional trades, and continue to be avoided by the continually growing population of college-bound students. While I don’t want to dis vocational training, what those classes ought to be doing is showing all kids that making your own things is both possible and a worthwhile endeavor, for many reasons: that the way you learn how to do new things is to try them, make mistakes and try again. That iPhones and TVs and furniture and cars don’t grow on trees but somebody somewhere makes them. That we don’t have sit idly by and be content with what the world hands us but that we can solve problems by taking actions with our own two hands.

I want to introduce Bob Clagett from the Youtube channel I Like To Make Stuff. I only learned about Bob’s channel about 6 months ago, but when I look at him I see the man I wish was my shop teacher. He’s not afraid to admit his mistakes and that he’s still learning. He’s inspired people across the country to get up off their couches a try to make something. Give his channel a watch, you won’t regret it.

Riddler: The Babies and the Dwarves

Two relatively easy problems in this weekend’s Riddler. First,

Your baby is learning to walk. The baby begins by holding onto a couch. Whenever she is next to the couch, there is a 25 percent chance that she will take a step forward and a 75 percent chance that she will stay clutching the couch. If the baby is one or more steps away from the couch, there’s a 25 percent chance that she will take a step forward, a 25 percent chance she’ll stay in place and a 50 percent chance she’ll take one step back toward the couch.

In the long run, what percent of the time does the baby choose to clutch the couch?

We can model this as a Markov chain. The probability of baby being n-steps away from the couch can be represented by an N-by-1 vector, p_k . The initial value of p_k is a probability of 1 in the first element (representing starting at the couch) and 0 elsewhere. The transition of probabilities from one step to another is have represented by a matrix, M , like so:

p_{k + 1} = M p_{k}

\begin{Bmatrix} p_{0, k + 1} \\ p_{1, k + 1} \\ \vdots \end{Bmatrix} =  \begin{bmatrix} 0.75 & 0.25 & 0 & 0 & \ldots \\ 0.25 & 0.5 & 0.25 & 0 & \ldots \\  \vdots & \ddots & \ddots & \ddots & \ddots \end{bmatrix}  \begin{Bmatrix} p_{0, k} \\ p_{1, k} \\ \vdots \end{Bmatrix}

In the long term, we assume that there is a steady state probability; in mathematical terms, there is some value of p_k which is invariant under the transformation of M :

p_{\inf} = M p_{\inf}

M is known, so solve for p_{\inf} :

(I - M) p_{inf} = 0

(I - M) = \begin{bmatrix} 0.25 & -0.25 & 0 & 0 & \ldots \\ -0.25 & 0.5 & -0.25 & 0 & \ldots \\  \vdots & \ddots & \ddots & \ddots & \ddots \end{bmatrix}

Any vector p_{inf} in which all the elements have equal value satisfy this equation. Since p_{\inf} represents the sum of probabilities, the total of all elements of p_{\inf} = 1 . That means that each element of p_{\inf} = \lim_{N\to\inf} 1/N = 0 . There is effectively zero probability that the baby will stay at the couch forever.

Now, the dwarves:

A giant troll captures 10 dwarves and locks them up in his cave. That night, he tells them that in the morning he will decide their fate according to the following rules:

The 10 dwarves will be lined up from shortest to tallest so each dwarf can see all the shorter dwarves in front of him, but cannot see the taller dwarves behind him.

A white or black dot will be randomly put on top of each dwarf’s head so that no dwarf can see his own dot but they can all see the tops of the heads of all the shorter dwarves.

Starting with the tallest, each dwarf will be asked the color of his dot.

If the dwarf answers incorrectly, the troll will kill the dwarf.

If the dwarf answers correctly, he will be magically, instantly transported to his home far away.

Each dwarf present can hear the previous answers, but cannot hear whether a dwarf is killed or magically freed.

The dwarves have the night to plan how best to answer. What strategy should be used so the fewest dwarves die, and what is the maximum number of dwarves that can be saved with this strategy?

Extra credit: What if there are only five dwarves?

The answer is straightforward: You can guarantee to save all but one dwarf, no matter how many there are; and that one dwarf has a 50/50 chance of making it. He just picks black if there is an odd number of black dots and white if there is an even number of black dots (which picked color refers to odd or even of which color isn’t important, as long as they all agree). The next dwarf in line knows that if the first dwarf sees an even number of black dots and he sees an odd number, dwarf number 2 knows that he’s wearing a black dot; but if dwarf 2 sees an even number then he’s wearing a white dot. Every further dwarf can deduce their own dot color using the information given by every previous dwarf.

I wish I could claim I was smart enough to come up with this on my own, but the truth is it’s a well known Google interview question.

The Riddler and the Gold Coins: An Alternative Solution

I’m a pretty big fan of the Riddler. I read the puzzles every week and usually try them when I think I’ve got a shot at getting them right.

Last week, the puzzles both were oriented around sorting between fake and real gold coins by using a limited number of weighings on a balance scale. First, the easy one:

You have nine gold coins, but one isn’t pure. One has been minted with a cheap alloy, and is known to be heavier than the others. You have a simple balance scale. How do you determine the impure coin with only two weighings?

The solution to this one was pretty straightforward, and my solution was identical to the one presented this week. The second riddle was a little harder:

You have 12 gold coins — or so you think! One is fake and is known to have a different weight than the others. It could be heavier or lighter; you only know it’s wrong. Using the same simple balance scale, how do you determine the incorrect coin, and whether it is heavier or lighter, in only three weighings?

I got the solution to this one too. However, I was disappointed to see that the solution presented in this week’s column was different from mine, and in my opinion, not nearly as elegant. The official included a pair of if-then type statements—weigh 4 coins against 4 other coins, and leave 4 coins off. If the scale balances, then the fake coin is in the group not on the scale, and it’s easy to find the right coin in 2 more weighings. If the scale doesn’t, then it’s one of the 8 on the scale and there is a more complex weighing scheme.

I think my solution, below, is better because it prescribes the exact same experimental procedure regardless of the outcome of the weighings. I also conjecture that the FiveThirtyEight solution is mathematically equivalent but easier to picture.

Label the coins A thru L. Use the table below to put the coins on the correct spot–L for left balance, R for right balance, O for off the scale. W1 is the first weighing, W2 the second, and W3 the third.


--| A | B | C | D | E | F | G | H | I | J | K | L |
---------------------------------------------------
W1| O | O | O | O | L | R | L | R | R | L | R | L |
W2| O | R | R | R | O | O | O | L | L | L | R | L |
W3| R | O | L | R | O | R | R | O | L | O | L | L |

Find the pattern of tips left (L), tips right (R), or balances (B) in these two
tables. The column indicates which coin is fake. If the pattern is in the first
table, the fake coin is heavy. If it’s in the second, it’s light.

Heavy table

| A | B | C | D | E | F | G | H | I | J | K | L |
-------------------------------------------------
| B | B | B | B | L | R | L | R | R | L | R | L |
| B | R | R | R | B | B | B | L | L | L | R | L |
| R | B | L | R | B | R | R | B | L | B | L | L |

Light table

| A | B | C | D | E | F | G | H | I | J | K | L |
-------------------------------------------------
| B | B | B | B | R | L | R | L | L | R | L | R |
| B | L | L | L | B | B | B | R | R | R | L | R |
| L | B | R | L | B | L | L | B | R | B | R | R |

How We Published a Book!, Part 2: The Paperback

In the last post on this topic, I described the process and some of the code we used to self-publish a book in Kindle format that looks and feels much more professional than most self-published material out there. Since we knew a substantial portion of our likely reader base would prefer a physical book, we worked to publish a paperback book with the same level of quality.

The most important parts of the publishing process that we are utterly incapable of doing ourselves (printing and distribution) are taken care of by Createspace and Amazon. (Thank you, modern technology). Dealing with Createspace is pretty straightforward; I’ll talk about it at the end but really it doesn’t take much to figure out.

The parts that took the real work, beyond the simple writing of the book, were typesetting and cover design.

Typesetting

Typesetting is the process of what most people would call formatting. Createspace requires (sort of) uploading a PDF of the interior of the book. That file needs to have the correct page size and margins for going in a book. Further, to make the book look “real” and not like some dope printed it in his basement, there needs to be certain details that most people couldn’t explicitly name but definitely can sense: things like running headers, correct page numbering, widow/orphan protection, and correct justification. If you don’t believe it, type a page of text in Microsoft Word, then compare that text to a commercially published book. If you look carefully, you’ll be able to spot the subtle details that make a professionally-published book look polished.

So, it’s safe to say that I don’t recommend Word for typesetting your book. It will look like crap. There are a few pieces of software out that that CAN do basic typesetting in a point and click environment like work—well known, but relatively expensive Adobe Indesign, as well as the less-known Scrivener. However, we didn’t want to spend the money for a publishing suite that would be used for a single project. Instead, I fell back on the tool that served me so well in graduate school–LaTeX (pronounced “lay-tek”). LaTeX is not for the faint of heart–but if you’re willing to climb the steep learning curve, it can produce a book which is indistinguishable from a commercial publishing house product.

Like XHTML used to build Kindle files, LaTeX separates content from format. The content is written in plain text using tags to identify chapters and sections, saved in a file the the .tex extension. The formatting is done from a set of files called class and style file, or .sty. For our project, we used the memoir class with a template ready-made for Createspace named, not surprisingly, createspace.sty. These are many little tricks and options that it takes time to figure out, in particular if you want to have options other than defaults, but produces a really nice final product.

The trip from Kindle to PDF required a lot of search and replace to remove HTML tags and put LaTeX tags in their place. This ended up being quite time consuming and meant that there were two different versions of the bok floating around. In future projects, I plan to find a way to have single source of content that is translated into both HTML and LaTeX using an automatic script.

Cover design

Designing the cover of the book was in some ways more of a challenge because it is very easy to end up with an amateurish-looking product. We decided to use GIMP to create a simple cover graphic by layering a public-domain photo and globe graphic over a black background with title, subtitle, and author text. Again, GIMP is. freely available program with Photoshop-like capability. GIMP also has a pretty steep learning curve, but is quite capable if you’re willing to invest time to learn it. Please, oh please, though–DON’T use Powerpoint.

Createspace

Once we had a good inside and outside of the book, it was time to self-publish. We had used Createspace on one other occasion and it worked out really well. They are a print-on-demand publishing company, which means they only print the books when somebody orders them. Though I don’t think it’s required, Createspace makes your book available for sale on Amazon. They will send you a proof for free or a small fee, which I highly recommend as you will inevitably see a mistake, and it’s hard to judge the cover without holding the book in your hands. For example, we discovered a big problem with our margins, and that the shiny cover did not look good, both of which were fixed before the book went live on Amazon. Further, every time we had a problem or a question, their customer service was amazing, especially considering that we had zero up front cost.

Next time (when I get to it), I’m going to relate lessons learned from this project, and a few things to do different next time.