Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Your post looks like a non-sequitur to me. But for the sake of math...

For an arbitrary variance multiplier of 10 (std. dev. sqrt(10)), there are no non-imaginary solutions for a high-pass threshold where you can achieve a 60:40 split between two populations that have the same median.

You can get about a 58:42 split, putting your threshold a bit below the common median. But to achieve a 60:40 split, you need one population to have 25 times the variance, or for the populations to have different medians. Above that, you have two possible real solutions for the threshold value.

You can try it yourself, and hopefully my math is correct for this one:

  60/40 * erfc( x / sqrt(2))/2 = erfc( sqrt( variance_ratio ) * x / sqrt(2))/2
Since the threshold value equates to a different standard deviation for each population, you can say with some certainty that given the knowledge of which population a person is in, there is a different probability that person is above the threshold value, which works out to be 60% and 40%, respectively. But that's just begging the question, since those are the same numbers you used to work out the threshold value, and you still had to assume values for the medians and variances of both populations.


I'm super confused by this reply. The article says that the stddev of the distribution of male IQ is larger than the stddev of the distribution for female IQ, and also seems to state the mean and median are equal. GGP threw out 60/40 as their believed split of men to women being good engineers. I took this to mean that GGP was saying he believed engineering talent to be independent of gender when IQ was present, and that engineering required a higher than average IQ. More formally:

  P(good engineer | IQ) * P(IQ | gender) == P(good engineer | gender)
In that model the 60/40 split comes from a conditional distribution of gender given an IQ above some threshold. So something like:

  julia> women = sum([randn() > 1 for i=1:10_000_000])
  julia> men = sum([(randn() * 2) > 1 for i=1:10_000_000])
  julia> women / (women + men)
  0.3395449535000696
Which isn't exactly 60/40, but is fairly close. It says that [iff the 60/40 split is true], the stddev difference needed is less than 2x.

If you prefer exact math, the ratio above is

  erfc(1/sqrt(2)) / (erfc(1 / (2 * sqrt(2))) + erfc(1 / sqrt(2))) =~ 0.339593
and the exact solution for a 60/40 split is

  1 / (sqrt(2) * inverseerfc((3/2) * erfc(1/sqrt(2)))) =~ 1.4
Though FWIW I find the simulation version more intuitive (likely because of a misspent youth programming rather than a misspent youth mathing :p).

Can you explain the model you were using to motivate the mean vs median calculation? I feel like I'm missing something interesting.


Easy to explain. I made a mistake in my math. The equation I needed to solve was

  40/60 * erfc( x / sqrt(2))/2 = erfc( sqrt( variance_ratio ) * x / sqrt(2))/2




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: