General Category => Everything Else => Topic started by: vh on March 22, 2015, 09:10:26 PM
Title: data analysis puzzle
Post by: vh on March 22, 2015, 09:10:26 PM
you have data on a million people -- their birthdate, deathdate, and how much money they made
you have data on how famous/popular they were -- how many people knew them, how many people emailed them, how friends they have.
what sort of plot or analysis would you make to establish a correlation between wealth and fame
Title: Re: data analysis puzzle
Post by: FiahOwl on March 23, 2015, 03:02:34 AM
This message is only viewable with Universe Sandbox Galaxy Edition. Access it and much more with promo-code '148855'.
Title: Re: data analysis puzzle
Post by: vh on March 23, 2015, 03:44:44 AM
yes, but you have multiple metrics for fame, and you need to take into account when they were born and died, because globalization has increased average fame over time
Title: Re: data analysis puzzle
Post by: atomic7732 on March 23, 2015, 07:00:22 AM
integrate fame over their lifetime to get total fame and then divide out of average total population on planet during their lifetime?
Title: Re: data analysis puzzle
Post by: vh on March 23, 2015, 01:34:30 PM
but there are multiple metrics for fame (and it's a single numbers, not a functions), and you don't know the population on the planet.
Title: Re: data analysis puzzle
Post by: Bla on March 23, 2015, 01:50:12 PM
I would decide which is the best measure of fame (although nothing prevents you from analyzing multiple I guess). I'd probably pick how many people know them as the best measure. Emails relies very much on the person, location and time. And I associate being famous with being known by a lot of people, which doesn't have to mean being their friends.
I would negate globalization by finding the average number of people someone knows each year. Then to see how much "normalized" fame someone has, divide how many people they know a given year by that average. If you only know how many they knew at their death, I'd do something similar maybe with the death year. This might introduce a bias depending on how long people live that I'm not going to bother considering now. With a million people I might just divide people into categories who lived similar durations.
And at last plot how much money they made as a function of their normalized fame.