The statistical theory is that if you take a sample from a population, which is REPRESENTATIVE of the population with regard to some measure X, as you increase the sample size (n), the measure X(n) [X in the sample] approaches X(N) [X in the population].
Now, the first thing to consider is sampling bias. Let's say we wanted to ask Republicans how they feel about the Affordable Care Act. What initial factors would result in a sampling bias?
> Only people who are interested enough in this issue to respond to our question will be included.
> If our data source is public, only registered Republicans rather than ideologically aligned Republicans/Independents/Nonpartisans, will be targeted for our questions.
> Only people who are willing to answer an unknown caller ID / respond to the stranger at the door will be represented in our data.
> People who are on vacation will not be included.
> If the questions are verbal, deaf people and those with speech impediments will not be targeted for our questions.
You see, this list can go on and on indefinitely, each item being less important and representing a smaller fraction of the Population (big P) being intrinsically left out of the data.
The same thing qualifies for representation. Statisticians will run multivariate analysis on a Population to determine which are the most important coefficients to X. It goes like this:
> What is the structure and proportion of races in Republicans?
> Income Level
> Education
> Geographic Distribution
> Age
> Health Status
> Religion
> etc etc
Often, researchers find that there are levels of stratification along these lines, and to make the sample have any predictive value on the population, these strata have to be replicated in the sample.