Populations, Sampling Frames and Samples

bullseyeThere are three (3) common mistakes in research of any kind: (1) Solving the wrong problem; (2) Asking the wrong questions; and (3) Polling the wrong people (sample). Each of the three WILL lead to your results being entirely useless – however, only one of the three can remain disguised even after you’ve completed your work. You guessed it! The culprit is #3: Using the wrong sample.

The usual talks about choosing the perfect sample would probably go through the following topics: population parameters; sampling frames; sampling methods; samples and valid samples. We are however going to reduce that (chiefly because I need to leave content for the book) to a quick discussion on the key three: the population; the sampling frame and the sample.

The Population

This is the easiest one to explain, and the easiest metric to get entirely wrong. The population is simply (1) “the most generalizable specification” of (2) “the people who can be affected by the subject of your studies”. *whew* . Most people get this wrong because they focus on part 1 but not part 2. If an element/person CANNOT affected by the subject of your studies, he/she cannot be a part of your population. Here’s an example of how people get this wrong. Let us say we were engaging in a study called “The effect of menstrual cramps on the study habits of students in Jamaica”. Jot down your answer and I’ll return to this.

The Sampling Frame

More times than not, the population turns out to be simply too large for a census (using every person/element within the population as a part of the end sample), leaving the researcher with only those elements/persons that can possibly be reached within reason. THAT is what constitutes the sampling frame. Those members of the population that create a sub-population which is accessible to the researcher and his/her team. So, if we were to look at the relationship between the population and the sampling frame visually (take a look at the target paper in the top left of this post to follow the reference)… the population would be an “8” while the sampling frame would be a “9”.

The Sample

The sample is the easiest of the three because it is the most tangible. The sample is the people who you actually were able to reach. The sampling frame tells you who it is possible to reach, the sample tells you who you were able to collect information from. If you got to the point of collecting information from an element or person, then you have activated your sample. To return to the target paper, your generalized “sample” would be a “10”, while your “valid sample” per item/question would be the “bullseye”.

 

The “Why”

Now that we’re clear on all that – WHY?!? Why go through all this? The simple answer is PRECISION. We conduct research for one of two reasons: either to expand our knowledge of something; or to test (prove/disprove) our knowledge of something. If you ask the WRONG person the RIGHT question, there is no way you can be sure you received the RIGHT response.

Finally, I promised a return to the earlier mentioned hypothetical research on the effect of menstrual cramps.

If you guessed the population was all girls in Jamaica, then sorry, you just went and included a 9-year old in your possible sample. Don’t think your research will attract funding or make it past the ethics board. If you guessed the population was all students in Jamaica then – yups – still incorrect. Now you’re including a 9-year old boy in your possible sample. I can see your reputation tanking when his parents ask him what he did at school today. In this example, the population would be female students in Jamaica above the age of 11 (I believe there is are statistics that show menstruation starting around this age). Notice, only girls of a certain age may be affected directly by menstrual cramps, thus the population cannot extend beyond that limiting factor. To take this further, imagine that you – the researcher – are a student and thus can only afford to access students in Kingston. Then your sampling frame becomes – female students in KINGSTON, Jamaica above the age of 11. At this point you have one final consideration to make before moving ahead – are the female students in Kingston representative of the female students across Jamaica? If yes – then continue with this as a defined sampling frame. If no, then you ought to redefine your population (and therefore the title of your study) to account for its representativeness of Kingston only. Now, having done all of that, based on your chosen sampling technique you may confidently draw a sample that SHOULD represent the “whole”.