Guides &
Reference.

Practical explanations, worked examples, and ready-to-use syntax for everyday research tasks.

Weighting

How to correct for sample imbalance and interpret the results

What is weighting and why does it matter?

The fundamental concept explained

No survey sample is a perfect mirror of the population. Even with the best sampling design, some groups end up over- or under-represented in your final data — because they were harder to reach, more or less likely to respond, or simply because random variation in a finite sample created imbalances.

Weighting corrects for this by mathematically adjusting each respondent's contribution to the data, so that the final weighted dataset better reflects the true population structure.

A simple way to think about it: if 20% of your sample are aged 18–24 but they represent 30% of the real population, then each 18–24 respondent should count for more than one person. Weighting assigns them a weight greater than 1 to make up the shortfall.

Worked Example — Basic Concept

You survey 400 people. The population is 50% male / 50% female, but your sample came out 40% male / 60% female.

GenderSample %Population %Weight
Male40%50%50/40 = 1.25
Female60%50%50/60 = 0.83

Each male respondent now counts as 1.25 people, and each female respondent counts as 0.83 people. The weighted data reflects a 50/50 split.

When should you weight?

You should weight whenever your sample profile differs meaningfully from the population on variables that are likely to relate to your key measures. Common weighting targets include age, gender, region, social grade, and education level.

Weighting is not always necessary — if your sample already closely matches the population on all key dimensions, weighting may change very little and introduce unnecessary complexity.

Types of survey weights

Design weights, post-stratification, rim weighting, and more

Design weights

Design weights correct for known differences in the probability of selection rather than differences in response rates. If one stratum was sampled at twice the rate of another, respondents in that stratum receive a weight of 0.5 to compensate.

Design weights are applied before any post-stratification and are determined by the sample design, not the achieved sample profile.

Post-stratification weights (cell weighting)

The most common form in commercial market research. You define target cells (e.g. male 18–34, female 35–54) and calculate a weight for each cell based on the ratio of population proportion to sample proportion.

Limitation: Cell weighting requires enough respondents in every cell to produce stable weights. With many demographic variables, cells can become very small, leading to extreme weights.

Rim weighting (raking)

Rim weighting — also called iterative proportional fitting or raking — allows you to weight to multiple variables simultaneously without needing population data for every combination of those variables.

Instead of weighting to age × gender × region cells, you weight to the age margins, gender margins, and region margins separately, iterating back and forth until the sample converges on all targets at once.

How Rim Weighting Works

Step 1: Weight the data to match the age distribution. This will slightly disturb the gender balance.

Step 2: Re-weight to match the gender distribution. This slightly disturbs the age balance.

Step 3: Repeat until both distributions are within tolerance (typically <0.1% of target).

Most rim weighting converges in 5–20 iterations. The result is a single composite weight per respondent that simultaneously satisfies all margin targets.

Calibration weights

A generalisation of rim weighting that uses regression-based methods to bring survey estimates into alignment with known population totals. More common in official statistics and large-scale government surveys than commercial research.

Applying weights — a worked example

Step-by-step from raw data to weighted results

You've completed a survey of 500 adults asking whether they support a proposed new policy. You want to weight by age and gender to match the national population.

Step 1 — Get your population targets

Source population data from official statistics (ONS in the UK, Census Bureau in the US). For this example:

GroupPopulation %Sample nSample %
Male 18–3414%5210.4%
Male 35–5418%8817.6%
Male 55+18%7014.0%
Female 18–3414%7815.6%
Female 35–5418%10220.4%
Female 55+18%11022.0%

Step 2 — Calculate cell weights

Weight = Population % ÷ Sample %. For Male 18–34: 14% ÷ 10.4% = 1.346. For Female 55+: 18% ÷ 22% = 0.818.

Step 3 — Apply weights to your analysis

Each respondent in Male 18–34 now counts as 1.346 people. When you calculate the % who support the policy, you multiply each response by that person's weight before summing.

Unweighted vs Weighted Result

Unweighted: 58% support the policy (290 out of 500 said yes)

Weighted: 54% support the policy

The 4pp difference exists because younger males — who were under-represented and support the policy less — receive higher weights in the weighted data, pulling the total figure down.

Capping weights

When and how to trim extreme weights

Extreme weights — where a single respondent is weighted up to represent 4, 5, or even 10 times their actual count — create a serious problem. A very heavily weighted respondent has a disproportionate effect on your results, and if their answers are in any way atypical, the weighted data can be substantially distorted.

What is capping?

Weight capping (also called weight trimming) sets a maximum value that any single weight can take. Respondents who would have received a weight above this cap are instead given the cap value, and the remaining weight is redistributed to others in the same marginal group.

Common capping thresholds

There is no universal rule, but common thresholds in commercial research are:

  • 3.0× — frequently used in online panel research with broad population targets
  • 4.0× — a more permissive cap sometimes used when sample sizes allow
  • 0.33×–3.0× — a symmetric cap that limits both high and low extremes

Some researchers express this as a ratio of the highest to lowest weight: a 4:1 max ratio means no weight should be more than four times any other weight in the same dataset.

The trade-off

Capping reduces variance (making your results more stable) but introduces bias (your weighted data no longer perfectly matches population targets). This is always a trade-off — there is no cap that eliminates both problems simultaneously.

The Capping Trade-off in Practice

Your sample has 8 men aged 18–24 but they represent 12% of the population (60 of 500). Each receives a weight of 7.5 — meaning each one counts as 7.5 people.

Problem: If one of these 8 respondents gave an unusual answer, it accounts for 7.5 votes. A single outlier can swing your national figure by several percentage points.

With a cap of 3.0: Each is capped at 3.0. You now only represent 24 of the 60 needed from this group. The remaining 36 are redistributed across other cells, introducing a small bias but dramatically reducing the influence of any single respondent.

Recommendation: If a large number of respondents hit the cap, reconsider whether your population targets are appropriate or whether you need to boost that subgroup in your sample design.

How to decide on a cap

Look at the distribution of your pre-cap weights before deciding. If only 2–3 respondents exceed 3.0 and most weights sit between 0.5 and 2.5, a cap of 3.0 will have very little impact on your population fit. If 15% of respondents would exceed 3.0, the cap is doing heavy lifting and your sample design may need review.

Checking and diagnosing your weights

Weight efficiency, DEFF, and distribution diagnostics

Always check before reporting

Before using weighted data in analysis, it is good practice to run a set of diagnostics. Extreme or poorly distributed weights can silently distort your findings.

Check 1 — Weight distribution

Run a frequency table of your weight variable. Look at the minimum, maximum, mean, and standard deviation. A well-behaved weight variable in a 1,000-person survey typically has: mean ≈ 1.0, most weights between 0.4 and 2.5, and very few (if any) above 3.0.

Check 2 — Effective sample size

The effective sample size (ESS) tells you how much statistical power your weighted data actually has. It is calculated as (Σw)² / Σw². An ESS of 750 from a nominal sample of 1,000 means you have lost 25% of your precision through weighting.

Use our Effective Sample Size calculator to check this for any dataset.

Check 3 — Design effect (DEFF)

DEFF = n / ESS. A DEFF of 1.3 means your margins of error should be calculated using the effective n, not the nominal n. Always report margins of error based on effective sample size in weighted data.

Check 4 — Profile after weighting

Run a crosstab of each weighting variable after weighting is applied. The weighted profile should match your targets. If it doesn't converge, your weighting algorithm may not have reached a solution, or there may be conflicting targets.

Quick Diagnostics Checklist

✓  Mean weight ≈ 1.0 (or ≈ population/sample size if not normalised)

✓  Max weight below your chosen cap

✓  Weight efficiency above 70% (ideally above 80%)

✓  Weighted profile matches population targets on all weighting dimensions

✓  Margins of error calculated using effective n, not nominal n

Common weighting mistakes

What to watch out for in practice

Using unweighted n in significance tests

Perhaps the most common error. Significance tests depend on sample size — using the nominal n (1,000) rather than the effective n (e.g. 780) will make differences appear more statistically significant than they actually are. Always base significance calculations on effective n for weighted data.

Weighting to out-of-date population data

If your population benchmarks are from a census five years ago and demographic patterns have shifted meaningfully, your weights may be correcting for a population that no longer exists. Check that your population data is current and appropriate for the population you are studying.

Weighting to the wrong population

If your survey is of homeowners, weighting to the general adult population profile may introduce bias rather than reduce it — homeowners have a different demographic profile. Always weight to the profile of the target population, not the general population.

Ignoring small cells

Cell weighting with small cells (fewer than ~15–20 respondents) produces unstable weights with high variance. Consider collapsing age or geographic categories, or switching to rim weighting which handles small cell sizes more gracefully.

Reporting weighted counts as unweighted

Always make clear in charts and tables whether counts shown are weighted or unweighted. Bases should typically show the unweighted n — so readers can assess reliability — while percentages are calculated from weighted data.

Sampling

Building samples that give you data you can trust

Sample frames

The foundation of any good sample design

A sample frame is the list or mechanism from which your sample is drawn. In an ideal world it is a complete and accurate list of every member of your target population. In practice, perfect frames rarely exist.

Common sample frames in market research

  • Online panels — large pre-recruited pools of respondents who have agreed to take surveys. The most commonly used frame in commercial research. Not probability-based.
  • Electoral registers — used for face-to-face probability surveys of adults in the UK. Covers ~90% of eligible adults.
  • Postcode address files (PAF) — a near-complete list of UK postal addresses, used for household surveys.
  • Customer databases — used when surveying existing customers of a company. Coverage depends on the completeness of the CRM.
  • Phone number databases — increasingly incomplete due to mobile phone adoption and declining landline usage.

Frame error

Coverage error occurs when the frame does not cover some part of your population. If your frame is an electoral register and you want to survey 16–17 year olds (who are not on it), your sample will have zero coverage of that group regardless of how carefully you sample from the frame.

Always ask: who is missing from my frame, and does that matter for this study?

Frame Coverage Example

You want to understand internet usage among UK adults. Your frame is an online panel.

Problem: By definition, online panels exclude people without internet access — the very group most likely to have low internet usage. Your frame has systematic undercoverage of the population segment most relevant to your research question.

Solution: Consider supplementing with face-to-face interviewing for offline respondents, or acknowledge the limitation explicitly in your methodology.

Probability vs non-probability sampling

The most fundamental distinction in sampling design

Probability sampling

Every member of the population has a known, non-zero probability of being selected. This is the gold standard because it allows you to make statistically valid inferences about the population and calculate unbiased margins of error.

Pro Theoretically unbiased estimates

Pro Valid, calculable margins of error

Con Expensive and slow to execute

Con Often requires a complete sampling frame

Con Non-response still introduces bias in practice

Non-probability sampling

Respondents are selected without a known probability of inclusion. The vast majority of online market research uses non-probability samples — most online panels are opt-in, and quota-based sampling does not give every population member an equal or known chance of inclusion.

Pro Fast and cost-effective

Pro Flexible and scalable

Con Margins of error are technically not valid

Con Self-selection bias is hard to detect or correct

Note MoE is still widely reported as a useful guide to precision

The Honest Position on Online Panels

Most commercial online surveys use non-probability panels and report margins of error as if they were probability samples. This is pragmatic but technically imprecise — the MoE captures only sampling variation, not the potential bias from systematic differences between panel members and the general population.

This doesn't mean the data is bad. It means that context matters, and that weighting and careful quota setting are particularly important for panel-based research.

Types of probability sampling

SRS, stratified, cluster, and systematic explained

Simple Random Sampling (SRS)

Every member of the population has an equal and independent chance of selection. You draw names from a hat (metaphorically). The simplest approach and the basis for standard MoE formulas.

Pro Simple and unbiasedCon May under-represent small subgroups by chance

Stratified sampling

The population is divided into meaningful subgroups (strata) — e.g. by region or age — and a random sample is drawn from each stratum. This guarantees adequate representation of each group and is more efficient than SRS when strata differ meaningfully on the outcome of interest.

Pro Ensures subgroup coveragePro Often more precise than SRS for same n

Cluster sampling

The population is divided into clusters (e.g. schools, neighbourhoods), a random sample of clusters is selected, and then all (or a random subset of) members within those clusters are surveyed. Often used when a complete list of individuals doesn't exist but a list of groups does.

Pro Practical when no complete frame existsCon Less statistically efficient — respondents within clusters are similar

Systematic sampling

Select every k-th person from a list (e.g. every 10th name from a customer database). Simpler to execute than random sampling but can introduce periodicity bias if the list has a regular pattern.

Stratified vs SRS — Why It Matters

You want to survey 400 people across 4 UK regions, and the North East accounts for only 4% of the population (16 people under SRS). With only 16 North East respondents, your subgroup analysis for that region will have a margin of error of ±24% — essentially useless.

With stratified sampling: You deliberately oversample the North East to achieve 80 respondents there, then apply a design weight of 0.2 to correct for the oversampling in national totals. You now have meaningful regional data and valid national figures.

Non-probability approaches

Quota sampling, convenience, and panels

Quota sampling

Interviewers (or systems) are given quotas to fill — e.g. "recruit 50 men aged 18–34 and 50 women aged 18–34". Sampling continues until all quotas are filled. This controls the sample profile but does not give every population member an equal chance of selection.

Quota sampling is the dominant approach in commercial online research. It controls for known demographic variables but cannot control for unmeasured differences between those who choose to take surveys and those who don't.

Convenience sampling

Recruiting whoever is easiest to reach — website intercepts, social media links, employee surveys. Fast and cheap but highly susceptible to self-selection bias. Best suited to exploratory or qualitative-in-spirit quant work rather than population inference.

Online panels

Pre-recruited pools of respondents who receive survey invitations by email. Panel members self-selected into the panel and tend to be more survey-aware, digitally engaged, and demographically skewed than the general population. Weighting corrects for some of this but not all.

Quality of panel data varies significantly by provider. Key factors to consider: panel size, recruitment methodology, engagement rates, deduplication practices, and response time distribution.

Panel Quality Red Flags

Watch out for: very fast completion times (under 3 minutes for a 15-minute survey), straight-lining on grid questions, failing attention check questions, and open-text responses that are clearly nonsensical. These indicate low-quality respondents that should be screened out before analysis.

Sample size — rules of thumb

How big does my sample actually need to be?

It depends on what you need to do with the data

There is no universal answer, but here are the practical considerations:

  • National totals only: 1,000 is the industry standard in the UK and US. This gives ±3.1% MoE at 95% confidence.
  • Subgroup analysis: You need adequate n within every subgroup you plan to analyse. Rule of thumb: 100 per subgroup for reporting, 50 as an absolute minimum.
  • Tracking surveys: If you're measuring change over time, you need enough power to detect meaningful shifts. A 3pp change requires roughly 1,000–1,500 per wave to be reliably detectable at 95% confidence.
  • Experimental designs (A/B): Use a power calculation to determine sample size. This requires specifying the minimum detectable effect size and acceptable error rates.

The subgroup trap

The most common sample size mistake in commercial research is designing to a national total without thinking about subgroups. A sample of 500 gives ±4.4% MoE nationally. But if you want to compare ABC1 vs C2DE (roughly 55%/45% split), you're working with effective subgroup bases of ~275 and ~225 — with MoE of ±5.9% and ±6.5% respectively. A 5pp gap between these groups is not significant.

Planning for Subgroup Analysis

You need to report results for 6 UK regions and want at least 100 per region. London is 15% of the adult population — so you'd naturally get 150 per region in a nationally representative 1,000-person sample. But the North East is only 4% — you'd get just 40. You need to boost the North East specifically and apply design weights.

Use our Target Sample Size calculator to plan your subgroup requirements before fieldwork.

Common pitfalls

  • Non-response bias: Low response rates don't just reduce sample size — they introduce bias if non-responders differ systematically from responders.
  • Attrition in trackers: In longitudinal studies, dropout is rarely random. Track who drops out and compare their earlier responses to completers.
  • Over-claiming from small bases: Reporting percentages on bases below 50 without a caveat is a common but serious error. Consider suppressing data or using starred asterisks as a warning.

SPSS Syntax

Ready-to-use syntax — click any block to copy

Why use syntax instead of menus? Syntax creates a permanent, reproducible record of every step in your analysis. When a client requests a re-run, or a colleague needs to replicate your work, syntax makes it instant. Get into the habit of running everything from syntax, not clicks.

Labelling variables and values

VARIABLE LABELS, VALUE LABELS

Good labelling is the difference between a data file that is self-explanatory and one that requires the questionnaire to be open at all times. Label everything before analysis begins.

Variable labels

VARIABLE LABELS
  Q1   'Overall satisfaction with the service'
  Q2   'Likelihood to recommend (NPS)'
  Q3_1 'Reason for visit: Browse'
  Q3_2 'Reason for visit: Purchase'
  Q3_3 'Reason for visit: Return item'
  WVAR 'Weighting variable'
.

Value labels

VALUE LABELS
  Q1
    1 'Very satisfied'
    2 'Fairly satisfied'
    3 'Neither satisfied nor dissatisfied'
    4 'Fairly dissatisfied'
    5 'Very dissatisfied'
    -1 'Don''t know'
.

VALUE LABELS
  Q3_1 Q3_2 Q3_3
    0 'Not selected'
    1 'Selected'
.

Defining missing values

/* Mark -1 (Don't know) and -2 (Not applicable) as user-missing */
MISSING VALUES Q1 Q2 (-1, -2).
MISSING VALUES Q3_1 TO Q3_10 (-1, -2).

Applying a weight

WEIGHT BY, turning weights on and off

The WEIGHT BY command tells SPSS to use a specified variable as a weight for all subsequent procedures. It remains active until you explicitly turn it off.

Apply a weight

/* Apply the weighting variable WVAR */
WEIGHT BY WVAR.

Turn weighting off

/* Turn weighting off — all subsequent procedures use unweighted data */
WEIGHT OFF.

Check weighting is applied correctly

/* Run frequencies on a known demographic variable to check weighted profile */
WEIGHT BY WVAR.

FREQUENCIES VARIABLES = AGE_GROUP GENDER REGION
  /ORDER = ANALYSIS.
Important Note

Always include WEIGHT BY WVAR. at the top of every syntax file that uses weighted data. If you open the file later and re-run only part of it, the weight should be the first thing that executes.

Applying a filter

FILTER, SELECT IF, USE

Filters restrict analysis to a subset of respondents. There are two main approaches: FILTER (which excludes but retains cases) and SELECT IF (which permanently deletes excluded cases). Use FILTER in almost all circumstances — SELECT IF is irreversible without re-reading the data.

Apply a simple filter

/* Filter to respondents who visited a store in the last month (Q1 = 1) */
FILTER BY Q1 = 1.

/* Alternative: create a filter variable then apply it */
COMPUTE FILTER_VAR = (Q1 = 1).
VARIABLE LABELS FILTER_VAR 'Filter: Visited store last month'.
FILTER BY FILTER_VAR.

Filter using multiple conditions

/* Filter to ABC1 respondents aged 18-44 */
COMPUTE FILT_ABC1_1844 = (SOCIAL_GRADE <= 2 AND AGE >= 18 AND AGE <= 44).
FILTER BY FILT_ABC1_1844.

Turn off the filter

/* Return to all respondents */
FILTER OFF.
USE ALL.

Checking a routing from a multicoded question

Verify who was shown a routed question

A common task: Q5 was only shown to respondents who selected code 3 ("Online") at Q3, which is a multicoded question. You need to verify the routing was applied correctly and then produce Q5 results among the correct base.

Step 1 — Understand the data structure

Multicoded questions are typically stored as a series of binary variables: Q3_1, Q3_2, Q3_3... where 1 = selected and 0 = not selected. Q3_3 = 1 means the respondent selected code 3 at Q3.

Step 2 — Check the routing

/* Count Q5 responses by whether they were routed via Q3_3=1 */
/* Q5 should only have valid answers (non-missing) where Q3_3=1 */

CROSSTABS
  /TABLES = Q3_3 BY Q5
  /FORMAT = AVALUE TABLES
  /CELLS = COUNT ROW
  /COUNT ROUND CELL.

Step 3 — Produce Q5 results on the correct base

/* Filter to those who selected Online at Q3, then run Q5 */
WEIGHT BY WVAR.
FILTER BY Q3_3 = 1.

FREQUENCIES VARIABLES = Q5
  /ORDER = ANALYSIS.

FILTER OFF.
USE ALL.
Worked Example — Routing Verification

You run the crosstab and find 15 respondents have a valid answer at Q5 but Q3_3 = 0. This means either: (a) the routing was applied incorrectly in the survey, or (b) the data has been processed incorrectly. Investigate before continuing — these 15 cases should either be cleaned or excluded from Q5 analysis.

Creating sum and index variables

COMPUTE, SUM function, handling missing values

Common tasks: counting how many codes a respondent selected at a multicoded question, summing scores across a battery, or creating an index variable.

Count the number of codes selected (multicoded question)

/* Q3 is multicoded with 8 options (Q3_1 to Q3_8, coded 0/1) */
/* Create a variable that counts how many options each respondent selected */
COMPUTE Q3_COUNT = Q3_1 + Q3_2 + Q3_3 + Q3_4
                   + Q3_5 + Q3_6 + Q3_7 + Q3_8.
VARIABLE LABELS Q3_COUNT 'Q3: Number of reasons selected'.

Sum a battery (handling missing values correctly)

/* SUM() function ignores missing values; arithmetic (+) propagates them */
/* Use SUM() when some respondents may have missing values on individual items */

COMPUTE SCORE_TOTAL = SUM(Q10_1, Q10_2, Q10_3, Q10_4, Q10_5).
VARIABLE LABELS SCORE_TOTAL 'Q10: Total score (sum of 5 items)'.

/* Mean score across the battery */
COMPUTE SCORE_MEAN = MEAN(Q10_1, Q10_2, Q10_3, Q10_4, Q10_5).
VARIABLE LABELS SCORE_MEAN 'Q10: Mean score across 5 items'.

Create an index (subgroup vs total)

/* Index = (subgroup % / total %) x 100 */
/* Example: create an awareness index — score above 100 = above average awareness */
/* First get total awareness %, then compute index per subgroup in your reporting tool */
/* In SPSS this is typically done at the output/table stage, not as a computed variable */

/* Create a binary awareness variable (Q4=1 means aware) */
COMPUTE AWARE = (Q4 = 1).
VARIABLE LABELS AWARE 'Awareness (binary)'.
VALUE LABELS AWARE 0 'Not aware' 1 'Aware'.

Frequencies and crosstabs

FREQUENCIES, CROSSTABS with column %

Basic frequencies

WEIGHT BY WVAR.

FREQUENCIES VARIABLES = Q1 Q2 Q3
  /FORMAT = NOTABLE
  /BARCHART FREQ
  /ORDER = ANALYSIS.

Crosstab with column percentages (the standard format)

/* Q1 (rows) by gender (columns), showing column % */
WEIGHT BY WVAR.

CROSSTABS
  /TABLES = Q1 BY GENDER
  /FORMAT = AVALUE TABLES
  /STATISTICS = CHISQ
  /CELLS = COUNT COLUMN
  /COUNT ROUND CELL.

Multiple dependent variables, multiple breaks

/* Run Q1, Q2, Q3 by gender, age group, and region simultaneously */
WEIGHT BY WVAR.

CROSSTABS
  /TABLES = Q1 Q2 Q3 BY GENDER AGE_GROUP REGION
  /FORMAT = AVALUE TABLES
  /CELLS = COUNT COLUMN
  /COUNT ROUND CELL.

Recoding variables

RECODE, collapsing categories, creating age groups

Always recode INTO a new variable (RECODE ... INTO) rather than overwriting the original. This preserves your raw data.

Recode a satisfaction scale to top/bottom box

/* Q1 is a 5-point satisfaction scale (1=Very satisfied, 5=Very dissatisfied) */
/* Create top 2 box (1-2) and bottom 2 box (4-5) variables */

RECODE Q1 (1,2=1)(3=2)(4,5=3)(MISSING=SYSMIS) INTO Q1_RECODED.
VARIABLE LABELS Q1_RECODED 'Q1: Satisfaction (3-point)'.
VALUE LABELS Q1_RECODED
  1 'Top 2 box (satisfied)'
  2 'Neither'
  3 'Bottom 2 box (dissatisfied)'.

Create age groups from a continuous age variable

RECODE AGE
  (18 THRU 24 = 1)
  (25 THRU 34 = 2)
  (35 THRU 44 = 3)
  (45 THRU 54 = 4)
  (55 THRU 64 = 5)
  (65 THRU HIGHEST = 6)
  (MISSING = SYSMIS)
  INTO AGE_GROUP.
VARIABLE LABELS AGE_GROUP 'Age group (6-band)'.
VALUE LABELS AGE_GROUP
  1 '18-24'  2 '25-34'  3 '35-44'
  4 '45-54'  5 '55-64'  6 '65+'.

Net / combine codes

/* Combine awareness codes 1 (prompted) and 2 (spontaneous) into a single net */
COMPUTE AWARE_NET = (Q4 = 1 OR Q4 = 2).
VARIABLE LABELS AWARE_NET 'Awareness net (prompted + spontaneous)'.
VALUE LABELS AWARE_NET 0 'Not aware' 1 'Aware (net)'.

Syntax tips and best practices

Habits that will save you time and prevent errors

Always start your syntax file the same way

/* ================================================
   PROJECT: [Project name]
   WAVE:    [Wave number / date]
   ANALYST: [Your name]
   CREATED: [Date]
   NOTES:   [Any important context]
   ================================================ */

/* 1. Open the data file */
GET FILE = 'C:\Projects\MyProject\data\w3_data.sav'.

/* 2. Apply weight */
WEIGHT BY WVAR.

/* 3. Set output options */
SET TVARS LABELS.
SET TNUMBERS VALUES.

Key shortcuts and habits

  • Ctrl+A, Ctrl+R — select all, run all. Quick way to re-run your full syntax.
  • Ctrl+R on a selection — run just the highlighted block.
  • Comment liberally — use /* comment */ to explain every section. Your future self will thank you.
  • Save syntax, not just output — output is a snapshot; syntax is the source of truth.
  • Never overwrite raw variables — always RECODE INTO a new variable name.
  • EXECUTE after COMPUTE — if you COMPUTE a variable and immediately use it in another COMPUTE, insert EXECUTE. in between to force evaluation.
The EXECUTE Trap

You write: COMPUTE A = B + 1. then COMPUTE C = A * 2. — SPSS may not have processed A before it evaluates C, leaving C with system missing values. Fix: insert EXECUTE. between the two COMPUTE statements to force execution.