Objectives Training in and assessment of consultation skills are high on the agenda of vocational training institutes for postgraduate training. There is a need to establish valid and reliable instruments to assess consultation skills in authentic settings. We investigated the number of assessors and observations needed to achieve reliable assessments of the consultation skills of general practice trainees (GPTs) using a communication instrument (MAAS-Global) and either standardised patient (SP) encounters or videotaped real patient (RP) encounters. Methods Eight teachers at the Vrije Universiteit (VU) University Medical Centre in Amsterdam attended a training course on the use of the MAAS-Global instrument, which they subsequently used to assess the consultation skills of 53 GPTs in 176 videotaped consultations (102 with SPs, 74 with RPs). All consultations were randomly allocated and assessed by two teachers independently. The reliability of the ratings was estimated using generalisability theory. Results It was easier to obtain acceptable reliability using RP consultations than SP consultations. Two assessors and five consultations were required to achieve minimal reliability (generalisability coefficient 0.7) with RPs, whereas three assessors and 30 consultations were needed to achieve minimal reliability with SPs. Conclusions Inter-observer and context variability in the assessment of the consultation skills of GPTs remains high. To achieve acceptable levels of reliability, large samples of observations are required in both formats, but, interestingly, RP encounters require a smaller sample than SP encounters. © Blackwell Publishing Ltd 2011.