The ability to perceive facial expressions of emotion is essential for effective social communication. We investigated how the perception of facial expression emerges from the image properties that convey this important social signal, and how neural responses in face-selective brain regions might track these properties. To do this, we measured the perceptual similarity between expressions of basic emotions, and investigated how this is reflected in image measures and in the neural response of different face-selective regions. We show that the perceptual similarity of different facial expressions (fear, anger, disgust, sadness, happiness) can be predicted by both surface and feature shape information in the image. Using block design fMRI, we found that the perceptual similarity of expressions could also be predicted from the patterns of neural response in the face-selective posterior superior temporal sulcus (STS), but not in the fusiform face area (FFA). These results show that the perception of facial expression is dependent on the shape and surface properties of the image and on the activity of specific face-selective regions.