Bistable visual stimuli such as Rubin's vase/face or the Necker cube refer to the phenomena of spontaneously alternating percepts while viewing the same visual image. The uncoupling between the stimulus and percept offers a means for understanding neural basis of visual perception. Here we asked whether pooling the responses of a large population of MT neurons over time and space could improve the predictability of perceptual decisions during ambiguous visual stimulation. Two well trained rhesus monkeys indicated the perceived direction of rotation of bistable structure-from-motion (SFM) stimuli by pushing one of two levers. During this task, multi-channel intracortical recordings including single-unit activity (SUA), multi-unit activity (MUA), and local field potentials (LFP) were collected from area MT. We sorted the neural data according to the monkeys' behavioral choices and employed statistical algorithms to classify brain states (i.e., the subjective interpretation of a bistable stimulus). Classification was performed with linear discriminant analysis with leave-one-out cross-validation. We found that SUA, MUA and LFP all had a rather modest capability of predicting the monkeys' perceptual report when considered in isolation. We developed dynamic models for spatio-temporal integration of distributed neural signals. We found that the discriminative information of neuronal population activity accumulates over time and the combination of simultaneously collected data greatly improved the prediction accuracy of each of the signals. The accuracy and statistical power of determining the monkeys' perception increased with the number of channels as well as with the types of neural signals used for analysis. Our results demonstrate that simultaneous collection of multiple neural responses in area MT can improve the determination of perceptual states.