Automatic gender recognition and age estimation from speaker's audio is desired by applications in music recommendation, speaker profiling etc. However, its performance degrades greatly with the class-imbalanced data distribution. This paper explores a novel multi-task learning based gender recognition and age estimation system using speaker embedding. We apply the label distribution smoothing referred as LDS and investigate a weight mean squared error focal loss named as w-MSE-FL to reshape the weight assigned to the centralized-distribution samples during training. For a limited dataset, we pretrain a deep convolution neural network stacked with an attentive statistic pooling layer for speaker recognition task on a speaker speech dataset to extract robust speaker embedding feature. Then, we further fine-tune the multi-task learning network for gender recognition and age estimation simultaneously using classifier and regressor on a specific gender and age dataset, respectively. Experimental results verify our proposed system achieves better results on the TIMIT dataset with RMSE of 7.17 and 7.25 years on age estimation for male and female speakers, respectively, while performs an overall gender recognition accuracy of 99.30%.