Natural Language Understanding (NLU), which includes intent detection and slot tagging, plays an important role in any dialog system. This paper aims at building a first-ever conversational smart home dataset SmartNLU and NLU model for Vietnamese. Raw data were collected by asking participants provide or confirm the intents of and slot values in the user says that they sent or received in a smart home conversation until all were matched, using a Wizard-of-Oz set-up of a web tool. The data were then cleaned and processed to build templates of user says with empty slots. The entity strategy, which filled all slot values by the round-robin algorithm to templates, was empirically chosen to generate user says from collected templates, which made a total of 3,492/1,176/1,198 user says correspondingly for the training/validating/test sets. The dataset has been released for a challenge carried out in AIHub, and published for the community. Several state-of-the-art joint NLU models were experimented on the released dataset. The proposed NLU model, which added PhoBERT to the DIET architecture of Rasa framework, gave the best results. The sentence accuracy of the DIET+PhoBERT was considerably higher than (i.e. 4.3% to 11.7%) the one of others.