重点实验室团队发布手语语料库NationalCSL-DP

时间：2025-02-14 09:14:30 来源：
打印本页关闭窗口

The NationalCSL-DP dataset

The NationalCSL-DP dataset was developed by the Sichuan Province Key Laboratory of Philosophy and Social Science for Language Intelligence in Special Education and the Key Laboratory of Internet Natural Language Intelligent Processing of Sichuan Provincial Education Department.

The NationalCSL-DP dataset contains the most extensive vocabulary compared with the existing public ISLR (Isolated Sign Language Recognition) datasets. It contains 6707 glosses from CNSL (Chinese National Sign Language vocanbulary) and provides 134140 sign videos with two vertical views of signer, i.e. the front side and the left side. For the development of the NationalCSL-DP dataset, 10 participants were recruited, including 2 males and 8 females, with a mean age of 19.82±0.28 years. Among them, 8 were deaf students, and 2 were hearing students, all of whom were highly proficient in CNSL. The videos were recorded in a supervised environment with two green-screen studios. Each of these studios was furnished with two high-definition RGB cameras. All the cameras were configured to record videos at a resolution of 1920×1080 pixels and a frame rate of 50 frames per second. Each gloss in the vocabulary was signed by ten signers. To our knowledge, this is the first ISLR dataset that provides dual-view RGB videos and covers the complete glosses in the CNSL vocabulary.

Similar to some popular ISLR datasets (e.g. WLASL, MSASL), we created five subsets from the original dataset, each containing a distinct number of glosses. These subsets are named NationalCSL200, NationalCSL500, NationalCSL1000, NationalCSL2000, and NationalCSL6707, respectively. This systematic division allows for more targeted experimentation and analysis within the context of sign language research, enabling us to evaluate how different levels of data complexity impact the performance of sign recognition systems.

The dataset is released under the CC-BY license (CC BY 4.0).

The dataset is released here:

The partitions of the NationalCSL-DP dataset include:

If you have any question, please contact: syjing628@126.com

Ethical approval: All participants provided informed consent forms for the sharing of their identity information and signed agreements to consent to participate in the construction of the NationalCSL-DP dataset, as well as to allow the dataset to be published, including but not limited to academic journals and online databases. The Ethical Review Board (ERB) of Leshan Normal University reviewed our ethical review application, as well as the informed consent forms and agreements of all participants regarding the sharing of identity information as well as the dataset publication. Finally, permission was granted by the ERB of LSNU for the open publication of the dataset, including manuscript submission and dataset release (Ethical Review Number: LNU-KYLL2025-02-15).