National Center Biobank Network (NCBN) dataset
Summary
The NCBN dataset consists of allele and genotype frequencies obtained through whole-genome sequence (WGS) analysis of control individuals for cancer and rare disease studies, collected by the National Center BioBank Network (NCBN). Regional variations were taken into consideration. The dataset includes frequencies for the mainland Japanese population and the Ryukyu population, determined through PCA analysis. Variants that were specific to or not observed in the Japanese population were identified by joint calling with WGS data from the 1000 Genomes Project (1KGP). See the publication below for details.
- Version/Last update: 2024/6/28
- Sample size: 9,290 (Japanese), 2,504 (1000 Genomes Project)
- Number of variants:215,729,032 (Total of Japanese and 1000 Genomes Project)
Publication
Kawai Y, Watanabe Y, Omae Y, Miyahara R, Khor S-S, Noiri E, et al. (2023) Exploring the genetic diversity of the Japanese population: Insights from a large-scale whole genome sequencing analysis. PLoS Genet 19(12): e1010625. https://doi.org/10.1371/journal.pgen.1010625.Terms of use
Rights of Data Users
The rights of data users shall conform to "5-2-1. Open Data" in "5-2. Rights of Data Users" listed in the NBDC Human Data Sharing Guidelines.
- The data user can freely present the result of the study for which data from the NBDC Human Database are used.
- The data user can freely acquire intellectual property rights based on the result of the study for which data from the NBDC Human Database are used.
Responsibilities of Data Users
Terms of "5-3-1. Open Data" in "5-3. Responsibilities of Data Users" listed in the NBDC Human Data Sharing Guidelines shall apply with modification to the responsibilities of data users. As for redistribution of data, terms for controlled-access data shall apply because this dataset was generated by processing controlled-access data.
- In using data, the user must take responsibility for and make judgments concerning the quality, content, and scientific validity of the data.
- The data user must comply with the following rules.
- The use of data is limited to the study being undertaken.
- Identification of individuals is prohibited
- Redistribution of data is prohibited.
- The data user must add the following citation while using the data in public (e.g. publishing an article).
Kawai Y, Watanabe Y, Omae Y, Miyahara R, Khor S-S, Noiri E, et al. (2023) Exploring the genetic diversity of the Japanese population: Insights from a large-scale whole genome sequencing analysis. PLoS Genet 19(12): e1010625. https://doi.org/10.1371/journal.pgen.1010625.
Download VCF file created by the data provider [Unrestricted access]
NBDC Human DB | Study title | Participants | Sample size | Data provider |
---|---|---|---|---|
hum0331.v1.freq.v1 | Construction of control data for the promotion of genomic medicine for cancers and rare diseases | healthy individuals (Japanese) | 9,290 | Katsushi Tokunaga |
Total | 9,290 |
List of populations for which frequencies are available
By using the “Alternative allele frequency/count” in the Advanced search, you can search for variants based on the alternative allele frequencies aggregated by population.
- Mainland of Japan
- Ryukyu Island in Japan
- African Caribbean in Barbados [ACB]
- African Ancestry in SW USA [ASW]
- Bengali in Bangladesh [BEB]
- British From England and Scotland [GBR]
- Chinese Dai in Xishuangbanna, China [CDX]
- Colombian in Medellín, Colombia [CLM]
- Esan in Nigeria [ESN]
- Finnish in Finland [FIN]
- Gambian in Western Division – Mandinka [GWD]
- Gujarati Indians in Houston, Texas, USA [GIH]
- Han Chinese in Beijing, China [CHB]
- Han Chinese South [CHS]
- Iberian Populations in Spain [IBS]
- Indian Telugu in the U.K. [ITU]
- Japanese in Tokyo, Japan [JPT]
- Kinh in Ho Chi Minh City, Vietnam [KHV]
- Luhya in Webuye, Kenya [LWK]
- Mende in Sierra Leone [MSL]
- Mexican Ancestry in Los Angeles CA USA [MXL]
- Peruvian in Lima Peru [PEL]
- Puerto Rican in Puerto Rico [PUR]
- Punjabi in Lahore, Pakistan [PJL]
- Sri Lankan Tamil in the UK [STU]
- Toscani in Italia [TSI]
- Utah residents (CEPH) with Northern and Western European ancestry [CEU]
- Yoruba in Ibadan, Nigeria [YRI]