YEAST_DB schema --------------- - Now supported in AnnotationDbi (except for the CHRLENGTHS and REJECTORF maps that are missing). - Am I right in assuming that the YEAST package is _not_ probe-based despite the fact that the "quality control" says: Mappings found for probe based rda files: ... Mappings found for non-probe based rda files: ... My understanding is that the YEAST maps are based on the "systematic gene names". Also misleading is the naming of the reverse maps: YEASTENZYME2PROBE, YEASTGO2PROBE, etc... Only 1 reverse map seems to be aknowledging the fact that the package data is based on the systematic gene names, the COMMON2SYSTEMATIC map. NL: Correct. YEAST is not probe-based, and mose maps are indexed by systematic name. - However, this map name "COMMON2SYSTEMATIC" is inconsistent. It seems to be the reverse map for the GENENAME map. In this case, why isn't it called GENENAME2SYSTEMATIC? (all other reverse maps use this naming convention of prefixing with the name of the direct map). NL: Yes, GENENAME2SYSTEMATIC gives a better discription of the map. Whoever names it "COMMON2SYSTEMATIC" must have a reason. Just to clarify, "COMMON2SYSTEMATIC" is NOT the reverse map for the GENENAME map. GENENAME map contains systematic-to-genename mapping of all systematic names that has a gene name. If a gene name has no systematic name, it is not included in the GENENAME map. COMMON2SYSTEMATIC map contains all existing gene names, and provides their systematic names if there exists. COMMON2SYSTEMATIC map also contains all alias (alias of gene names), and provides their systematic names if exists. - Where should I extract the GENENAME map from? o or from the sgd table? (has cols: id, systematic_name, gene_name, sgd_id) o or from the gene2systematic table? (has cols: gene_name, systematic_name) Sometime it gives the same, sometimes not. With YEAST.sqlite: sqlite> select gene_name from sgd where systematic_name='YAL012W'; CYS3 sqlite> select gene_name from gene2systematic where systematic_name='YAL012W'; CYS3 CYI1 FUN35 STR1 Using the former seems to give results much closer to the current YEAST package (1.15.2) so this is what I've choosen for the current version of AnnotationDbi (0.0.24). But then the gene2systematic is useless... NL: There are 2 naming systems for yeast: Systematic name and Gene name(or Standard Name). Detailed descriptions can be found at http://www.yeastgenome.org/help/yeastGeneNomenclature.shtml There are 4 possible scopes of data coverages: * Named Genes Only: contain information only about features which have been given a Gene Name, either a Standard Name or a Reserved Name. Thus these files will NOT include information on ORFs (protein coding genes) that have not been given Gene Names, and WILL include information about genetic loci that have never been mapped to a chromosomal position, but which have been given Gene Names. * ORFs: contain information about all ORFs (protein coding genes), regardless of whether or not they are also associated with a Gene Name (i.e. a Standard Name or a Reserved Name). * Gene products: contain information about chromosomal features which correspond to gene products, either protein or RNA products, including ORF (protein coding genes), Ty ORF, tRNA, rRNA, snRNA, snoRNA, and other RNA gene features. Other sequence features (LTR, ARS, Transposon, pseudogene, and CEN) will not be included. * All chromosomal features: contain information about all chromosomal sequence features including ORF (protein coding genes), LTR, tRNA, Ty ORF, snoRNA, ARS, Transposon, pseudogene, rRNA, CEN, RNA gene, and snRNA features. The scope of the following objects in the envir-based YEAST package are "ORFs": CHR, CHRLOC, DESCRIPTION, ENZYME, GO, PATH, PMID, INTERPRO, PFAM, SMART, ENZYME2PROBE, PATH2PROBE, PMID2PROBE, ALIAS, GENENAME The scope of COMMON2SYSTEMATIC in envir-based YEAST package is "Named Genes Only" plus all alias In the DB schema: table scope corresponding maps sgd: "Named Genes Only" UNION "All chromosomal features"; GENENAME chromosome_features: "All chromosomal features"; CHR, CHRLOC, DESCRIPTION gene2alias: "Named Genes Only"; ALIAS genes2systematic: "Named Genes Only" UNION "All chromosomal features" UNION alias; COMMON2SYSTEMATIC go_bp, go_mf, go_cc: "Gene Products"; GO pubmed: "All chromosomal features"; PMID interpro, pfam, smart: "ORFs"; INTERPRO, PFAM, SMART