AI4S Cup -LLM挑战赛-大模型提取“基因-疾病-药物”知识图谱-解决方案-不知道对不队-推理代码

空间站广场
论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹
我的工作空间
任务
节点
文件
数据集
镜像
项目
数据库
公开
AI4S Cup -LLM挑战赛-大模型提取“基因-疾病-药物”知识图谱-解决方案-不知道对不队-推理代码
notebook
AI4S
AI4SCUP-LLMKG
notebookAI4SAI4SCUP-LLMKG
bohrac44ed
发布于 2024-04-23
推荐镜像 :Third-party software:ai4s-cup-metrics:0.3
推荐机型 :c8_m32_1 * NVIDIA V100
数据集
librarys(v10)
数据集(v10)
task3_resources(v6)
linkbert-task1(v3)
linkbert_task2(v3)
mistral_instruct(v2)
mistral_base(v2)
Qwen1.5-7Bchat-full(v1)
biomistral-full(v1)
双击即可修改
代码
文本
[4]
import os
os.environ['WANDB_DISABLED']="true"
cur_dir = os.getcwd()
# 为了测评方便，建议指定该文件路径
DATA_PATH = os.getenv('DATA_PATH')
# If DATA_PATH is not set, assign a default value and raise a warning
if not DATA_PATH:
DATA_PATH = '/bohr/AGAC-GDA-0ifh/v8'
print("Warning: DATA_PATH environment variable is not set. Using default path:", DATA_PATH)
! mkdir original_submission
! cp $DATA_PATH/submission.jsonl original_submission/submission.jsonl
import jsonlines
file1=f"{cur_dir}/original_submission/submission_task1.jsonl"
file2=f"{cur_dir}/original_submission/submission_task2.jsonl"
file3=f"{cur_dir}/original_submission/submission_task3.jsonl"
sub_file="original_submission/submission.jsonl"
with jsonlines.open(sub_file, 'r') as reader, jsonlines.open(file1, 'w') as writer1, \
jsonlines.open(file2, 'w') as writer2, jsonlines.open(file3, 'w') as writer3:
for task in reader:
if task['task']==1:
writer1.write(task)
elif task['task']==2:
writer2.write(task)
elif task['task']==3:
writer3.write(task)
! cp /bohr/libr-wd9d/v10/library/zty_final_submission.sh .
! cp -r /bohr/libr-wd9d/v10/library/llm_data llm_data
! cp -r /bohr/libr-wd9d/v10/library/llama_factory llama_factory
! cp -r /bohr/libr-wd9d/v10/library/linkbert linkbert
! bash zty_final_submission.sh
Warning: DATA_PATH environment variable is not set. Using default path: /bohr/AGAC-GDA-0ifh/v8

mkdir: cannot create directory ‘original_submission’: File exists

mkdir: cannot create directory ‘model1_generate’: File exists

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).

04/16/2024 18:55:06 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16

[INFO|tokenization_utils_base.py:2025] 2024-04-16 18:55:06,690 >> loading file tokenizer.model

[INFO|tokenization_utils_base.py:2025] 2024-04-16 18:55:06,690 >> loading file added_tokens.json

[INFO|tokenization_utils_base.py:2025] 2024-04-16 18:55:06,690 >> loading file special_tokens_map.json

[INFO|tokenization_utils_base.py:2025] 2024-04-16 18:55:06,690 >> loading file tokenizer_config.json

[INFO|tokenization_utils_base.py:2025] 2024-04-16 18:55:06,690 >> loading file tokenizer.json

04/16/2024 18:55:06 - INFO - llmtuner.data.loader - Loading dataset final_submission1.json...

04/16/2024 18:55:06 - WARNING - llmtuner.data.utils - Checksum failed: missing SHA-1 hash value in dataset_info.json.

Generating train split: 25 examples [00:00, 2506.46 examples/s]

Converting format of dataset: 100%|████| 25/25 [00:00<00:00, 3682.57 examples/s]

Running tokenizer on dataset: 100%|█████| 25/25 [00:00<00:00, 189.36 examples/s]

input_ids:

[1136, 264, 23427, 4240, 12418, 16107, 28725, 368, 14612, 14900, 297, 4169, 3864, 19810, 304, 9131, 288, 3136, 477, 2245, 28723, 3604, 3905, 14657, 24329, 21653, 17198, 2955, 28725, 14573, 4435, 325, 1763, 28765, 28725, 420, 10907, 28725, 13923, 28725, 15924, 557, 304, 8030, 2955, 28725, 868, 1758, 1378, 456, 1871, 778, 6140, 1016, 1249, 23468, 28723, 13, 10649, 28747, 3604, 3638, 349, 298, 20765, 272, 3857, 2245, 298, 9131, 17198, 28733, 28715, 864, 555, 3136, 395, 396, 19679, 356, 272, 14573, 2268, 1444, 272, 17198, 304, 272, 8030, 28723, 1529, 2107, 3136, 297, 272, 5032, 302, 325, 12835, 28749, 28725, 28318, 5324, 28725, 12751, 1151, 11257, 557, 970, 464, 12835, 28749, 28742, 349, 272, 17198, 28742, 28713, 1141, 28725, 464, 16368, 28742, 13966, 272, 17198, 28742, 28713, 14573, 2268, 5088, 288, 272, 8030, 28725, 304, 464, 3284, 1151, 11257, 28742, 349, 272, 1141, 302, 272, 8030, 11180, 28723, 415, 464, 16368, 28742, 1023, 347, 22260, 778, 624, 302, 272, 2296, 13187, 28747, 6807, 28765, 325, 9195, 302, 908, 557, 420, 10907, 325, 23634, 302, 908, 557, 13923, 325, 1376, 353, 5377, 3758, 557, 442, 15924, 325, 19612, 3758, 970, 272, 14573, 2268, 349, 459, 6315, 6807, 28765, 442, 420, 10907, 609, 6096, 369, 368, 1023, 5643, 272, 1212, 302, 272, 22470, 28723, 1047, 378, 349, 5278, 22470, 737, 464, 20107, 309, 279, 1002, 647, 464, 25449, 3692, 647, 464, 262, 961, 1293, 647, 368, 1023, 3825, 420, 10907, 28723, 1047, 378, 349, 435, 527, 495, 22470, 737, 464, 18722, 10025, 647, 464, 450, 961, 1293, 647, 464, 262, 4944, 1345, 647, 368, 1023, 3825, 6807, 28765, 28723, 6352, 739, 272, 908, 349, 14214, 737, 464, 2440, 286, 297, 647, 464, 1376, 6432, 647, 368, 1023, 3825, 13923, 28723, 25217, 1430, 9378, 349, 24329, 10248, 304, 7567, 297, 272, 6140, 5032, 28723, 1047, 5166, 3136, 460, 1419, 28725, 7681, 706, 395, 901, 293, 28725, 737, 579, 28747, 464, 28732, 12835, 28749, 28740, 28725, 28318, 5324, 28740, 28725, 12751, 1151, 11257, 28740, 557, 28732, 12835, 28749, 28750, 28725, 28318, 5324, 28750, 28725, 12751, 1151, 11257, 28750, 557, 1101, 4135, 1263, 25312, 28725, 1236, 349, 396, 2757, 28747, 12628, 264, 17198, 464, 18884, 28828, 28742, 369, 1515, 274, 908, 5374, 298, 464, 28757, 864, 555, 28741, 647, 304, 264, 17198, 464, 2192, 28743, 28742, 369, 21722, 908, 25885, 298, 464, 28757, 864, 555, 28760, 647, 574, 3825, 1023, 913, 737, 456, 28747, 464, 28732, 18884, 28828, 28725, 6807, 28765, 28725, 26782, 555, 28741, 557, 28732, 2192, 28743, 28725, 420, 10907, 28725, 26782, 555, 28760, 28731, 4135, 995, 927, 298, 3825, 438, 2429, 624, 4372, 304, 708, 27071, 4372, 28723, 13407, 349, 272, 2245, 28747, 13, 13248, 3090, 1549, 302, 272, 384, 17047, 28760, 28770, 28740, 17198, 3352, 4547, 663, 27481, 6883, 302, 12836, 5016, 28723, 13, 28753, 1990, 12053, 28747, 661, 659, 750, 17793, 369, 4548, 697, 297, 340, 2015, 1467, 28725, 1206, 385, 300, 282, 937, 409, 495, 28705, 28770, 28740, 325, 6071, 28759, 28760, 28770, 28740, 557, 272, 17198, 16087, 388, 2156, 262, 28725, 349, 7332, 354, 21797, 28724, 292, 436, 294, 9788, 4320, 325, 28759, 4811, 28758, 28745, 384, 17047, 28760, 28770, 28740, 28731, 304, 4547, 663, 27481, 1212, 3717, 325, 23896, 28750, 28757, 609, 816, 4921, 286, 384, 17047, 28760, 28770, 28740, 297, 264, 2475, 1001, 28716, 419, 302, 6883, 395, 1581, 15193, 1083, 5630, 302, 4547, 663, 27481, 325, 23896, 28731, 298, 7655, 272, 25371, 636, 302, 384, 17047, 28760, 28770, 28740, 4548, 697, 3352, 2223, 28769, 6883, 28723, 13, 16398, 28735, 28747, 384, 17047, 28760, 28770, 28740, 403, 4921, 286, 297, 28705, 28740, 28781, 28774, 2223, 28769, 28750, 28725, 28705, 28750, 28774, 2223, 28769, 28740, 28725, 3522, 438, 1416, 745, 2223, 28769, 28725, 304, 28705, 28740, 28740, 521, 1889, 1799, 2223, 28769, 6883, 477, 12836, 18433, 5414, 28713, 28723, 16969, 352, 15109, 403, 7885, 486, 1863, 5331, 10695, 302, 544, 22363, 439, 1053, 28723, 13, 15132, 28735, 28747, 816, 10248, 28705, 28770, 28783, 1581, 3090, 1549, 3352, 28705, 28740, 28774, 28782, 6883, 28723, 4822, 3090, 1549, 654, 6315, 1160, 1082, 7591, 294, 28725, 562, 438, 2429, 989, 575, 302, 272, 28705, 28740, 28782, 21797, 1035, 12928, 3090, 1549, 325, 28720, 28723, 28754, 28770, 28782, 28734, 28780, 304, 284, 28723, 28754, 28783, 28783, 28750, 28735, 28731, 460, 17931, 298, 2824, 992, 388, 2156, 262, 4693, 304, 908, 28725, 20223, 1951, 840, 2439, 8371, 25130, 28723, 1770, 1658, 6308, 2439, 8371, 294, 4548, 352, 403, 1419, 297, 272, 1676, 8477, 291, 302, 6883, 395, 1167, 4548, 697, 28723, 13, 3185, 3100, 2252, 1702, 28735, 28747, 384, 17047, 28760, 28770, 28740, 349, 459, 264, 3014, 4244, 302, 2223, 28769, 28723, 13, 7226, 11143, 28747, 28705]

inputs:

As a specialized biologist AI, you possess expertise in recognizing entities and extracting relations from text. Your role involves accurately identifying gene names, functional changes (LOF, GOF, REG, COM), and disease names, then structuring this information into specified relational formats.

 Human: Your task is to analyze the provided text to extract gene-disease relations with an emphasis on the functional change between the gene and the disease. Extract relations in the format of (GENE, FUNCTION, DISEASE), where 'GENE' is the gene's name, 'FUNCTION' describes the gene's functional change impacting the disease, and 'DISEASE' is the name of the disease affected. The 'FUNCTION' should be classified into one of the following categories: LOF (loss of function), GOF (gain of function), REG (regulatory relationship), or COM (complex relationship where the functional change is not clearly LOF or GOF). Note that you should analysis the type of the regulation. If it is positive regulation like 'facilitates', 'enhanced', 'increased', you should output GOF. If it is negetive regulation like 'suppressed', 'decreased', 'inhibited', you should output LOF. Only when the function is neutral like 'resulted in', 'regulated', you should output REG. Ensure each relation is accurately identified and presented in the specified format. If multiple relations are found, separate them with commas, like so: '(GENE1, FUNCTION1, DISEASE1),(GENE2, FUNCTION2, DISEASE2),...'. For clarity, here is an example: Given a gene 'XYZ' that loses function leading to 'DiseaseA', and a gene 'ABC' that gains function contributing to 'DiseaseB', your output should look like this: '(XYZ, LOF, DiseaseA),(ABC, GOF, DiseaseB)'. You need to output at least one answer and no duplicate answer. Following is the text:

Sequence variants of the DFNB31 gene among Usher syndrome patients of diverse origin.

PURPOSE: It has been demonstrated that mutations in deafness, autosomal recessive 31 (DFNB31), the gene encoding whirlin, is responsible for nonsyndromic hearing loss (NSHL; DFNB31) and Usher syndrome type II (USH2D). We screened DFNB31 in a large cohort of patients with different clinical subtypes of Usher syndrome (USH) to determine the prevalence of DFNB31 mutations among USH patients.

METHODS: DFNB31 was screened in 149 USH2, 29 USH1, six atypical USH, and 11 unclassified USH patients from diverse ethnic backgrounds. Mutation detection was performed by direct sequencing of all coding exons.

RESULTS: We identified 38 different variants among 195 patients. Most variants were clearly polymorphic, but at least two out of the 15 nonsynonymous variants (p.R350W and p.R882S) are predicted to impair whirlin structure and function, suggesting eventual pathogenicity. No putatively pathogenic mutation was found in the second allele of patients with these mutations.

CONCLUSIONS: DFNB31 is not a major cause of USH.

Assistant: 

[INFO|configuration_utils.py:727] 2024-04-16 18:55:07,846 >> loading configuration file /bohr/biof-ar2p/v1/config.json

[INFO|configuration_utils.py:792] 2024-04-16 18:55:07,847 >> Model config MistralConfig {

  "_name_or_path": "/bohr/biof-ar2p/v1",

  "architectures": [

    "MistralForCausalLM"

  ],

  "attention_dropout": 0.0,

  "bos_token_id": 1,

  "eos_token_id": 2,

  "hidden_act": "silu",

  "hidden_size": 4096,

  "initializer_range": 0.02,

  "intermediate_size": 14336,

  "max_position_embeddings": 32768,

  "model_type": "mistral",

  "num_attention_heads": 32,

  "num_hidden_layers": 32,

  "num_key_value_heads": 8,

  "rms_norm_eps": 1e-05,

  "rope_theta": 10000.0,

  "sliding_window": 4096,

  "tie_word_embeddings": false,

  "torch_dtype": "bfloat16",

  "transformers_version": "4.37.2",

  "use_cache": false,

  "vocab_size": 32000

}



04/16/2024 18:55:07 - INFO - llmtuner.model.patcher - Using KV cache for faster generation.

[INFO|modeling_utils.py:3473] 2024-04-16 18:55:07,887 >> loading weights file /bohr/biof-ar2p/v1/model.safetensors.index.json

[INFO|modeling_utils.py:1426] 2024-04-16 18:55:07,889 >> Instantiating MistralForCausalLM model under default dtype torch.float16.

[INFO|configuration_utils.py:826] 2024-04-16 18:55:07,890 >> Generate config GenerationConfig {

  "bos_token_id": 1,

  "eos_token_id": 2

}



Loading checkpoint shards: 100%|██████████████████| 3/3 [00:09<00:00,  3.13s/it]

[INFO|modeling_utils.py:4350] 2024-04-16 18:55:18,151 >> All model checkpoint weights were used when initializing MistralForCausalLM.



[INFO|modeling_utils.py:4358] 2024-04-16 18:55:18,151 >> All the weights of MistralForCausalLM were initialized from the model checkpoint at /bohr/biof-ar2p/v1.

If your task is similar to the task the model of the checkpoint was trained on, you can already use MistralForCausalLM for predictions without further training.

[INFO|configuration_utils.py:779] 2024-04-16 18:55:18,156 >> loading configuration file /bohr/biof-ar2p/v1/generation_config.json

[INFO|configuration_utils.py:826] 2024-04-16 18:55:18,157 >> Generate config GenerationConfig {

  "bos_token_id": 1,

  "eos_token_id": 2

}



04/16/2024 18:55:18 - INFO - llmtuner.model.adapter - Adapter is not found at evaluation, load the base model.

04/16/2024 18:55:18 - INFO - llmtuner.model.loader - all params: 7241732096

Detected kernel version 4.19.24, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.

[INFO|trainer.py:571] 2024-04-16 18:55:18,336 >> Using auto half precision backend

[INFO|trainer.py:3242] 2024-04-16 18:55:18,344 >> ***** Running Prediction *****

[INFO|trainer.py:3244] 2024-04-16 18:55:18,344 >>   Num examples = 25

[INFO|trainer.py:3247] 2024-04-16 18:55:18,344 >>   Batch size = 1

100%|███████████████████████████████████████████| 25/25 [00:34<00:00,  1.39s/it]

***** predict metrics *****

  predict_runtime            = 0:00:36.15

  predict_samples_per_second =      0.691

  predict_steps_per_second   =      0.691

04/16/2024 18:55:54 - INFO - llmtuner.train.sft.trainer - Saving prediction results to /personal/model1_generate/generated_predictions.jsonl

mkdir: cannot create directory ‘model2_generate’: File exists

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).

04/16/2024 18:56:03 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16

[INFO|tokenization_utils_base.py:2025] 2024-04-16 18:56:03,712 >> loading file vocab.json

[INFO|tokenization_utils_base.py:2025] 2024-04-16 18:56:03,712 >> loading file merges.txt

[INFO|tokenization_utils_base.py:2025] 2024-04-16 18:56:03,712 >> loading file added_tokens.json

[INFO|tokenization_utils_base.py:2025] 2024-04-16 18:56:03,712 >> loading file special_tokens_map.json

[INFO|tokenization_utils_base.py:2025] 2024-04-16 18:56:03,712 >> loading file tokenizer_config.json

[INFO|tokenization_utils_base.py:2025] 2024-04-16 18:56:03,712 >> loading file tokenizer.json

[WARNING|logging.py:314] 2024-04-16 18:56:04,040 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

04/16/2024 18:56:04 - INFO - llmtuner.data.loader - Loading dataset final_submission1.json...

04/16/2024 18:56:04 - WARNING - llmtuner.data.utils - Checksum failed: missing SHA-1 hash value in dataset_info.json.

Running tokenizer on dataset: 100%|█████| 25/25 [00:00<00:00, 150.46 examples/s]

input_ids:

[2121, 264, 27076, 87604, 15235, 11, 498, 15218, 18726, 304, 48083, 14744, 323, 59408, 4300, 504, 1467, 13, 4615, 3476, 17601, 29257, 24588, 14862, 5036, 11, 15629, 4344, 320, 1593, 37, 11, 479, 12483, 11, 13676, 11, 7682, 701, 323, 8457, 5036, 11, 1221, 2036, 1677, 419, 1995, 1119, 5189, 71183, 19856, 624, 33975, 25, 4615, 3383, 374, 311, 23643, 279, 3897, 1467, 311, 8649, 14862, 1737, 55307, 4300, 448, 458, 24654, 389, 279, 15629, 2297, 1948, 279, 14862, 323, 279, 8457, 13, 22826, 4300, 304, 279, 3561, 315, 320, 11085, 36, 11, 24819, 11, 422, 9133, 4066, 701, 1380, 364, 11085, 36, 6, 374, 279, 14862, 594, 829, 11, 364, 18149, 6, 16555, 279, 14862, 594, 15629, 2297, 72955, 279, 8457, 11, 323, 364, 35, 9133, 4066, 6, 374, 279, 829, 315, 279, 8457, 11495, 13, 576, 364, 18149, 6, 1265, 387, 21091, 1119, 825, 315, 279, 2701, 11059, 25, 5017, 37, 320, 9379, 315, 729, 701, 479, 12483, 320, 59146, 315, 729, 701, 13676, 320, 1580, 37120, 5025, 701, 476, 7682, 320, 23247, 5025, 1380, 279, 15629, 2297, 374, 537, 9355, 5017, 37, 476, 479, 12483, 568, 7036, 429, 498, 1265, 6358, 279, 943, 315, 279, 19256, 13, 1416, 432, 374, 6785, 19256, 1075, 364, 22185, 3664, 973, 516, 364, 39867, 4874, 516, 364, 42742, 1475, 516, 498, 1265, 2550, 479, 12483, 13, 1416, 432, 374, 834, 455, 533, 19256, 1075, 364, 12776, 14318, 516, 364, 450, 837, 1475, 516, 364, 258, 92517, 516, 498, 1265, 2550, 5017, 37, 13, 8278, 979, 279, 729, 374, 20628, 1075, 364, 1382, 291, 304, 516, 364, 80622, 516, 498, 1265, 2550, 13676, 13, 29279, 1817, 12687, 374, 29257, 10820, 323, 10449, 304, 279, 5189, 3561, 13, 1416, 5248, 4300, 525, 1730, 11, 8651, 1105, 448, 76602, 11, 1075, 773, 25, 22022, 11085, 36, 16, 11, 24819, 16, 11, 422, 9133, 4066, 16, 23547, 11085, 36, 17, 11, 24819, 17, 11, 422, 9133, 4066, 17, 701, 1112, 4427, 1752, 31273, 11, 1588, 374, 458, 3110, 25, 16246, 264, 14862, 364, 32196, 6, 429, 32191, 729, 6388, 311, 364, 35, 55307, 32, 516, 323, 264, 14862, 364, 25411, 6, 429, 19619, 729, 28720, 311, 364, 35, 55307, 33, 516, 697, 2550, 1265, 1401, 1075, 419, 25, 22022, 32196, 11, 5017, 37, 11, 30874, 32, 23547, 25411, 11, 479, 12483, 11, 30874, 33, 8, 4427, 1446, 1184, 311, 2550, 518, 3245, 825, 4226, 323, 902, 22513, 4226, 13, 22713, 374, 279, 1467, 510, 14076, 26012, 315, 279, 43376, 33342, 18, 16, 14862, 4221, 547, 89505, 27339, 6835, 315, 16807, 6238, 624, 95445, 7150, 25, 1084, 702, 1012, 20459, 429, 33584, 304, 46742, 2090, 11, 46872, 52811, 46762, 533, 220, 18, 16, 320, 5262, 33342, 18, 16, 701, 279, 14862, 11170, 63683, 258, 11, 374, 8480, 369, 31695, 88, 303, 441, 292, 10778, 4709, 320, 2448, 13485, 26, 43376, 33342, 18, 16, 8, 323, 547, 89505, 27339, 943, 7946, 320, 19518, 17, 35, 568, 1205, 57577, 43376, 33342, 18, 16, 304, 264, 3460, 40844, 315, 6835, 448, 2155, 14490, 1186, 9242, 315, 547, 89505, 27339, 320, 19518, 8, 311, 8253, 279, 36909, 315, 43376, 33342, 18, 16, 33584, 4221, 2274, 39, 6835, 624, 38074, 50, 25, 43376, 33342, 18, 16, 572, 57577, 304, 220, 16, 19, 24, 2274, 39, 17, 11, 220, 17, 24, 2274, 39, 16, 11, 4743, 518, 88115, 2274, 39, 11, 323, 220, 16, 16, 650, 64874, 2274, 39, 6835, 504, 16807, 21551, 35476, 13, 67203, 17984, 572, 10660, 553, 2118, 61019, 315, 678, 10822, 505, 2382, 624, 85628, 25, 1205, 10820, 220, 18, 23, 2155, 26012, 4221, 220, 16, 24, 20, 6835, 13, 7496, 26012, 1033, 9355, 44933, 40869, 11, 714, 518, 3245, 1378, 700, 315, 279, 220, 16, 20, 31695, 1872, 9757, 26012, 320, 79, 2013, 18, 20, 15, 54, 323, 281, 2013, 23, 23, 17, 50, 8, 525, 19149, 311, 37874, 63683, 258, 5944, 323, 729, 11, 22561, 41735, 1815, 28469, 487, 13, 2308, 2182, 7887, 1815, 28469, 26374, 572, 1730, 304, 279, 2086, 69410, 315, 6835, 448, 1493, 33584, 624, 5790, 37853, 42386, 25, 43376, 33342, 18, 16, 374, 537, 264, 3598, 5240, 315, 2274, 39, 624, 71703, 25, 220]

inputs:

As a specialized biologist AI, you possess expertise in recognizing entities and extracting relations from text. Your role involves accurately identifying gene names, functional changes (LOF, GOF, REG, COM), and disease names, then structuring this information into specified relational formats.

Human: Your task is to analyze the provided text to extract gene-disease relations with an emphasis on the functional change between the gene and the disease. Extract relations in the format of (GENE, FUNCTION, DISEASE), where 'GENE' is the gene's name, 'FUNCTION' describes the gene's functional change impacting the disease, and 'DISEASE' is the name of the disease affected. The 'FUNCTION' should be classified into one of the following categories: LOF (loss of function), GOF (gain of function), REG (regulatory relationship), or COM (complex relationship where the functional change is not clearly LOF or GOF). Note that you should analysis the type of the regulation. If it is positive regulation like 'facilitates', 'enhanced', 'increased', you should output GOF. If it is negetive regulation like 'suppressed', 'decreased', 'inhibited', you should output LOF. Only when the function is neutral like 'resulted in', 'regulated', you should output REG. Ensure each relation is accurately identified and presented in the specified format. If multiple relations are found, separate them with commas, like so: '(GENE1, FUNCTION1, DISEASE1),(GENE2, FUNCTION2, DISEASE2),...'. For clarity, here is an example: Given a gene 'XYZ' that loses function leading to 'DiseaseA', and a gene 'ABC' that gains function contributing to 'DiseaseB', your output should look like this: '(XYZ, LOF, DiseaseA),(ABC, GOF, DiseaseB)'. You need to output at least one answer and no duplicate answer. Following is the text:

Sequence variants of the DFNB31 gene among Usher syndrome patients of diverse origin.

PURPOSE: It has been demonstrated that mutations in deafness, autosomal recessive 31 (DFNB31), the gene encoding whirlin, is responsible for nonsyndromic hearing loss (NSHL; DFNB31) and Usher syndrome type II (USH2D). We screened DFNB31 in a large cohort of patients with different clinical subtypes of Usher syndrome (USH) to determine the prevalence of DFNB31 mutations among USH patients.

METHODS: DFNB31 was screened in 149 USH2, 29 USH1, six atypical USH, and 11 unclassified USH patients from diverse ethnic backgrounds. Mutation detection was performed by direct sequencing of all coding exons.

RESULTS: We identified 38 different variants among 195 patients. Most variants were clearly polymorphic, but at least two out of the 15 nonsynonymous variants (p.R350W and p.R882S) are predicted to impair whirlin structure and function, suggesting eventual pathogenicity. No putatively pathogenic mutation was found in the second allele of patients with these mutations.

CONCLUSIONS: DFNB31 is not a major cause of USH.

Assistant: 

[INFO|configuration_utils.py:727] 2024-04-16 18:56:09,462 >> loading configuration file /bohr/qwen-ehvp/v1/config.json

[INFO|configuration_utils.py:792] 2024-04-16 18:56:09,463 >> Model config Qwen2Config {

  "_name_or_path": "/bohr/qwen-ehvp/v1",

  "architectures": [

    "Qwen2ForCausalLM"

  ],

  "attention_dropout": 0.0,

  "bos_token_id": 151643,

  "eos_token_id": 151645,

  "hidden_act": "silu",

  "hidden_size": 4096,

  "initializer_range": 0.02,

  "intermediate_size": 11008,

  "max_position_embeddings": 32768,

  "max_window_layers": 28,

  "model_type": "qwen2",

  "num_attention_heads": 32,

  "num_hidden_layers": 32,

  "num_key_value_heads": 32,

  "rms_norm_eps": 1e-06,

  "rope_theta": 1000000.0,

  "sliding_window": 32768,

  "tie_word_embeddings": false,

  "torch_dtype": "bfloat16",

  "transformers_version": "4.37.2",

  "use_cache": false,

  "use_sliding_window": false,

  "vocab_size": 151936

}



04/16/2024 18:56:09 - INFO - llmtuner.model.patcher - Using KV cache for faster generation.

[INFO|modeling_utils.py:3473] 2024-04-16 18:56:09,501 >> loading weights file /bohr/qwen-ehvp/v1/model.safetensors.index.json

[INFO|modeling_utils.py:1426] 2024-04-16 18:56:09,503 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16.

[INFO|configuration_utils.py:826] 2024-04-16 18:56:09,504 >> Generate config GenerationConfig {

  "bos_token_id": 151643,

  "eos_token_id": 151645

}



Loading checkpoint shards: 100%|██████████████████| 4/4 [00:09<00:00,  2.27s/it]

[INFO|modeling_utils.py:4350] 2024-04-16 18:56:19,259 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.



[INFO|modeling_utils.py:4358] 2024-04-16 18:56:19,260 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /bohr/qwen-ehvp/v1.

If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.

[INFO|configuration_utils.py:779] 2024-04-16 18:56:19,265 >> loading configuration file /bohr/qwen-ehvp/v1/generation_config.json

[INFO|configuration_utils.py:826] 2024-04-16 18:56:19,266 >> Generate config GenerationConfig {

  "bos_token_id": 151643,

  "do_sample": true,

  "eos_token_id": [

    151645,

    151643

  ],

  "pad_token_id": 151643,

  "repetition_penalty": 1.05,

  "temperature": 0.7,

  "top_k": 20,

  "top_p": 0.8

}



04/16/2024 18:56:19 - INFO - llmtuner.model.adapter - Adapter is not found at evaluation, load the base model.

04/16/2024 18:56:19 - INFO - llmtuner.model.loader - all params: 7721324544

Detected kernel version 4.19.24, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.

[INFO|trainer.py:571] 2024-04-16 18:56:19,452 >> Using auto half precision backend

[INFO|trainer.py:3242] 2024-04-16 18:56:19,462 >> ***** Running Prediction *****

[INFO|trainer.py:3244] 2024-04-16 18:56:19,462 >>   Num examples = 25

[INFO|trainer.py:3247] 2024-04-16 18:56:19,462 >>   Batch size = 1

100%|███████████████████████████████████████████| 25/25 [00:23<00:00,  1.05it/s]

***** predict metrics *****

  predict_runtime            = 0:00:25.30

  predict_samples_per_second =      0.988

  predict_steps_per_second   =      0.988

04/16/2024 18:56:44 - INFO - llmtuner.train.sft.trainer - Saving prediction results to /personal/model2_generate/generated_predictions.jsonl

mkdir: cannot create directory ‘model3_generate’: File exists

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).

04/16/2024 18:56:54 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16

[INFO|tokenization_utils_base.py:2025] 2024-04-16 18:56:54,018 >> loading file tokenizer.model

[INFO|tokenization_utils_base.py:2025] 2024-04-16 18:56:54,018 >> loading file added_tokens.json

[INFO|tokenization_utils_base.py:2025] 2024-04-16 18:56:54,018 >> loading file special_tokens_map.json

[INFO|tokenization_utils_base.py:2025] 2024-04-16 18:56:54,018 >> loading file tokenizer_config.json

[INFO|tokenization_utils_base.py:2025] 2024-04-16 18:56:54,018 >> loading file tokenizer.json

04/16/2024 18:56:54 - INFO - llmtuner.data.loader - Loading dataset final_submission2.json...

04/16/2024 18:56:54 - WARNING - llmtuner.data.utils - Checksum failed: missing SHA-1 hash value in dataset_info.json.

Generating train split: 224 examples [00:00, 19440.98 examples/s]

Converting format of dataset: 100%|█| 224/224 [00:00<00:00, 19834.57 examples/s]

Running tokenizer on dataset: 100%|███| 224/224 [00:00<00:00, 257.25 examples/s]

input_ids:

[1136, 264, 1957, 4065, 4240, 12418, 16107, 28725, 368, 14612, 14900, 297, 24329, 4169, 3864, 19810, 304, 9131, 288, 652, 9391, 477, 10469, 19337, 28723, 16157, 653, 574, 10023, 6266, 298, 9051, 325, 3086, 687, 28725, 8030, 28731, 3136, 395, 1486, 16021, 28723, 13, 10649, 28747, 3604, 3638, 349, 298, 1424, 294, 26344, 20765, 272, 3857, 11576, 477, 10469, 11354, 304, 9131, 544, 9391, 1444, 623, 4647, 304, 18257, 28723, 661, 349, 13040, 369, 368, 9051, 1430, 18300, 28733, 28715, 864, 555, 5964, 24329, 28723, 21073, 544, 272, 9391, 368, 1300, 297, 264, 8113, 5032, 28747, 325, 3086, 687, 28705, 28740, 28725, 8030, 28705, 28740, 557, 325, 3086, 687, 28705, 28750, 28725, 8030, 28705, 28750, 557, 325, 25272, 16381, 3850, 5919, 616, 7750, 298, 456, 5032, 10536, 271, 5019, 304, 511, 459, 3024, 707, 12982, 628, 7616, 442, 10928, 697, 28723, 3604, 6258, 3232, 1023, 347, 356, 17008, 9040, 13828, 304, 272, 9237, 1774, 302, 8598, 3136, 28723, 1263, 3335, 28725, 513, 272, 11576, 4498, 594, 464, 12580, 361, 262, 21123, 272, 4623, 302, 4148, 18762, 28140, 8030, 28742, 304, 464, 28737, 28726, 715, 311, 16926, 993, 389, 2303, 13713, 12380, 302, 976, 28764, 24556, 28742, 28713, 647, 574, 3825, 1023, 347, 28747, 8272, 12580, 361, 262, 647, 464, 5538, 18762, 28140, 8030, 4829, 8272, 28737, 28726, 715, 311, 16926, 647, 464, 2707, 28764, 24556, 28742, 28713, 4532, 995, 1023, 3825, 438, 2429, 624, 3758, 304, 708, 27071, 3758, 28723, 13407, 349, 272, 2245, 28747, 13, 28765, 314, 322, 313, 473, 349, 264, 2008, 21928, 382, 28750, 28733, 267, 18933, 2725, 4959, 392, 1307, 297, 297, 27792, 6472, 354, 27539, 302, 6727, 9271, 15148, 304, 349, 8102, 6488, 20646, 1096, 302, 871, 2859, 2434, 28723, 5800, 544, 302, 272, 5489, 2632, 382, 28750, 28733, 267, 18933, 2725, 4959, 1583, 506, 4894, 272, 3313, 5454, 298, 4244, 882, 361, 1962, 28725, 865, 989, 8361, 5745, 4469, 506, 750, 5363, 395, 1282, 322, 313, 473, 28723, 415, 11739, 2264, 356, 3522, 4469, 302, 1282, 322, 313, 473, 28733, 19704, 601, 882, 361, 1962, 297, 6556, 1332, 6883, 693, 15253, 4716, 3714, 17078, 302, 1282, 322, 313, 473, 28723, 415, 16074, 323, 493, 7752, 1063, 302, 1282, 322, 313, 473, 460, 23282, 28725, 395, 708, 2268, 297, 871, 1424, 21530, 1443, 297, 272, 22476, 4889, 2598, 28723, 415, 22570, 302, 1413, 1282, 322, 313, 473, 297, 22476, 12920, 460, 9951, 28723, 13, 7226, 11143, 28747, 28705]

inputs:

As a proficient biologist AI, you possess expertise in accurately recognizing entities and extracting their relationships from scientific texts. Utilize your advanced skills to identify (compound, disease) relations with high precision.

 Human: Your task is to meticulously analyze the provided abstract from scientific literature and extract all relationships between compounds and diseases. It is crucial that you identify each compound-disease pair accurately. Present all the relationships you find in a strict format: (compound 1, disease 1), (compound 2, disease 2), (...,...) ... Please adhere to this format rigorously and do not include any supplementary comments or explanations. Your primary focus should be on precise entity recognition and the extraction of relevant relations. For instance, if the abstract mentions 'Aspirin reduces the risk of cardiovascular disease' and 'Ibuprofen may alleviate symptoms of Alzheimer's', your output should be: ('Aspirin', 'cardiovascular disease'), ('Ibuprofen', 'Alzheimer's'). You should output at least one relationship and no duplicate relationship. Following is the text:

Famotidine is a histamine H2-receptor antagonist used in inpatient settings for prevention of stress ulcers and is showing increasing popularity because of its low cost. Although all of the currently available H2-receptor antagonists have shown the propensity to cause delirium, only two previously reported cases have been associated with famotidine. The authors report on six cases of famotidine-associated delirium in hospitalized patients who cleared completely upon removal of famotidine. The pharmacokinetics of famotidine are reviewed, with no change in its metabolism in the elderly population seen. The implications of using famotidine in elderly persons are discussed.

Assistant: 

[INFO|configuration_utils.py:727] 2024-04-16 18:56:55,785 >> loading configuration file /bohr/mist-opx6/v2/config.json

[INFO|configuration_utils.py:792] 2024-04-16 18:56:55,787 >> Model config MistralConfig {

  "_name_or_path": "/bohr/mist-opx6/v2",

  "architectures": [

    "MistralForCausalLM"

  ],

  "attention_dropout": 0.0,

  "bos_token_id": 1,

  "eos_token_id": 2,

  "hidden_act": "silu",

  "hidden_size": 4096,

  "initializer_range": 0.02,

  "intermediate_size": 14336,

  "max_position_embeddings": 32768,

  "model_type": "mistral",

  "num_attention_heads": 32,

  "num_hidden_layers": 32,

  "num_key_value_heads": 8,

  "rms_norm_eps": 1e-05,

  "rope_theta": 10000.0,

  "sliding_window": 4096,

  "tie_word_embeddings": false,

  "torch_dtype": "bfloat16",

  "transformers_version": "4.37.2",

  "use_cache": false,

  "vocab_size": 32000

}



04/16/2024 18:56:55 - INFO - llmtuner.model.patcher - Using KV cache for faster generation.

[INFO|modeling_utils.py:3473] 2024-04-16 18:56:55,828 >> loading weights file /bohr/mist-opx6/v2/model.safetensors.index.json

[INFO|modeling_utils.py:1426] 2024-04-16 18:56:55,829 >> Instantiating MistralForCausalLM model under default dtype torch.float16.

[INFO|configuration_utils.py:826] 2024-04-16 18:56:55,830 >> Generate config GenerationConfig {

  "bos_token_id": 1,

  "eos_token_id": 2

}



Loading checkpoint shards: 100%|██████████████████| 3/3 [00:09<00:00,  3.01s/it]

[INFO|modeling_utils.py:4350] 2024-04-16 18:57:05,577 >> All model checkpoint weights were used when initializing MistralForCausalLM.



[INFO|modeling_utils.py:4358] 2024-04-16 18:57:05,577 >> All the weights of MistralForCausalLM were initialized from the model checkpoint at /bohr/mist-opx6/v2.

If your task is similar to the task the model of the checkpoint was trained on, you can already use MistralForCausalLM for predictions without further training.

[INFO|configuration_utils.py:779] 2024-04-16 18:57:05,583 >> loading configuration file /bohr/mist-opx6/v2/generation_config.json

[INFO|configuration_utils.py:826] 2024-04-16 18:57:05,583 >> Generate config GenerationConfig {

  "bos_token_id": 1,

  "eos_token_id": 2

}



04/16/2024 18:57:05 - INFO - llmtuner.model.adapter - Adapter is not found at evaluation, load the base model.

04/16/2024 18:57:05 - INFO - llmtuner.model.loader - all params: 7241732096

Detected kernel version 4.19.24, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.

[INFO|trainer.py:571] 2024-04-16 18:57:05,779 >> Using auto half precision backend

[INFO|trainer.py:3242] 2024-04-16 18:57:05,788 >> ***** Running Prediction *****

[INFO|trainer.py:3244] 2024-04-16 18:57:05,788 >>   Num examples = 224

[INFO|trainer.py:3247] 2024-04-16 18:57:05,788 >>   Batch size = 1

100%|█████████████████████████████████████████| 224/224 [04:40<00:00,  1.25s/it]

***** predict metrics *****

  predict_runtime            = 0:04:42.14

  predict_samples_per_second =      0.794

  predict_steps_per_second   =      0.794

04/16/2024 19:01:47 - INFO - llmtuner.train.sft.trainer - Saving prediction results to /personal/model3_generate/generated_predictions.jsonl

mkdir: cannot create directory ‘model4_generate’: File exists

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).

04/16/2024 19:01:59 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16

[INFO|tokenization_utils_base.py:2025] 2024-04-16 19:01:59,197 >> loading file tokenizer.model

[INFO|tokenization_utils_base.py:2025] 2024-04-16 19:01:59,198 >> loading file added_tokens.json

[INFO|tokenization_utils_base.py:2025] 2024-04-16 19:01:59,198 >> loading file special_tokens_map.json

[INFO|tokenization_utils_base.py:2025] 2024-04-16 19:01:59,198 >> loading file tokenizer_config.json

[INFO|tokenization_utils_base.py:2025] 2024-04-16 19:01:59,198 >> loading file tokenizer.json

04/16/2024 19:01:59 - INFO - llmtuner.data.loader - Loading dataset final_submission2.json...

04/16/2024 19:01:59 - WARNING - llmtuner.data.utils - Checksum failed: missing SHA-1 hash value in dataset_info.json.

Running tokenizer on dataset: 100%|███| 224/224 [00:00<00:00, 254.24 examples/s]

input_ids:

[1136, 264, 1957, 4065, 4240, 12418, 16107, 28725, 368, 14612, 14900, 297, 24329, 4169, 3864, 19810, 304, 9131, 288, 652, 9391, 477, 10469, 19337, 28723, 16157, 653, 574, 10023, 6266, 298, 9051, 325, 3086, 687, 28725, 8030, 28731, 3136, 395, 1486, 16021, 28723, 13, 10649, 28747, 3604, 3638, 349, 298, 1424, 294, 26344, 20765, 272, 3857, 11576, 477, 10469, 11354, 304, 9131, 544, 9391, 1444, 623, 4647, 304, 18257, 28723, 661, 349, 13040, 369, 368, 9051, 1430, 18300, 28733, 28715, 864, 555, 5964, 24329, 28723, 21073, 544, 272, 9391, 368, 1300, 297, 264, 8113, 5032, 28747, 325, 3086, 687, 28705, 28740, 28725, 8030, 28705, 28740, 557, 325, 3086, 687, 28705, 28750, 28725, 8030, 28705, 28750, 557, 325, 25272, 16381, 3850, 5919, 616, 7750, 298, 456, 5032, 10536, 271, 5019, 304, 511, 459, 3024, 707, 12982, 628, 7616, 442, 10928, 697, 28723, 3604, 6258, 3232, 1023, 347, 356, 17008, 9040, 13828, 304, 272, 9237, 1774, 302, 8598, 3136, 28723, 1263, 3335, 28725, 513, 272, 11576, 4498, 594, 464, 12580, 361, 262, 21123, 272, 4623, 302, 4148, 18762, 28140, 8030, 28742, 304, 464, 28737, 28726, 715, 311, 16926, 993, 389, 2303, 13713, 12380, 302, 976, 28764, 24556, 28742, 28713, 647, 574, 3825, 1023, 347, 28747, 8272, 12580, 361, 262, 647, 464, 5538, 18762, 28140, 8030, 4829, 8272, 28737, 28726, 715, 311, 16926, 647, 464, 2707, 28764, 24556, 28742, 28713, 4532, 995, 1023, 3825, 438, 2429, 624, 3758, 304, 708, 27071, 3758, 28723, 13407, 349, 272, 2245, 28747, 13, 28765, 314, 322, 313, 473, 349, 264, 2008, 21928, 382, 28750, 28733, 267, 18933, 2725, 4959, 392, 1307, 297, 297, 27792, 6472, 354, 27539, 302, 6727, 9271, 15148, 304, 349, 8102, 6488, 20646, 1096, 302, 871, 2859, 2434, 28723, 5800, 544, 302, 272, 5489, 2632, 382, 28750, 28733, 267, 18933, 2725, 4959, 1583, 506, 4894, 272, 3313, 5454, 298, 4244, 882, 361, 1962, 28725, 865, 989, 8361, 5745, 4469, 506, 750, 5363, 395, 1282, 322, 313, 473, 28723, 415, 11739, 2264, 356, 3522, 4469, 302, 1282, 322, 313, 473, 28733, 19704, 601, 882, 361, 1962, 297, 6556, 1332, 6883, 693, 15253, 4716, 3714, 17078, 302, 1282, 322, 313, 473, 28723, 415, 16074, 323, 493, 7752, 1063, 302, 1282, 322, 313, 473, 460, 23282, 28725, 395, 708, 2268, 297, 871, 1424, 21530, 1443, 297, 272, 22476, 4889, 2598, 28723, 415, 22570, 302, 1413, 1282, 322, 313, 473, 297, 22476, 12920, 460, 9951, 28723, 13, 7226, 11143, 28747, 28705]

inputs:

As a proficient biologist AI, you possess expertise in accurately recognizing entities and extracting their relationships from scientific texts. Utilize your advanced skills to identify (compound, disease) relations with high precision.

 Human: Your task is to meticulously analyze the provided abstract from scientific literature and extract all relationships between compounds and diseases. It is crucial that you identify each compound-disease pair accurately. Present all the relationships you find in a strict format: (compound 1, disease 1), (compound 2, disease 2), (...,...) ... Please adhere to this format rigorously and do not include any supplementary comments or explanations. Your primary focus should be on precise entity recognition and the extraction of relevant relations. For instance, if the abstract mentions 'Aspirin reduces the risk of cardiovascular disease' and 'Ibuprofen may alleviate symptoms of Alzheimer's', your output should be: ('Aspirin', 'cardiovascular disease'), ('Ibuprofen', 'Alzheimer's'). You should output at least one relationship and no duplicate relationship. Following is the text:

Famotidine is a histamine H2-receptor antagonist used in inpatient settings for prevention of stress ulcers and is showing increasing popularity because of its low cost. Although all of the currently available H2-receptor antagonists have shown the propensity to cause delirium, only two previously reported cases have been associated with famotidine. The authors report on six cases of famotidine-associated delirium in hospitalized patients who cleared completely upon removal of famotidine. The pharmacokinetics of famotidine are reviewed, with no change in its metabolism in the elderly population seen. The implications of using famotidine in elderly persons are discussed.

Assistant: 

[INFO|configuration_utils.py:727] 2024-04-16 19:02:01,030 >> loading configuration file /bohr/misc-wj1z/v2/config.json

[INFO|configuration_utils.py:792] 2024-04-16 19:02:01,031 >> Model config MistralConfig {

  "_name_or_path": "/bohr/misc-wj1z/v2",

  "architectures": [

    "MistralForCausalLM"

  ],

  "attention_dropout": 0.0,

  "bos_token_id": 1,

  "eos_token_id": 2,

  "hidden_act": "silu",

  "hidden_size": 4096,

  "initializer_range": 0.02,

  "intermediate_size": 14336,

  "max_position_embeddings": 32768,

  "model_type": "mistral",

  "num_attention_heads": 32,

  "num_hidden_layers": 32,

  "num_key_value_heads": 8,

  "rms_norm_eps": 1e-05,

  "rope_theta": 1000000.0,

  "sliding_window": null,

  "tie_word_embeddings": false,

  "torch_dtype": "bfloat16",

  "transformers_version": "4.37.2",

  "use_cache": false,

  "vocab_size": 32000

}



04/16/2024 19:02:01 - INFO - llmtuner.model.patcher - Using KV cache for faster generation.

[INFO|modeling_utils.py:3473] 2024-04-16 19:02:01,072 >> loading weights file /bohr/misc-wj1z/v2/model.safetensors.index.json

[INFO|modeling_utils.py:1426] 2024-04-16 19:02:01,073 >> Instantiating MistralForCausalLM model under default dtype torch.float16.

[INFO|configuration_utils.py:826] 2024-04-16 19:02:01,074 >> Generate config GenerationConfig {

  "bos_token_id": 1,

  "eos_token_id": 2

}



Loading checkpoint shards: 100%|██████████████████| 3/3 [00:08<00:00,  2.80s/it]

[INFO|modeling_utils.py:4350] 2024-04-16 19:02:10,201 >> All model checkpoint weights were used when initializing MistralForCausalLM.



[INFO|modeling_utils.py:4358] 2024-04-16 19:02:10,201 >> All the weights of MistralForCausalLM were initialized from the model checkpoint at /bohr/misc-wj1z/v2.

If your task is similar to the task the model of the checkpoint was trained on, you can already use MistralForCausalLM for predictions without further training.

[INFO|configuration_utils.py:779] 2024-04-16 19:02:10,207 >> loading configuration file /bohr/misc-wj1z/v2/generation_config.json

[INFO|configuration_utils.py:826] 2024-04-16 19:02:10,207 >> Generate config GenerationConfig {

  "bos_token_id": 1,

  "eos_token_id": 2

}



04/16/2024 19:02:10 - INFO - llmtuner.model.adapter - Adapter is not found at evaluation, load the base model.

04/16/2024 19:02:10 - INFO - llmtuner.model.loader - all params: 7241732096

Detected kernel version 4.19.24, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.

[INFO|trainer.py:571] 2024-04-16 19:02:10,391 >> Using auto half precision backend

[INFO|trainer.py:3242] 2024-04-16 19:02:10,400 >> ***** Running Prediction *****

[INFO|trainer.py:3244] 2024-04-16 19:02:10,400 >>   Num examples = 224

[INFO|trainer.py:3247] 2024-04-16 19:02:10,400 >>   Batch size = 1

100%|█████████████████████████████████████████| 224/224 [04:21<00:00,  1.17s/it]

***** predict metrics *****

  predict_runtime            = 0:04:22.94

  predict_samples_per_second =      0.852

  predict_steps_per_second   =      0.852

04/16/2024 19:06:33 - INFO - llmtuner.train.sft.trainer - Saving prediction results to /personal/model4_generate/generated_predictions.jsonl

mkdir: cannot create directory ‘task1_submission’: File exists

Generating submission file...

Submission file created at: 

/personal/task1_submission/golden_submission.jsonl

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).

04/16/2024 19:06:43 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True

Generating train split: 578 examples [00:00, 67706.75 examples/s]

Generating validation split: 24 examples [00:00, 10092.57 examples/s]

Generating test split: 24 examples [00:00, 11791.41 examples/s]



label_list ['0', 'COM', 'GOF', 'LOF', 'REG']

Running tokenizer on dataset: 100%|██| 578/578 [00:00<00:00, 2272.64 examples/s]

Running tokenizer on dataset: 100%|████| 24/24 [00:00<00:00, 1521.90 examples/s]

Running tokenizer on dataset: 100%|████| 24/24 [00:00<00:00, 1513.67 examples/s]

04/16/2024 19:06:45 - WARNING - accelerate.utils.other - Detected kernel version 4.19.24, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.

100%|█████████████████████████████████████████████| 3/3 [00:00<00:00, 15.88it/s]seqcls/run_seqcls.py:482: FutureWarning: load_metric is deprecated and will be removed in the next major version of datasets. Use 'evaluate.load' instead, from the new library 🤗 Evaluate: https://huggingface.co/docs/evaluate

  metric = load_metric("accuracy")

/opt/conda/lib/python3.8/site-packages/datasets/load.py:752: FutureWarning: The repository for accuracy contains custom code which must be executed to correctly load the metric. You can inspect the repository content at https://raw.githubusercontent.com/huggingface/datasets/2.16.1/metrics/accuracy/accuracy.py

You can avoid this message in future by passing the argument `trust_remote_code=True`.

Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.

  warnings.warn(

['REG', 'REG', '0', '0', '0', '0', 'LOF', 'LOF', 'LOF', 'LOF', 'LOF', 'LOF', 'LOF', 'LOF', 'LOF', 'REG', 'REG', '0', '0', 'LOF', 'REG', 'REG', 'LOF', 'REG']

100%|█████████████████████████████████████████████| 3/3 [00:01<00:00,  1.80it/s]

Generating submission file...

Submission file created at: 

/personal/task1_submission/submission.jsonl

mkdir: cannot create directory ‘task2_submission’: File exists

Generating submission file...

Submission file created at: 

/personal/task2_submission/golden_submission.jsonl

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).

04/16/2024 19:06:56 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True

Generating train split: 14604 examples [00:00, 176981.91 examples/s]

Generating validation split: 475 examples [00:00, 107755.66 examples/s]

Generating test split: 475 examples [00:00, 118019.93 examples/s]



label_list ['0', '1']

Running tokenizer on dataset: 100%|█| 14604/14604 [00:05<00:00, 2466.85 examples

Running tokenizer on dataset: 100%|██| 475/475 [00:00<00:00, 2830.21 examples/s]

Running tokenizer on dataset: 100%|██| 475/475 [00:00<00:00, 2820.92 examples/s]

04/16/2024 19:07:04 - WARNING - accelerate.utils.other - Detected kernel version 4.19.24, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.

 98%|██████████████████████████████████████████▎| 59/60 [00:05<00:00, 10.24it/s]seqcls/run_seqcls.py:482: FutureWarning: load_metric is deprecated and will be removed in the next major version of datasets. Use 'evaluate.load' instead, from the new library 🤗 Evaluate: https://huggingface.co/docs/evaluate

  metric = load_metric("accuracy")

/opt/conda/lib/python3.8/site-packages/datasets/load.py:752: FutureWarning: The repository for accuracy contains custom code which must be executed to correctly load the metric. You can inspect the repository content at https://raw.githubusercontent.com/huggingface/datasets/2.16.1/metrics/accuracy/accuracy.py

You can avoid this message in future by passing the argument `trust_remote_code=True`.

Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.

  warnings.warn(

['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '0', '1', '0', '1', '1', '1', '1', '1', '1', '0', '1', '1', '1', '1', '1', '1', '0', '1', '1', '1', '1', '1', '0', '0', '1', '0', '0', '0', '0', '0', '0', '1', '0', '0', '0', '0', '0', '1', '1', '1', '1', '1', '1', '0', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '0', '1', '1', '0', '0', '1', '0', '1', '0', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '1', '0', '0', '0', '0', '1', '0', '1', '1', '1', '1', '1', '1', '0', '1', '1', '1', '1', '1', '0', '0', '0', '0', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '1', '0', '0', '1', '0', '1', '1', '0', '0', '0', '0', '0', '1', '0', '1', '1', '1', '0', '1', '0', '1', '0', '0', '1', '1', '1', '1', '0', '1', '1', '1', '1', '0', '0', '1', '1', '1', '1', '0', '0', '1', '1', '1', '1', '1', '1', '1', '0', '1', '1', '0', '0', '0', '0', '1', '1', '1', '1', '1', '1', '1', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '1', '1', '1', '1', '1', '1', '1', '1', '0', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '1', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '1', '0', '1', '1', '0', '1', '1', '1', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '0', '0', '0', '1', '1', '1', '1', '1', '1', '1', '0', '0', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '1', '0', '0', '1', '1', '0', '0', '0', '0', '1', '0', '0', '1', '1', '1', '0', '0', '1', '1', '0', '0', '1', '1', '0', '0', '0', '0', '0', '0', '1', '1', '1', '1', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '0', '0', '0', '0', '0', '0', '1', '1', '1', '1', '1', '1', '1', '1']

100%|███████████████████████████████████████████| 60/60 [00:07<00:00,  7.96it/s]

Generating submission file...

Submission file created at: 

/personal/task2_submission/submission.jsonl
代码
文本
代码
文本
[ ]
代码
文本
[5]
import jsonlines
file1="./task1_submission/submission.jsonl"
file2="./task2_submission/submission.jsonl"
file3="./task3_submission/submission.jsonl"
sub_file="submission.jsonl"
with jsonlines.open(sub_file, 'w') as writer:
with jsonlines.open(file1, 'r') as reader1:
for task1 in reader1:
if task1['task']==1:
writer.write(task1)
with jsonlines.open(file2, 'r') as reader2:
for task2 in reader2:
if task2['task']==2:
writer.write(task2)
with jsonlines.open(file3, 'r') as reader3:
for task3 in reader3:
if task3['task']==3:
writer.write(task3)
代码
文本
[ ]
代码
文本
[7]
代码
文本
[8]
代码
文本
[ ]
代码
文本
[9]
代码
文本
[ ]
代码
文本
notebook
AI4S
AI4SCUP-LLMKG
notebookAI4SAI4SCUP-LLMKG
点个赞吧