【FATE联邦学习】自定义数据集自定义神经网络模型下的横向纵向训练
创始人
2024-05-31 06:30:24
0

前言

代码大部分来自

  • https://fate.readthedocs.io/en/latest/tutorial/pipeline/nn_tutorial/Hetero-NN-Customize-Dataset/#example-implement-a-simple-image-dataset
  • https://fate.readthedocs.io/en/latest/tutorial/pipeline/nn_tutorial/Homo-NN-Customize-your-Dataset/

但是官方的文档不完整,我自己记录完整一下。

我用的是mnist数据集https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/fate/examples/data/mnist.zip。
目录结构如下,横向的话,加载两次mnist就可以,而纵向一方加载mnist_guest带标签,一方加载mnist_host没有标签。mnist12两个文件夹没有用,不用管。
在这里插入图片描述

由于官方demo中的需要使用jupyter,不适合普通Python代码,本文给出此例子。在Python的解释器上,要注意在环境变量里加入FATE的安装包里的bin/init_env.sh里面的Python解释器路径,否则federatedml库会找不到。

横向

自定义数据集

自定义数据集,然后再本地测试一下。

import os
from torchvision.datasets import ImageFolder
from torchvision import transforms
from federatedml.nn.dataset.base import Datasetclass MNISTDataset(Dataset):def __init__(self, flatten_feature=False): # flatten feature or not super(MNISTDataset, self).__init__()self.image_folder = Noneself.ids = Noneself.flatten_feature = flatten_featuredef load(self, path):  # read data from path, and set sample ids# read using ImageFolderself.image_folder = ImageFolder(root=path, transform=transforms.Compose([transforms.ToTensor()]))# filename as the image idids = []for image_name in self.image_folder.imgs:ids.append(image_name[0].split('/')[-1].replace('.jpg', ''))self.ids = idsreturn selfdef get_sample_ids(self):  # implement the get sample id interface, simply return idsreturn self.idsdef __len__(self,):  # return the length of the datasetreturn len(self.image_folder)def __getitem__(self, idx): # get itemret = self.image_folder[idx]if self.flatten_feature:img = ret[0][0].flatten() # return flatten tensor 784-dimreturn img, ret[1] # return tensor and labelelse:return retds = MNISTDataset(flatten_feature=True)
ds.load('mnist/')
# print(len(ds))
# print(ds[0])
# print(ds.get_sample_ids()[0])

成功输出后,要手动在FAET/federatedml.nn.datasets下新建数据集文件,把上文的代码扩充成组件类的形式,如下

import torch
from federatedml.nn.dataset.base import Dataset
from torchvision.datasets import ImageFolder
from torchvision import transforms
import numpy as np
# 这里的包缺什么补什么哈class MNISTDataset(Dataset):def __init__(self, flatten_feature=False): # flatten feature or not super(MNISTDataset, self).__init__()self.image_folder = Noneself.ids = Noneself.flatten_feature = flatten_featuredef load(self, path):  # read data from path, and set sample ids# read using ImageFolderself.image_folder = ImageFolder(root=path, transform=transforms.Compose([transforms.ToTensor()]))# filename as the image idids = []for image_name in self.image_folder.imgs:ids.append(image_name[0].split('/')[-1].replace('.jpg', ''))self.ids = idsreturn selfdef get_sample_ids(self):  # implement the get sample id interface, simply return idsreturn self.idsdef __len__(self,):  # return the length of the datasetreturn len(self.image_folder)def __getitem__(self, idx): # get itemret = self.image_folder[idx]if self.flatten_feature:img = ret[0][0].flatten() # return flatten tensor 784-dimreturn img, ret[1] # return tensor and labelelse:return retif __name__ == '__main__':pass

这样就完成了他官方文档所谓的“手动添加”了。添加后federatedml的目录应该是这样的在这里插入图片描述文件名称要和下文的dataset param对应
添加后,FATE就“认识”我们自建的数据集了。
下文中的local test是不需要做手动添加的步骤的,但是local只是做个测试。生产中没什么用……

横向训练

import os
from torchvision.datasets import ImageFolder
from torchvision import transforms
from federatedml.nn.dataset.base import Dataset# test local process
# from federatedml.nn.homo.trainer.fedavg_trainer import FedAVGTrainer
# trainer = FedAVGTrainer(epochs=3, batch_size=256, shuffle=True, data_loader_worker=8, pin_memory=False) # set parameter# trainer.local_mode() # import torch as t
# from pipeline import fate_torch_hook
# fate_torch_hook(t)
# # our simple classification model:
# model = t.nn.Sequential(
#     t.nn.Linear(784, 32),
#     t.nn.ReLU(),
#     t.nn.Linear(32, 10),
#     t.nn.Softmax(dim=1)
# )# trainer.set_model(model) # set model# optimizer = t.optim.Adam(model.parameters(), lr=0.01)  # optimizer
# loss = t.nn.CrossEntropyLoss()  # loss function
# trainer.train(train_set=ds, optimizer=optimizer, loss=loss)  # use dataset we just developed# 必须在federatedml.nn.datasets目录下  手动加入新的数据集的信息!https://blog.csdn.net/Yonggie/article/details/129404212
# real training
import torch as t
from torch import nn
from pipeline import fate_torch_hook
from pipeline.component import HomoNN
from pipeline.backend.pipeline import PipeLine
from pipeline.component import Reader, Evaluation, DataTransform
from pipeline.interface import Data, Modelt = fate_torch_hook(t)import os
# bind data path to name & namespace
fate_project_path = os.path.abspath('./')
host = 1
guest = 2
arbiter = 3
pipeline = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, host=host,arbiter=arbiter)data_0 = {"name": "mnist_guest", "namespace": "experiment"}
data_1 = {"name": "mnist_host", "namespace": "experiment"}data_path_0 = fate_project_path + '/mnist'
data_path_1 = fate_project_path + '/mnist'
pipeline.bind_table(name=data_0['name'], namespace=data_0['namespace'], path=data_path_0)
pipeline.bind_table(name=data_1['name'], namespace=data_1['namespace'], path=data_path_1)reader_0 = Reader(name="reader_0")
reader_0.get_party_instance(role='guest', party_id=guest).component_param(table=data_0)
reader_0.get_party_instance(role='host', party_id=host).component_param(table=data_1)from pipeline.component.nn import DatasetParamdataset_param = DatasetParam(dataset_name='mnist_dataset', flatten_feature=True)  # specify dataset, and its init parametersfrom pipeline.component.homo_nn import TrainerParam  # Interface# our simple classification model:
model = t.nn.Sequential(t.nn.Linear(784, 32),t.nn.ReLU(),t.nn.Linear(32, 10),t.nn.Softmax(dim=1)
)nn_component = HomoNN(name='nn_0',model=model, # modelloss=t.nn.CrossEntropyLoss(),  # lossoptimizer=t.optim.Adam(model.parameters(), lr=0.01), # optimizerdataset=dataset_param,  # datasettrainer=TrainerParam(trainer_name='fedavg_trainer', epochs=2, batch_size=1024, validation_freqs=1),torch_seed=100 # random seed)pipeline.add_component(reader_0)
pipeline.add_component(nn_component, data=Data(train_data=reader_0.output.data))
pipeline.add_component(Evaluation(name='eval_0', eval_type='multi'), data=Data(data=nn_component.output.data))pipeline.compile()
pipeline.fit()# print result and summary
pipeline.get_component('nn_0').get_output_data()
pipeline.get_component('nn_0').get_summary()

纵向

会用到mnist_host和mnist guest,下载

guest data: https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/fate/examples/data/mnist_guest.zip

host data: https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/fate/examples/data/mnist_host.zip

你查看一下数据集的格式。FATE里面的纵向,都是一方有标签,一方没标签,跟我所认知的合并数据集那种场景有差别。

纵向数据集

做法参考横向那里,我这里只给出新建的类的代码,跟横向的有一点点差别。

import torch
from federatedml.nn.dataset.base import Dataset
from torchvision.datasets import ImageFolder
from torchvision import transforms
import numpy as npclass MNISTDataset(Dataset):def __init__(self, return_label=True):  super(MNISTDataset, self).__init__() self.return_label = return_labelself.image_folder = Noneself.ids = Nonedef load(self, path):  self.image_folder = ImageFolder(root=path, transform=transforms.Compose([transforms.ToTensor()]))ids = []for image_name in self.image_folder.imgs:ids.append(image_name[0].split('/')[-1].replace('.jpg', ''))self.ids = idsreturn selfdef get_sample_ids(self, ):return self.idsdef get_classes(self, ):return np.unique(self.image_folder.targets).tolist()def __len__(self,):  return len(self.image_folder)def __getitem__(self, idx): # get item ret = self.image_folder[idx]img = ret[0][0].flatten() # flatten tensor 784 dimsif self.return_label:return img, ret[1] # img & labelelse:return img # no label, for hostif __name__ == '__main__':pass

在这里插入图片描述

纵向训练

详细的注释都放在里面了。

import numpy as np
from federatedml.nn.dataset.base import Dataset
from torchvision.datasets import ImageFolder
from torchvision import transforms# 本地定义的
# class MNISTHetero(Dataset):#     def __init__(self, return_label=True):  
#         super(MNISTHetero, self).__init__() 
#         self.return_label = return_label
#         self.image_folder = None
#         self.ids = None#     def load(self, path):  #         self.image_folder = ImageFolder(root=path, transform=transforms.Compose([transforms.ToTensor()]))
#         ids = []
#         for image_name in self.image_folder.imgs:
#             ids.append(image_name[0].split('/')[-1].replace('.jpg', ''))
#         self.ids = ids#         return self#     def get_sample_ids(self, ):
#         return self.ids#     def get_classes(self, ):
#         return np.unique(self.image_folder.targets).tolist()#     def __len__(self,):  
#         return len(self.image_folder)#     def __getitem__(self, idx): # get item 
#         ret = self.image_folder[idx]
#         img = ret[0][0].flatten() # flatten tensor 784 dims
#         if self.return_label:
#             return img, ret[1] # img & label
#         else:
#             return img # no label, for host# test guest dataset
# ds = MNISTHetero().load('mnist_guest/')
# print(len(ds))
# print(ds[0][0]) 
# print(ds.get_classes())
# print(ds.get_sample_ids()[0: 10])# test host dataset
# ds = MNISTHetero(return_label=False).load('mnist_host')
# print(len(ds))
# print(ds[0]) # no labelimport os
import torch as t
from torch import nn
from pipeline import fate_torch_hook
from pipeline.component import HeteroNN
from pipeline.component.hetero_nn import DatasetParam
from pipeline.backend.pipeline import PipeLine
from pipeline.component import Reader, Evaluation, DataTransform
from pipeline.interface import Data, Modelfate_torch_hook(t)# bind path to fate name&namespace
fate_project_path = os.path.abspath('./')
guest = 4
host = 3pipeline_img = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, host=host)guest_data = {"name": "mnist_guest", "namespace": "experiment"}
host_data = {"name": "mnist_host", "namespace": "experiment"}guest_data_path = fate_project_path + '/mnist_guest'
host_data_path = fate_project_path + '/mnist_host'
pipeline_img.bind_table(name='mnist_guest', namespace='experiment', path=guest_data_path)
pipeline_img.bind_table(name='mnist_host', namespace='experiment', path=host_data_path)guest_data = {"name": "mnist_guest", "namespace": "experiment"}
host_data = {"name": "mnist_host", "namespace": "experiment"}
reader_0 = Reader(name="reader_0")
reader_0.get_party_instance(role='guest', party_id=guest).component_param(table=guest_data)
reader_0.get_party_instance(role='host', party_id=host).component_param(table=host_data)# 这里为什么要这样定义,可以看文档,有模型https://fate.readthedocs.io/en/latest/federatedml_component/hetero_nn/
hetero_nn_0 = HeteroNN(name="hetero_nn_0", epochs=3,interactive_layer_lr=0.01, batch_size=512, task_type='classification', seed=100)
guest_nn_0 = hetero_nn_0.get_party_instance(role='guest', party_id=guest)
host_nn_0 = hetero_nn_0.get_party_instance(role='host', party_id=host)# define model
# image features 784, guest bottom model
# our simple classification model:
guest_bottom = t.nn.Sequential(t.nn.Linear(784, 8),t.nn.ReLU()
)# image features 784, host bottom model
host_bottom = t.nn.Sequential(t.nn.Linear(784, 8),t.nn.ReLU()
)# Top Model, a classifier
guest_top = t.nn.Sequential(nn.Linear(8, 10),nn.Softmax(dim=1)
)# interactive layer define
interactive_layer = t.nn.InteractiveLayer(out_dim=8, guest_dim=8, host_dim=8)# add models, 根据文档定义,guest要add2个,host只有一个
guest_nn_0.add_top_model(guest_top)
guest_nn_0.add_bottom_model(guest_bottom)
host_nn_0.add_bottom_model(host_bottom)# opt, loss
optimizer = t.optim.Adam(lr=0.01) 
loss = t.nn.CrossEntropyLoss()# use DatasetParam to specify dataset and pass parameters
# 注意和你手动加入的文件库名字要对应
guest_nn_0.add_dataset(DatasetParam(dataset_name='mnist_hetero', return_label=True))
host_nn_0.add_dataset(DatasetParam(dataset_name='mnist_hetero', return_label=False))hetero_nn_0.set_interactive_layer(interactive_layer)
hetero_nn_0.compile(optimizer=optimizer, loss=loss)pipeline_img.add_component(reader_0)
pipeline_img.add_component(hetero_nn_0, data=Data(train_data=reader_0.output.data))
pipeline_img.add_component(Evaluation(name='eval_0', eval_type='multi'), data=Data(data=hetero_nn_0.output.data))
pipeline_img.compile()pipeline_img.fit()# 获得结果
pipeline_img.get_component('hetero_nn_0').get_output_data()  # get result

相关内容

热门资讯

爱国活动主持词 爱国活动主持词范文  主持人在台上表演的灵魂就表现在主持词中。我们眼下的社会,主持词在各种活动中起到...
“我的祖国”演讲比赛主持词 “我的祖国”演讲比赛主持词  主持词要注意活动对象,针对活动对象写相应的主持词。在当下这个社会中,越...
舞蹈节目主持词串词 舞蹈节目主持词串词范文(精选8篇)  主持词需要富有情感,充满热情,才能有效地吸引到观众。在当下的社...
我是歌手歌唱比赛主持词 我是歌手歌唱比赛主持词  小歌手主持词篇一  A:尊敬的各位领导  B:敬爱的老师,亲爱的同学们  ...
中秋节联欢晚会主持词 中秋节联欢晚会主持词(精选11篇)  主持词要注意活动对象,针对活动对象写相应的主持词。我们眼下的社...
初中新生开学欢迎词 初中新生开学欢迎词2017各位初一年全体同学石狮二中敞开胸怀迎接你们,真诚地欢迎你们加入这个大家庭,...
品鉴会主持词 品鉴会主持词  借鉴诗词和散文诗是主持词的一种写作手法。随着中国在不断地进步,各种集会的节目都通过主...
新年半台词 新年三句半台词  三句半是一种中国民间群众传统曲艺表演形式。每段内容有三长句一半句。一般由4人演出,...
消夏文艺晚会的主持词 消夏文艺晚会的主持词(精选11篇)  主持词是主持人在节目进行过程中用于串联节目的串联词。在各种集会...
英雄联盟经典台词 英雄联盟经典台词  英雄联盟经典台词  1、正义,要么靠法律,要么靠武力!  2、你迷失在黑暗之中,...
小学元旦节的主持词 小学元旦节的主持词(精选16篇)  主持词是主持人在台上表演的灵魂之所在。在当今不断发展的世界,各种...
婚纱走秀主持词 婚纱走秀主持词三篇  篇一:婚纱走秀演出主持词  当您披上洁白的婚纱,点亮您一生中最美丽的日子,您是...
医者仁心台词 医者仁心台词大全  1. 钟立行对丁祖望:我们都在努力做一个能够被人怀念的人。  2.罗雪樱旁白:从...
《美丽人生》的经典台词 《美丽人生》的经典台词  意大利电影《美丽人生》,由罗伯托贝尼尼自编自演,讲述了意大利一对犹太父子被...
二年级主持词 二年级主持词  主持词分为会议主持词、晚会主持词、活动主持词、婚庆主持词等。在一步步向前发展的社会中...
年会的主持词 年会的主持词范文(通用5篇)  根据活动对象的不同,需要设置不同的主持词。时代不断在进步,主持词是活...
姨妈的后现代生活经典台词分享 姨妈的后现代生活经典台词分享  吉日良辰当欢笑,为什么鲛珠化泪抛?此时却又明白了,世上何尝尽富豪。也...
学校语文教研活动主持词 学校语文教研活动主持词  借鉴诗词和散文诗是主持词的一种写作手法。在一步步向前发展的社会中,很多晚会...
六一庆祝大会主持词 六一庆祝大会主持词  六一就是我们的节日,六一就是一个欢乐的日子,下面小编整理的六一庆祝大会主持词,...
婚礼感谢词 婚礼感谢词(15篇)婚礼感谢词1各位来宾,各位亲友:  大家晚上好!  今日是我女儿XXX和女婿XX...