WO2017076184A1 - 分布式文件系统中的数据写入方法和装置 - Google Patents

分布式文件系统中的数据写入方法和装置 Download PDF

Info

Publication number
WO2017076184A1
WO2017076184A1 PCT/CN2016/103139 CN2016103139W WO2017076184A1 WO 2017076184 A1 WO2017076184 A1 WO 2017076184A1 CN 2016103139 W CN2016103139 W CN 2016103139W WO 2017076184 A1 WO2017076184 A1 WO 2017076184A1
Authority
WO
WIPO (PCT)
Prior art keywords
disk
load
score
probability
remaining capacity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2016/103139
Other languages
English (en)
French (fr)
Inventor
董乘宇
朱家稷
张海勇
曹锋
王勇
姚文辉
吴均平
吴洋
董元元
吴冬政
陆靖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to EP16861452.7A priority Critical patent/EP3373155A1/en
Publication of WO2017076184A1 publication Critical patent/WO2017076184A1/zh
Priority to US15/970,820 priority patent/US11055360B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1737Details of further file system functions for reducing power consumption or coping with limited storage space, e.g. in mobile devices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/184Distributed file systems implemented as replicated file system
    • G06F16/1844Management specifically adapted to replicated file systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Definitions

  • the present application relates to the field of data processing technologies, and in particular, to a data writing method in a distributed file system and a data writing device in a distributed file system.
  • the object of the present invention is to provide a data writing method in a distributed file system, which can prevent the disk from being full and does not cause hotspot access.
  • the embodiment of the present application further provides a data writing device in a distributed file system, to ensure implementation and application of the foregoing method.
  • a data writing method in a distributed file system including:
  • Write data is stored in the write disk by the target replica server.
  • the step of selecting to write to the disk according to the remaining capacity of the disk managed by the target replica server and the disk load includes:
  • the writing to the disk is determined according to a probability that each of the disks is selected and a preset condition.
  • the calculating according to the remaining capacity of the disk managed by the target replica server and the disk load, the probability that each disk is selected, including:
  • the scores of the respective disks are normalized to obtain the probability that each of the disks is selected.
  • the calculating a score of the disk according to a ratio between a remaining capacity and a total capacity of the disk and a weight thereof, and a load of the disk and a weight thereof including:
  • the score of the disk is calculated based on the score of the ratio and the weight of the ratio, and the score of the load and the weight of the load.
  • the load of the disk is the length of the disk IO queue.
  • the determining, according to a probability that the disks are selected and a preset condition, the writing to the disk includes:
  • the application also discloses a data writing device in a distributed file system, comprising:
  • a first selection unit configured to select at least one target replica server according to a remaining capacity of the plurality of replica servers in the plurality of replica servers managed by the metadata server;
  • a second selection unit configured to, in the target replica server, select to write to the disk according to a remaining capacity of the disk managed by the target replica server and a disk load;
  • a storage unit configured to store write data in the write disk by the target replica server.
  • the second selection unit comprises:
  • a probability calculation subunit configured to calculate a probability that each disk is selected according to a remaining capacity of the disk managed by the target replica server and a disk load
  • the determining subunit is configured to determine the write disk according to a probability that the each disk is selected and a preset condition.
  • the probability calculation subunit comprises:
  • a score calculation subunit configured to calculate a score of the disk according to a ratio between a remaining capacity of the disk and a total capacity and a weight thereof, and a load of the disk and a weight thereof;
  • the probability obtaining subunit is configured to normalize the score of each of the disks to obtain a probability that each of the disks is selected.
  • the score calculation subunit comprises:
  • a first computing subunit configured to separately calculate a score of a ratio between a remaining capacity of the disk and a total capacity, and a score of a load of the disk, wherein a score of the ratio is between the ratio and the ratio Both the score of the load and the load satisfy a monotonically increasing function relationship;
  • a second computing subunit configured to calculate a score of the disk based on a score of the ratio and a weight of the ratio, and a score of the load and a weight of the load.
  • the load of the disk is the length of the disk IO queue.
  • the determining subunit comprises:
  • And accumulating subunits configured to calculate an accumulated probability value of each disk according to a probability that each of the disks is selected
  • the search sub-unit is configured to perform a binary search on the accumulated probability values of the disks, and use the discovered disk corresponding to the accumulated probability value of the disk that meets the preset condition as the write disk.
  • the embodiments of the present application include the following advantages:
  • the embodiment of the present application determines the storage disk for writing data by considering the remaining capacity of the disk and the disk load through the two-level selection mode of the metadata server and the replica server, thereby avoiding the disk fullness and the hot spot access of the disk.
  • the disk load is basically the same, the disk with the higher remaining capacity can be preferentially selected as the write disk, so that the disk can be prevented from being full; when the remaining capacity is basically the same, the disk with the lower disk load can be preferentially selected as the write disk. This can avoid hotspot access to the disk.
  • FIG. 1 is a flow chart showing the steps of an embodiment of a data writing method in a distributed file system of the present application
  • FIG. 2 is a flow chart showing the steps of an embodiment of the method for selecting a remaining capacity of a disk managed by a target replica server and selecting a disk load to be written to the disk according to the present application;
  • FIG. 3 is a flow chart showing the steps of an embodiment of a method for calculating the probability that each disk is selected according to the present application
  • FIG. 4 is a flow chart showing the steps of an embodiment of a method for calculating a disk score according to the present application
  • FIG. 5 is a flow chart showing the steps of an embodiment of a method for determining a write to a disk according to a probability that each disk is selected and a preset condition;
  • FIG. 6 is a structural block diagram of an embodiment of a data writing device in a distributed file system of the present application.
  • FIG. 7 is a structural block diagram of an embodiment of a second selection unit of the present application.
  • FIG. 8 is a structural block diagram of an embodiment of a probability calculation subunit of the present application.
  • FIG. 9 is a structural block diagram of an embodiment of a score calculation subunit of the present application.
  • FIG. 10 is a structural block diagram of an embodiment of a determining subunit of the present application.
  • FIG. 1 a flow chart of steps of an embodiment of a data writing method in a distributed file system of the present application is shown, which may specifically include the following steps:
  • Step 101 In the plurality of replica servers managed by the metadata server, select at least one target replica server according to remaining capacity of the plurality of replica servers.
  • a metadata server and a replica server are first set in a distributed file system, wherein the metadata server is used to manage a namespace, a copy information of a data block, and a replica server address information; and the replica server is used to manage the data block.
  • a local copy that provides read and write operations on a copy of the managed block.
  • the data writing device (hereinafter referred to as a device) in the distributed file system first selects a replica server to write data in a replica server managed by the metadata server, and specifically may select the remaining A large-capacity replica server, recorded as the target replica server.
  • Step 102 In the target replica server, select to write to the disk according to the remaining capacity of the disk managed by the target replica server and the disk load.
  • the device determines the target replica server, further selects in the target replica server.
  • a disk that is written to the data store is written as a disk.
  • the device comprehensively considers the remaining capacity and the load to determine the write to the disk. Specifically, the device can score the remaining capacity of each disk and the disk load, and then combine the remaining capacity and the respective weights of the load to obtain the selected disk. The probability, which in turn is determined to be written to the disk based on the probability. For details, please refer to the description of the subsequent embodiments.
  • step 103 the write data is stored in the write disk by the target replica server.
  • the data written by the client can be stored by the target replica server to the write disk determined in the previous step.
  • the embodiment of the present application determines the storage disk for writing data by considering the remaining capacity of the disk and the disk load through the two-level selection mode of the metadata server and the replica server, thereby avoiding the disk fullness and the hot spot access of the disk.
  • the disk load is basically the same, the disk with the higher remaining capacity can be preferentially selected as the write disk, so that the disk can be prevented from being full; when the remaining capacity is basically the same, the disk with the lower disk load can be preferentially selected as the write disk. This can avoid hotspot access to the disk.
  • the process of selecting to write to the disk according to the remaining capacity of the disk managed by the target replica server and the disk load, as shown in FIG. 2, may include:
  • Step 201 Calculate the probability that each disk is selected according to the remaining capacity of the disk managed by the target replica server and the disk load.
  • the remaining capacity of each disk managed by the target replica server and the load of each disk are first obtained, and then the probability that each disk is selected is calculated according to the remaining capacity and the load.
  • the probability that each disk is selected is calculated according to the remaining capacity and the load.
  • One method for calculating the probability that each disk is selected may be as shown in FIG. 3, including:
  • Step 301 Calculate the score of the disk according to the ratio between the remaining capacity of the disk and the total capacity and its weight, and the load of the disk and its weight.
  • Each disk in the target replica server uses the same method to calculate the score of the disk.
  • the score of the scale and the score of the load are calculated first, and then the score of the disk is obtained according to the respective weights.
  • the disk score can also be calculated by introducing the specified function relation. For details, refer to the description of the subsequent embodiments.
  • the disk load can be measured by the length of the disk IO queue. The larger the disk IO queue length, the higher the load on the disk.
  • step 302 the scores of the disks are normalized to obtain the probability that each disk is selected.
  • the scores are normalized, and the probability that each disk is selected can be obtained.
  • Step 202 Determine to write to the disk according to a probability that each disk is selected and a preset condition.
  • the disk whose probability meets the preset condition can be determined to be written to the disk.
  • the preset condition may be a user self-setting or the like.
  • a method for calculating a disk score may specifically include:
  • step 401 a score of the ratio between the remaining capacity of the disk and the total capacity, and a score of the load of the disk are respectively calculated.
  • the ratio of the score to the ratio and the load score and the load both satisfy the monotonically increasing function relationship, where the monotonically increasing function has a domain from negative infinity to positive infinity, and the range of values ranges from 0 to 1, the shape For the S type, it exists in the classification evaluation model and the logistic regression model, and belongs to the category of multiple variable analysis, such as the arctan function.
  • the load of the disk is measured by the length of the disk IO queue.
  • the target replica server contains N disks, and the ratio between the remaining capacity of each disk and the total capacity of N disks is R1, R2, ..., RN.
  • the length of the current disk IO queue (that is, the disk load) of the N-disks is Q1, Q2, ..., QN.
  • x can be R or Q. That is, the ratio between the score and the ratio and the load score and the load both satisfy the monotonically increasing function relationship.
  • the monotonically increasing function can be an arctan function.
  • the arctan function is a monotonically increasing function, it is characterized by a higher score indicating a higher proportion of the remaining capacity of the disk or a lower load, and conversely indicating a lower proportion of the remaining capacity of the disk or a higher load.
  • Step 402 Calculate the magnetic quantity according to the score of the proportion and the weight of the proportion, and the weight of the load and the weight of the load. The score of the disc.
  • the score of each disk can be calculated by combining the weight of the remaining capacity of the disk and the weight of the disk load. .
  • w is the weight corresponding to the remaining capacity of the disk
  • 1-w is the weight of the disk load.
  • the weight value can be set autonomously according to the empirical value.
  • the foregoing step 302 can be performed to normalize the scores of the disks to obtain the probability that each disk is selected.
  • the probability Pi of the i-th disk is selected as follows:
  • the method for determining the write to the disk according to the probability that each disk is selected and the preset condition, as shown in FIG. 5, may include:
  • Step 501 Calculate an accumulated probability value of each disk according to a probability that each disk is selected.
  • the probability that all disks are selected is cumulatively equal to one.
  • Step 502 Perform a binary search on the accumulated probability value of each disk, and write the disk corresponding to the accumulated probability value of the disk that meets the preset condition to the disk.
  • a [0, 1] random number generator can be generated to generate a random number r.
  • the preset condition is that the disk i needs to satisfy A i-1 ⁇ r ⁇ A i .
  • the device performs a binary search on the accumulated probability values of the sorted disks.
  • the disk i When it is found that the disk i satisfies A i-1 ⁇ r ⁇ A i , the disk i is determined to be written to the disk.
  • the binary search is also called the binary search.
  • the advantage is that the comparison times are small, the search speed is fast, and the average performance is good.
  • the binary search method can quickly find the disk that meets the preset condition in the accumulated probability value after sorting.
  • the foregoing step 103 can be performed to store the write data.
  • FIG. 6 a structural block diagram of an embodiment of a data writing apparatus in a distributed file system of the present application is shown, which may specifically include the following units:
  • the first selection unit 601 is configured to select at least one target replica server according to remaining capacity of the plurality of replica servers in the plurality of replica servers managed by the metadata server.
  • the second selecting unit 602 is configured to, in the target replica server, select to write to the disk according to the remaining capacity of the disk managed by the target replica server and the disk load.
  • the storage unit 603 is configured to store the write data in the write disk by the target replica server.
  • the embodiment of the present application determines the storage disk of the written data by considering the remaining capacity of the disk and the disk load by the above unit, thereby avoiding the disk fullness and the hot spot access of the disk.
  • the disk load is basically the same, the disk with the higher remaining capacity can be preferentially selected as the write disk, so that the disk can be prevented from being full; when the remaining capacity is basically the same, the disk with the lower disk load can be preferentially selected as the write disk. This can avoid hotspot access to the disk.
  • the second selection unit 602 may further include:
  • the probability calculation sub-unit 701 is configured to calculate a probability that each disk is selected according to the remaining capacity of the disk managed by the target replica server and the disk load.
  • the determining subunit 702 is configured to determine the write disk according to a probability that the each disk is selected and a preset condition.
  • the probability calculation subunit 701 may further include:
  • a score calculation subunit 801 configured to calculate a ratio between a remaining capacity and a total capacity of the disk and The weight, and the load of the disk and its weight, calculate the score of the disk.
  • the probability obtaining sub-unit 802 is configured to normalize the scores of the respective disks to obtain a probability that each of the disks is selected.
  • the score calculation sub-unit 801 may further include:
  • a first calculation subunit 901 configured to separately calculate a score of a ratio between a remaining capacity of the disk and a total capacity, and a score of a load of the disk, wherein a score of the ratio is between the ratio and the ratio And both the score of the load and the load satisfy a monotonically increasing functional relationship.
  • the second calculation sub-unit 902 is configured to calculate a score of the disk based on the score of the ratio and the weight of the ratio, and the score of the load and the weight of the load.
  • the load of the disk is the length of the disk IO queue.
  • the determining subunit 702 can include:
  • the accumulating subunit 1001 is configured to calculate an accumulated probability value of each of the disks according to a probability that each of the disks is selected.
  • the search subunit 1002 is configured to perform a binary search on the accumulated probability values of the disks, and use the discovered disk corresponding to the accumulated probability value of the disk that meets the preset condition as the write disk.
  • the embodiment of the present application further provides an electronic device, including a memory and a processor.
  • the processor and the memory are connected to each other through a bus; the bus may be an ISA bus, a PCI bus, or an EISA bus.
  • the bus can be divided into an address bus, a data bus, a control bus, and the like.
  • the memory is used to store a program, and specifically, the program may include program code, and the program code includes computer operation instructions.
  • the memory may include a high speed RAM memory and may also include a non-volatile memory such as at least one disk memory.
  • the processor is used to read the program code in the memory and perform the following steps:
  • the target replica server selecting to write to the disk according to the remaining capacity of the disk managed by the target replica server and the disk load;
  • Write data is stored in the write disk of the memory by the target replica server.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device
  • Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction device is implemented in a flow or a flow and/or a block diagram of the flowchart The function specified in the box or in multiple boxes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Library & Information Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Algebra (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种分布式文件系统中的数据写入方法和装置。该分布式文件系统中的数据写入方法包括:在元数据服务器所管理的多个副本服务器中,根据所述多个副本服务器的剩余容量选择目标副本服务器(101);在所述目标副本服务器中,根据所述目标副本服务器所管理的磁盘的剩余容量及磁盘负载选择写入磁盘(102);通过所述目标副本服务器将写入数据存储在所述写入磁盘中(103)。所述分布式文件系统中的数据写入方法和装置通过元数据服务器和副本服务器两级选盘方式,综合考虑磁盘剩余容量以及磁盘负载来确定写入数据的存储磁盘,既避免了磁盘写满,也不会造成磁盘的热点访问。

Description

分布式文件系统中的数据写入方法和装置
本申请要求2015年11月03日递交的申请号为201510740419.5、发明名称为“分布式文件系统中的数据写入方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,特别是涉及一种分布式文件系统中的数据写入方法和一种分布式文件系统中的数据写入装置。
背景技术
大型在线分布式存储系统中,单台服务器内有很多块磁盘,不同磁盘可能不同,且在一台服务器内部也可能出现异构的情况,即两种不同容量的磁盘混合。分布式存储系统解决数据存储的问题,当有数据到来时如何选择一块磁盘写入数据是一个需要考虑的问题。
传统的分布式文件系统在选择磁盘时采用一致性哈希算法。该方法是事先根据数据分区做哈希来选定磁盘,是一种固定的选盘策略。哈希算法本身保证了平衡性,通常情况下不会造成热点,但是在异常情况,当同一数据分区有集中的数据,被选中的磁盘往往不能躲开大量的写入,可能会写满磁盘,造成写入失败,同时密集的流量会产生热点访问,造成写入时间变长。
因此,目前需要本领域技术人员迫切解决的一个技术问题就是:如何可以避免磁盘写满,且不会造成热点访问。
发明内容
本申请实施例的发明目的在于提供一种分布式文件系统中的数据写入方法,可以避免磁盘写满,且不会造成热点访问。
相应的,本申请实施例还提供了一种分布式文件系统中的数据写入装置,用以保证上述方法的实现及应用。
为了解决上述问题,本申请公开了一种分布式文件系统中的数据写入方法,包括:
在元数据服务器所管理的多个副本服务器中,根据所述多个副本服务器的剩余容量选择至少一个目标副本服务器;
在所述目标副本服务器中,根据所述目标副本服务器所管理的磁盘的剩余容量及磁 盘负载选择写入磁盘;
通过所述目标副本服务器将写入数据存储在所述写入磁盘中。
优选地,在所述目标副本服务器中,根据所述目标副本服务器所管理的磁盘的剩余容量及磁盘负载选择写入磁盘的步骤包括:
根据所述目标副本服务器所管理的磁盘的剩余容量及磁盘负载计算各磁盘被选中的概率;
根据所述各磁盘被选中的概率及预设条件确定所述写入磁盘。
优选地,所述根据所述目标副本服务器所管理的磁盘的剩余容量及磁盘负载计算各磁盘被选中的概率,包括:
根据所述磁盘的剩余容量和总容量之间的比例及其权重,以及所述磁盘的负载及其权重,计算所述磁盘的得分;
对各所述磁盘的得分进行归一化,获得各所述磁盘被选中的概率。
优选地,所述根据所述磁盘的剩余容量和总容量之间的比例及其权重,以及所述磁盘的负载及其权重,计算所述磁盘的得分,包括:
分别计算所述磁盘的剩余容量和总容量之间的比例的得分,以及所述磁盘的负载的得分,其中,所述比例的得分与所述比例之间以及所述负载的得分与所述负载之间均满足单调递增函数关系;
根据所述比例的得分及所述比例的权重,以及所述负载的得分及所述负载的权重,计算所述磁盘的得分。
优选地,所述磁盘的负载为所述磁盘IO队列的长度。
优选地,所述根据所述各磁盘被选中的概率及预设条件确定所述写入磁盘,包括:
根据所述各磁盘被选中的概率计算所述各磁盘的累加概率值;
对所述各磁盘的累加概率值进行二分查找,将查找到的满足预设条件的磁盘的累加概率值对应的磁盘作为所述写入磁盘。
本申请还公开了一种分布式文件系统中的数据写入装置,包括:
第一选择单元,被配置为在元数据服务器所管理的多个副本服务器中,根据所述多个副本服务器的剩余容量选择至少一个目标副本服务器;
第二选择单元,被配置为在所述目标副本服务器中,根据所述目标副本服务器所管理的磁盘的剩余容量及磁盘负载选择写入磁盘;
存储单元,被配置为通过所述目标副本服务器将写入数据存储在所述写入磁盘中。
优选地,所述第二选择单元包括:
概率计算子单元,被配置为根据所述目标副本服务器所管理的磁盘的剩余容量及磁盘负载计算各磁盘被选中的概率;
确定子单元,被配置为根据所述各磁盘被选中的概率及预设条件确定所述写入磁盘。
优选地,所述概率计算子单元包括:
得分计算子单元,被配置为根据所述磁盘的剩余容量和总容量之间的比例及其权重,以及所述磁盘的负载及其权重,计算所述磁盘的得分;
概率获得子单元,被配置为对各所述磁盘的得分进行归一化,获得各所述磁盘被选中的概率。
优选地,所述得分计算子单元包括:
第一计算子单元,被配置为分别计算所述磁盘的剩余容量和总容量之间的比例的得分,以及所述磁盘的负载的得分,其中,所述比例的得分与所述比例之间以及所述负载的得分与所述负载之间均满足单调递增函数关系;
第二计算子单元,被配置为根据所述比例的得分及所述比例的权重,以及所述负载的得分及所述负载的权重,计算所述磁盘的得分。
优选地,所述磁盘的负载为所述磁盘IO队列的长度。
优选地,所述确定子单元包括:
累加子单元,被配置为根据所述各磁盘被选中的概率计算所述各磁盘的累加概率值;
查找子单元,被配置为对所述各磁盘的累加概率值进行二分查找,将查找到的满足预设条件的磁盘的累加概率值对应的磁盘作为所述写入磁盘。
与现有技术相比,本申请实施例包括以下优点:
本申请实施例通过元数据服务器和副本服务器两级选盘方式,综合考虑磁盘剩余容量以及磁盘负载来确定写入数据的存储磁盘,既避免了磁盘写满,也不会造成磁盘的热点访问。当各磁盘负载基本相同时,可以优先选择剩余容量较高的磁盘作为写入磁盘,从而可以避免磁盘写满;当剩余容量基本相同时,可以优先选择磁盘负载较低的磁盘作为写入磁盘,从而可以避免造成磁盘的热点访问。
附图说明
图1是本申请的一种分布式文件系统中的数据写入方法实施例的步骤流程图;
图2是本申请的一种根据目标副本服务器所管理的磁盘的剩余容量及磁盘负载选择写入磁盘方法实施例的步骤流程图;
图3是本申请的一种计算各磁盘被选中的概率的方法实施例的步骤流程图;
图4是本申请的一种计算磁盘得分的方法实施例的步骤流程图;
图5是本申请的一种根据各磁盘被选中的概率及预设条件确定写入磁盘的方法实施例的步骤流程图;
图6是本申请的一种分布式文件系统中的数据写入装置实施例的结构框图;
图7是本申请的一种第二选择单元实施例的结构框图;
图8是本申请的一种概率计算子单元实施例的结构框图;
图9是本申请的一种得分计算子单元实施例的结构框图;
图10是本申请的一种确定子单元实施例的结构框图。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。
参照图1,示出了本申请的一种分布式文件系统中的数据写入方法实施例的步骤流程图,具体可以包括如下步骤:
步骤101,在元数据服务器所管理的多个副本服务器中,根据所述多个副本服务器的剩余容量选择至少一个目标副本服务器。
本申请实施例中,首先在分布式文件系统中设置元数据服务器和副本服务器,其中,元数据服务器用于管理名字空间、数据块的副本信息和副本服务器地址信息;副本服务器用于管理数据块的本地副本,提供对所管理数据块副本的读写操作。
本实施例采取两级选盘机制,该分布式文件系统中的数据写入装置(以下简称装置)首先在元数据服务器所管理的副本服务器中选择副本服务器来进行写入数据,具体可以选择剩余容量大的副本服务器,记为目标副本服务器。
步骤102,在目标副本服务器中,根据目标副本服务器所管理的磁盘的剩余容量及磁盘负载选择写入磁盘。
本步骤中,在该装置确定出目标副本服务器后,进一步在该目标副本服务器中选择 进行写入数据存储的磁盘,记为写入磁盘。该装置综合考虑了剩余容量和负载两方面来确定写入磁盘,具体的,该装置可以对各磁盘的剩余容量及磁盘负载进行打分,然后结合剩余容量和负载各自的权重获得该磁盘被选中的概率,进而根据该概率确定出写入磁盘。具体请参见后续实施例的描述。
步骤103,通过目标副本服务器将写入数据存储在写入磁盘中。
在该装置确定出写入磁盘后,即可通过该目标副本服务器将客户端写入的数据存储到上步骤确定的写入磁盘上了。
本申请实施例通过元数据服务器和副本服务器两级选盘方式,综合考虑磁盘剩余容量以及磁盘负载来确定写入数据的存储磁盘,既避免了磁盘写满,也不会造成磁盘的热点访问。当各磁盘负载基本相同时,可以优先选择剩余容量较高的磁盘作为写入磁盘,从而可以避免磁盘写满;当剩余容量基本相同时,可以优先选择磁盘负载较低的磁盘作为写入磁盘,从而可以避免造成磁盘的热点访问。
在本申请的另一实施例中,在目标副本服务器中,根据目标副本服务器所管理的磁盘的剩余容量及磁盘负载选择写入磁盘的过程,如图2所示,可以包括:
步骤201,根据目标副本服务器所管理的磁盘的剩余容量及磁盘负载计算各磁盘被选中的概率。
本步骤中,首先获得目标副本服务器所管理的各磁盘的剩余容量及各磁盘的负载,然后根据剩余容量和负载计算各磁盘被选中的概率。具体计算方法有多种,以下仅为举例。
其中一种计算各磁盘被选中的概率的方法,可以如图3所示,包括:
步骤301,根据磁盘的剩余容量和总容量之间的比例及其权重,以及磁盘的负载及其权重,计算磁盘的得分。
目标副本服务器中的各磁盘采用相同方法计算获得磁盘的得分。可以先根据比例及其权重计算比例的得分,根据负载及其权重计算负载的得分,然后综合两个得分获得磁盘的得分,或者综合两个得分并引入指定函数关系式获得磁盘的得分;还可以先计算比例的得分以及负载的得分,然后根据各自的权重计算获得磁盘的得分,其中也可以引入指定的函数关系式计算磁盘得分,具体请参见后续实施例的描述。
其中,磁盘的负载为可以用磁盘IO队列的长度来衡量,磁盘IO队列长度越大表示该磁盘的负载越高。
步骤302,对各磁盘的得分进行归一化,获得各磁盘被选中的概率。
在获得各磁盘的得分后,对得分进行归一化,进而可获得各磁盘被选中的概率。
步骤202,根据各磁盘被选中的概率及预设条件确定写入磁盘。
在获得各磁盘被选中的概率后,即可将概率满足预设条件的磁盘确定为写入磁盘。该预设条件可以是用户自主设定等。
在本申请的另一实施例中,一种计算磁盘得分的方法,如图4所示,具体可以包括:
步骤401,分别计算磁盘的剩余容量和总容量之间的比例的得分,以及磁盘的负载的得分。
其中,比例的得分与比例之间以及负载的得分与负载之间均满足单调递增函数关系,此处的单调递增函数的定义域从负无穷到正无穷,值域范围为从0到1,形状为S型,多存在于分类评定模型、逻辑回归模型,属于多重变数分析范畴,例如arctan函数等。本实施例中,磁盘的负载为用磁盘IO队列的长度来衡量。
假设目标副本服务器中包含N块磁盘,各磁盘的剩余容量和N块磁盘的总容量之间的比例依次为R1,R2,……,RN。N块磁盘当前磁盘IO队列的长度(也即磁盘负载)依次为Q1,Q2,……,QN。
则磁盘的剩余容量和总容量之间的比例的得分Si(Ri)及磁盘的负载的得分Si(Qi),i=1,……,N,可以通过以下公式获得:
Figure PCTCN2016103139-appb-000001
Figure PCTCN2016103139-appb-000002
Figure PCTCN2016103139-appb-000003
c=-arctan(a×bottom+b)
x可以是R或Q。也即比例的得分与比例之间以及负载的得分与负载之间均满足单调递增函数关系。该单调递增函数可以为arctan函数。
其中,high,low和bottom是实验确定的值。xi取值需要保证在(low,high)之间,因此axi+b保证了取值范围保证是在(-3,3)之间。bottom保证了当一个很低的xi时,Pi=0。
由于arctan函数是一个单调递增函数,其特点是,较高的得分表示磁盘剩余容量比例高或者负载较低,而反之则表示磁盘剩余容量比例低或者负载较高。
步骤402,根据比例的得分及比例的权重,以及负载的得分及负载的权重,计算磁 盘的得分。
在获得各磁盘的剩余容量和总容量之间的比例的得分,以及各磁盘的负载的得分后,在本步骤中,即可结合磁盘剩余容量的权重及磁盘负载的权重计算各磁盘的得分了。
Si=Si(Ri)w×Si(Qi)1-w
其中,w是磁盘剩余容量对应的权重,1-w即为磁盘负载的权重,该权重值可以根据经验值等自主设定。
在按照上述步骤401~402获得各磁盘的得分后,即可执行前述步骤302,对各磁盘的得分进行归一化,获得各磁盘被选中的概率。
对N块磁盘的得分做归一化,用Si来表示第i块磁盘的得分,则:
Figure PCTCN2016103139-appb-000004
因此,第i块磁盘被选中的概率Pi是,如下所示:
Figure PCTCN2016103139-appb-000005
在本申请的另一实施例中,根据各磁盘被选中的概率及预设条件确定写入磁盘的方法,如图5所示,可以包括:
步骤501,根据各磁盘被选中的概率计算各磁盘的累加概率值。
以第i块磁盘被选中的概率为Pi为例:
Figure PCTCN2016103139-appb-000006
所有磁盘被选中的概率累加等于1。
定义各磁盘的累加概率值依次为A1,A2,……,AN,
Figure PCTCN2016103139-appb-000007
因此Ai是从小到大排序的,取值范围是[0,1],定义A0=0。
步骤502,对各磁盘的累加概率值进行二分查找,将查找到的满足预设条件的磁盘的累加概率值对应的磁盘作为写入磁盘。
本实施例中,可以生成一个[0,1]随机数发生器,产生一个随机数r,该预设条件即为磁盘i需满足Ai-1<r≤Ai
该装置对排序后的各磁盘的累加概率值进行二分查找,在查找到磁盘i满足Ai-1<r≤Ai时,该磁盘i即确定为写入磁盘。其中,二分查找又称折半查找,优点是比较次数少,查找速度快,平均性能好,通过二分查找方式可以在排序后的累加概率值中很快找到满足预设条件的磁盘。
在确定出写入磁盘后,即可执行前述步骤103进行写入数据的存储。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
参照图6,示出了本申请一种分布式文件系统中的数据写入装置实施例的结构框图,具体可以包括如下单元:
第一选择单元601,被配置为在元数据服务器所管理的多个副本服务器中,根据多个副本服务器的剩余容量选择至少一个目标副本服务器。
第二选择单元602,被配置为在所述目标副本服务器中,根据所述目标副本服务器所管理的磁盘的剩余容量及磁盘负载选择写入磁盘。
存储单元603,被配置为通过所述目标副本服务器将写入数据存储在所述写入磁盘中。
本申请实施例通过上述单元综合考虑磁盘剩余容量以及磁盘负载来确定写入数据的存储磁盘,既避免了磁盘写满,也不会造成磁盘的热点访问。当各磁盘负载基本相同时,可以优先选择剩余容量较高的磁盘作为写入磁盘,从而可以避免磁盘写满;当剩余容量基本相同时,可以优先选择磁盘负载较低的磁盘作为写入磁盘,从而可以避免造成磁盘的热点访问。
在另一实施例中,如图7所示,第二选择单元602可以进一步包括:
概率计算子单元701,被配置为根据所述目标副本服务器所管理的磁盘的剩余容量及磁盘负载计算各磁盘被选中的概率。
确定子单元702,被配置为根据所述各磁盘被选中的概率及预设条件确定所述写入磁盘。
其中,如图8所示,概率计算子单元701又可以进一步包括:
得分计算子单元801,被配置为根据所述磁盘的剩余容量和总容量之间的比例及其 权重,以及所述磁盘的负载及其权重,计算所述磁盘的得分。
概率获得子单元802,被配置为对各所述磁盘的得分进行归一化,获得各所述磁盘被选中的概率。
如图9所示,得分计算子单元801又可以进一步包括:
第一计算子单元901,被配置为分别计算所述磁盘的剩余容量和总容量之间的比例的得分,以及所述磁盘的负载的得分,其中,所述比例的得分与所述比例之间以及所述负载的得分与所述负载之间均满足单调递增函数关系。
第二计算子单元902,被配置为根据所述比例的得分及所述比例的权重,以及所述负载的得分及所述负载的权重,计算所述磁盘的得分。
其中,磁盘的负载为所述磁盘IO队列的长度。
在另一实施例中,如图10所示,确定子单元702可以包括:
累加子单元1001,被配置为根据所述各磁盘被选中的概率计算所述各磁盘的累加概率值。
查找子单元1002,被配置为对所述各磁盘的累加概率值进行二分查找,将查找到的满足预设条件的磁盘的累加概率值对应的磁盘作为所述写入磁盘。
本申请实施例还提供了一种电子设备,包括存储器和处理器。
处理器与存储器通过总线相互连接;总线可以是ISA总线、PCI总线或EISA总线等。所述总线可以分为地址总线、数据总线、控制总线等。
其中,存储器用于存储一段程序,具体地,程序可以包括程序代码,所述程序代码包括计算机操作指令。存储器可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
处理器用于读取存储器中的程序代码,执行以下步骤:
在元数据服务器所管理的多个副本服务器中,根据多个副本服务器的剩余容量选择至少一个目标副本服务器;
在所述目标副本服务器中,根据所述目标副本服务器所管理的磁盘的剩余容量及磁盘负载选择写入磁盘;
通过所述目标副本服务器将写入数据存储在所述存储器的所述写入磁盘中。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
在一个典型的配置中,所述计算机设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非持续性的电脑可读媒体(transitory media),如调制的数据信号和载波。
本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个 方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种分布式文件系统中的数据写入方法和一种分布式文件系统中的数据写入装置,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (12)

  1. 一种分布式文件系统中的数据写入方法,其特征在于,包括:
    在元数据服务器所管理的多个副本服务器中,根据所述多个副本服务器的剩余容量选择至少一个目标副本服务器;
    在所述目标副本服务器中,根据所述目标副本服务器所管理的磁盘的剩余容量及磁盘负载选择写入磁盘;
    通过所述目标副本服务器将写入数据存储在所述写入磁盘中。
  2. 根据权利要求1所述的方法,其特征在于,在所述目标副本服务器中,根据所述目标副本服务器所管理的磁盘的剩余容量及磁盘负载选择写入磁盘的步骤包括:
    根据所述目标副本服务器所管理的磁盘的剩余容量及磁盘负载计算各磁盘被选中的概率;
    根据所述各磁盘被选中的概率及预设条件确定所述写入磁盘。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述目标副本服务器所管理的磁盘的剩余容量及磁盘负载计算各磁盘被选中的概率,包括:
    根据所述磁盘的剩余容量和总容量之间的比例及其权重,以及所述磁盘的负载及其权重,计算所述磁盘的得分;
    对各所述磁盘的得分进行归一化,获得各所述磁盘被选中的概率。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述磁盘的剩余容量和总容量之间的比例及其权重,以及所述磁盘的负载及其权重,计算所述磁盘的得分,包括:
    分别计算所述磁盘的剩余容量和总容量之间的比例的得分,以及所述磁盘的负载的得分,其中,所述比例的得分与所述比例之间以及所述负载的得分与所述负载之间均满足单调递增函数关系;
    根据所述比例的得分及所述比例的权重,以及所述负载的得分及所述负载的权重,计算所述磁盘的得分。
  5. 根据权利要求3所述的方法,其特征在于,所述磁盘的负载为所述磁盘IO队列的长度。
  6. 根据权利要求2至5中任意一项所述的方法,其特征在于,所述根据所述各磁盘被选中的概率及预设条件确定所述写入磁盘,包括:
    根据所述各磁盘被选中的概率计算所述各磁盘的累加概率值;
    对所述各磁盘的累加概率值进行二分查找,将查找到的满足预设条件的磁盘的累加 概率值对应的磁盘作为所述写入磁盘。
  7. 一种分布式文件系统中的数据写入装置,其特征在于,包括:
    第一选择单元,被配置为在元数据服务器所管理的多个副本服务器中,根据所述多个副本服务器的剩余容量选择至少一个目标副本服务器;
    第二选择单元,被配置为在所述目标副本服务器中,根据所述目标副本服务器所管理的磁盘的剩余容量及磁盘负载选择写入磁盘;
    存储单元,被配置为通过所述目标副本服务器将写入数据存储在所述写入磁盘中。
  8. 根据权利要求7所述的装置,其特征在于,所述第二选择单元包括:
    概率计算子单元,被配置为根据所述目标副本服务器所管理的磁盘的剩余容量及磁盘负载计算各磁盘被选中的概率;
    确定子单元,被配置为根据所述各磁盘被选中的概率及预设条件确定所述写入磁盘。
  9. 根据权利要求8所述的装置,其特征在于,所述概率计算子单元包括:
    得分计算子单元,被配置为根据所述磁盘的剩余容量和总容量之间的比例及其权重,以及所述磁盘的负载及其权重,计算所述磁盘的得分;
    概率获得子单元,被配置为对各所述磁盘的得分进行归一化,获得各所述磁盘被选中的概率。
  10. 根据权利要求9所述的装置,其特征在于,所述得分计算子单元包括:
    第一计算子单元,被配置为分别计算所述磁盘的剩余容量和总容量之间的比例的得分,以及所述磁盘的负载的得分,其中,所述比例的得分与所述比例之间以及所述负载的得分与所述负载之间均满足单调递增函数关系;
    第二计算子单元,被配置为根据所述比例的得分及所述比例的权重,以及所述负载的得分及所述负载的权重,计算所述磁盘的得分。
  11. 根据权利要求8所述的装置,其特征在于,所述磁盘的负载为所述磁盘IO队列的长度。
  12. 根据权利要求8至11中任意一项所述的装置,其特征在于,所述确定子单元包括:
    累加子单元,被配置为根据所述各磁盘被选中的概率计算所述各磁盘的累加概率值;
    查找子单元,被配置为对所述各磁盘的累加概率值进行二分查找,将查找到的满足 预设条件的磁盘的累加概率值对应的磁盘作为所述写入磁盘。
PCT/CN2016/103139 2015-11-03 2016-10-25 分布式文件系统中的数据写入方法和装置 Ceased WO2017076184A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP16861452.7A EP3373155A1 (en) 2015-11-03 2016-10-25 Data writing method and device in distributed file system
US15/970,820 US11055360B2 (en) 2015-11-03 2018-05-03 Data write-in method and apparatus in a distributed file system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510740419.5A CN106649401A (zh) 2015-11-03 2015-11-03 分布式文件系统中的数据写入方法和装置
CN201510740419.5 2015-11-03

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/970,820 Continuation US11055360B2 (en) 2015-11-03 2018-05-03 Data write-in method and apparatus in a distributed file system

Publications (1)

Publication Number Publication Date
WO2017076184A1 true WO2017076184A1 (zh) 2017-05-11

Family

ID=58661609

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/103139 Ceased WO2017076184A1 (zh) 2015-11-03 2016-10-25 分布式文件系统中的数据写入方法和装置

Country Status (4)

Country Link
US (1) US11055360B2 (zh)
EP (1) EP3373155A1 (zh)
CN (1) CN106649401A (zh)
WO (1) WO2017076184A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11055360B2 (en) 2015-11-03 2021-07-06 Alibaba Group Holding Limited Data write-in method and apparatus in a distributed file system
US11507533B2 (en) 2018-02-05 2022-11-22 Huawei Technologies Co., Ltd. Data query method and apparatus
CN119336766A (zh) * 2024-12-17 2025-01-21 阿里云计算有限公司 数据写入任务处理方法及装置

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107577441B (zh) * 2017-10-17 2020-08-21 苏州浪潮智能科技有限公司 一种osd慢盘处理方法、系统、装置及计算机存储介质
CN110018783B (zh) * 2018-01-09 2022-12-20 阿里巴巴集团控股有限公司 一种数据存储方法、装置及系统
CN110874257B (zh) * 2018-08-31 2024-04-02 阿里巴巴集团控股有限公司 服务器上的容器配置方法和装置
CN109542352B (zh) * 2018-11-22 2020-05-08 北京百度网讯科技有限公司 用于存储数据的方法和装置
CN110113414B (zh) * 2019-05-05 2022-05-31 深圳市腾讯网域计算机网络有限公司 一种管理副本的方法、装置、服务器及存储介质
CN110377237B (zh) * 2019-07-26 2023-01-10 苏州浪潮智能科技有限公司 一种多数据池选择方法、系统及装置
CN110531934A (zh) * 2019-08-23 2019-12-03 北京浪潮数据技术有限公司 基于分布式系统的数据存储方法和装置
CN112698783A (zh) * 2019-10-22 2021-04-23 北京金山云网络技术有限公司 对象存储方法、装置及系统
CN111581013A (zh) * 2020-03-18 2020-08-25 宁波送变电建设有限公司永耀科技分公司 基于元数据和影子文件的系统信息备份与重构方法
CN111541753B (zh) * 2020-04-16 2024-02-27 深圳市迅雷网络技术有限公司 区块链数据的分布式存储系统、方法、计算机设备及介质
CN111859016A (zh) * 2020-06-24 2020-10-30 安徽吉秒科技有限公司 一种录像文件的存储、回放的装置及方法
CN113010113B (zh) * 2021-03-17 2024-05-14 阿里巴巴创新公司 数据处理方法、装置及设备
CN112839112B (zh) * 2021-03-25 2023-02-17 中国工商银行股份有限公司 一种分层数据存储系统及方法、备份管理服务器
CN113741917B (zh) * 2021-08-23 2025-07-22 浙江大华技术股份有限公司 数据刻录方法、装置、系统、电子装置和存储介质
CN114253482A (zh) * 2021-12-23 2022-03-29 深圳市名竹科技有限公司 数据储存方法、装置、计算机设备、存储介质
CN114253481A (zh) * 2021-12-23 2022-03-29 深圳市名竹科技有限公司 数据储存方法、装置、计算机设备、存储介质
CN114465957B (zh) * 2021-12-29 2024-03-08 天翼云科技有限公司 一种数据写入方法及装置
CN115167782B (zh) * 2022-07-28 2023-02-28 北京志凌海纳科技有限公司 临时存储副本管理方法、系统、设备和存储介质
CN115834587A (zh) * 2022-10-10 2023-03-21 浙江大华技术股份有限公司 一种选择目标存储服务器的方法、装置及电子设备
CN117880553A (zh) * 2024-03-13 2024-04-12 济南浪潮数据技术有限公司 流媒体存储方法、系统、服务器、电子设备和存储介质
CN119088313B (zh) * 2024-09-04 2026-01-23 中电云计算技术有限公司 基于hdfs的磁盘选择方法、装置、设备、介质及产品
CN120821442B (zh) * 2025-09-17 2025-12-12 深圳捷誊技术有限公司 分布式文件系统的控制方法、设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101470733A (zh) * 2007-12-27 2009-07-01 中国移动通信集团公司 数据块副本数量调整方法及分布式文件系统
CN102375893A (zh) * 2011-11-17 2012-03-14 浪潮(北京)电子信息产业有限公司 一种分布式文件系统及其建立副本的方法
CN102629934A (zh) * 2012-02-28 2012-08-08 北京搜狐新媒体信息技术有限公司 基于分布式存储系统的数据存储方法及装置
CN104023088A (zh) * 2014-06-28 2014-09-03 山东大学 一种应用于分布式文件系统的存储服务器选择方法

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6317808B1 (en) * 1999-03-26 2001-11-13 Adc Telecommunications, Inc. Data storage system and method of routing or assigning disk write requests among a set of disks using weighted available disk space values
US6829678B1 (en) * 2000-07-18 2004-12-07 International Business Machines Corporation System for determining the order and frequency in which space is allocated on individual storage devices
US7418494B2 (en) * 2002-07-25 2008-08-26 Intellectual Ventures Holding 40 Llc Method and system for background replication of data objects
US7908264B2 (en) * 2006-05-02 2011-03-15 Mypoints.Com Inc. Method for providing the appearance of a single data repository for queries initiated in a system incorporating distributed member server groups
US20100235409A1 (en) * 2009-03-10 2010-09-16 Global Relay Communications Inc. System and method for managing data stored in a data network
CN102541460B (zh) * 2010-12-20 2014-10-08 中国移动通信集团公司 一种多磁盘场景下的磁盘管理方法和设备
US8386840B2 (en) * 2010-12-27 2013-02-26 Amplidata Nv Distributed object storage system
JP2013045379A (ja) * 2011-08-26 2013-03-04 Fujitsu Ltd ストレージ制御方法、情報処理装置およびプログラム
US9075528B2 (en) * 2011-11-17 2015-07-07 Jade Quantum Technologies, Inc High-efficiency virtual disk management system
JP6056769B2 (ja) * 2011-12-19 2017-01-11 富士通株式会社 ストレージシステム、データリバランシングプログラム及びデータリバランシング方法
CN103077197A (zh) * 2012-12-26 2013-05-01 北京思特奇信息技术股份有限公司 一种数据存储方法装置
CN103473365B (zh) * 2013-09-25 2017-06-06 北京奇虎科技有限公司 一种基于hdfs的文件存储方法、装置及分布式文件系统
US9021296B1 (en) * 2013-10-18 2015-04-28 Hitachi Data Systems Engineering UK Limited Independent data integrity and redundancy recovery in a storage system
US9495478B2 (en) * 2014-03-31 2016-11-15 Amazon Technologies, Inc. Namespace management in distributed storage systems
US10264071B2 (en) * 2014-03-31 2019-04-16 Amazon Technologies, Inc. Session management in distributed storage systems
CN106649401A (zh) 2015-11-03 2017-05-10 阿里巴巴集团控股有限公司 分布式文件系统中的数据写入方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101470733A (zh) * 2007-12-27 2009-07-01 中国移动通信集团公司 数据块副本数量调整方法及分布式文件系统
CN102375893A (zh) * 2011-11-17 2012-03-14 浪潮(北京)电子信息产业有限公司 一种分布式文件系统及其建立副本的方法
CN102629934A (zh) * 2012-02-28 2012-08-08 北京搜狐新媒体信息技术有限公司 基于分布式存储系统的数据存储方法及装置
CN104023088A (zh) * 2014-06-28 2014-09-03 山东大学 一种应用于分布式文件系统的存储服务器选择方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3373155A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11055360B2 (en) 2015-11-03 2021-07-06 Alibaba Group Holding Limited Data write-in method and apparatus in a distributed file system
US11507533B2 (en) 2018-02-05 2022-11-22 Huawei Technologies Co., Ltd. Data query method and apparatus
CN119336766A (zh) * 2024-12-17 2025-01-21 阿里云计算有限公司 数据写入任务处理方法及装置

Also Published As

Publication number Publication date
US20180253506A1 (en) 2018-09-06
CN106649401A (zh) 2017-05-10
EP3373155A4 (en) 2018-09-12
US11055360B2 (en) 2021-07-06
EP3373155A1 (en) 2018-09-12

Similar Documents

Publication Publication Date Title
WO2017076184A1 (zh) 分布式文件系统中的数据写入方法和装置
CN109658238B (zh) 数据处理方法及装置
CN108389118B (zh) 资产管理系统、方法及装置、电子设备
CN111737265B (zh) 区块数据访问方法、区块数据存储方法及装置
CN114528231B (zh) 一种数据动态存储方法、装置、电子设备及存储介质
CN110740194B (zh) 基于云边融合的微服务组合方法及应用
CN107622020B (zh) 一种数据存储方法、访问方法及装置
CN106776368A (zh) 一种数据读取时的缓存管理方法、装置及系统
CN115543938A (zh) 数据处理方法、装置、电子设备及存储介质
CN102821113A (zh) 缓存方法及系统
US12105696B2 (en) Dynamic index management for computing storage resources
CN110019048A (zh) 基于MongoDB的文件处理方法、装置、系统及服务器
CN101763433A (zh) 一种数据存储系统及方法
CN110908587B (zh) 一种用于存储时序数据的方法及其装置
CN111506254B (zh) 分布式存储系统及其管理方法、装置
TW201903631A (zh) 資料流的分群方法和裝置
CN118838877B (zh) 一种分布式数据均衡方法、装置、设备和存储介质
CN113867928B (zh) 负载均衡的方法、装置及服务器
CN105989445A (zh) 规则管理方法及系统
CN110019210A (zh) 数据写入方法及设备
WO2021147382A1 (zh) 一种检测任务执行方法及设备
CN119065601A (zh) 数据写入处理方法及装置
TWI740884B (zh) 分散式檔案系統中的資料寫入方法和裝置
CN104518965B (zh) 一种最短路径查询方法及装置
CN111159438B (zh) 一种数据的存储和检索方法、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16861452

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2016861452

Country of ref document: EP