DevObject/BasicFunctions
From GPU
Contents
|
[edit] Allocation/Deallocation
[edit] function allocate_dv(chartype,nx,ny,nz)
! ny and nz are optional:
!depending on their presence allocates 1d, 2d or 3d device variable
[edit] Input:
character chartype
integer: nx,ny,nz
optional ny,nz
[edit] Output:
type(devVar) allocate_dv
[edit] Description:
Allows one to allocate integer(4), real(4), and complex(4) device variables. Creates devVar structure which contains information about 1d (vector) or 2d/3d matrices. Calls function CUBLAS_ALLOC.
[edit] Error handling:
In the case of a CUBLAS allocation error provides message allocate_dv error due to CUBLAS: CUBLAS error code: (error code) and aborts the execution of the main program thread.
[edit] Examples of use:
dv_A=allocate_dv(‘real’, 325) dv_B=allocate_dv(‘integer’,N1,N2) dv_C=allocate_dv(‘complex’,14,220,N3)
[edit] Comments:
allocate_dv normally allocates larger space on the device than the variable size. Handling of matrices and vector parts is possible, as the devVar structure has parameters for variable sizes, keeps the allocation parameters nx,ny,nz as lengthX,lengthy, and lengthZ components of the structures, and also has necessary information on the leading dimensions.
[edit] subroutine deallocate_dv(dv_A)
! deallocates device variable dv_A: similar to standard Fortran function deallocate(A)
[edit] Input:
type(devVar) dv_A
[edit] Output:
none
[edit] Description:
Deallocates any device variable created by allocate_dv. Calls inside function CUBLAS_FREE.
[edit] Error handling:
If deallocate is called for already deallocated variable, provides error message deallocate_dv error: this variable is already deallocated. However, this is only warning, and the main execution thread is not interrupted. So the program will continue without change of the status or parameters of dv_A structure.
[edit] Examples of use:
call deallocate_dv(dv_B)
[edit] subroutine deallocif_dv(dv_A)
! deallocates device variable dv_A: similar to standard Fortran construction
! if(allocated(A)) deallocate(A)
[edit] Input:
type(devVar) dv_A
[edit] Output:
none
[edit] Description:
Deallocates any device variable created by allocate_dv if it is allocated. Empty operation for deallocated variables. Calls inside function CUBLAS_FREE.
[edit] Error handling:
none
[edit] Examples of use:
call deallocif_dv(dv_B)
[edit] Comments:
In fact, it performs the same operation as deallocate, while did not provide error message for deallocated variables.
[edit] function allocated_dv(dv_A)
! similar to standard Fortran function allocated(A)
[edit] Input:
type(devVar) dv_A
[edit] Output:
logical allocated_dv
[edit] Description:
Takes value .true. if dv_A is allocated and .false. otherwise
[edit] Error handling:
none
[edit] Examples of use:
i=allocated_dv(dv_A) if(allocated_dv(dv_A)) call deallocate_dv(dv_A)
[edit] Comments:
The last example shows equivalence of operations
if(allocated_dv(dv_A)) call deallocate_dv(dv_A)
and
call deallocif_dv(dv_A)
[edit] function cloneDeepWData(devVarA)
! deep clone with data
[edit] Input:
type(devVar) devVarA
[edit] Output:
type(devVar) cloneDeepWData
[edit] Description:
Create a new devVar with identical elements as devVarA on the host and device. New space is allocated on the device and data elements are copied from devVarA to the new space.
[edit] Error handling:
none
[edit] Examples of use:
integer,parameter::size=128
real ar(size)
type(devVar) dv_A, dv_B
dv_A=allocate_dv('real',size)
call random_number(ar)
dv_B=cloneDeepWData(dv_A)
[edit] Comments:
Useful for creating parallel device variables when specific parameters for allocate_dv are not readily visible.
[edit] function cloneDeepWOData(devVarA)
! deep clone without data
[edit] Input:
type(devVar) devVarA
[edit] Output:
type(devVar) cloneDeepWOData
[edit] Description:
Create a new devVar with identical elements as devVarA on the host and device. New space is allocated on the device but data elements are not copied from devVarA to the new space.
[edit] Error handling:
none
[edit] Examples of use:
integer,parameter::size=128
real ar(size)
type(devVar) dv_A, dv_B
dv_A=allocate_dv('real',size)
call random_number(ar)
dv_B=cloneDeepWOData(dv_A)
[edit] Comments:
Useful for creating parallel device variables when specific parameters for allocate_dv are not readily visible.
[edit] Data transfer between the CPU and GPU
[edit] subroutine transfer_i4(A,dv_A,cpu2gpu)
! transfers integer(4) data from CPU to GPU and back
[edit] Input:
integer(4) A(*)
type(devVar) dv_A
logical cpu2gpu
[edit] Output:
none
[edit] Description:
Allows to transfer vectors, matrices, and arrays of type integer(4) from the host (CPU) to device (GPU) and back, assuming that dv_A is allocated. In the first case cpu2gpu=.true., and in the second one, cpu2gpu=.false. It is assumed that A has dimensions lengthX, (lengthX,lengthY), or (lengthX,lengthY,lengthZ), which are provided by structure dv_A. Depending on dimensionality calls CUBLAS functions CUBLAS_SET_VECTOR, CUBLAS_GET_VECTOR, CUBLAS_SET_MATRIX, and CUBLAS_GET_MATRIX.
[edit] Error handling:
In case of CUBLAS error for the above functions provides message transfer_i4 error due to CUBLAS: CUBLAS error code: (error code) and aborts the execution of the main program thread. If called for non-integer variable dv_A, provides message transfer_i4 error: the allocated device Variable is not integer and aborts the execution of the main program thread.
[edit] Examples of use:
integer(4) ArrayA(373,67) type(devVar) dv_ArrayA ArrayA=3; dv_ArrayA=allocate_dv(‘integer’,373,67) call transfer_i4(ArrayA,dv_ArrayA,.true.) … call transfer_i4(ArrayA,dv_ArrayA,.false.)
[edit] Comments:
It is important that dv_ArrayA provides dimensions of A. Otherwise the subroutine will not “know” about the allocation pattern and size in A, and will cause error. For transferring of data of different shape/size other devObject functions are involved, such as treatas and part.
[edit] subroutine transfer_i(A,dv_A,cpu2gpu)
! ! transfers integer data from CPU to GPU and back
[edit] Input:
integer A(*)
type(devVar) dv_A
logical cpu2gpu
[edit] Output:
none
[edit] Description:
Allows to transfer data of default type integer from CPU to GPU and back. Have the same syntax as transfer_i4, and calls this subroutine inside. The difference is that the CPU variable A should have size dv_A%dsize, which is a drawback of the pilot version and should be fixed. For transfer of variables of different size additional devObject functions currently should be used (part and treatas).
[edit] Error handling:
The same error messages as from transfer_i4.
[edit] Examples of use:
integer ArrayA(373,67) type(devVar) dv_ArrayA ArrayA=3; dv_ArrayA=allocate_dv(‘integer’,373,67) call transfer_i(ArrayA,treatas(dv_ArrayA,373,67),.true.) … call transfer_i(ArrayA,treatas(dv_ArrayA,373,67),.false.)
[edit] Comments:
Despite the integer variable may have any size on CPU (e.g. 8 bytes), the device variable will be in any case integer(4). Also due to arbitrary type implementation this subroutine is slower than transfer_i4.
[edit] subroutine transfer_r4(A,dv_A,cpu2gpu)
! transfers real(4) data from CPU to GPU and back
[edit] Input:
real(4) A(*)
type(devVar) dv_A
logical cpu2gpu
[edit] Output:
none
[edit] Description:
The same as transfer_i4, but for real(4).
[edit] Error handling:
The same as transfer_i4, but for real(4).
[edit] Examples of use:
real(4) ArrayA(373,67) type(devVar) dv_ArrayA ArrayA=3.0; dv_ArrayA=allocate_dv(‘real’,373,67) call transfer_r4(ArrayA,dv_ArrayA,.true.) … call transfer_r4(ArrayA,dv_ArrayA,.false.)
[edit] Comments:
[edit] subroutine transfer_r(A,dv_A,cpu2gpu)
! transfers real data from CPU to GPU and back
[edit] Input:
real A(*)
type(devVar) dv_A
logical cpu2gpu
[edit] Output:
none
[edit] Description:
The same as transfer_i, but for real case.
[edit] Error handling:
The same error messages as from transfer_i/i4.
[edit] Examples of use:
real ArrayA(373,67) type(devVar) dv_ArrayA ArrayA=3.0; dv_ArrayA=allocate_dv(‘real’,373,67) call transfer_r(ArrayA,treatas(dv_ArrayA,373,67),.true.) … call transfer_r(ArrayA,treatas(dv_ArrayA,373,67),.false.)
[edit] Comments:
[edit] subroutine transfer_c4(A,dv_A,cpu2gpu)
! transfers complex(4) data from CPU to GPU and back
[edit] Input:
complex(4) A(*)
type(devVar) dv_A
logical cpu2gpu
[edit] Output:
none
[edit] Description:
The same as transfer_i4, but for complex(4).
[edit] Error handling:
The same as transfer_i4, but for complex(4).
[edit] Examples of use:
complex(4) ArrayA(373,67) type(devVar) dv_ArrayA ArrayA=(3.0,-0.123); dv_ArrayA=allocate_dv(‘complex’,373,67) call transfer_c4(ArrayA,dv_ArrayA,.true.) … call transfer_c4(ArrayA,dv_ArrayA,.false.)
[edit] Comments:
See comments for transfer_i4.
[edit] subroutine transfer_c(A,dv_A,cpu2gpu)
! transfers complex data from CPU to GPU and back
[edit] Input:
complex A(*)
type(devVar) dv_A
logical cpu2gpu
[edit] Output:
none
[edit] Description:
The same as transfer_i, but for complex case.
[edit] Error handling:
The same error messages as from transfer_i.
[edit] Examples of use:
complex ArrayA(373,67) type(devVar) dv_ArrayA ArrayA=(3.0,-0.123); dv_ArrayA=allocate_dv(‘complex’,373,67) call transfer_c(ArrayA,treatas(dv_ArrayA,373,67),.true.) … call transfer_c(ArrayA,treatas(dv_ArrayA,373,67),.false.)
[edit] Comments:
[edit] Access to the parts of the device variables
[edit] function part(dv_A,i1,i2,j1,j2,k1,k2)
! allows to work with a part of allocated device variable dv_A;
! parameters j1,j2,k1,k2 are optional and needed for the objects of dimensionalities 2 and 3.
[edit] Input:
type(devVar) dv_A
integer i1,i2,j1,j2,k1,k2
optional j1,j2,k1,k2
[edit] Output:
type(devVar) part
[edit] Description:
Does not involve any CUBLAS function and operates only with the devVar structure, so it is applicable for any type (integer,real,complex). Designed to use with other functions, for which the sizing attributes should be changed to access the part of the variable (e.g. transfer data to a part of variable, copy, or apply any CUBLAS function only to a part of variable). Constructions involving part are similar to the Fortran and Matlab constructions, when working with parts (e.g. despite, matrix A has size NxM one can use its part only in the form A(i1:i2,j1:j2).
[edit] Error handling:
None.
[edit] Examples of use:
1)complex ArrayA(373,67) type(devVar) dv_ArrayA ArrayA=(3.0,-0.123); dv_ArrayA=allocate_dv(‘complex’,900,100) call transfer_c4(ArrayA,part(dv_ArrayA,3,375,18,84),.true.) … call transfer_c4(ArrayA,part(dv_ArrayA,2,374,20,86),.false.)2)
dv_B=allocate_dv(‘real’,1889) call devf_zeros(part(dv_B, 35,99))3)
dv_C=allocate_dv(‘integer’,12,56,77) …. dv_D=allocate_dv(‘integer’,5,7,130) call copy3d(part(dv_C,2,3,6,8,3,20),part(dv_D,1,2,1,3,5,22))
[edit] Comments:
In the first example, the size of a 2D complex CPU array is smaller than the size of the GPU array, while we need to transfer data only to the part of the GPU array (3:375, 18:84).
The second example calls standard DevObject function devf_zeros (which initializes the entire array passed as its argument to zero). Call of this function with part provides that only part of dv_B from 35 to 99 will be initialized to zeros without touching any other entry. (This equivalent to Fortran operator B(35:99)=0.0).
In the third example a standard DevObject function copy3d(which copies on the device one variable to the other, assuming that they have the same size) is called for the part of the objects.
[edit] function get_i(dv_A,i,j,k)
! returns the value of the element (i,j,k) of the integer device variable dv_A
! j and k are optional and needed for the objects of dimensionalities 2 and 3.
[edit] Input:
type(devVar) dv_A
integer i,j,k
optional j,k
[edit] Output:
integer(4) get_i
[edit] Description:
Equivalent of the Fortran operator =A(i,j,k) (or =A(i), =A(i,j)). Involves CUBLAS function CUBLAS_GET_VECTOR.
[edit] Error handling:
None.
[edit] Examples of use:
1)type(devVar) dv_A dv_A=allocate_dv(‘integer’,900,100) call devf_zeros(dv_A) write(*,*) get_i(dv_A,12,19)2)
dv_B=allocate_dv(‘integer’,1889) … do i=3,37 write(*,*) i,get_i(dv_B,i) enddo …3)
integer C1(3) … dv_C=allocate_dv(‘integer’,12,56,77) …. C1(1)=get_i(dv_C,2,5,7) C1(2)=get_i(dv_C,3,3,3) C1(3)=get_i(dv_C,11,15,70) …
[edit] Comments:
In the first example the value of element (12,19) of 2D array dv_A allocated on the device will be printed on the screen. Second example shows the output for several consequent values of 1D array. In the third example three elements of the 3D array dv_C are assigned to the C1 variable allocated on the host (CPU). Despite the function seems universal, it is used mainly in the debugging process for random access of the array allocated on the device. Also it can be used if just a few values need to be transferred from GPU to CPU. In production (release) subroutines for massive data transfer library subroutines transfer_i4 or transfer_i should be used (as each call of get_i is as expensive as call of transfer_i4).
[edit] function get_s(dv_A,i,j,k)
! returns the value of the element (i,j,k) of the single precision device variable dv_A
! j and k are optional and needed for the objects of dimensionalities 2 and 3.
[edit] Input:
type(devVar) dv_A
integer i,j,k
optional j,k
[edit] Output:
real(4) get_s
[edit] Description:
[edit] Error handling:
None.
[edit] Examples of use:
1)type(devVar) dv_A dv_A=allocate_dv(‘real’,900,100) call devf_zeros(dv_A) write(*,*) get_s(dv_A,12,19)2)
dv_B=allocate_dv(‘real’,1889) … do i=3,37 write(*,*) i,get_s(dv_B,i) enddo …3)
real C1(3) … dv_C=allocate_dv(‘real’,12,56,77) …. C1(1)=get_s(dv_C,2,5,7) C1(2)=get_s(dv_C,3,3,3) C1(3)=get_s(dv_C,11,15,70) …
[edit] Comments:
Similar to get_i.
[edit] function get_c(dv_A,i,j,k)
! returns the value of the element (i,j,k) of the single precision complex device variable dv_A
! j and k are optional and needed for the objects of dimensionalities 2 and 3.
[edit] Input:
type(devVar) dv_A
integer i,j,k
optional j,k
[edit] Output:
complex(4) get_c
[edit] Description:
[edit] Error handling:
None.
[edit] Examples of use:
1)type(devVar) dv_A dv_A=allocate_dv(‘complex’,900,100) call devf_zeros(dv_A) write(*,*) get_c(dv_A,12,19)2)
dv_B=allocate_dv(‘complex’,1889) … do i=3,37 write(*,*) i,get_c(dv_B,i) enddo …3)
complex C1(3) … dv_C=allocate_dv(‘complex’,12,56,77) …. C1(1)=get_c(dv_C,2,5,7) C1(2)=get_c(dv_C,3,3,3) C1(3)=get_c(dv_C,11,15,70) …
[edit] Comments:
Similar to get_i.
[edit] subroutine set_i(dv_A,val,i,j,k)
! sets the value of the element (i,j,k) of the integer device variable dv_A to val
! j and k are optional and needed for the objects of dimensionalities 2 and 3.
[edit] Input:
type(devVar) dv_A
integer val
integer i,j,k
optional j,k
[edit] Output:
none
[edit] Description:
Equivalent of the Fortran operator A(i,j,k)= (or A(i)=, A(i,j)=). Involves CUBLAS function CUBLAS_SET_VECTOR.
[edit] Error handling:
None.
[edit] Examples of use:
1)type(devVar) dv_A dv_A=allocate_dv(‘integer’,900,100) call set_i(dv_A,35467,12,19)2)
dv_B=allocate_dv(‘integer’,1889) … do i=3,37 call set_i(dv_B,i*i,i) enddo …3)
integer C1(3) … dv_C=allocate_dv(‘integer’,12,56,77) …. call set_i(dv_C,C1(1),2,5,7) call set_i(dv_C,C1(2),3,3,3) call set_i(dv_C,C1(3),11,15,70) …
[edit] Comments:
In the first example the value of element (12,19) of 2D array dv_A allocated on the device will be set to 35467. In the second example elements with index i of 1D array dv_B allocated on the device will take values i*i for i=3,…,37. In the third example three elements of the 3D array dv_C (2,5,7), (3,3,3) and (11,15,70) will take values C1(1), C1(2), and C1(3), respectively, where C1 is an array allocated on the host.
Despite the function seems universal, it is used mainly in the debugging process for random access of the array allocated on the device. Also it can be used if just a few values need to be transferred from CPU to GPU. In production (release) subroutines for massive data transfer library subroutines transfer_i4 or transfer_i should be used (as each call of set_i is as expensive as call of transfer_i4).
[edit] subroutine set_s(dv_A,val,i,j,k)
! sets the value of the element (i,j,k) of the single precision device variable dv_A to val
! j and k are optional and needed for the objects of dimensionalities 2 and 3.
[edit] Input:
type(devVar) dv_A
real val
integer i,j,k
optional j,k
[edit] Output:
none
[edit] Description:
See set_i.
[edit] Error handling:
None.
[edit] Examples of use:
1)type(devVar) dv_A dv_A=allocate_dv(‘real’,900,100) call set_s(dv_A,3.54e-5,12,19)2)
dv_B=allocate_dv(‘real’,1889) … do i=3,37 call set_i(dv_B,0.5+i*i,i) enddo …3)
real C1(3) … dv_C=allocate_dv(‘real’,12,56,77) …. call set_s(dv_C,C1(1),2,5,7) call set_s(dv_C,C1(2),3,3,3) call set_s(dv_C,C1(3),11,15,70) …
[edit] Comments:
See set_i.
[edit] subroutine set_c(dv_A,val,i,j,k)
! sets the value of the element (i,j,k) of the single precision complex device variable dv_A to val
! j and k are optional and needed for the objects of dimensionalities 2 and 3.
[edit] Input:
type(devVar) dv_A
complex val
integer i,j,k
optional j,k
[edit] Output:
none
[edit] Description:
See set_i.
[edit] Error handling:
None.
[edit] Examples of use:
1)type(devVar) dv_A dv_A=allocate_dv(‘complex’,900,100) call set_s(dv_A,(3.54e-5,-0.099),12,19)2)
dv_B=allocate_dv(‘real’,1889) … do i=3,37 call set_i(dv_B,cmplx(0.5+i*i,0.1-i),i) enddo …3)
complex C1(3) … dv_C=allocate_dv(‘complex’,12,56,77) …. call set_c(dv_C,C1(1),2,5,7) call set_c(dv_C,C1(2),3,3,3) call set_c(dv_C,C1(3),11,15,70) …
[edit] Comments:
See set_i.
[edit] Change of the device variable dimensionality attributes
[edit] function treatas(dv_A,nx,ny,nz)
! allows to change dimensionality attributes of device variable dv_A to work with different dimensionality subroutines;
! parameters ny and nz are optional and needed for the 2D and 3D objects.
[edit] Input:
type(devVar) dv_A
integer nx,ny,nz
optional ny,nz
[edit] Output:
type(devVar) treatas
[edit] Description:
Does not involve any CUBLAS function and operates only with the devVar structure, so it is applicable for any type (integer,real,complex). Designed to use with other functions, for which the dimensionality attributes should be changed (e.g. transfer data, copy, or apply some CUBLAS functions). This function does not copy anything. So the data on the device stay at the same place. However, one can treat, say 2D array allocated by a standard routine as 1D array, with respective element indexing (column-wise order). Function treatas also can be used to change leading dimensions of the arrays and provide device variables of length, which coincides with the actual length of variables.
[edit] Error handling:
None.
[edit] Examples of use:
1)complex ArrayA(373,67) type(devVar) dv_ArrayA ArrayA=(3.0,-0.123); dv_ArrayA=allocate_dv(‘complex’,900,100) call transfer_c(ArrayA,part(dv_ArrayA,3,375,18,84),.true.) … call transfer_c(ArrayA,part(dv_ArrayA,2,374,20,86),.false.)2)
dv_B=allocate_dv(‘real’,1889) call devf_zeros(part(dv_B, 35,99))3)
dv_C=allocate_dv(‘integer’,12,56,77) …. dv_D=allocate_dv(‘integer’,5,7,130) call copy3d(part(dv_C,2,3,6,8,3,20),part(dv_D,1,2,1,3,5,22))
[edit] Comments:
In the first example, the size of a 2D complex CPU array is smaller than the size of the GPU array, while we need to transfer data only to the part of the GPU array (3:375, 18:84).
The second example calls standard DevObject function devf_zeros (which initializes the entire array passed as its argument to zero). Call of this function with 'part' provides that only part of dv_B from 35 to 99 will be initialized to zeros without touching any other entry. (This equivalent to Fortran operator B(35:99)=0.0).
In the third example a standard DevObject function copy3d(which copies on the device one variable to the other, assuming that they have the same size) is called for the part of the objects.
