Question

We see that this computes the product of two matrices. Add a new kernel code, called...

We see that this computes the product of two matrices. Add a new kernel code, called sum, to compute the sum of the two matrices.

#include <stdio.h>
#include <math.h>
#include <sys/time.h>

#define TILE_WIDTH 2
#define WIDTH 6

// Kernel function execute by the device (GPU)
__global__ void
product (float *d_a, float *d_b, float *d_c, const int n) {
   int col = blockIdx.x * blockDim.x + threadIdx.x ;
   int row = blockIdx.y * blockDim.y + threadIdx.y ;

   float sum = 0;
   if (row < n && col < n) {
      for (int i = 0 ; i<n ; ++i) {
         sum += d_a[row * n + i ] * d_b[i * n + col] ;
      }
      d_c[row * n + col] = sum;
   }
}


// Utility function to print the input matrix
void printMatrix (float m[][WIDTH]) {
   int i, j;
   for (i = 0; i<WIDTH; ++i) {
      for (j = 0; j< WIDTH; ++j) {
         printf ("%d\t", (int)m[i][j]);
      }
      printf ("\n");
   }
}

// Main function execute by the host (CPU)
int main () {
   // host matrices
   float host_a[WIDTH][WIDTH],
         host_b[WIDTH][WIDTH],
         host_c[WIDTH][WIDTH];

   // device arrays
   float *device_a, *device_b, *device_c;

   int i, j;

   // initialize host matrices using random numbers
   time_t t;
   srand ((unsigned) time(&t));

   for (i = 0; i<WIDTH; ++i) {
      for (j = 0; j<WIDTH; j++) {
         host_a[i][j] = (float) (rand() % 50);
         host_b[i][j] = (float) (rand() % 50);
      }
   }

   printf ("Matrix A:\n");
   printMatrix (host_a);
   printf ("\n");

   printf ("Matrix B:\n");
   printMatrix (host_b);
   printf ("\n");

   // allocate device memory for input matrices
   size_t deviceSize = WIDTH * WIDTH * sizeof (float);
   cudaMalloc ((void **) &device_a, deviceSize);
   cudaMalloc ((void **) &device_b, deviceSize);

   // copy host matrices to device
   cudaMemcpy (device_a, host_a, deviceSize, cudaMemcpyHostToDevice );
   cudaMemcpy (device_b, host_b, deviceSize, cudaMemcpyHostToDevice );

   // allocate device memory to store computed result
   cudaMalloc((void **) &device_c, deviceSize) ;

   dim3 dimBlock (WIDTH, WIDTH);
   dim3 dimGrid (WIDTH/TILE_WIDTH, WIDTH/TILE_WIDTH);
   product<<<dimGrid, dimBlock>>> (device_a, device_b, device_c, WIDTH);

   // copy result from device back to host
   cudaMemcpy (host_c, device_c, deviceSize, cudaMemcpyDeviceToHost);

   // output the computed result matrix
   printf ("A x B: \n");
   printMatrix (host_c);

   cudaFree (device_a);
   cudaFree (device_b);
   cudaFree (device_c);
   return 0;
}

Homework Answers

Answer #1

Sum function code:

// Kernel function execute by the device (GPU)
__global__ void
sum (float *d_a, float *d_b,float *d_d,const int n) {
int i,j;
for (i = 0; i<n; ++i) {
      for (j = 0; j<n; j++) {
         d_d[i][j] = d_a[i][j] + d_b[i][j];
      }
   }
}

In main function add these lines:

dim4 dimBlock (WIDTH, WIDTH);
   dim4 dimGrid (WIDTH/TILE_WIDTH, WIDTH/TILE_WIDTH);
   sum<<<dimGrid, dimBlock>>> (device_a, device_b, device_d, WIDTH);

// copy result from device back to host
   cudaMemcpy (host_d, device_d, deviceSize, cudaMemcpyDeviceToHost);

   // output the computed result matrix
   printf ("A + B: \n");
   printMatrix (host_d);

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
This is C programming assignment. The objective of this homework is to give you practice using...
This is C programming assignment. The objective of this homework is to give you practice using make files to compose an executable file from a set of source files and adding additional functions to an existing set of code. This assignment will give you an appreciation for the ease with which well designed software can be extended. For this assignment, you will use both the static and dynamic assignment versions of the matrix software. Using each version, do the following:...
It is N queens problem please complete it use this code //*************************************************************** // D.S. Malik //...
It is N queens problem please complete it use this code //*************************************************************** // D.S. Malik // // This class specifies the functions to solve the n-queens // puzzle. //*************************************************************** class nQueensPuzzle { public: nQueensPuzzle(int queens = 8);     //constructor     //Postcondition: noOfSolutions = 0; noOfQueens = queens;     // queensInRow is a pointer to the array     // that store the n-tuple.     // If no value is specified for the parameter queens,     // the default value, which is 8, is assigned to it. bool...
IN C PROGRAMMING A Tv_show structure keeps track of a tv show’s name and the channels...
IN C PROGRAMMING A Tv_show structure keeps track of a tv show’s name and the channels (integer values) that broadcast the show. For this problem you can ONLY use the following string library functions: strcpy, strlen, strcmp. You MAY not use memcpy, memset, memmove. You can assume memory allocations are successful (you do not need to check values returned by malloc nor calloc). typedef struct tv_show { char *name; int num_channels, *channels; } Tv_show; a. Implement the init_tv_show function that...
It is about C++linked list code. my assignment is making 1 function, in below circumstance,(some functions...
It is about C++linked list code. my assignment is making 1 function, in below circumstance,(some functions are suggested for easier procedure of making function.) void search_node(struct linked_list* list, int find_node_ value) (The function to make) This function finds the node from the list that value is same with find_node_value and count the order of the node. This function should print message “The order of (node_value) is (order).” and error message “Function search_node : There is no such node to search.”....
For the following code in C, I want a function that can find "america" from the...
For the following code in C, I want a function that can find "america" from the char array, and print "america is on the list" else "america is not on the list" (Is case sensitive). I also want a function to free the memory at the end of the program. #include <stdio.h> #include <stdlib.h> struct Node { void *data; struct Node *next; }; struct List { struct Node *head; }; static inline void initialize(struct List *list) { list->head = 0;...
Please answer the following C question: Read the following files called array-utils5A.c and array-utils5A.h. Build an...
Please answer the following C question: Read the following files called array-utils5A.c and array-utils5A.h. Build an executable with gcc -Wall -DUNIT_TESTS=1 array-utils5A.c The definitions for is_reverse_sorted and all_different are both defective. Rewrite the definitions so that they are correct. The definition for is_alternating is missing. Write a correct definition for that function, and add unit tests for it, using the unit tests for is_reverse_sorted and all_different as models. Please explain the logic errors present in in the definition of is_reverse_sorted...
Using the C programming language implement Heapsort in the manner described in class. Here is some...
Using the C programming language implement Heapsort in the manner described in class. Here is some example code to use as a guideline. Remember, you need only implement the sort algorithm, both the comparison and main functions have been provided. /* * * after splitting this file into the five source files: * * srt.h, main.c, srtbubb.c, srtinsr.c, srtmerg.c * * compile using the command: * * gcc -std=c99 -DRAND -DPRNT -DTYPE=(float | double) -D(BUBB | HEAP | INSR |...
Q3) Write a function that takes two arrays and their size as inputs, and calculates their...
Q3) Write a function that takes two arrays and their size as inputs, and calculates their inner product (note that both arrays must have the same size so only one argument is needed to specify their size). The inner product of two arrays A and B with N elements is a scalar value c defined as follows: N−1 c=A·B= A(i)B(i)=A(0)B(0)+A(1)B(1)+···+A(N−1)B(N−1), i=0 where A(i) and B(i) are the ith elements of arrays A and B, respectively. For example, theinnerproductofA=(1,2)andB=(3,3)isc1 =9;andtheinnerproductofC= (2,5,4,−2,1)...
Write a method that returns the sum of all the elements in a specified column in...
Write a method that returns the sum of all the elements in a specified column in a 3 x 4 matrix using the following header: public static double sumColumn(double[][] m, int columnIndex) The program should be broken down into methods, menu-driven, and check for proper input, etc. The problem I'm having is I'm trying to get my menu to execute the runProgram method. I'm not sure what should be in the parentheses to direct choice "1" to the method. I'm...
For a C program hangman game: Create the function int setup_game [int setup_game ( Game *g,...
For a C program hangman game: Create the function int setup_game [int setup_game ( Game *g, char wordlist[][MAX_WORD_LENGTH], int numwords)] for a C program hangman game. (The existing code for other functions and the program is below, along with what the function needs to do) What int setup_game needs to do setup_game() does exactly what the name suggests. It sets up a new game of hangman. This means that it picks a random word from the supplied wordlist array and...
ADVERTISEMENT
Need Online Homework Help?

Get Answers For Free
Most questions answered within 1 hours.

Ask a Question
ADVERTISEMENT