Question

We see that this computes the product of two matrices. Add a new kernel code, called...

We see that this computes the product of two matrices. Add a new kernel code, called sum, to compute the sum of the two matrices.

#include <stdio.h>
#include <math.h>
#include <sys/time.h>

#define TILE_WIDTH 2
#define WIDTH 6

// Kernel function execute by the device (GPU)
__global__ void
product (float *d_a, float *d_b, float *d_c, const int n) {
   int col = blockIdx.x * blockDim.x + threadIdx.x ;
   int row = blockIdx.y * blockDim.y + threadIdx.y ;

   float sum = 0;
   if (row < n && col < n) {
      for (int i = 0 ; i<n ; ++i) {
         sum += d_a[row * n + i ] * d_b[i * n + col] ;
      }
      d_c[row * n + col] = sum;
   }
}


// Utility function to print the input matrix
void printMatrix (float m[][WIDTH]) {
   int i, j;
   for (i = 0; i<WIDTH; ++i) {
      for (j = 0; j< WIDTH; ++j) {
         printf ("%d\t", (int)m[i][j]);
      }
      printf ("\n");
   }
}

// Main function execute by the host (CPU)
int main () {
   // host matrices
   float host_a[WIDTH][WIDTH],
         host_b[WIDTH][WIDTH],
         host_c[WIDTH][WIDTH];

   // device arrays
   float *device_a, *device_b, *device_c;

   int i, j;

   // initialize host matrices using random numbers
   time_t t;
   srand ((unsigned) time(&t));

   for (i = 0; i<WIDTH; ++i) {
      for (j = 0; j<WIDTH; j++) {
         host_a[i][j] = (float) (rand() % 50);
         host_b[i][j] = (float) (rand() % 50);
      }
   }

   printf ("Matrix A:\n");
   printMatrix (host_a);
   printf ("\n");

   printf ("Matrix B:\n");
   printMatrix (host_b);
   printf ("\n");

   // allocate device memory for input matrices
   size_t deviceSize = WIDTH * WIDTH * sizeof (float);
   cudaMalloc ((void **) &device_a, deviceSize);
   cudaMalloc ((void **) &device_b, deviceSize);

   // copy host matrices to device
   cudaMemcpy (device_a, host_a, deviceSize, cudaMemcpyHostToDevice );
   cudaMemcpy (device_b, host_b, deviceSize, cudaMemcpyHostToDevice );

   // allocate device memory to store computed result
   cudaMalloc((void **) &device_c, deviceSize) ;

   dim3 dimBlock (WIDTH, WIDTH);
   dim3 dimGrid (WIDTH/TILE_WIDTH, WIDTH/TILE_WIDTH);
   product<<<dimGrid, dimBlock>>> (device_a, device_b, device_c, WIDTH);

   // copy result from device back to host
   cudaMemcpy (host_c, device_c, deviceSize, cudaMemcpyDeviceToHost);

   // output the computed result matrix
   printf ("A x B: \n");
   printMatrix (host_c);

   cudaFree (device_a);
   cudaFree (device_b);
   cudaFree (device_c);
   return 0;
}

Homework Answers

Answer #1

Sum function code:

// Kernel function execute by the device (GPU)
__global__ void
sum (float *d_a, float *d_b,float *d_d,const int n) {
int i,j;
for (i = 0; i<n; ++i) {
      for (j = 0; j<n; j++) {
         d_d[i][j] = d_a[i][j] + d_b[i][j];
      }
   }
}

In main function add these lines:

dim4 dimBlock (WIDTH, WIDTH);
   dim4 dimGrid (WIDTH/TILE_WIDTH, WIDTH/TILE_WIDTH);
   sum<<<dimGrid, dimBlock>>> (device_a, device_b, device_d, WIDTH);

// copy result from device back to host
   cudaMemcpy (host_d, device_d, deviceSize, cudaMemcpyDeviceToHost);

   // output the computed result matrix
   printf ("A + B: \n");
   printMatrix (host_d);

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
This is C programming assignment. The objective of this homework is to give you practice using...
This is C programming assignment. The objective of this homework is to give you practice using make files to compose an executable file from a set of source files and adding additional functions to an existing set of code. This assignment will give you an appreciation for the ease with which well designed software can be extended. For this assignment, you will use both the static and dynamic assignment versions of the matrix software. Using each version, do the following:...
This is my code and can you please tell me why it's not working? By the...
This is my code and can you please tell me why it's not working? By the way, it will work if I reduce 10,000,000 to 1,000,000. #include <iostream> using namespace std; void radixSort(int*a, int n) { int intBitSize = sizeof(int)<<3; int radix = 256; int mask = radix-1; int maskBitLength = 8;    int *result = new int[n](); int *buckets = new int[radix](); int *startIndex = new int[radix]();    int flag = 0; int key = 0; bool hasNeg =...
I need to add two trivial functions to the following code. I am having an issue...
I need to add two trivial functions to the following code. I am having an issue with what i may need to change in the current code to meet the requirements. I am already displaying the bank record but now using a function Here are some examples of what i can add to the code 1. Obtain opening (current) balance. 2. Obtain number the number of deposits. 3. Obtain number of withdrawals. 4. Obtain deposit amounts. 5. Obtain withdrawal amounts....
It is N queens problem please complete it use this code //*************************************************************** // D.S. Malik //...
It is N queens problem please complete it use this code //*************************************************************** // D.S. Malik // // This class specifies the functions to solve the n-queens // puzzle. //*************************************************************** class nQueensPuzzle { public: nQueensPuzzle(int queens = 8);     //constructor     //Postcondition: noOfSolutions = 0; noOfQueens = queens;     // queensInRow is a pointer to the array     // that store the n-tuple.     // If no value is specified for the parameter queens,     // the default value, which is 8, is assigned to it. bool...
IN C PROGRAMMING A Tv_show structure keeps track of a tv show’s name and the channels...
IN C PROGRAMMING A Tv_show structure keeps track of a tv show’s name and the channels (integer values) that broadcast the show. For this problem you can ONLY use the following string library functions: strcpy, strlen, strcmp. You MAY not use memcpy, memset, memmove. You can assume memory allocations are successful (you do not need to check values returned by malloc nor calloc). typedef struct tv_show { char *name; int num_channels, *channels; } Tv_show; a. Implement the init_tv_show function that...
It is about C++linked list code. my assignment is making 1 function, in below circumstance,(some functions...
It is about C++linked list code. my assignment is making 1 function, in below circumstance,(some functions are suggested for easier procedure of making function.) void search_node(struct linked_list* list, int find_node_ value) (The function to make) This function finds the node from the list that value is same with find_node_value and count the order of the node. This function should print message “The order of (node_value) is (order).” and error message “Function search_node : There is no such node to search.”....
For the following code in C, I want a function that can find "america" from the...
For the following code in C, I want a function that can find "america" from the char array, and print "america is on the list" else "america is not on the list" (Is case sensitive). I also want a function to free the memory at the end of the program. #include <stdio.h> #include <stdlib.h> struct Node { void *data; struct Node *next; }; struct List { struct Node *head; }; static inline void initialize(struct List *list) { list->head = 0;...
Please answer the following C question: Read the following files called array-utils5A.c and array-utils5A.h. Build an...
Please answer the following C question: Read the following files called array-utils5A.c and array-utils5A.h. Build an executable with gcc -Wall -DUNIT_TESTS=1 array-utils5A.c The definitions for is_reverse_sorted and all_different are both defective. Rewrite the definitions so that they are correct. The definition for is_alternating is missing. Write a correct definition for that function, and add unit tests for it, using the unit tests for is_reverse_sorted and all_different as models. Please explain the logic errors present in in the definition of is_reverse_sorted...
Using the C programming language implement Heapsort in the manner described in class. Here is some...
Using the C programming language implement Heapsort in the manner described in class. Here is some example code to use as a guideline. Remember, you need only implement the sort algorithm, both the comparison and main functions have been provided. /* * * after splitting this file into the five source files: * * srt.h, main.c, srtbubb.c, srtinsr.c, srtmerg.c * * compile using the command: * * gcc -std=c99 -DRAND -DPRNT -DTYPE=(float | double) -D(BUBB | HEAP | INSR |...
The code is in C programming language pls convert it into python. Thanks. Program --> #include...
The code is in C programming language pls convert it into python. Thanks. Program --> #include <stdio.h> #include <stdlib.h> void main() { //declare variables FILE *fileptr; char filename[15]; char charRead; char filedata[200],searchString[50]; int i=0,j=0,countNoOfWord=0,count=0; //enter the filename to be opened printf("Enter the filename to be opened \n"); scanf("%s", filename); /* open the file for reading */ fileptr = fopen(filename, "r"); //check file exit if (fileptr == NULL) { printf("Cannot open file \n"); exit(0); } charRead = fgetc(fileptr); //read the string...
ADVERTISEMENT
Need Online Homework Help?

Get Answers For Free
Most questions answered within 1 hours.

Ask a Question
ADVERTISEMENT