Custom ethernet driver problem
I dont know if this questions is relevant here or superuser, but ask anyway.
I have below mentioned setup- A Linux Desktop PC system. To this is connected one custom FPGA development board.In this FPGA there is an Ethernet Network Card IP realized and executing. This board is connected to the FPGA development board using USB ports-USB cable, and Serial cable. Essentially this whole setup tests the FPGA based Network card and the associated ethernet drivers realized thereon.
There are many applications which run on the host linux pc and send the data to the FPGA based Ethernet n/w card, which accepts it, does the necessary processing and sends to the physical layer realized on the FPGA which then sends it out over the ethernet network to some other node/device on the network.
This setup works fine, even when multiple applications from the host-pc send data to the FPGA network card. As one of the applications, i use a Linux based VLC player(its a multimedia player) to play some multimedia streams from the Linux-host and that data is sent to the FPGA network card. In the VLC player, i seek (reverse/ forward) the stream using the vlc player controls. When i do this seek operation continuously, it makes the linux host pc. hang/freeze. No i/o device work, only reboot works.
Now i tried to see the logs in the linux host pc /var/log (dmesg) to see if i get a clue about what process/application caused the freeze, but i could not get any input from it.
How do i proceed to isolate different components(Software , Hardware, ) involved in this whole setup of mine to narrow on the problem root cause?
Is there any way to communicate to the frozen linux-host via some means(Serial cable or some other connection to get any data from it when it hangs?
What steps i should follow? How can i tell if the VLC application is a problem or the FPGA network card driver i开发者_StackOverflow社区s a problem, or the something else ?
Any pointers will be useful.
Thanks.
-AD.
You mention that the Linux host is frozen. I would first determine if it is actually locked up in the kernel or if there is some user space process(es) consuming too much CPU.
Can the host be pinged (preferably on an interface separate from your FPGA Ethernet card)? If it replies, the kernel is not locked up.
Hardware Problem?
If possible, can the setup be temporarily changed to remove the FPGA Ethernet card and then reproduce the problem? I would do this to help isolate issues specifically related to the hardware (FPGA Ethernet).
User Space (Software) Problem?
If you remove VLC from the equation, can you still get the lockup/hang to happen by using another method to generate Ethernet traffic?
You might try creating a shell that runs at a higher priority in order to retrieve data when the system seems to hang. Perhaps by running top in this high priority shell you can determine who, if anyone, is using all the CPU. You can run this shell over the network (telnet/ssh) or via a serial terminal.
#include <errno.h>
#include <sched.h>
#include <stdio.h>
#include <unistd.h>
int main()
{
struct sched_param scheduling_parameters;
scheduling_parameters.sched_priority = 10;
if (sched_setscheduler(getpid(), SCHED_FIFO, &scheduling_parameters) < 0) {
printf("error is %d\n", errno);
}
execlp("/bin/bash", "bash",0, 0, 0, 0);
return 0;
}
Kernel (Software) Problem?
You can enable the magic sysrq key and examine the system state and go from there. Kernel developers use this interface to debug their software. The CONFIG_MAGIC_SYSRQ option has to be enabled at kernel compile time in order to use this functionality.
After empirically narrowing a bug down to a specific module, printk() is still a good resource.
It may also be helpful to enable the kernel debugger (KDB) and connect to it via a serial cable.
@Jscheimer: thanks for a detailed pointers on my problem. After lot of debugging & some discussions with other system developers at work place, i finally found the root cause. There is a DMA peripheral which comes into picture in this whole setup. The DMA was comfigured for aligned access, but somewhere in some data transfer it was receiving a unaligned address, as a result of not checking the buffer alignment by me in the code, which was causing the hang/freeze. And there was no pattern to this behaviour.
精彩评论