socket

sockets 可以说是 standard Unix file descriptors。
read() write() 也可以,但是send() and recv() 会有更好的控制。A file descriptor is simply an integer associated with an open file. But (and here’s the catch), that file can be a network connection, a FIFO, a pipe, a terminal, a real on-the-disk file, or just about anything else. Everything in Unix is a file!

两种 Internet Sockets

Stream sockets 就是用的 TCP (the transimission control protocol)。IP 代表的是 Internet Protocol。
Datagram sockets 用 UDP (User Datagram protocol)。 tftp (trivial file transfer protocol, a little brother to FTP), dhcpcd (a DHCP client), multiplayer games, streaming audio, video conferencing, etc.

IP Addresses

1
2
3
4
struct sockaddr_in sa; // IPv4
struct sockaddr_in6 sa6; // IPv6
inet_pton(AF_INET, "10.12.110.57", &(sa.sin_addr)); // IPv4
inet_pton(AF_INET6, "2001:db8:63b3:1::3490", &(sa6.sin6_addr)); // IPv6

该函数将IP地址转换为 sockaddr_in 结构。
“pton” stands for “presentation to network”—you can call it “printable to network” if that’s easier to remember.
以前这个功能的函数叫 inet_addr 或者是 inet_aton, 但是用不了 ipv6 了。

inet_ntop() “ntop” means “network to presentation”

1
2
3
4
char ip4[INET_ADDRSTRLEN]; // space to hold the IPv4 string
struct sockaddr_in sa; // pretend this is loaded with something
inet_ntop(AF_INET, &(sa.sin_addr), ip4, INET_ADDRSTRLEN);
printf("The IPv4 address is: %s\n", ip4);

getaddrinfo() 可以得到IP地址。

socket()

1
2
3
#include <sys/types.h>
#include <sys/socket.h>
int socket(int domain, int type, int protocol);

参数可以硬编码,(domain is PF_INET or PF_INET6, type is SOCK_STREAM or SOCK_DGRAM, and protocol can be set to 0 to choose the proper protocol for the given type
也可以用getprotobyname() 传”TCP” “UDP”

1
2
3
4
5
6
7
8
9
10
int s;
struct addrinfo hints, *res;
// do the lookup
// [pretend we already filled out the "hints" struct]
getaddrinfo("www.example.com", "http", &hints, &res);
// [again, you should do error-checking on getaddrinfo(), and walk
// the "res" linked list looking for valid entries instead of just
// assuming the first one is good (like many of these examples do.)
// See the section on client/server for real examples.]
s = socket(res->ai_family, res->ai_socktype, res->ai_protocol);

socket 调用成功则返回 a socket descriptor, 失败返回-1。
全局变量 errno 将会被设置。

一旦你有了 socket,你应该将它跟本地机器的端口绑定。
服务端就需要 listen(), 客户端需要 connect()。

1
2
3
#include <sys/types.h>
#include <sys/socket.h>
int bind(int sockfd, struct sockaddr *my_addr, int addrlen);

sockfd 就是 a socket descriptor, my_addr 需要指向 struct sockaddr,addrlen 是地址的字节数的长度

1
2
3
4
5
6
7
8
9
10
11
12
struct addrinfo hints, *res;
int sockfd;
// first, load up address structs with getaddrinfo():
memset(&hints, 0, sizeof hints);
hints.ai_family = AF_UNSPEC; // use IPv4 or IPv6, whichever
hints.ai_socktype = SOCK_STREAM;
hints.ai_flags = AI_PASSIVE; // fill in my IP for me
getaddrinfo(NULL, "3490", &hints, &res);
// make a socket:
sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
// bind it to the port we passed in to getaddrinfo():
bind(sockfd, res->ai_addr, res->ai_addrlen);

如果已经被绑定了

1
2
3
4
5
// lose the pesky "Address already in use" error message
if (setsockopt(listener,SOL_SOCKET,SO_REUSEADDR,&yes,sizeof yes) == -1) {
perror("setsockopt");
exit(1);
}

其实客户端连接不用关心 bind(), 你调用 connect() 它会检查 socket 是否 unbound, 需要的话 它会自行绑定到本地未使用的端口。

Connect

1
2
3
#include <sys/types.h>
#include <sys/socket.h>
int connect(int sockfd, struct sockaddr *serv_addr, int addrlen);

sockfd is our friendly neighborhood socket file descriptor, as returned by the socket() call, serv_addr is a struct sockaddr containing the destination port and IP address, and addrlen is the length in bytes of the server address structure.

建立一个socket 连接:

1
2
3
4
5
6
7
8
9
10
11
12
struct addrinfo hints, *res;
int sockfd;
// first, load up address structs with getaddrinfo():
memset(&hints, 0, sizeof hints);
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
getaddrinfo("www.example.com", "3490", &hints, &res);
// make a socket:
sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);

// connect!
connect(sockfd, res->ai_addr, res->ai_addrlen);

旧的写法会填充 sockaddr_ins.
确保 connect 返回的值不等于-1, 不然要去检查一下 errno. 别担心端口, 无需 bind, 内核会做这些事.

listen()

1
int listen(int sockfd, int backlog);

sockfd 就是 socket file descriptor. backlog 是所能允许的连接数. 进入的连接在等待, 大多数系统默认限制为20, 你可以设置它.
跟以往一样, 当listen() 返回 -1 时, 需要检查 errno.
显然, 在调用 listen() 时, 先需要 bind().

所以, 调用的顺序基本上是

1
2
3
4
5
getaddrinfo();
socket();
bind();
listen();
/* accept() goes here */

accept()

连接都在等待被 Accept(). 你调用 accept(), 然后等待准备的连接, 它将放回全新的一个 socket file descriptor. 原来的 socket 仍然在等待新的连接, 全新的 socket 可以使用 send() 和 recv().

1
2
3
4
#include <sys/types.h>
#include <sys/socket.h>

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

错误处理一如既往. addr 指向本地的 struct sockaddr_storage. addrlen 应该是 sizeof(struct sockaddr_storage).

至此, 总结一下上面的吧:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

#define MYPORT "3490" // the port users will be connecting to
#define BACKLOG 10 // how many pending connections queue will hold

int main(void)
{
struct sockaddr_storage their_addr;
socklen_t addr_size;
struct addrinfo hints, *res;
int sockfd, new_fd;

// !! don't forget your error checking for these calls !!

// first, load up address structs with getaddrinfo():

memset(&hints, 0, sizeof hints);
hints.ai_family = AF_UNSPEC; // use IPv4 or IPv6, whichever
hints.ai_socktype = SOCK_STREAM;
hints.ai_flags = AI_PASSIVE; // fill in my IP for me

getaddrinfo(NULL, MYPORT, &hints, &res);

// make a socket, bind it, and listen on it:

sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
bind(sockfd, res->ai_addr, res->ai_addrlen);
listen(sockfd, BACKLOG);

// now accept an incoming connection:

addr_size = sizeof their_addr;
new_fd = accept(sockfd, (struct sockaddr *)&their_addr, &addr_size);

// ready to communicate on socket descriptor new_fd!
.
.

new_fd 是新的, 如果你不想要再接受新的连接, 直接调用 close() 也行.

send() and recv()

TCP 用这两个.

1
int send(int sockfd, const void *msg, int len, int flags);

样例如:

1
2
3
4
5
6
7
8
9
char *msg = "Beej was here!";
int len, bytes_sent;
.
.
.
len = strlen(msg);
bytes_sent = send(sockfd, msg, len, 0);
.
.

send() 会返回实际上发送出去的数据大小. 如果一次性发送过大的数据量, 剩余的你要处理并继续发送. 如果整个数据包小于1KB, 可讷讷个不用担心这个问题.
错误处理-1.

recv() 有很多相似的地方.

1
int recv(int sockfd, void *buf, int len, int flags);

recv() 返回真实收到的字节数大小. 错误处理-1, 如果返回0, 说明对面关闭连接了.

sendto() 和 recvfrom() – datagram style

datagram socket 不需要连接远程主机。

1
2
int sendto(int sockfd, const void *msg, int len, unsigned int flags,
const struct sockaddr *to, socklen_t tolen);

跟 send() 相比, 这个函数就多两个参数。 sockaddr 包含了远程主机的地址和端口, tolen 其实就是 sockaddr 的长度.

sendto() 返回参数的意义和 send() 一样.

你可以用 getaddrinfo(), recvfrom(), 或者 你手工填充 *to 的数据.

1
2
int recvfrom(int sockfd, void *buf, int len, unsigned int flags,
struct sockaddr *from, int *fromlen);

多两个额外字段, 可见 sendto(). 返回结果如 recv 一致.

struct sockaddr_storage 可以让我们不要局限于 struct sockaddr_in 中. 通用结构 sockaddr_storage 足够容纳这两者. ( 为什么不能直接用 struct sockaddr 还要用 struct sockaddr_storage 去转成 struct sockaddr? 看起来很冗余. 就是因为 stcuct sockaddr 不能容纳, 所以他们做了一个新的.)

close() and shutdown()

1
close(sockfd);

它会阻止所有的读写在这个 socket file descriptor. 强行读写, 将发生错误.

1
int shutdown(int sockfd, int how);

这个函数拥有更细颗粒的控制度.

  • 0 Further receives are disallowed
  • 1 Further sends are disallowed
  • 2 Further sends and receives are disallowed (like close())

成功 返回 0 失败 返回 -1.

shutdown() 不能真正关闭 socket file desciptor, 只是改变了可用度. 为了释放它, 你需要使用 close(). (如果是 windows 平台, 那就使用 closesocket(). )

getpeername()

The function getpeername() will tell you who is at the other end of a connected stream socket

1
2
#include <sys/socket.h>
int getpeername(int sockfd, struct sockaddr *addr, int *addrlen);

sockfd 已经连接的sfd, addr 指向 sockaddr 的指针, addrlen -> sizeof *addr.
inet_ntop(), getnameinfo(), or gethostbyaddr() 可以得到更多信息.
The function returns -1 on error and sets errno accordingly.

gethostname()

It returns the name of the computer that your program is running on

1
2
#include <unistd.h>
int gethostname(char *hostname, size_t size);

hostname is a pointer to an array of chars that will contain the hostname upon the function’s return, and size is the length in bytes of the hostname array. The function returns 0 on successful completion, and -1 on error, setting errno as usual.

Client-Server Background

there will only be one server on a machine, and that server will handle multiple clients using
fork(). The basic routine is: server will wait for a connection, accept() it, and fork() a child process to
handle it. This is what our sample server does in the next section.

多进程处理连接。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
/*
** server.c -- a stream socket server demo
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <arpa/inet.h>
#include <sys/wait.h>
#include <signal.h>
#define PORT "3490" // the port users will be connecting to
#define BACKLOG 10 // how many pending connections queue will hold

void sigchld_handler(int s) {
// waitpid() might overwrite errno, so we save and restore it:
int saved_errno = errno;
while (waitpid(-1, NULL, WNOHANG) > 0);
errno = saved_errno;
}
// get sockaddr, IPv4 or IPv6:
void * get_in_addr(struct sockaddr * sa) {
if (sa -> sa_family == AF_INET) {
return &(((struct sockaddr_in * ) sa) -> sin_addr);
}
return &(((struct sockaddr_in6 * ) sa) -> sin6_addr);
}
int main(void) {
int sockfd, new_fd; // listen on sock_fd, new connection on new_fd
struct addrinfo hints, * servinfo, * p;
struct sockaddr_storage their_addr; // connector's address information
socklen_t sin_size;
struct sigaction sa;
int yes = 1;
char s[INET6_ADDRSTRLEN];
int rv;
memset( & hints, 0, sizeof hints);
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
hints.ai_flags = AI_PASSIVE; // use my IP
if ((rv = getaddrinfo(NULL, PORT, & hints, & servinfo)) != 0) {
fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(rv));
return 1;
}
// loop through all the results and bind to the first we can
for (p = servinfo; p != NULL; p = p -> ai_next) {
if ((sockfd = socket(p -> ai_family, p -> ai_socktype,
p -> ai_protocol)) == -1) {
perror("server: socket");
continue;
}
if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, & yes,
sizeof(int)) == -1) {
perror("setsockopt");
exit(1);
}
if (bind(sockfd, p -> ai_addr, p -> ai_addrlen) == -1) {
close(sockfd);
perror("server: bind");
continue;
}
break;
}
freeaddrinfo(servinfo); // all done with this structure
if (p == NULL) {
fprintf(stderr, "server: failed to bind\n");
exit(1);
}
if (listen(sockfd, BACKLOG) == -1) {
perror("listen");
exit(1);
}
sa.sa_handler = sigchld_handler; // reap all dead processes
sigemptyset( & sa.sa_mask);
sa.sa_flags = SA_RESTART;
if (sigaction(SIGCHLD, & sa, NULL) == -1) {
perror("sigaction");
exit(1);
}
printf("server: waiting for connections...\n");
while (1) { // main accept() loop
sin_size = sizeof their_addr;
new_fd = accept(sockfd, (struct sockaddr * ) & their_addr, & sin_size);
if (new_fd == -1) {
perror("accept");
continue;
}
inet_ntop(their_addr.ss_family,
get_in_addr((struct sockaddr * ) & their_addr),
s, sizeof s);
printf("server: got connection from %s\n", s);
if (!fork()) { // this is the child process
close(sockfd); // child doesn't need the listener
if (send(new_fd, "Hello, world!", 13, 0) == -1)
perror("send");
close(new_fd);
exit(0);
}
close(new_fd); // parent doesn't need this
}
return 0;
}

sigaction() 主要负责杀死没有连接的 fork() 的僵尸进程.

A Simple Stream Client

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
/*
** client.c -- a stream socket client demo
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <netdb.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#define PORT "3490" // the port client will be connecting to
#define MAXDATASIZE 100 // max number of bytes we can get at once
// get sockaddr, IPv4 or IPv6:
void * get_in_addr(struct sockaddr * sa) {
if (sa -> sa_family == AF_INET) {
return &(((struct sockaddr_in * ) sa) -> sin_addr);
}
return &(((struct sockaddr_in6 * ) sa) -> sin6_addr);
}
int main(int argc, char * argv[]) {
int sockfd, numbytes;
char buf[MAXDATASIZE];
struct addrinfo hints, * servinfo, * p;
int rv;
char s[INET6_ADDRSTRLEN];
if (argc != 2) {
fprintf(stderr, "usage: client hostname\n");
exit(1);
}
memset( & hints, 0, sizeof hints);
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
if ((rv = getaddrinfo(argv[1], PORT, & hints, & servinfo)) != 0) {
fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(rv));
return 1;
}
// loop through all the results and connect to the first we can
for (p = servinfo; p != NULL; p = p -> ai_next) {
if ((sockfd = socket(p -> ai_family, p -> ai_socktype,
p -> ai_protocol)) == -1) {
perror("client: socket");
continue;
}
if (connect(sockfd, p -> ai_addr, p -> ai_addrlen) == -1) {
close(sockfd);
perror("client: connect");
continue;
}
break;
}
if (p == NULL) {
fprintf(stderr, "client: failed to connect\n");
return 2;
}
inet_ntop(p -> ai_family, get_in_addr((struct sockaddr * ) p -> ai_addr),
s, sizeof s);
printf("client: connecting to %s\n", s);
freeaddrinfo(servinfo); // all done with this structure
if ((numbytes = recv(sockfd, buf, MAXDATASIZE - 1, 0)) == -1) {
perror("recv");
exit(1);
}
buf[numbytes] = '\0';
printf("client: received '%s'\n", buf);
close(sockfd);
return 0;
}

如果服务端没开启, 将返回 Connection refused.

Datagram Sockets

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
/*
** listener.c -- a datagram sockets "server" demo
*/

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>

#define MYPORT "4950" // the port users will be connecting to

#define MAXBUFLEN 100

// get sockaddr, IPv4 or IPv6:
void *get_in_addr(struct sockaddr *sa)
{
if (sa->sa_family == AF_INET) {
return &(((struct sockaddr_in*)sa)->sin_addr);
}

return &(((struct sockaddr_in6*)sa)->sin6_addr);
}

int main(void)
{
int sockfd;
struct addrinfo hints, *servinfo, *p;
int rv;
int numbytes;
struct sockaddr_storage their_addr;
char buf[MAXBUFLEN];
socklen_t addr_len;
char s[INET6_ADDRSTRLEN];

memset(&hints, 0, sizeof hints);
hints.ai_family = AF_UNSPEC; // set to AF_INET to force IPv4
hints.ai_socktype = SOCK_DGRAM;
hints.ai_flags = AI_PASSIVE; // use my IP

if ((rv = getaddrinfo(NULL, MYPORT, &hints, &servinfo)) != 0) {
fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(rv));
return 1;
}

// loop through all the results and bind to the first we can
for(p = servinfo; p != NULL; p = p->ai_next) {
if ((sockfd = socket(p->ai_family, p->ai_socktype,
p->ai_protocol)) == -1) {
perror("listener: socket");
continue;
}

if (bind(sockfd, p->ai_addr, p->ai_addrlen) == -1) {
close(sockfd);
perror("listener: bind");
continue;
}

break;
}

if (p == NULL) {
fprintf(stderr, "listener: failed to bind socket\n");
return 2;
}

freeaddrinfo(servinfo);

printf("listener: waiting to recvfrom...\n");

addr_len = sizeof their_addr;
if ((numbytes = recvfrom(sockfd, buf, MAXBUFLEN-1 , 0,
(struct sockaddr *)&their_addr, &addr_len)) == -1) {
perror("recvfrom");
exit(1);
}

printf("listener: got packet from %s\n",
inet_ntop(their_addr.ss_family,
get_in_addr((struct sockaddr *)&their_addr),
s, sizeof s));
printf("listener: packet is %d bytes long\n", numbytes);
buf[numbytes] = '\0';
printf("listener: packet contains \"%s\"\n", buf);

close(sockfd);

return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
/*
** talker.c -- a datagram "client" demo
*/

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>

#define SERVERPORT "4950" // the port users will be connecting to

int main(int argc, char *argv[])
{
int sockfd;
struct addrinfo hints, *servinfo, *p;
int rv;
int numbytes;

if (argc != 3) {
fprintf(stderr,"usage: talker hostname message\n");
exit(1);
}

memset(&hints, 0, sizeof hints);
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_DGRAM;

if ((rv = getaddrinfo(argv[1], SERVERPORT, &hints, &servinfo)) != 0) {
fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(rv));
return 1;
}

// loop through all the results and make a socket
for(p = servinfo; p != NULL; p = p->ai_next) {
if ((sockfd = socket(p->ai_family, p->ai_socktype,
p->ai_protocol)) == -1) {
perror("talker: socket");
continue;
}

break;
}

if (p == NULL) {
fprintf(stderr, "talker: failed to create socket\n");
return 2;
}

if ((numbytes = sendto(sockfd, argv[2], strlen(argv[2]), 0,
p->ai_addr, p->ai_addrlen)) == -1) {
perror("talker: sendto");
exit(1);
}

freeaddrinfo(servinfo);

printf("talker: sent %d bytes to %s\n", numbytes, argv[1]);
close(sockfd);

return 0;
}

Slightly Advanced Techniques

Blocking

recvfrom() 在等待数据的时候处于阻塞状态,许多函数调用都会如此,accept()会阻塞,recv()也会阻塞。当新创建一个 socket 时,内核默认阻塞。

1
2
3
4
5
#include <unistd.h>
#include <fcntl.h>

sockfd = socket(PF_INET, SOCK_STREAM, 0);
fcntl(sockfd, F_SETFL, O_NONBLOCK);

错误的话,返回-1,error 将被设置为 EAGAIN or EWOULDBLOCK。

select() 同步 I/O 复用

当非阻塞的情况下,一边 accept() 一边 recv(),将导致CPU时间混乱。 select() 可以同时掌控几个 socket,它虽然很便捷,但是速度最慢。考虑 libevent 来作为替代品吧,对系统调用 socket 进行了封装。

1
2
3
4
5
6
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>

int select(int numfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout);

该函数掌控 file descriptor 的集合:readfds, writefds 和 exceptfds. 如果你想要读标准输入和某一 socket , 就把 file descriptors 0 和 sockfd 加入到 readfds 的集合中. 参数 numfds 设置为 file descriptor 的最大值 + 1. 当 select() 返回的时候, readfds 对应了即将操作的 socket. 可以用 FD_ISSET() 宏进行测试.

一些宏函数

FD_SET(int fd, fd_set set); Add fd to the set.
FD_CLR(int fd, fd_set
set); Remove fd from the set.
FD_ISSET(int fd, fd_set set); Return true if fd is in the set.
FD_ZERO(fd_set
set); Clear all entries from the set.

struct timeval 相当于超时设置.

1
2
3
4
struct timeval {
int tv_sec; // seconds
int tv_usec; // microseconds
};

如果将struct timeval中的字段设置为0,则select()立即超时,有效地轮询集合中的所有文件描述符。 如果将参数timeout设置为
NULL,它将等到第一个文件描述符准备就绪. 最后,如果你不在乎关于等待某个集合,你可以在调用select()时将其设置为NULL.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/*
** select.c -- a select() demo
*/
#include <stdio.h>
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>
#define STDIN 0 // file descriptor for standard input
int main(void)
{
struct timeval tv;
fd_set readfds;
tv.tv_sec = 2;
tv.tv_usec = 500000;
FD_ZERO(&readfds);
FD_SET(STDIN, &readfds);
// don't care about writefds and exceptfds:
select(STDIN+1, &readfds, NULL, NULL, &tv);
if (FD_ISSET(STDIN, &readfds))
printf("A key was pressed!\n");
else
printf("Timed out.\n");
return 0;
}

TypeScript and Babel 7

来源

过去的一年里,Babel 用户设置 TypeScript 太困难了。

虽然 TypeScript 能编译成浏览器兼容还能使用最新特性,但是如果能不切换Babel的话也能获得静态检查的优点就更好了。

现在 Babel 7 提供了这项支持。

怎么用

1
npm install --save-dev @babel/preset-typescript
1
npm install --save-dev @babel/preset-typescript @babel/preset-env @babel/plugin-proposal-class-properties @babel/plugin-proposal-object-rest-spread

.babelrc 确保是对的

1
2
3
4
5
6
7
8
9
10
{
"presets": [
"@babel/env",
"@babel/preset-typescript"
],
"plugins": [
"@babel/proposal-class-properties",
"@babel/proposal-object-rest-spread"
]
}

如果用 @babel/cli 这样就完事了。

1
babel ./src --out-dir lib --extensions ".ts,.tsx"

别忘记 tsconfig.json

这意味什么

虽然 Bebal 能擦除类型、重写功能使其在旧浏览器运行,但是没有静态检查。就算构建成功了,也容易抛出错误。tsc 工具能提供捕获类型错误的好处。

如果你已经在用 TypeScript ,这确实没什么大不了的。如果你在用 Babel,这能为你提供 TypeScript 的好处。

Caveats

忽略

Next?

致谢… 提升了解析器的速度…
广纳意见。

JavaScript engine fundamentals: optimizing prototypes

来源.

Based on that AST, the interpreter can start to do its thing and produce bytecode.

为了更快,将把字节码发送至优化编译器。基于 profilling data 产生高度优化的机器代码。如果某时刻发生错误,优化编译器将中断返回给解释器。

JavaScript引擎中的解释器/编译器管道

interpreter->bytecode->optimizing compiler(with profilling data)->optimized code

Chrome

V8 引擎基本一致。V8的解释器叫 Ignition ,负责生产和执行字节码。当它执行字节码的时候会在收集能在之后提升性能的 profilling data。当函数使用频繁,生产的字节码和 profilling data 会传入 TurboFan ,去生产高度优化的机器码。

Firefox

interpreter->bytecode
Baseline->somewhat optimized code
lonMonkey->optimized code

SpiderMonkey 引擎以及 SpiderNode,他们有两个优化编译器。Baseline compiler 生成一些优化代码,运行的时候获取 profilling data ,LonMonkey 生成高度优化代码。如果 the speculative optimization 失败,则回到 Baseline code 中。

Edge

interpreter->bytecode
SimpleJIT->somewhat optimized code
FullJIT->optimized code

微软 Edge 用的是 Chakra 以及 node-chakracore,跟两个优化编译器的步骤相似。解释器后进入 SimpleJIT 最后进入 FullJIT。

Safari

LLInt->bytecode
Baseline->somewhat optimized code
DFG(Data Flow Graph)->mostly optimized code
FTL(Faster Then Light)->optimized code

Apple Safari 用的是 JavaScriptCore ,用了三个不同的优化编译器。LLInt, the Low-Level Interpreter->the DFG (Data Flow Graph) compiler->the FTL (Faster Than Light) compiler。

总结

为什么各个引擎优化编译器不同?这是权衡。选择快速生成的低效的字节码还是花更多的时间运行高效的机器码,这是一个问题。有些引擎增加复杂度有更细颗粒度的去掌控这个问题。他们都有共同的结构:一个解析器、解释器/编译器管道。

JavaScript 对象模型

规范

Object.getOwnPropertyDescriptor 获取对象描述符属性。Arrays are limited to 2³²−1 items in JavaScript。

Shapes

With that in mind, JavaScript engines can optimize object property access based on the object’s shape.

1
2
3
const object1 = { x: 1, y: 2 };
const object2 = { x: 3, y: 4 };
// `object1` and `object2` have the same shape.

为了节省内存,一份对象由 Shape 和 JSObject 组成。 Shape 由对象属性名及偏移量组成。JSObject 持有对象属性值。当有很多 Shape 一致的对象,JSObject 对应同一份 Shape。这样的好处很明显,不管有多少份对象,只要他们的 Shape 一样,那么我们只需要储存 Shape 一次。

在不同引擎里的叫法

  • Academic papers call them Hidden Classes (confusing w.r.t. JavaScript classes)
  • V8 calls them Maps (confusing w.r.t. JavaScript Maps)
  • Chakra calls them Types (confusing w.r.t. JavaScript’s dynamic types and typeof)
  • JavaScriptCore calls them Structures
  • SpiderMonkey calls them Shapes

Transition chains and trees

Shape 继承空的 Shape ,当你不停在同父 Shape 进行修改时,Shape 的变化是 transition chain,但有分叉情况出现时,Shape 的变化是 transition tree。

This optimization shortens the transition chains and makes it more efficient to construct objects from literals.
尽可能缩短 transition chains。为了加快搜索属性, JavaScript 引擎添加了一个 ShapeTable 数据结构。此 ShapeTable 是一个字典,将属性键映射到引入给定属性的相应形状。
如果都启用字典查询了,我还担心什么? Shape 启用了 Inline Caches。

Inline Caches (ICs)

Shape 也借鉴了 ICs 的想法。它减少了很多昂贵开销的查找,使得寻找对象属性更快。
本质上就是 IC 会储存一次查找的值和偏移,如果下一次是同样的 Shape ,可以命中缓存,只需加载记忆偏移量的值则返回结果。

Storing arrays efficiently

数组里的索引,索引里储存的值都是可写可遍历的,所以不会有多份属性描述符。
如果你用 Object.defineProperty 定义数组的某值且改变属性描述符,将会把数组进入低效模式。

总结

  • 始终以相同的方式初始化对象,以确保它们不会走向不同的 shape 方向。
  • 不要混淆数组元素的属性特性(property attributes),以确保可以高效地存储和操作它们。

优化层级与执行效率的取舍

基于成本的考虑(内存与时间), Hot 的代码会进入优化编译器。

perf_notes

来源 Angular Core Reader3 中的 perf-notes

通用

每个 Array 开销为 70 字节,由数组和 (array) 对象组成。

  • 显式定义的数组 item 每个 32 字节。
  • (array) 虚拟机对象是 38 字节。

每个对象开销为 24 字节 加 每个属性 8 字节。

直接储存数据作为列表来使用是比小数组的更有性能效率。
速度比较结果

Monomorphic vs Megamorphic code

Great reads:

1) Monomorphic prop access is 100 times faster than megamorphic.
2) Monomorphic call is 4 times faster the megamorphic call.

See benchmark here.

总结 静态定义比灵活定义的性能好,有几倍的差距。

Exporting top level variables

在可能的情况下,应尽可能避免导出顶级变量
和代码大小很重要:

1
2
3
4
5
6
7
8
9
10
11
// Typescript
export let exported = 0;
let notExported = 0;

notExported = exported;

// Would be compiled to
exports.exported = 0;
var notExported = 0;

notExported = exports.exported;

大多数压缩工具都不会重命名属性(除了闭包内的)
所以可以这样写:

1
2
3
4
let exported = 0;

export function getExported() { return exported; }
export function setExported(v) { exported = v; }

Also writing to a property of exports might change its hidden class resulting in megamorphic access.

另外, exports 的属性写入(不易察觉)导致了 megamorphic 的访问,速度下降。

Iterating over Keys of an Object.

https://jsperf.com/object-keys-vs-for-in-with-closure/3 明确指出 Object.keys 是最快遍历对象的方式。

1
2
3
for (var i = 0, keys = Object.keys(obj); i < keys.length; i++) {
const key = keys[i];
}

Recursive functions

尽可能避免使用递归函数,因为它们无法内联。

总结 递归性能会下降。

Loops

Don’t use foreach, it can cause megamorphic function calls (depending on the browser) and function allocations.
不要使用foreach,它可能会调用 megamorphic 函数(取决于浏览器)和 函数分配。

总结 foreach 比 for 慢。

It is a lot slower than regular for loops

Redux vs MobX

两者都可以跟React使用。选择他们各自的好处是什么?

单储存 vs 多储存

你的数据存在store中。
Redux用一个超大store保存状态(Redux 也可以使用多store),MobX用了多个store,所以MobX你可以分离你的store。
Redux的数据更规范。MobX可以有不规范的数据。

纯数据 vs 可观察数据

Redux用原生结构储存数据,MobX用observable储存数据。
这意味着Redux你不得不追踪所有的数据,而MobX只监听发生变化的数据。

不可变 vs 可变 (纯 vs 不纯)

Redux用不可变的状态,意味着所有状态只读且无法直接重写。在Redux,之前的状态会被新状态取代。结果而言,Redux是纯函数。这导致回退到之前的状态是困难的。
MobX状态可以被重写,你可以简单用新值更新状态。MobX可以说是不纯的。

掌握难度

MobX更容易学和驾驭,尤其是大多数传统Javascript 开发者熟悉OOP的话。MobX做的抽象让你不需要担心太多事情。Redux遵循了函数式编程的范例,对函数式编程没有经验的Javascript开发者,很难完全掌握Redux。可能需要学习Redux Thunk这类中间件,上升学习难度。

调试

由于这些抽象,让调试麻烦一点。现有的MobX调试工具都差不多,有时候结果都不可预测。
而Redux提供的调试工具还包括了时间旅行,这种纯函数加上更少的抽象,让调试比MobX有更好的体验,遵循流式编程让Redux更可靠。

可扩展性和可维护性

由于Redux有上述原因,Redux更可靠些。

支持度

Redux 社区力量更大些。

How to use IndexedDB to build Progressive Web Apps

阅读原文,并写下相关要点。

你需要有在Service Workers 中的 IndexedDB 的实现的 Service Workers 相关概念及知识。

开始

单页应用基本需要 web service 装在数据。数据通过控制器注入 DOM 中。大多数前端框架都这么做。

你可以缓存静态网页和资源文件,尤其是浏览的数据,但没有网络的时候,用户可以从当地数据库缓存中进行读取。

IndexedDB 是被废弃的 Web SQL 数据库的代替品。一个键值对的 NoSQL 且支持超大储存(上浮20-50%硬盘空间)的数据库。支持数据类型 number, string, JSON, blob 等等。

IndexedDB遵守同源策略。这意味着不同应用不能相互访问数据库。这是事件驱动,具有事务性,原子操作,API 异步的数据库。被所有主流浏览器支持,不支持的以后也会支持。非常适合储存离线数据。

IndexedDB 2.0 是一个值得一看的。虽然所有浏览器都不支持。

我用几个小 API 吧,文档

介绍

IndexedDB 由多个 object store 组成。像 MySQL 的表、 MongoDB 的集合。每个储存都有多个对象。像是 MySQL 表里的行。

object store 可以记录多行,是键值对型的。每行靠 key 上升排序。数据库里只能有一个独一无二的 object store 名字。
一个数据库创建的时候,版本为1。 数据库不能有多版本。

可以被多个客户端连接,读写操作具有事务性,

key 的类型:string, date, float, a binary blob, or an array。

value 跟 JavaScript 表达的一样:boolean, number, string, date, object, array, regexp, undefined and null。

用 IndexedDB 的基本操作:

  • 打开数据库。
  • 创建一个 object store。
  • 事务性执行数据库操作,如添加或检索数据。
  • 等操作完成,通过监听事件。
  • 用读出结果坐点东西。

window 和 service worker 的全局作用域里都可以使用。
检查一下支持不,window.indexedDB or self.IndexedDB。

1
2
3
if(self.IndexedDB){
console.log('IndexedDB is supported');
}

.open(dbName, versionInt) 打开数据库。传入名字和版本号。

如果数据库不存在,就创建一个。如果版本号高,数据库会用新版本,不用旧数据。
事件驱动,你懂吧。open 方法触发 success 、 error 、 upgradeneeded。像我一样:

1
2
3
4
5
6
7
8
9
var request = self.IndexedDB.open('EXAMPLE_DB', 1);
var db;
request.onsuccess = function(event) {
console.log('[onsuccess]', request.result);
db = event.target.result; // === request.result
};
request.onerror = function(event) {
console.log('[onerror]', request.error);
};

在回调里,event.target === request。request.result 就是结果。

onerror 里记得处理错误。

开发者工具里, Application > IndexedDB 里可以看情况。

打开新数据库或者新版本的时候,onupgradeneeded 会被触发。

1
2
3
request.onupgradeneeded = function(event) {
// create object store from db or event.target.result
};

createObjectStore() 创建 object store 。

1
db.createObjectStore(storeName, options);

storeName 也是唯一性。options 有两种属性。options.keyPath 是字段名称。(类似 Key

1
2
3
4
request.onupgradeneeded = function(event) {
var db = event.target.result;
var store = db.createObjectStore('products', {keyPath: 'id'});
};

作用如图:

如果需要 Key 自增,使用 options.autoIncrement = true
各选项组合效果如下:

IndexedDB 具有索引。Indexes 且可具有唯一约束。

1
store.createIndex(indexName, keyPath, options);

indexName 索引名称, keyPath 在哪个 Key 上建立。options 可选。可配置 {unique: true}
如上配置,则无法添加重复Key值得条目。

1
2
3
4
5
6
request.onupgradeneeded = function(event) {
var db = event.target.result;
var store = db.createObjectStore('products', {keyPath: 'id'});
// create unique index on keyPath === 'id'
store.createIndex('products_id_unqiue, 'id', {unique: true});
};

一旦 onupgradeneeded 事件完成, success 事件被触发。

1
2
var transaction = db.transaction(storeName, mode);
var transaction = db.transaction(storeNamesArray, mode);

根据 store 返回操作权限数量。

mode 可选 参数为 readonly readwrite versionchange。

1
var objectStore = transaction.objectStore(storeName);

操作成功后,complete 事件被触发。
abort() 可以回滚事务。

关于事务

objectStore 是 IDBObjectStore 接口实例,提供了 get, add, clear, count, put, delete 等操作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
var request = self.IndexedDB.open('EXAMPLE_DB', 1);
request.onsuccess = function(event) {
// some sample products data
var products = [
{id: 1, name: 'Red Men T-Shirt', price: '$3.99'},
{id: 2, name: 'Pink Women Shorts', price: '$5.99'},
{id: 3, name: 'Nike white Shoes', price: '$300'}
];
// get database from event
var db = event.target.result;
// create transaction from database
var transaction = db.transaction('products', 'readwrite');
// add success event handleer for transaction
// you should also add onerror, onabort event handlers
transaction.onsuccess = function(event) {
console.log('[Transaction] ALL DONE!');
};

// get store from transaction
// returns IDBObjectStore instance
var productsStore = transaction.objectStore('products');
// put products data in productsStore
products.forEach(function(product){
var db_op_req = productsStore.add(product); // IDBRequest
});
};

for the sake of simplicity === 简单起见

1
2
3
4
5
6
7
var db_op_req = productsStore.add(product);
db_op_req.onsuccess = function(event) {
console.log(event.target.result == product.id); // true
};
db_op_req.onerror = function(event) {
// handle error
};

CRUD:

  • objectStore.add(data)
  • objectStore.get(key)
  • objectStore.getAll()
  • objectStore.count(key?)
  • objectStore.getAllKeys()
  • objectStore.put(data, key?)
  • objectStore.delete(key)
  • objectStore.clear()

详情见 IDBObjectStore

close 可关闭数据库连接。除非事务都完成,数据库才关闭。但提前关闭,则不会有新的事务产生。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
var request = self.indexedDB.open('EXAMPLE_DB', 1);

request.onsuccess = function(event) {
// some sample products data
var products = [
{id: 1, name: 'Red Men T-Shirt', price: '$3.99'},
{id: 2, name: 'Pink Women Shorts', price: '$5.99'},
{id: 3, name: 'Nike white Shoes', price: '$300'}
];

// get database from event
var db = event.target.result;

// create transaction from database
var transaction = db.transaction('products', 'readwrite');

// add success event handleer for transaction
// you should also add onerror, onabort event handlers
transaction.onsuccess = function(event) {
console.log('[Transaction] ALL DONE!');
};

// get store from transaction
var productsStore = transaction.objectStore('products');

/*************************************/

// put products data in productsStore
products.forEach(function(product){
var db_op_req = productsStore.add(product);

db_op_req.onsuccess = function(event) {
console.log(event.target.result == product.id); // true
}
});

// count number of objects in store
productsStore.count().onsuccess = function(event) {
console.log('[Transaction - COUNT] number of products in store', event.target.result);
};

// get product with id 1
productsStore.get(1).onsuccess = function(event) {
console.log('[Transaction - GET] product with id 1', event.target.result);
};

// update product with id 1
products[0].name = 'Blue Men T-shirt';
productsStore.put(products[0]).onsuccess = function(event) {
console.log('[Transaction - PUT] product with id 1', event.target.result);
};

// delete product with id 2
productsStore.delete(2).onsuccess = function(event) {
console.log('[Transaction - DELETE] deleted with id 2');
};
};

request.onerror = function(event) {
console.log('[onerror]', request.error);
};

request.onupgradeneeded = function(event) {
var db = event.target.result;
var productsStore = db.createObjectStore('products', {keyPath: 'id'});
};

输出:

检查结果:

理解更多 IndexedDB 概念,如游标,数据库迁移,数据库版本控制。

学习如何正确使用是一个大问题。遵从下面的建议去让你更好使用离线数据库:

  • service worker 的 install 事件中去缓存静态文件。
  • 在 service 的 activate 事件中初始化数据库比 worker 的 install 事件好。新数据库与 old service worker 混合使用可能会产生冲突。用 keyPath 作为 URL 储存点。
  • 无论在线或离线,你的 app 会使用缓存文件。但获取数据的请求依然会发出。
  • 当线上请求出现错误的时候,设计一个 offline-api 的地址,进入 service worker 获取数据库数据。
  • 请求以 /offline-api 开头,那么使用等于请求 keyPath 从数据库提取数据, 在 application/json 之类的响应上设置适当的头并将响应返回给浏览器。 你可以使用 Response 构造函数。

用上面的方法,可以构造一个完全离线使用的应用。作者准备写一系列的文章去细讲 Progress Web Apps。

这篇文章可能漏讲一些东西,去仔细审阅文档吧。

与缓存API不同,IndexedDB API是事件驱动的,而不是基于 Promise 的。使用一些indexeddb包装库,它允许你编写基于promise的代码。

  • localForage (~8KB, promises, good legacy browser support)
  • IDB-keyval (500 byte alternative to localForage, for modern browsers)
  • IDB-promised (~2k, same IndexedDB API, but with promises)
  • Dexie (~16KB, promises, complex queries, secondary indices)
  • PouchDB (~45KB (supports custom builds), synchronization)
  • Lovefield (relational)
  • LokiJS (in-memory)
  • ydn-db (dexie-like, works with WebSQL)

Understanding TypeScript’s type notation

这篇博文快速介绍 TypeScript 静态类型的标注。

你将学习什么

阅读本文后,您应该能够理解以下代码的含义。

1
2
3
4
5
6
7
interface Array<T> {
concat(...items: Array<T[] | T>): T[];
reduce<U>(
callback: (state: U, element: T, index: number, array: T[]) => U,
firstState?: U): U;
···
}

如果你认为这很神秘,那么我同意你的看法。 但是(正如我希望证明的),这个标注相对来说容易学习。 一旦你理解了它,它会告诉你,一个直接,精确和全面的总结这些代码行为。 无需阅读英文长篇描述。

尝试一下代码示例

TypeScript 有一个在线编译。 为了获得最全面的检查,应该打开 “option” 菜单中的所有内容。 这相当于在 –strict 模式下运行TypeScript编译器。

设置全部的类型检查

我总是使用 TypeScript 以最全面的设置 –strict。 没有它,程序的编写会变得相对容易些,但是你也会失去静态类型检查的许多好处。 目前,此设置启用以下子设置:

  • –noImplicitAny: 如果 TypeScript 无法推断出某种类型,说明你没有启用这个选项。 这主要适用于函数和方法的参数:使用此设置,你就必须标注它们的类型。
  • –noImplicitThis: 如果这个类型不明确,提示警告。
  • –alwaysStrict: 尽可能使用 JavaScript 严格模式。
  • –strictNullChecks: null 不是任何类型的一部分(除了它自己的类型,null),并且如果它是可接受的值,则必须明确给值。
  • –strictFunctionTypes: 强化函数类型检查。
  • –strictPropertyInitialization: 如果某个属性的值未定义,那么它必须在构造函数中初始化。

更多信息:TypeScript手册中的“编译器选项”一章。

类型

在这篇博文中,类型是一组值。 JavaScript 语言(不是TypeScript!)有7种类型:

  • Undefined: 元素为未定义的集合。
  • Null: 元素为 null 的集合。
  • Boolean: 元素为 false 或 true 的集合。
  • Number: 元素为数字的集合。
  • String: 元素为字符串的集合。
  • Symbol: 元素为 symbols 的集合。
  • Object: 元素为对象(包括函数和数组)的集合.

所有这些类型都是动态的:您可以在运行时使用它们。

TypeScript 在 JavaScript 基础上增加了一层:静态类型。它只存在于编译的时候或者源代码类型检查的时候。每一个有静态类型的储存(变量或者属性)地方都可以预知它的值。类型检查确保实现类型推断,不用运行代码就能进行静态类型检查。举个例子,如果函数f(x)的参数x具有静态数字类型,则函数调用f(’abc’)是非法的,因为参数’abc’是错误的静态类型参数。

类型标注

类型标注在变量的冒号后面。冒号后面的静态类型标注描述了这个变量可以有什么值。下面一个例子表示这个变量只能储存数字类型。

1
let x: number;

你可能想知道如果 x 没有被初始化的时候是不是可以通过静态类型检查。对于这个问题, TypeScript 在给它赋值之前不会让你读取 x 。

类型推断

在 TypeScript 中,即使所有变量你都写静态类型,但你不需要全部明确写它的静态类型。 TypeScript 经常可以推断出它。 例如,如果你写

1
let x = 123;

然后TypeScript推断x具有数字静态类型。

描述类型

类型表达式在类型标注的冒号之后出现。 这些范围从简单到复杂,创建如下

合法的类型表达式有基本类型:

  • JavaScript 动态类型的静态类型:undefined,null,boolean,number,string,symbol,object
  • TypeScript 特定类型:any(所有值的类型)等

注意,“未定义为值”和“未定义为类型”都被定义为未定义。 根据你使用它的地方,它作为一个值或者一个类型。 null也是一样。

可以通过类型运算符组合基本类型来创建更多类型表达式,类型运算符与运算符 union(∪) 和 intersection(∩) 组合类似。

接下来解释 TypeScript 提供的一些类型运算符。

数组作为列表

有两种方法可以表示 Array arr 其元素都是数字的列表:

1
2
let arr: number[] = [];
let arr: Array<number> = [];

通常情况下,如果有给值,TypeScript可以推断变量的类型。 在这种情况下,你实际上必须明确标注类型,因为对于空数组,它无法确定元素的类型。

稍后,讲解尖括号表示法(Array)。

数组作为元组

如果您在数组中存储两个点,那么你将该数组用作元组。 这看起来如下:

1
let point: [number, number] = [7, 5];

在这种情况下,不需要类型注释。
元组的另一个例子是 Object.entries(obj) 的结果:对于 obj 的每个属性都有一个 [key,value] 对的数组。

1
2
> Object.entries({a:1, b:2})
[ [ 'a', 1 ], [ 'b', 2 ] ]

Object.entries() 的类型是:

1
Array<[string, any]>

函数类型

这是一个函数类型的例子:

1
(num: number) => string

该类型接受单个数字类型参数,和返回字符串类型的值。 让我们在类型注释中使用这个类型(字符串在这里用作函数):

1
const func: (num: number) => string = String;

同样,我们通常不会在这里使用类型注释,因为 TypeScript 知道 String 的类型,因此可以推断出 func 的类型。
下面的代码是一个更实用的例子:

1
2
3
function stringify123(callback: (num: number) => string) {
return callback(123);
}

我们使用函数类型来描述 stringify123() 的回调函数。 由于此类型注释,TypeScript 拒绝以下函数调用。

1
f(String);

但它接受以下函数调用:

1
f(Number);

(原文在这可能顺序写反了)

函数结果类型声明

标注函数的所有参数类型是一种很好的实践。 你也可以指定结果类型(但 TypeScript 很适合推断它):

1
2
3
4
function stringify123(callback: (num: number) => string): string {
const num = 123;
return callback(num);
}

特殊结果类型 void

void 是函数结果的特殊类型:它告诉 TypeScript 函数总是返回 undefined (显式或隐式):

1
2
3
function f1(): void { return undefined } // OK
function f2(): void { } // OK
function f3(): void { return 'abc' } // error

可选参数

标识符后面的问号表示该参数是可选的。 例如:

1
2
3
4
5
6
7
function stringify123(callback?: (num: number) => string) {
const num = 123;
if (callback) {
return callback(num); // (A)
}
return String(num);
}

如果您在 –strict 模式下运行 TypeScript ,则会在检查回调没有被省略的情况下让你在 A 行中进行函数调用。

参数默认值

TypeScript 支持 ES6参数默认值

1
2
3
function createPoint(x=0, y=0) {
return [x, y];
}

默认值使参数可选。 通常可以省略类型注释,因为 TypeScript 可以推断类型。 例如,它可以推断x和y都具有数字类型。

如果你想添加类型注释,那看起来如下。

1
2
3
function createPoint(x:number = 0, y:number = 0) {
return [x, y];
}

剩余类型

您还可以使用 ES6 rest 运算符来处理 TypeScript 参数定义。 相应参数的类型必须是Array:

1
2
3
4
function joinNumbers(...nums: number[]): string {
return nums.join('-');
}
joinNumbers(1, 2, 3); // '1-2-3'

联合类型

在JavaScript中,变量常常是几种类型之一。 要描述这些变量,你可以使用联合类型。
例如,在以下代码中,x 的类型为 null 或类型 number :

1
2
let x = null;
x = 123;

x 的类型可以被描述为 null | number :

1
2
let x: null|number = null;
x = 123;

类型表达式 s | t 的结果是 类型s 和 t 的集合论联合(正如我们前面所看到的那样,这两个集合)。

让我们重写函数 stringify123() :这一次,我们不希望参数回调是可选的。 应该总是传递该参数。 如果调用者不想提供函数,他们必须显式传递null。 这是实现如下:

1
2
3
4
5
6
7
8
function stringify123(
callback: null | ((num: number) => string)) {
const num = 123;
if (callback) { // (A)
return callback(123); // (B)
}
return String(num);
}

注意,实际上我们必须检查回调是否是一个函数(A行),然后才能在B行进行函数调用。如果没有检查,TypeScript将报告错误。

? 与 undefined| T

类型T的可选参数? 和类型为 undefined | T 的参数非常相似。 (顺便说一句,对于可选属性也是如此。)

主要区别是可以省略可选参数:

1
2
3
4
function f1(x?: number) { }
f1(); // OK
f1(undefined); // OK
f1(123); // OK

但是你不能省略类型为 undefined| T 的参数:

1
2
3
4
function f1(x?: number) { }
f1(); // OK
f1(undefined); // OK
f1(123); // OK

在类型中,通常不包含 null 和 undefined 值

在许多编程语言中, null 是所有类型的一部分。
例如,在 java 中,只要参数的类型是 String ,就可以传递 null 并且 Java 不会报错

相比之下,在 TypeScript 中,undefined 和 null 由不同的类型处理。 如果你想跟上述一样,你需要一个类型联合,比如 undefined | number 和 null | number 。

对象类型

与数组类似,对象在 JavaScript 中扮演两个角色( 偶尔混合 和 或更动态 ):

  • 记录:开发时已知的固定数量的属性。 每个属性可以有不同的类型。
  • 字典:在开发时不知道名称的任意数量的属性。 所有属性键(字符串和/或符号)具有相同的类型,属性值也是如此。

我们将在此文中忽略对象作为词典。 另外,无论如何,Maps 对于字典来说通常是更好的选择。

通过 interfaces 描述对象类型作为记录

interfaces 描述对象类型。 例如:

1
2
3
4
interface Point {
x: number;
y: number;
}

TypeScript的类型系统的一大优点是它是结构化的,而不是字面化。 也就是说, interface 匹配具有适当结构的所有对象:

1
2
3
4
function pointToString(p: Point) {
return `(${p.x}, ${p.y})`;
}
pointToString({x: 5, y: 7}); // '(5, 7)'

相比之下,Java的类型系统需要类来实现接口。

可选属性

如果一个属性可以被省略,你在变量后面加一个问号:

1
2
3
4
interface Person {
name: string;
company?: string;
}

函数

Interfaces 也可包含函数:

1
2
3
4
5
interface Point {
x: number;
y: number;
distance(other: Point): number;
}

类型变量和泛型类型

使用静态类型,有两个级别:

  • 对象级别
  • 元类型级别

相似于

  • 对象储存正常变量
  • 描述类型的变量,也是值类型的变量

正常变量通过 const,let 等引入。
类型变量通过尖括号(<>)引入。 例如,以下代码包含通过引入的类型变量T.

1
2
3
4
interface Stack<T> {
push(x: T): void;
pop(): T;
}

你可以看到类型 参数T 在 Stack中 出现两次。 所以这个界面可以直观地理解如下:

  • Stack 是一堆所有具有给定 类型T 的值。每当提到 Stack 时,您必须填写T. 接下来我们将看到如何。
  • push 函数 接受 T类型参数
  • pop 函数 返回 T类型值

如果你使用 Stack ,你就需要给定 T的值。
以下代码显示了一个虚接口 Stack ,其唯一目的是匹配接口。

1
2
3
4
const dummyStack: Stack<number> = {
push(x: number) {},
pop() { return 123 },
};

示例:Maps

TypeScript 定义 Map 类型。例如:

1
2
3
4
const myMap: Map<boolean,string> = new Map([
[false, 'no'],
[true, 'yes'],
]);

泛型函数

函数(和方法)也可以引入类型变量:

1
2
3
function id<T>(x: T): T {
return x;
}

使用如下

1
id<number>(123);

由于类型推理,可以省略类型参数

1
id(123);

类型参数的传递

1
2
3
function fillArray<T>(len: number, elem: T) {
return new Array<T>(len).fill(elem);
}

不必显式指定 Array 的类型T - 它是从参数 elem 推断出来的:

1
2
const arr = fillArray(3, '*');
// Inferred type: string[]

结论

让我们用我们所学的知识来理解我们之前看到的那段代码:

1
2
3
4
5
6
7
interface Array<T> {
concat(...items: Array<T[] | T>): T[];
reduce<U>(
callback: (state: U, element: T, index: number, array: T[]) => U,
firstState?: U): U;
···
}

这是一个数组的 interface ,其元素的类型为 T ,我们必须在使用此 interface 时填写它们:

方法 .concat() 有零个或更多参数(通过剩余参数操作符)。他们中的每个函数有类型 T[] 或 T 。因此,它可能是类型T
的数组或者是单个 T 的值。

方法 .reduce() 引进了它自己的类型变量,U 。 U 表示以下实体全都具有相同类型(不需要指定,它会自动推断):

  • 回调函数里的 state 参数
  • 回调函数里的结果
  • 方法 .reduce() 的可选参数 firstState
  • .reduce() 的结果

回调函数也获取一个参数 element ,其类型与数组元素的 类型T 相同,参数 index 是数字,参数 array 是 T值 。

进一步阅读

  • 书 (在线免费阅读): “Exploring ES6
  • ECMAScript Language Types” 在 ECMAScript 规范.
  • TypeScript Handbook”: 一本写的很好解释了 TypeScript 支持的多种类型及类型操作。
  • TypeScript 仓库里有完整 ECMAScript 标准库的类型定义。练习类型标注的简单方法是阅读它们。

本文翻译自原文 - Understanding TypeScript’s type notation

用可中断的 Fetch 取消请求

在用户交互或者最新输入的时候,你常常需要在一个 web 应用中频繁发起请求。比如说在输入文字的时候的自动完成操作或者是在地图上放大缩小的操作。让我们花一点时间去思考一下这些例子。首先是自动完成,每一次我们输入的时候(或者更少在用 debounce 时)我们就会发出一个请求。如果用户输入改变的时候,旧的请求就会与现在无关,当我们持续打字输入的时候(比如 “java” 和 “javascript” )。也许有很多冗余请求,在我们拿到我们需要的东西的时候。

然后是地图的例子。我们在缩放查看地图的时候,我们对于之前缩放级别的图块不再感兴趣。与此同时,为了冗余数据,许多请求还处于等待。

带着我们第一个例子,看看实现一个自动完成场景的原生代码。为了本文的目的,我们将使用更现代的 fetch 而不是 XMLHttpRequest 来提出网络请求。 代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
autocompleteInput.addEventListener('keydown', function() {

const url = "https://api.example.com/autocomplete"

fetch(url)
.then((response) => {
// Do something with the response
updateAutocompleteMenu()
})
.catch((error) => {
// Something went wrong
handleAutocompleteError(error);
})

});

这个例子的问题就是每一个请求都要完成,即使它不再相关。我们可以实现一个额外的逻辑在 updateAutocompeleteMenu 中去防止额外的代码,但是却不能真正停止这个请求。值得注意的是,浏览器有同时请求的限制,这意味着一旦限制被触发时,请求将队列化(而且每个浏览器限制又不同)。

Abortable Fetch

我们可以用来解决上述问题的新浏览器技术是 Abortable Fetch 。 Abortable Fetch 依赖浏览器的规范 AbortController 。这个控制器有 signal 属性,我们用它传递,在之后的时机中用控制器中断方法去取消请求。

一个例子如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
autocompleteInput.addEventListener('keydown', function() {
const url = "https://api.example.com/autocomplete"
let controller;
let signal;

autocompleteInput.addEventListener('keyup', () => {

if (controller !== undefined) {
// Cancel the previous request
controller.abort();
}

// Feature detect
if ("AbortController" in window) {
controller = new AbortController;
signal = controller.signal;
}

// Pass the signal to the fetch request
fetch(url, {signal})
.then((response) => {
// Do something with the response
updateAutocompleteMenu()
})
.catch((error) => {
// Something went wrong
handleAutocompleteError(error);
})
});

});

我们进行特性检测,以确定我们是否可以使用 AbortController (它在 Edge,Firefox,Opera 和 Chrome 66 中支持!)。我们确认是否有一个控制器已经创建,如果是,我们调用 controller.abort() 去取消之前的请求。你也可以一次用相同的 signal 取消多个 fetches 。

一个小例子

我创建了一个展示如何使用 Abortable Fetch 小例子,基于自动完成场景的想法(没有任何实现细节!)。去理解每当输入网络请求时会发生什么。 如果您在旧请求完成之前进行了新的输入,它将中止先前的抓取。 在实践中它看起来有点像这样:

abortable fetch

源代码

深入思考

也许 AbortController 最酷的部分是它被设计成一个中止异步任务的通用机制。它是 WHATWG specification 的一部分。这意味着它是DOM规范而不是语言(ECMAScript)规范,但对于前端开发来说,这仍然是一个有用的功能。你可以利用它作为一个更清晰的异步控制流机制,用于实现异步任务的时候(即当使用 Promises 时)。这篇Bram Van Damme super article for a more detailed example 文章会有更多我所讲述的细节。

本翻译自原文 - Cancelling Requests with Abortable Fetch。不定期更新技术博文欢迎star。由于本人水平有限,错误之处在所难免,敬请指正!转载请注明出处,保留原文链接以及作者信息。

ECMAScript Goodies I: Check out the ES2017 Major Features

ES2017 的主要功能有什么以及 ES2018 会带来什么?分两部分去探索最新最好的 ECMAScript 的特性。第一部分探索主要特性如异步函数、共享内存以及 atomics ,而第二部分探索其余特性。

看看 ECMA International, Technical Committee 39 ,事实证明 ES6 中的6并不是发布所需的年数。由于 ES6/ES2015 花了很长时间发布(6年),委员会决定每年去发布一点。我认为这种势头会让事情发生变化,并改善 JavaScript 。ES2017 会带来什么, ES2018 的列表呢?

你可以了解 TC39 议程进展从2ality通过Axel Rauschmayer:The TC39 Process for ECMAScript Features

ES2017

一月的 TC39 会议,该小组就 ECMAScript 提案进行了解决,该提案将被视为 ES2017 的功能(也被称为 ES8,这应该是为了避免混淆)。列表包括:

主要特性

次要特性

在该文章中,第一部分介绍上述所列的主要特性。第二篇文章介绍次要特性。

Async Functions

Async Functions on GitHub (Proposed by Brian Terlson)

在 ES2015 ,我们使用 promise 帮助我们避免回调地狱。

async/await 语法受 TJ Holowaychuk 的 Co 软件包启发,读起来像同步代码。总结, async/await 关键字 及 try/catch 块使得函数异步执行。它们像 generateors 一样工作,但没有转换为 Generateor 函数。这是这样的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Old Promise Town  
function fetchThePuppies(puppy) {
return fetch(puppy)
.then(puppyInfo => puppyInfo.text())
.then(text => {
return JSON.parse(text)
})
.catch(err =>
console.log(`Error: ${err.message}`)
)
}

// New Async/Await City
async function fetchThePuppies(puppy) {
try {
let puppyInfo = await fetch(puppy)
let text = await puppyInfo.text()
return JSON.parse(text)
}
catch (err) {
console.log(`Error: ${err.message}`)
}
}

这并不意味着你应该把所有 promise 代码替换成 async/await 。就像你不会把每一个函数都用箭头函数(希望),只在最合适的地方用这个语法。我不会讲太多细节因为有盖了 async/await 。去看一下。(上面前一个句子给了一些博文)。在即将到来的一年中,我们将看到人们如何使他们的代码更具有可读性和更高效的使用 async/await 。

共享内存及 Atomics

Shared Memory and Atomics on GitHub (Proposed by Lars T. Hansen)

这个 ECMAScript 提案引入了 SharedArrayBuffer 和一个命名空间对象 Atomics 以及辅助函数至 ES2017 。

我们使用 JavaScript 在浏览器中使用更多操作,这些操作依赖于 JIT 即时编译器和快速 CPU 。不幸的是,正如 Lars T. Hansen 在他的一篇很棒的2016年三月的文章A Taste of JavaScript’s New Parallel Primitives中说。

JS JIT 现在性能提升缓慢,并且 CPU 性能的改进大多停滞不前。 现在所有的消费级设备(从台式机系统到智能手机)都拥有多个 CPU (真正的 CPU 内核),而不是更快的 CPU ,除了低端以外,它们通常有两个以上。 一位想要为程序提供更高性能的程序员必须开始并行使用多个内核。 这对于所有用多线程编程语言( Java , Swift , C# 和 C ++ )编写的“本机”应用程序来说都不是问题,但是对于多 CPU 上运行非常有限的 JS 是个问题( web workers, 只有缓慢的消息传递和少数方式去避免数据拷贝)。

SharedArrayBuffer

该提案为我们提供多核计算的构建基础,去研究不同方式去实现 JS 高级并行架构。这些构建基础是什么? SharedArrayBuffer 了解一下。MDN有简洁的定义,我贴在这里:

SharedArrayBuffer 对象作为一个通用的、长度固定的raw二进制数据缓存块,跟 ArrayBuffer 对象相似,但是他们可以在共享内存中创建视图。不像 ArrayBuffer , SharedArrayBuffer 不能分离。

我不知道你会不会跟我第一次读到的时候感受一样 “???”。

基本上,我们能够用 web workers 运行并行任务是首要方式之一。由于 workers 运行在它自己的全局环境中且无法分享,除非 workers 间进行交流或者跟主线程交流。更好的是, SharedArrayBuffer 对象允许你去分享字节数据在多个 wrokers 和主线程中。再者,与前身 ArrayBuffer 不同, SharedArrayBuffer 可以同时被多方(例如 web workers 或者 web 页面主程序)引用。你可以使用 postMessage 将 SharedArrayBuffer 从其中一方转移到另一方。把它放在一起,使用 SharedArrayBuffer 在多个工作者和主线程之间传输数据,以便可以一次执行多个任务,这些任务在 JavaScript 中是并行的。

SharedArrayBuffer 新消息

重要!注意 SharedArrayBuffer 的暂时搁置。 如果你最近一直关注这个新闻,你可能会了解到处理器芯片安全设计缺陷导致两个漏洞: Meltdown) 和 Spectre) 。 阅读它,知道浏览器禁用 SharedArrayBuffer ,直到解决此问题。

Atomics

并行的下一站: Atomics ,一个有两个方法的全局变量。首先,介绍一下 Atomics 方法要解决的问题:在各方(如 web workers 或 Web页面的主程序)共享 SharedArrayBuffer 时,各方可以随时读取和写入其内存。那么,你如何保持一致性及访问顺序化,确保各方知道要等待其他方完成他们的数据写入?

Atomics 有 wakeload 方法。各方将会“休眠”在等待队列中,等待其他方完成写入数据,所以 Atomics.wake 是一个让各方“醒来”的方法。当你需要读数据的时候,你用 Atomics.load 去从某个地点装载数据,这个位置需要输入两个参数,TypedArray,一类数组机制去访问raw二进制数据(即 SharedArrayBuffer),和一个下标去寻找在 TypedArray 的位置。除了我们刚才介绍的内容之外,还有更多的内容,但这是它的要点。

目前, Atomics 只有这两种方法。 最后, Hansen (我们这个提案的作者和并行事物的解释者)说,应该有更多的方法,比如 store 和 compareExchange 来真正实现同步。相对而言,我们正处于 JavaScript 并行的开始阶段,这个提议为我们提供了实现这些目标的基石。

虽然这是一个值得思考的问题,但这仍然是一个高度概括。此更新可能在下一年不会被大多数开发人员使用,但会有助于推进 JavaScript 让每个人受益。 所以,感谢你的阅读,并浏览这些奇妙资源去深入了解!

更多内容:

本文翻译自原文 -ECMAScript Goodies I: Check out the ES2017 Major Features

Active learning of neuron morphology for accurate automated tracing of neurites

Active learning of neuron morphology for accurate automated tracing of neurites

神经元形态主动学习 对 神经轴突的精确自动追踪

阅读本文需要proxy,依赖imgur图床

参考文献:

http://journal.frontiersin.org/article/10.3389/fnana.2014.00037/full

摘要

Automating the process of neurite tracing from light microscopy stacks of images is essential for large-scale or high-throughput quantitative studies of neural circuits. While the general layout of labeled neurites can be captured by many automated tracing algorithms, it is often not possible to differentiate reliably between the processes belonging to different cells. The reason is that some neurites in the stack may appear broken due to imperfect labeling, while others may appear fused due to the limited resolution of optical microscopy. Trained neuroanatomists routinely resolve such topological ambiguities during manual tracing tasks by combining information about distances between branches, branch orientations, intensities, calibers, tortuosities, colors, as well as the presence of spines or boutons. Likewise, to evaluate different topological scenarios automatically, we developed a machine learning approach that combines many of the above mentioned features. A specifically designed confidence measure was used to actively train the algorithm during user-assisted tracing procedure. Active learning significantly reduces the training time and makes it possible to obtain less than 1% generalization error rates by providing few training examples. To evaluate the overall performance of the algorithm a number of image stacks were reconstructed automatically, as well as manually by several trained users, making it possible to compare the automated traces to the baseline inter-user variability. Several geometrical and topological features of the traces were selected for the comparisons. These features include the total trace length, the total numbers of branch and terminal points, the affinity of corresponding traces, and the distances between corresponding branch and terminal points. Our results show that when the density of labeled neurites is sufficiently low, automated traces are not significantly different from manual reconstructions obtained by trained users.

对于神经回路(neural circuits)高通量研究,从光学研究大量图像的神经追踪自动化是必不可少的.
当被标记的神经的通常分布可以被许多自动追踪算法捕获,但事实上分辨神经元分属于不同细胞常常是不可能的.
原因是一些在一堆的神经元常常出现断裂,由于不完美的”神经元的扫描”,其他的还会出现融合,由于光学显微有限的方法.
专业的神经解剖学一般依靠组合信息例如分支节点的距离,分支方向强度,口径,变形,颜色,刺或者棒头的存在.
还有,为了评估不同技术自动化脚本,我们通过上述提到的特点发展出了机器学习.
在用户辅助训练过程中,一种特别设计可信赖的测量适用于主动训练算法.
主动学习显著减少了训练时间,通过提供的少量训练样本,尽可能获取少于1%的一般错误比例.
为了评估算法的总体性能,大量的图像堆被自动重建以及以几个熟练用户手工重建,尽可能完成自动追踪对于基线用户间变异性.
在对比中选择几种追踪方法的几何和技术特征.
这些特征包括了总追踪长度,分支与终点之间的总数和距离,相应追踪的近似程度.
我们的结果表明,当标记神经的密度是足够低的,自动追踪与受过训练的用户来人工重建不会有显著不同.

介绍

Our understanding of brain functions is hindered by the lack of detailed knowledge of synaptic connectivity in the underlying neural network. With current technology it is possible to sparsely label specific populations of neurons in vivo and image their processes with high-throughput optical microscopy (see e.g., Stettler et al., 2002; Trachtenberg et al., 2002; De Paola et al., 2006; Wickersham et al., 2007; Lichtman et al., 2008; Wilt et al., 2009; Ragan et al., 2012). Imaging can be done in vivo for circuit development or plasticity studies (Trachtenberg et al., 2002), or ex vivo for circuit mapping projects (Lu et al., 2009). In the latter case, an unprecedented resolution can be achieved by first clarifying the tissue (Hama et al., 2011; Chung et al., 2013), and then imaging the entire brain from thousands of optical sections (Ragan et al., 2012). The overwhelming obstacle remaining on the way to brain mapping is accurate, high-throughput tracing of neurons (Sporns et al., 2005; Lichtman et al., 2008; Miller, 2010; Gillette et al., 2011b; Kozloski, 2011; Lichtman and Denk, 2011; Liu, 2011; Svoboda, 2011; Helmstaedter and Mitra, 2012; Van Essen and Ugurbil, 2012; Perkel, 2013). Presently, accurate traces of complex neuron morphologies can only be obtained manually, which is extremely time consuming (Stepanyants et al., 2004, 2008; Shepherd et al., 2005), and thus impractical for large reconstruction projects.

在底层神经网络中对于突触连通的细节知识的缺失阻碍了我们对脑功能的理解.
用现代技术是可能去少量标记特定的神经群,在vivo和通过高通量光学显微镜得到的图像.
成像可以在vivo完成,为了神经线路的发展或者粘性研究,或者前vivo对于神经线路测绘工程.在后来发展中,一种崭新的解决方案可以实现对组织的第一清晰度和形成整个大脑的图像变成数以前千计的光学切片.巨大的阻碍留在了如何完成大脑映射:精确高通量的神经元追踪.
目前,复杂神经元形态的精确追踪只能手工获得,这是极端耗时的,因此,对于大型重建工程是不切实际的.

Many automated tracing algorithms have been developed in recent years [see e.g., Al-Kofahi et al., 2002; Schmitt et al., 2004; Zhang et al., 2007; Al-Kofahi et al., 2008; Losavio et al., 2008; Peng et al., 2010; Srinivasan et al., 2010; Bas and Erdogmus, 2011; Peng et al., 2011; Turetken et al., 2011; Wang et al., 2011; Xie et al., 2011; Bas et al., 2012; Choromanska et al., 2012; Turetken et al., 2012 and Meijering, 2010; Donohue and Ascoli, 2011; Parekh and Ascoli, 2013 for review]. In general, existing algorithms can accurately capture the geometrical layout of neurites but are not guaranteed to recover their correct branching topology (Figure 1). Topological errors are inevitably present in traces obtained from low signal-to-noise images, images of non-uniformly labeled neurites, or images with high density of labeled structures. Close examination of such traces often reveals topological errors such as broken branches, missing branches, and incorrectly resolved branch crossover regions (stolen branches). This is a particular concern for high-throughput projects where topological errors can accumulate over multiple stacks. For example, while tracing a long-range axon from one optical section to the next, even a very low error-rate, say 5% per section, will almost certainly lead to erroneous connectivity after about 20 sections (typically about 10 mm), rendering the trace unusable for brain mapping projects. Clearly, the rate of topological errors in automated reconstruction projects must be carefully controlled (Chothani et al., 2011).

在近年来许多自动追踪算法已经发展,在一般情况下,现存的算法可以精确捕获到神经元的几何布局,但是不保证恢复正确分支节点的技术.
从低信噪比图像和非均匀地标记的神经突的图像中,对于现存的追踪获取,技术错误是不可避免的.
这样的追踪方法在仔细检查下,常常会发现技术错误,例如破碎的分支,缺失的分支,和错误解决的分支交叉区域.这是备受关注的高通量项目,拓扑学错误可以累积多个堆栈中,例如,当追踪长轴突从一个光学切片到下一个,即使是一个非常低的错误率,如每切片5%,几乎毫无疑问地导致错误的连通性,在大约20个切片后(通常是十毫米),致使这个追踪是不能用的对于大脑映射工程.明显的,技术错误的几率在自动化重建工程必须是小心地被控制的.

In this study we describe an active machine learning approach (Settles, 2012) that has the potential to significantly reduce the number of topological errors in automated traces. Our algorithm first detects a geometrically accurate trace with the Fast Marching method (Cohen et al., 1994; Cohen and Kimmel, 1997; Sethian, 1999; Mukherjee and Stepanyants, 2012), which was extended to incorporate multiple seed points. Next, the initial trace is dismantled to the level of individual branches, and active learning is applied to reconnect this trace based on knowledge of neuron morphology. We show that active learning does not require large sets of training examples, and the results generalize well on image stacks acquired under similar experimental conditions. What is more, when the density of labeled neurites is sufficiently low, automated traces are not significantly different from reconstructions produced manually by trained users.

方法

Results of this study are based on the analyses of two datasets featured at the DIADEM challenge (Brown et al., 2011). The OP dataset includes 9 image stacks containing axons of single olfactory projection neurons from Drosophila (Jefferis et al., 2007), and the L6 dataset consists of 6 image stacks containing axons of multiple layer 6 neurons imaged in layer 1 of mouse visual cortex (De Paola et al., 2006). The NCTracer software (www.neurogeometry.net) was used to trace each image stack automatically, as well as manually. The manual traces were generated independently for each stack by three trained users.

这项研究的结果是基于两个有特点的数据集在”王冠”的挑战.这OP数据集包括了9个图像堆,包含了果蝇属的单嗅觉工程神经集的轴突,和由6图像堆组成的L6数据集,包含了多层次的6个神经在小鼠视觉皮层的1层成像的轴突.NCTracer软件(www.neurogeometry.net)是被用于自动化追踪每一个图像堆,同时也可以手工.对于每一个图像堆,通过三个训练有素的用户,这手工追踪可以被独立生成.

神经元的初始化追踪

The Initial Trace of Neurites
We refer to any trace providing geometrically accurate information about the layout of neurites within an image stack as an initial trace. Numerous segmentation and tracking-based methods (see e.g., Al-Kofahi et al., 2002; Schmitt et al., 2004; Zhang et al., 2007; Al-Kofahi et al., 2008; Losavio et al., 2008; Peng et al., 2010; Srinivasan et al., 2010; Bas and Erdogmus, 2011; Peng et al., 2011; Turetken et al., 2011; Wang et al., 2011; Xie et al., 2011; Bas et al., 2012; Choromanska et al., 2012; Turetken et al., 2012) can be used to produce initial traces. In this study we adapt the Fast Marching method (Cohen et al., 1994; Cohen and Kimmel, 1997; Sethian, 1999; Mukherjee and Stepanyants, 2012) to grow the initial trace from multiple seed points (Figure 2), analogous to the way light from multiple point sources spreads through a non-uniform medium. This process is described by the Eikonal boundary value problem (Sethian, 1999):

我们查阅了任何一种追踪都带着在几何学上精确的信息,如神经元的布局,使用一个图像堆作为初始追踪.
许多分隔和基于追踪的方法可以用来产生初始追踪,在这项研究中,我们选用了快速行进的方法 (Cohen et al., 1994; Cohen and Kimmel, 1997; Sethian, 1999; Mukherjee and Stepanyants, 2012)从多个种子点去扩张初始追踪,类似于光线从多个点源通过非均匀媒介中传播.这个进程被描述为 光程函数边值问题(Sethian, 1999).

In this expression, vector r represents a position in the image stack (or non-uniform medium), I is the image intensity normalized to the 0–1 range (analog of the speed of light in the medium), and ∇ denotes the gradient operator. Light rays originate from the boundary, ∂S, at time zero, and the time map, T(r), provides information about the shortest time of arrival of these rays to various locations in the image. Because higher image intensities correspond to faster speeds of light propagation, the arrival time front in the image will preferentially spread along the high intensity structures of neurites (see Figure 2C).

在这个表达式中,矢量r代表在图像堆中或是在非均匀媒介中的位置,I是图像亮度 归于0-1范围内(类似于光在介质中的速度),∇代表梯度操作子.
光射线起源于边界,∂S在零时间,和时间对应,T(r)提供了在极短时间内这些光线到达图像中的不同位置的信息.因为更高的图像亮度代表着光传播的更快的速度.在到达时间前,图像将会优先传播沿着高亮度结构的轴突.

The Fast Marching algorithm of Sethian (Sethian, 1999) is an efficient numerical scheme for solving the Eikonal boundary value problem, Equation (1). Since the speed function in our problem is defined by the image intensity, it is always positive. For positive speed functions it is known that the Eikonal boundary value problem can be solved more efficiently than the commonly used alternative—the Hamilton-Jacobi problem of the Level Set method (Sethian, 1999). One reason is that the stability condition required for a numerical solution of the time-dependent Level Set equation is more stringent than that used to solve the Eikonal problem. Specifically, this condition requires very small time steps and thus the Level Set method is expected to be more time consuming. The second advantage of Fast Marching has to do with the outward only propagation of the fronts, which can be used to find new front points very efficiently (Sethian, 1999).

Sethian 的快速行进算法是一个高效数值方案对于求光程函数边界数值问题(方程1),在我们的问题中,速度被定义成图像的亮度,它始终为正,对于正速度函数,众所周知,在光程函数边值问题上,可以比常用的替代方法[the Hamilton-Jacobi problem of the Level Set method (Sethian, 1999)]更有效地解决。原因之一是,对于随时间变化的水平集方程的数值解所需的稳定性条件是更苛刻的,比用于解决光程函数边界数值问题。特别得,这种情况需要非常小的时间间隔,因而水平集方法预计将耗费更多的时间。快速行进的第二个优点是只在前沿,向外传播,它可用于非常有效地找到新的前进点 (Sethian, 1999).

We implement the Fast Marching algorithm (Sethian, 1999) on a discrete lattice defined by the centers of image voxels, l = (i, j, k)T. Here the time map is evolved from the boundary at T = 0 by taking the upwind solution of a discretized version of Equation (1):

我们实现快速行进算法在离散点阵形式通过图像像素的中心定义, l = (i, j, k)T.这里的时间映射是从在T = 0的边界演变,采用方程的离散版本的upwind解决方案.

Parameters (sx, sy, sz) in this expression denote the voxel dimensions which may not be the same due to a typically lower z-resolution in confocal and two-photon microscopy images.

参数(i,j,k)在这个表达式代表三维尺寸,并不相同由于一般低z分辨率在共焦和双光子显微镜图像.

The arrival time front is initialized with T = 0 at multiple seed points, which are automatically generated along the structure of neurites based on image intensity (Figure 2A). As was previously described (Mukherjee and Stepanyants, 2012), the arrival time front is allowed to travel a specified distance, Dmax, to establish a local time map. The value of Dmax has to be chosen based on two considerations: Dmax has to be larger than the caliber of neurites (3–5sx for OP and L6) not to produce short spurious branches and, at the same time, not much larger than the shortest branch that needs to be resolved by the algorithm (10sx in this study). Dmax = 15sx was used throughout this study. The path connecting the farthest point of the front to the T = 0 boundary is then found by performing gradient descent on T(i, j, k) (see Figures 2C–E). Next, the gradient descent path is added to the boundary ∂S and the Fast Marching algorithm is re-initialized from the new boundary. This process continues until a stopping condition is reached, at which point the final ∂S defines the initial trace. The stopping condition used in this study is based on the average intensity of the last added branch. When this intensity falls below a set threshold (typically 20% of the average intensity of the existing trace), Fast Marching is paused and can then be continued or terminated by the user.

在达到时间前是用T = 0在多个种子点进行初始化,是用于自动生成沿着神经元结构基于图像亮度.如先前所描述(Mukherjee and Stepanyants, 2012),在到达时间前,是被允许行进规定距离,Dmax 是建立了当地时间映射,Dmax的值基于两个方面考虑:Dmax必须比轴突的口径更大,(3–5sx for OP and L6)不产生短伪分支,与此同时,不比需要由算法来解决的最短分支大得多(10SX在这项研究中).在整个本研究中使用DMAX = 15sx。该路径连接最接近到T = 0的边界前的最远点的路径,然后由于T(I,J,K)(参见图2C-E)进行梯度下降找到.
接下来,梯度下降通道被添加到边界∂S,快速行进算法从新的边界被重新初始化.
此过程继续进行,直到达到终止条件.在该点,最终∂S表示初始轨迹.在此研究中使用的停止条件是基于最后添加分支的平均强度.当该亮度降到低于设定的阈值(通常是已经存在的追踪的平均亮度的20%),快速行进被暂停,然后可以由用户继续或终止.

As long as the seed points used to initialize Fast Marching are located in the foreground and are connected by higher than background intensity paths, their Fast Marching fronts are guaranteed to collide. The gradient descent algorithm is invoked in this case as well. Here, gradient descent paths originating from the collision voxel back-propagate into every colliding region, thus connecting their T = 0 boundaries. If there is a break in intensity along a neurite linking two seed points, the Fast Marching algorithm may terminate before the fronts have a chance to collide. In addition, high levels of background intensity may lead to erroneous front collisions. These and other topological errors in the initial trace will be corrected as described in the following sections.

只要位于前景种子点通常用快速行进的初始化 而且是 通过比背景亮度更强的路径来连接, 快速行进的前端一定会碰撞. 在这个场景下,梯度下降算法效果很好,梯度下降路径起点从碰撞的三维像素反向传播到每一个碰撞的区域,因此连接到T=0的边界中.如果在亮度上有断开在神经元的两个种子点之间,快速行进算法也许会停滞在可能碰撞的前端.此外,高亮度的背景会导致接触前端的错误.在接下来所描述的部分,这和其他算法的错误在初始化追踪中将被纠正,

Optimization of the Initial Trace

优化初始追踪

We represent the initial trace as a graph structure consisting of nodes linked by straight line segments. Each node, k, is described by its position in the stack, rk = (xk, yk, zk)T, and the caliber, Rk, of the neurite at that location. Information about connectivity among the nodes is stored in the adjacency matrix, A. We find this representation to be more convenient than the traditional SWC format of neuron morphology (Cannon et al., 1998) because the latter cannot be used to describe structures containing loops.

由直线所连接的节点组成的图形结构作为初始轨迹.每一个节点,k都用来描述它在图像栈的位置, rk = (xk, yk, zk)T,神经突起的口径:Rk,储存在邻接矩阵的节点间的连通性信息,我们发现,这表示要比传统的神经元形态SWC格式更实用 (Cannon et al., 1998)因为SWC无法描述包含回环的结构。

Because the initial trace lies sufficiently close to the centerline of neurites, this trace can be optimized by monitoring its fitness in response to small changes in the position and caliber of every node. The fitness function used in this study, yes, consists of the intensity integrated along the trace and regularizing constraints on the positions and calibers of the connected nodes:

因为初始追踪位置十分接近神经的中间线,追踪可以通过监测它适应的每一个节点的位置和口径的微小变化去优化.在这项研究中,使用的适应调整功能,是由轨迹的整体亮度和连接节点的口径与位置的规则约束所组成的

Vectors rk in this expression specify the positions of the trace vertices, while vectors lm denote the positions of voxel centers in the image stack. Index k’ enumerates the neighbors of vertex k. Parameter λ denotes the average density of nodes in the trace, i.e. the number of nodes per voxel. Lagrange multipliers αr > 0 and αR > 0 control the stiffness of the regularizing constraints. The first term in this expression is the convolution of the image with the Laplacian of Gaussian. This convolution can be performed by using the Fast Fourier Transform (Press, 2007) or, in case of relatively small density of trace nodes, it may be faster to perform explicit summation over the index m. In this case, due to the fast decay of the Gaussian factor, the summation can be restricted to a small number of voxels in the vicinity of the trace (see Chothani et al., 2011 for details).

在这个表达式中,向量Rk指的是追踪顶点的位置,而向量lm表示在图像堆栈三维像素中心的位置,参数λ表示节点在跟踪的平均密度,即每像素节点的数目.拉格朗日乘子αR> 0,αR> 0控制的正则化约束刚度.在该表达式中的第一项是与高斯的拉普拉斯图像的卷积。这个卷积可以通过使用快速傅里叶变换(Press,2007年),或在跟踪节点的相对较小密度的情况下进行,它可以是更快地通过索引号m执行显式求和。在这种情况下,由于高斯因子的快速衰减,求和可以限制到一个小数目在跟踪的附近的体素(see Chothani et al., 2011 for details)。

Maximization of the fitness function, yes, is performed with Newton’s method (Press, 2007):

最优适应函数是牛顿法

Variable n in this expression enumerates the iteration steps of the algorithm, parameter β > 0 controls the step size, Ĥ denotes the Hessian operator acting on all the node variables {rk(n), Rk(n)}, and -1 in the exponent denotes matrix inversion. The positions and calibers of all nodes of the trace, including branch and terminal points, are synchronously updated at every iteration step. The values of all three terms in the fitness function are monitored during optimization. Optimization is terminated once the relative changes in all three quantities fall below 10−8. For the OP and L6 datasets considered in this study, the optimization procedure typically converges to the optimum solution in less than 50 steps. Optimization improves the layout of branches as well as the placement of branch and terminal points in the initial trace (Vasilkoski and Stepanyants, 2009; Chothani et al., 2011). The values of parameters αr, αR, and β are constrained by the considerations of algorithm stability, speed of convergence, and accurate representation of neurites’ curvature and caliber. Some of these issues were discussed in Vasilkoski and Stepanyants (2009) and Chothani et al. (2011).

在此表达式变量n列举算法的迭代步骤中,参数β> 0的控制步长,h表示作用于所有节点的变量{RK(n)时,RK(n)}存储在黑森操作者,和-1中的指数表示矩阵求逆。的位置和跟踪,包括分支和终点的所有节点的口径,同步地在每个迭代步骤更新。在健身功能的所有三个术语的值优化过程中进行监控。一旦在所有三个量的相对变化低于10-8优化终止。对于本研究中考虑的OP和L6数据集,所述优化过程通常收敛于小于50步的最佳解决方案。优化提高分支的布局,以及在初始跟踪转移和终点的位置(Vasilkoski和Stepanyants,2009; Chothani等人,2011)。参数值αR,αR,β是由算法稳定性,收敛的速度,和突起’曲率和口径的精确表示的考虑因素的限制。其中的一些问题进行了Vasilkoski和Stepanyants(2009年)和Chothani等人讨论。 (2011年)。

Learning Branching Morphology of Neurites

学习轴突分支形态

As shown in Figure 1, even when the initial trace accurately describes the geometry of neurites, it often fails to capture the correct branching topology. To address this problem, we disconnect branches of the initial trace from one another and then assemble them into tree-like structures based on prior knowledge of neuron morphology. In order to discriminate between correct and erroneous ways to assemble branches, different branch merging scenarios are evaluated in a machine learning approach by combining information about various features of the trace. Such features may include distances between branches, branch orientations, average intensities, intensity variations, branch thicknesses, curvatures, tortuosities, colors, and presence of spines or boutons. Features 1–9 of Figure 3 were used to produce the results of this study. These features were selected based on our knowledge of neuroanatomy and intuition gained from manual neuron tracing. We carefully examined the decisions we make when faced with branch merging tasks and initially created a list of 17 features that are shown in Figure 3. Features 15 and 16 are not applicable for the OP and L6 datasets as these datasets include grayscale images of axons only. Features 10–14 and 17 were tested but did not improve the performance of the classifiers. This is why the above features (10–17) were left out of the analysis. This is not to say that features 10–17 are not important; they may be useful for other dataset types.

如图一,即使初始追踪点准确描述神经元几何形状,但它往往不能捕捉到正确的分支的拓扑.为了解决这个问题,我们从中分离各个初始节点的分支和根据神经元形态学的先前知识组合它们成树状结构.为了辩别真假分支节点的通路,不同的分支节点合并脚本在机器学习中优化,通过组合追踪的多样特点信息,这些特点包括分支距离,分支方向,平均亮度,亮度变动,分支厚度,导致弯曲的因素,弯曲,颜色,神经轴突的存在.特征图3的1-9生成本研究的结果.根据我们人工跟踪神经元得到的解剖学与直觉的知识来选择这些特性。我们仔细审核我们所做的决定–对于分支合并任务和初始创建特征图3的17的列表,如图3所示功能15和16不适用于OP和L6数据集,这些数据集只包括轴突的灰度图像。特征图10-14和17进行了测试,但没有提高分类的效果。这也是为什么上述特征图(10-17)被排除分析。这并不是说,具有10-17并不重要;它们可以是用于其他数据集类型是有用的。

To evaluate different branch merging patterns in the disconnected initial trace we cluster branch terminal points on the basis of their relative distances.
为了改善不同分支合并的模式在分离初始追踪,我们在它们相对距离的基础上连接分支末端.
For this, we first create an all-to-all connected graph in which nodes represent the branch terminal points. Next, the links between distant nodes (>10sx) are removed, exposing the clusters of nearby branch points.
为此,我们首先创造了代表分支节点末端的一个节点的全部连接图形,接下来,距离>10sx的线会被移走,出现临近分支节点的集群.
The threshold distance of 10sx was chosen based on two considerations.
基于两方面考虑选择了10sx的阈值距离:
First, this distance has to be larger than the voxel size (sx) and the size of a typical gap in intensity resulting from imperfect labeling of branches (0 for OP and ~5sx for L6).
首先,这个距离必须比三维像素尺寸(SX) 和 在亮度上,由于不好的标记分支得到的一个典型间隙大小(0为OP和〜5SX为L6) 要大.
Second the threshold distance has to be smaller than the typical branch length (20sx-50sx for OP and L6).
第二,这个阀值距离必须小于典型分支长度(20sx-50sx对于OP和L6).
Results of branch merging are not sensitive to the precise value of this parameter in the 5sx–15sx range. Branch merging is examined within each cluster of branch terminal points independently.
在精确到5SX-15sx范围内参数,分支合并的结果是不明显的.在每一个独立的分支末端点集群中,分支合并是被检查.

Within a given cluster, all possible branch merging scenarios are considered (Figure 4A), and the correct merging pattern is determined in a classification framework.
在给定的群集中,我们会考虑采用所有可能的分支合并方案(图4A),和在一个分类框架,去确定正确的合并模式.
Clusters containing 2 terminal points lead to two scenarios, i.e., to connect or not to connect the terminal points.
包含了两个末端点的多集群将导致两种情况:连接或不连接末端点.
Three terminal point clusters result in 5 scenarios, 4 terminal point clusters lead to 15 (Figure 4A), and the number of scenarios increases exponentially with the complexity of clusters (Figure 4B).
三个末端点的集群将导致五种结果,四个末端点的集群将导致十五种结果(图4A),和结果的数目与集群的复杂性(图4B)呈指数增大。
This exponential increase gives a unique advantage to our classification approach to branch merging.
对于进行分支合并的分类,该指数增长提供了一个独特的优势.

Generally, machine learning applications require large sets of labeled data.
一般情况下,机器学习应用需要大量标记数据。
Creating such sets can be very time-consuming and, in many cases, impractical.
创造这样的集合是非常费时,在许多情况下,不实用。
Our training strategy circumvents this problem by exploiting the large numbers of branch merging scenarios.
我们的训练策略是利用大量的分支合并方案去规避了这一问题。
Labeling the correct branch merging scenario in a single cluster can provide thousands of training examples.
在一个集群中提供数以千计的例子去标记一个正确的分支合并方案.
Hence, it becomes possible to train the classifier in real time and obtain accurate results by labeling only 10–100 clusters of branch terminal points.
因此,在现实中有可能训练出分类器,通过仅仅标记10-100个分类末端点的集群去获取精确的结果.
All possible branch merging scenarios are evaluated within a given cluster of branch terminal points.
在给定的分支末端节点集群中,所有可能的分支合并的方案都会被评估.
Each scenario, i, is characterized by a feature vector xi (Figure 4A) whose components consist of features of the trace that may be important for selecting the correct branch merging scenario (Figure 3).
每个方案其特点是一个特征矢量xi(图4A),其组件包括可能是重要的追踪的特性,对于筛选正确的分支合并方案(图3)。
The problem is thus reduced to learning the best set of weights, w, for discriminating between the correct and erroneous scenarios within every cluster.
因此,对于如何辨别每个集群中正确还是错误的方案,这个问题被简化为如何得到最好权重W,

Imgur

This formulation leads to another important advantage for the implementation of the classification strategy. Due to the linearity of the problem, Equation (5) can be rewritten as,
这公式对于实现分类策略有其他重要优势。由于问题是线性的,方程式如下

resulting in a subtractive normalization of the feature vectors within individual clusters.
这导致在独立集群的特征向量的消减归一化。
Because branch merging scenarios are only compared within clusters, Equation (6) effectively normalizes for the variations in image intensity and density of neurites across clusters.
因为分支合并方案只能在集群中比较。方程6可以高效归一对于图像亮度和集群中神经的密度的变化
The classification problem of Equation (6) is solved with sign-constrained perceptron (Engel and Broeck, 2001) or SVM classifiers (Wang, 2005), which were modified to be able to account for the relative importance of some training examples.
用sign-constrained perceptron (Engel and Broeck, 2001) or SVM classifiers (Wang, 2005)可以解决方程6的分类问题,这被修改,可以解释一些训练例子的相对重要性。
The sign-constrained perceptron algorithm was previously described in Chapeton et al. (2012):
sign-constrained perceptron算法以前在Chapeton et al. (2012)如下描述:

where w is the weight vector of the perceptron classifier, Δxμ is the difference between the feature vectors for the erroneous merger μ and the correct merger from the same cluster,
w表示感知分类的权重向量, Δxμ表示在相同集群中错误合并μ和中正确合并分支的特征向量不同,
N is the number of features (9 features were used in this study), and m is the number of comparisons made (total number of scenarios minus number of clusters).
N指的是特性数字(在该研究中9特征),和m是压缩模式中的数字(总方案除以集群数量)。
The value of the parameter gk can be set to −1 or 1, constraining the corresponding weight, wk, to be negative or positive, or set to 0, in which case the weight is unconstrained.
参数gk的值设置为-1或者1,限制相应的权重,wk,加强或减弱,或者设置为0,使得权重无限制。
Because larger distances, overruns, and offsets of terminal points (see Figure 3) decrease the likelihood that branches should be merged, the weights of these features were constrained to be positive.
因为更长距离,末端点的偏移或超出(见图3),减少了分支该合并的可能,这些特征权重是趋向于加强。
In addition, the weight associated with the number of free terminal points was constrained to be positive to promote branch merging.
此外,权重因自由末端的数量的提高,而改善分支合并。
All other weights were left unconstrained as we did not have clear motivation for doing otherwise. Hence, g = (1,1,1,0,0,0,0,0,1)T was used in this study.
其他权重不限制,因为我们没有明确动机去做,除非,g=(1,1,1,0,0,0,0,0,1)T会被使用在这个研究中。
Parameter κ is referred to as the perceptron robustness (analogous to SVM margin). Increasing κ should initially improve the generalization ability of the perceptron,
参数k引用作为健壮性(类似于SVM的边缘)。K的增加应该最初改善感知器通用能力。
but as the perceptron fails to correctly classify a progressively increasing number of training examples, this generalization ability should decrease.
但由于当训练样本数增加,感知器无法正确分类,这种通用能力应该减少。
We used the leave-one-out cross-validation scheme to examine this trend. In this scheme, training is done on all but one labeled example, and the remaining example is used for validation.
我们用了差一法、交叉验证去检验这个趋势,所有训练样本完成后,除了一个用来标记,其余都用于验证。
In Figure 5 each branch merging cluster was used once for validation and the results were averaged. Figure 5A shows that there is a large range of κ for which the perceptron performs reasonably well for both L6 and OP datasets.
在图像5,每一个分支合并集群用于验证,结果是平均的。图5a显示当有大量κ,感知器的表现相当不错,L6和OP的数据集。
The value of κ was set to 1 throughout this study.
在本研究,κ的值被设置为1。


Figure 5. How to choose best classification parameters. (A) Leave-one-out cross-validation error-rate as function of the perceptron robustness parameter, κ [see Equation (7)].
图5,如何选择最好的分类参数.归一法 交叉验证错误几率作为感受器健壮性功能的参数,κ [见方程(7)].
(B) Same error-rate as function of the SVM parameter, C [see Equation (9)].
相同错误几率作为SVN功能的参数,C [见方程(9)].
The inset shows how SVM margin, M, depends on C. Solid and dotted lines show the results for the OP and L6 datasets respectively.
这插入物显示了SVN是怎样留白,M,依赖于C.实线与点状线分别显示了OP与L6结果集的实验结果.
Large empty circles indicate the parameter values that were used throughout this study, κ = 20 and C = 220.
大空心环表明了这个研究始终使用这个参数值,K=20和C=220.

The sign-constrained perceptron problem of Equation (7) was solved by using a modified perceptron learning rule (Engel and Broeck, 2001):
使用更变后的感受器学习规则解决了该方程式7的符号约束感受器问题(Engel and Broeck, 2001):

Δw=θ(κN−1NwTΔxμ)1NΔxμ wk=wkθ(wkgk), k=1,2,…,N (8)

In this expression, Δw denotes the change in the perceptron weight vector in response to presentation of the training example μ;
在这个表达式中, Δw表明在感受器权重向量的改变将影响训练例子μ的感受器.
θ is the Heaviside step function, which is defined to be 1 for non-negative arguments and zero otherwise.
θ是海维赛德阶跃函数,是定义为1,为了非负参数与0.
The step functions in Equation (8) ensure that training is not done on learned examples, and that the perceptron weights violating the sign-constraints are set to zero at every step of the algorithm.
该阶跃函数在方程8确保了这个训练是没有在已经学习过的例子中做过,在算法中的每一步中,当感受器违反符号约束,权重被设为0.
Perceptron weights are updated asynchronously by training on examples, μ, that are drawn from the set of all examples with probabilities proportional to user-defined cluster weights, Qμ.
感受器权重是随着例子训练不断同步更新, μ,从所有例子中有概率比例对于用户定义集群权重得到,Qμ.
All cluster weights are initially set to 1 and can be modified by the user to increase the probabilities with which examples from some clusters come up for training.
所有集群权重初始化设置为1,可以被用户修改,增加概率,在训练中出现一些集群的例子.
This makes it possible to enforce learning of certain rare branch merging topologies.
这尽可能使得可学习某些特殊分支合并的拓扑结构.
Though user-defined cluster weights may be used to improve the outcome of training, this feature was not examined in the present study to avoid subjectivity associated with different choices of Qμ.
通过用户定义的集群权重可以用于改善训练的输出结果,这个特性是没有审核在现在的研究的,为了避免主观联系Qμ的不同选择.
An SVM classifier can also be used to solve the system of inequalities in Equation (6). To incorporate the used-defined cluster weights, Qμ,
在方程式6,一个SVN分类器可以被用于解决系统的不均等.去组成用户定义的集群权重,Qμ,
we modified the standard formulation of the SVM problem (Wang, 2005), and in this study maximize the following dual Lagrangian function in order to obtain the SVM weight vector w:
我们修改SVN问题(Wang,2005)的标准参数,和在该研究中,最大化以下双拉格朗日函数去获取SVN权重向量w:

In these expressions, l is the number of SVM support vectors and C is the SVM margin (see the inset in Figure 5B).
在这表达式中,l是SVN支持向量的数量,C是SVN留白(见插入物在图5B).
Similar to the perceptron robustness, there is a large range of values of C for which the SVM produces reasonably good generalization results for both datasets.
类似于感受器健壮性,这大范围的C值可以使得SVN生产出合理而好的通用结果,对于这两个数据集.
C = 220 was used to produce results of this study. Again, all used-defined cluster weights, Qμ, were set to 1 during training.
C=220被用于产出该研究的结果,再次,所有用户定义的集群权重 Qμ,都被设为1在训练期间.

Active Learning Strategy

主动学习策略

In this section we describe a pool-based sampling approach (Lewis and Gale, 1994) that can be used to actively train the Perceptron and SVM classifiers on branch merging examples.
在这一部分,我们描述一个基于池抽样方法 (Lewis and Gale, 1994),用于积极训练感受器和SVM分类器在分支合并的例子中.
In this approach the user selectively draws queries from the pool of all branch merging clusters based on the value of the confidence measure:
这这个方法,用户并不普遍画出问题从所有分支合并集群池中,基于信任的测量值:
Confidence=e−wTxcorrect merger/T∑i∈allmergerse−wTxi/T (10)

This measure assigns low confidence values (in the 0–1 range) to clusters in which the erroneous merging scenarios are located close to the decision boundary defined by w.
在这个测量分配低可信值(在0-1范围)对于使用错误合并方案的集群接近判定边界由w定义.(err)
Parameter T controls the spread of confidence values but does not affect their order.
参数T包含了可信值的范围除了不影响他们的规则.
This parameter was set to 1 throughout the study. Training can be performed after labeling a single or multiple low confidence clusters, and the confidence measure is updated after each training step.
在该研究中,这个参数是设定为1.训练可以实施,在边界单个或多个低可信集群和可信测量是被更新的在每一步训练后.
It is absolutely essential that clusters in which the correct merging scenario cannot be identified with high certainty should not be used for training,
绝对的是,高度肯定的正确合并方案不能被识别的集群是不能作为训练样本的.
as a small number of errors in the labeled set may significantly worsen the performance of classifiers.
即使是作为小错误在标记集群中也许会极大地使得分类器的性能下降.

Results

结果

The methodology described in this study is implemented in the NCTracer software for automated tracing of neurites.
在本研究中,该方法论以NCTracer软件进行自动追踪神经进行演绎.
This methodology consists of two major parts—initial tracing and branch merging.
改方法论包含了两个主要部分初始化追踪和分支合并部分.
In the first part, an initial trace is created by using the Voxel Coding (Zhou et al., 1998; Zhou and Toga, 1999; Vasilkoski and Stepanyants, 2009) or the Fast Marching (Cohen et al., 1994; Cohen and Kimmel, 1997; Sethian, 1999; Mukherjee and Stepanyants, 2012) algorithm, and optimized to ensure that the trace, including its branch and terminal points, conforms well to the intensity in the underlying image (see the Methods section for details).
在第一部分,一个初始追踪使用三维像素编码(Zhou et al., 1998; Zhou and Toga, 1999; Vasilkoski and Stepanyants, 2009) 或是快速行进算法(Cohen et al., 1994; Cohen and Kimmel, 1997; Sethian, 1999; Mukherjee and Stepanyants, 2012) 所创造,优化去确保该追踪,包括它的分支和末端点,确保亮度在下层图像中(见方法部分的细节).
Below we examine the initial traces from two very different dataset types: axons of single olfactory projection neurons from Drosophila (OP dataset, n = 9 image stacks) (Jefferis et al., 2007) and axons of multiple layer 6 neurons imaged in layer 1 of mouse visual cortex (L6 dataset, n = 6 image stacks) (De Paola et al., 2006).
我们从两个差异大的数据集中审核初始追踪点:来自果蝇的单嗅觉轴突神经工程(OP dataset, n = 9 image stacks) (Jefferis et al., 2007)和小鼠视皮层一层的多层6神经轴突 (L6 dataset, n = 6 image stacks) (De Paola et al., 2006).
These datasets were featured at the DIADEM challenge (Brown et al., 2011) and serve as benchmarks for automated reconstruction algorithms.
对于自动重建算法,这数据集的特色是在DIADEM挑战中(Brown et al., 2011)作为标准.
Figures 1A,B show representative image stacks from the OP and L6 datasets.
图1A,B显示了代表图像栈来自OP和L6数据集.
The initial traces are superimposed on the maximum intensity projections of the image stacks, and are slightly shifted for better visibility.
初始追踪是成阶层状的,预测在图像栈中的最大亮度,为了更好表现,稍微改变.
As can be seen, these initial traces accurately represent the geometry of neurites contained in the image stacks.
这些初始追踪精确表达了在图像栈中的神经的几何形状.
However, a closer examination of the L6 trace topology reveals numerous erroneously merged (stolen) branches.
然而,更接近L6追踪拓扑的检查揭露了更多错误的合并的神经分支.
Such errors in the initial trace often occur when the neurites belonging to different trees appear to be in contact due to poor z-resolution or due to high density of labeled structures.
例如错误在初始追踪时常发生,当属于不同树的神经,彼此联系,由于匮乏的z解决方案或是由于标记结构的高密度.
Presence of these topological errors becomes evident after labeling distinct tree structures with different colors (Figure 1C).
在用不同颜色标记树清晰的结构后,这些拓扑学错误的出现变成明显.
The second part of our automated tracing algorithm uses a machine learning approach that actively learns the morphology of neurites in an attempt to resolve the errors present in the initial trace (see Figure 1D).
我们自动追踪算法的第二部分用了机器学习方法,主动学习神经的形态,尝试去解决初始化追踪出现的错误(见图1D).

Comparison of Automated Initial Traces and Manual User Traces

比较自动化初始追踪和手工用户追踪

Below we evaluate how well automated and manual traces capture the layout (geometry) of the neurites in the image stack, as well as how well they represent the morphology of branching tree structures (topology).
在图像栈中,评估自动化和手工追踪获取神经形态布局哪个更好,也就是他们谁更能代表分支树结构的形态(拓扑).
Similar comparisons have been carried out in other studies (Gillette et al., 2011a; Choromanska et al., 2012;Mayerich et al., 2012).
相似的比较已经得到在其他研究成果中.(Gillette et al., 2011a; Choromanska et al., 2012;Mayerich et al., 2012).
Each OP and L6 image stack was traced automatically using the Fast Marching algorithm as well as manually by three trained users.
每一个OP和L6图像栈是自动追踪用快速行进算法也让三个熟练用户手工追踪.
Figure 6A shows an example of the resulting four traces of a single OP stack. Inevitably, imperfect labeling and limited resolution of optical microscopy lead to uncertainties in tracing.
图6A显示出单个OP栈的4个追踪的例子结果.不可避免的,不完美的标记以及光学显微镜的缺陷导致不可信的追踪.
Trained users often resolve such uncertainties differently from one another, and hence no single trace can be viewed as a gold standard.
熟练用户常常分解例如不能确定的不同,因此没有单追踪作为黄金标准.
Thus, we had to first establish quantitative measures describing the baseline inter-user variability, and only then evaluate the performance of the automated tracing algorithm in comparison to this baseline.
因此,我们不得不首次建立定量测量去描述基线用户间变异性,和评估自动追踪算法的性能的比较标准.
To this end, each manual trace was chosen to be the gold standard and compared to the automated trace and the remaining two manual traces. This led to 6 inter-user and 3 automated-to-user trace comparisons for each stack.
最后,每一个手工追踪最终都被选择为黄金准则,去比较自动追踪算法,和剩余两个手工追踪.对于每个栈来说,有了6个用户间和3个机器与人工的追踪比较.


Figure 6. Assessing the quality of automated traces. (A) Three manual traces (green, blue, and yellow) and one automated trace (red) are superimposed on a maximum intensity projection of an OP neuron.
图6,通过自动化的才能,(A)三个手工用户(绿,蓝,黄)和一个自动追踪(红)是成阶层在op神经元的最大亮度中.
The traces are staggered upward for better visibility. The inset shows a zoomed view of the boxed region. Scale bar is 20 μm.
该追踪是错开排列的为了更好显示.该插图显示了盒范围的升视角.比例尺是20μm.
(B) Several geometrical and topological features are used to compare traces. Gold standard trace (yellow) and test trace (blue) are shown.
(B)比较追踪是通过几个几何学和拓扑学的特点.黄金准则(黄)和测试追踪(蓝)如图.
Both traces are composed of nodes connected by edges of length d. Nodes on these traces are referred to as corresponding nodes if they are located within distance h of each other (d << h).
追踪都是由长度d边缘连接的节点组成的.如果节点位于相距h范围内,追踪的节点被称为对应的节点.
Circles highlight false negative and false positive branch and terminal points. (C–E) Automated traces reliably capture the geometry of neurites.
高亮圆表示错误的分支与末端节点.(C-E)自动追踪确实获取了神经的形态.
Nine OP axons were reconstructed with NCTracer, first automatically and then manually by three trained users.
9个OP轴突由NCTracer重建,首先自动然后由三个受训用户手工重建.
The probability densities for distances between the corresponding trace nodes (C), terminal points (D), and branch points (E) were used as metrics for geometrical comparisons.
这个可能性密度距离在相对应追踪节点(C),末端节点(D),和分支节点(E)作为材料进行形态比较.
Red lines show the results of automated-to-user trace comparisons. Here, all user traces for every stack were used one by one as the gold standard, leading to 27 automated-to-gold standard trace comparisons.
红线表示机器与人工追踪的比较结果.所有用户追踪对于每一个栈都一比一作为黄金准则,有27机器与黄金准则追踪的比较。
The results were pooled. Blue lines show similar results based on 54 user-to-user trace comparisons.
这结果是贫乏的。蓝线表示基于54个用户与用户的追踪比较的相似的结果
(F–H) Automated traces accurately represent the topology of OP neurons. Three topological measures were compared: false positive/negative trace lengths (F), numbers of false positive/negative terminal (G) and branch (H) points.
(F-H)自动追踪精确代表了OP神经数据集的拓扑结构。比较三个拓扑结果的测量得出:错误分支与末端节点(F)的长度,错误分支与末端节点(G)与分支节点(H)的数量。
Red and blue bars show the fractions of automated-to-user (n = 27) and user-to-user (n = 54) comparisons for different error types.
红与蓝条显示了机器与用户(n=27)与用户对用户(n=54)的部分比较有不同的错误类型。
The fractions for false positive and false negative errors are indicated with the bars above and below the x-axes.
错误的分支与末端节点的分数用高于x轴的条与低于x轴的条来表示。

To ensure the uniformity of the reconstructed dataset, all traces were subdivided into segments of equal length (d = 0.25 voxels).
为了确保重建数据集的一致性,所有追踪再分成等长的片段(d=0.25voxels).
To compare a pair of traces (a test trace and a gold standard trace) we perform a bi-directional nearest neighbor search to find corresponding nodes,
为了比较两者的追踪(测试的追踪和黄金准则的追踪)我执行双流向最近邻接搜索去发现相应的节点,
i.e., nodes on the two traces separated by less than h = 10 voxels (see Figure 6B).
在两个追踪,节点被分开于少于十个体积像素(见图6B).
A node in the test trace which has (does not have) a corresponding node in the gold standard trace is referred to as a true (false) positive node.
在测试追踪中,在黄金准则追踪线路,有一个相对应的节点的一个节点是作为一个真的积极节点.
A node in the gold standard trace for which there is no corresponding node in the test trace is referred to as a false negative node.
在黄金准则追踪中的,没有相对应的一个节点在测试追踪中作为一个假的消极节点.
Short terminal branches (less than 12 voxels) and dim branches (average intensity less than 0.12) were excluded from the comparisons.
短末端分支(少于12体积像素)和暗淡分支(平均亮度少于0.12)是被排除于比较的.
Results of the geometrical comparisons between automated initial traces and manual traces for the OP image stacks are shown in Figures 6C–E.
在OP图像栈,自动化初始追踪与手工追踪的几何学比较结果显示在图6C-E上.
The plots show probability densities of distances between corresponding nodes, corresponding branch points, and corresponding terminal points for both inter-user (blue lines),
计划显示了距离上可能的密度在相对应的节点,相对应的分支点和相对应的末端节点对于两者的内部用户(蓝线).
as well as automated-to-user comparisons (red lines). The geometrical precision of the automated and manual traces is evidenced by the fact that 95% of distance values lie below 2.3 voxels in Figure 6C, 7.3 voxels in Figure 6D,
和机器与用户的对比(红线).这个自动化追踪和手工追踪的几何精密是由事实来评估的:95%的距离值都小于2.3体积像素在图6C,7.3体积像素在图6D,
and 6.6 voxels in Figure 6E. More importantly, the difference between mean distances for the inter-user and automated-to-user comparisons (0.19, 0.51, and 0.65 voxels respectively) is smaller than the resolution of the image,
和6.6体积像素在图6E.更重要的是,用户间和机器与用户之间的比较的真实距离的不同时远小于图像的分辨率的.
and thus should have little bearing on trace dependent measurements. Similar conclusions were drawn from the geometrical comparisons of automated and manual traces of the L6 dataset (Figures 7A–C).
因此,应该有小误差在追踪中使用的尺寸.相似的结论由L6数据集中机器与手工追踪形态学比较中可以得出(图7A-C).

FIGURE 7

Figure 7. Assessment of initial traces of multiple neuron axons. Six L6 image stacks were reconstructed manually and automatically with NCTracer (see Figure 6 legend for details).
图7,多神经轴突的初始化追踪的评估.6个L6图片栈是由NCTracer手工和自动化重建的.
(A–C) Automated traces reliably capture the geometry of neurites. The probability densities for distances between the corresponding trace nodes (A),
(A-C)自动化追踪可靠地捕获神经的形态.在相对应追踪节点(A)之间的距离有可能性密度,
terminal points (B), and branch points (C) were used as metrics for geometrical comparisons. Red lines (n = 18) and blue lines (n = 36) show the results of automated-to-user and user-to-user trace comparisons.
末端点(B)和分支点(C)是作为几何比较的指标.红线(n=18)和蓝线(n=36)显示了自动化与用户和用户与用户追踪比较的结果.
(D–F) While the automated traces capture the geometry of the neurites well, they contain a markedly large number of false positive branch points (F).
(D-F)当自动追踪捕获神经形态学时,它们包含了一个极大数量的假正分支点(F).
These topological errors result from erroneous mergers of distinct axons that pass in close proximity of one another.
这些拓扑错误是由于在彼此接近的且明显轴突中错误合并.

Topological errors that occur due to incorrectly merged branches are more difficult to detect and can be detrimental to circuit reconstruction projects.
发生拓扑错误是由于检测错误的分支合并是比较困难的,还可能对线路重建项目不利.
Three measures were selected to quantify the extent of such errors: false positive/negative trace lengths, numbers of false positive/negative terminal points, and numbers of false positive/negative branch points.
使用了三项措施去量化这些错误的程度:假正向/负向的追踪长度,假正向/负向的末端节点的数量,以及假正向/负向的分支节点的数量
The results of comparisons for the OP dataset (Figures 6F–H) show that similar numbers of topological errors were made by the algorithm and the users, and these numbers were generally small (less than one false positive/negative branch or terminal point per stack).
(图6F-H)对于OP数据集的比较结果显示拓扑错误的相似数量由于算法和用户造成的,这些数字通常小(小于每个栈中的一个假正向/负向分支或末端节点).
For the L6 image stacks, the mismatches in length for the automated and manual traces were similar (Figure 7D), indicating that the automated algorithm performed as well as trained users in tracing the majority (in terms of length) of labeled structures.
(图7D)对于L6图像栈,对于机器和手工追踪的长度错配是相似的,表明了自动算法的表现与熟练追踪标记结构(长度方面)的用户一致.
However, in contrast to manual traces, automated traces contained more false positive/negative terminal points (Figure 7E) and markedly larger number of false positive branch points (Figure 7F).
然而,与手工追踪相比,自动化追踪包含了更多假正向/负向末端节点(图7E)和更多的假正向分支节点(图7F).
The former errors result from branches that are broken due to imperfect labeling, while the latter arise from a specific artifact of the Fast Marching algorithm, i.e., merging nearby, but distinct branches.
前者错误是因为分支断裂由于不完美的标记,而后者是从快速行进算法的特定人工品,即,合并附近的分支但不同的分支结果.
In particular, lower z-resolution of an image stack makes such mergers more prevalent, leading to larger numbers of false positive branch points.
尤其,z轴的低分辨率的图像栈使得这种合并更加普遍,导致了更大数量的假正向分支节点.

Active Learning of Branching Morphology of Neurites

轴突分支形态的主动学习

To resolve the above mentioned topological errors, branches of the initial trace were disconnected from one another and merged into tree-like structures in an active learning approach described in the Methods section.
为了解决上述拓扑错误,初始追踪的分支是彼此断开的,在本文描述的主动学习方法中合并进类似树的结构.
Briefly, the positions of branch and terminal points were clustered based on distance, and branch merging was performed within every cluster independently (see Figure 4).
简单点,分支节点和末端节点的位置是基于距离进行聚群的,和独立地完成分支合并在每个集群中(见图4).
Perceptron and SVM classifiers were designed and trained online to accomplish the branch merging task generating the final traces of the OP and L6 image stacks.
感知器和SVM分类器被设计和在线训练去完成分支合并任务生成OP和L6图像栈的最终追踪.
To assess the performance of the classifiers, their generalization error rates were monitored as functions of the number of training examples.
为了评估分类器的性能,他们普遍错误比例作为训练实例数量的函数进行监测的.
Figures 8A,B compare the performance of the classifiers trained on randomly selected branch merging examples with that of classifiers trained in an active learning approach.
图8A,B 比较分类器的性能 训练在随机选择的分支合并例子的分类器 与 用主动学习方法训练的分类器
The plots show that the active learning approach provides a clear advantage in terms of the number of training examples required to reach a given error rate.
该图表明主动学习方法提供了清晰的优势,在达到给定的错误比率的训练例子的数量上.
For each dataset, error rates of less than 5% were achieved by both classifiers with less than 40 actively chosen training examples.
对于每一个数据集,小于40训练例子的分类器可以达到少于5%的错误率.
The rapid decline of the generalization error rate validates our choice of features used for the branch merging task (see Figure 3).
普通错误比率的迅速下滑证明了我们的选择–用于分支合并任务的特性(见图3).
As expected, trained SVM and perceptron classifiers established nearly identical decision boundaries, as judged by the distance between their normalized weight-vectors (0.29 for OP and 0.10 for L6).
如预期,受训的SVM分类器和感受器建立了几乎相同的决策边界,通过归一化权重矢量之间的距离来判断.
In contrast, between-dataset distances were larger (0.63 for perceptron and 0.63 for SVM), indicative of the fact that classifiers were able to capture dataset specific morphological information.
与此相反,数据集之间的距离[差距]是增大的(0.63的感受器和0.63的SVM),表明了分类器是可以捕获数据集特定形态信息的事实.

FIGURE 8

Figure 8. Online training can be used to reduce the numbers of topological errors present in initial traces.
图8 在线训练可以用于减少拓扑错误率在初始化追踪时的数量
(A) Generalization error-rate as function of the number of training clusters for the perceptron classifier.
(A) 普通错误比率作为训练感受分类器集群的功能指标.
Black lines (mean) surrounded by gray margins (standard errors) show the results of random training.
黑线(实线)由灰色边框(标准错误)显示随机训练的结果.
For each number of training clusters, training sets were generated at random 1000 times. Training was performed on each set of clusters, while testing was done on the remaining clusters.
为每一个集群的训练,训练集随机生成1000次,在每个组的集群进行培训,而剩余的集群用于测试.
Results for all 1000 experiments were averaged. Solid and dotted lines show the results for the OP and L6 datasets respectively.
平均1000次的实验结果.实线和虚线分别显示了OP和L6的数据集.
Red lines show the corresponding results for active training experiments. (B) Same for the SVM classifier.
红线显示了主动训练实验的结果.(B)与SVM分类器一样.
(C) The number of false positive branch points present in the initial trace of L6 dataset (Figure 7F) is greatly reduced by the branch merging algorithm.
(C)目前在L6数据集(图7F)初始化追踪时,通过分支合并算法大量减少了假正向分支节点.
(D) The sum of false positive and false negative branch point errors is an increasing function of length density of labeled neurites.
(D)假正向或假负向分支节点错误的总数是对标记的神经元轴突的长度密度的增函数.
Length density is calculated as the average length of neurites traced by the users divided by the image stack volume. Error-bars indicate the range of branch point errors.
长度密度的计算方法是 用户追踪的神经的平均长度 除以图像栈体积.
Automated tracing algorithm performs as well as trained users when the density of labeled neurites is low (<0.003 μm−2).
自动追踪算法的性能与与用户追踪在标记神经的亮度低时(<0.003μm−2).

Geometry and topology of the final automated traces produced by the branch merging algorithm were compared to the user traces in the manner described in Figures 6, 7.
最终自动追踪的几何和拓扑由用户追踪的方式(如图6,7所说)用分支合并算法生成.
No significant geometrical changes resulted from automated branch merging. This was expected, as trace modifications that accompany branch merging are confined to very local regions in the vicinity of branch or terminal points.
没有显著的形态改变是由于自动分支合并.如所期望的,作为追踪伴随着分支合并是被非常局限的区域所限制而修改,在分支或末端附近.
In addition, automated branch merging did not alter the topology of initial traces of OP neurites.
更多的,自动分支合并不会更改OP神经初始化追踪的拓扑.
The reason is that the initial traces of these morphologically simple structures did not contain significant topological errors in the first place (Figures 6F–H).
原因是在第一个地方,这些形态简单结构的初始追踪不会包含显著的拓扑错误(图6F-H).
As for the topology of L6 traces, no significant changes were observed in false positive/negative lengths (Figure 7D) and terminal point numbers (Figure 7E).
至于L6追踪的拓扑,在假正向/负向长度(图7D)和末端点数字(图7E),没有观察到显著的改变.

As was intended, automated branch merging greatly reduced the number of false positive branch points present in the initial traces (Figure 8C vs. Figure 7F).
正如预期的,自动分支合并大量减少了在初始化追踪时的假正向分支的数量(图8C与图7F).
Though the reduction in the number of false positive branch points was large (about two-fold), the branch merging algorithm failed to achieve the level of user performance (Figure 8C).
通过假正向分支点数量的减少(大概两倍),分支合并算法没有达到用户性能的级别(图8C)
To examine the reason behind this disparity we plotted the sum of false positive and false negative branch point errors for every L6 stack as function of length density of neurites contained in the stack (Figure 8D).
检查这种差距背后的原因是,我们绘制的假正向和假负向的分支点误差作为每L6栈作为包含在该栈的神经的密度长度的指标(图8D).
The length density is defined as the total length of traced neurites (in μm) divided by the stack volume (in μm3) and was calculated for each image stack by averaging over all user traces.
该长度比重被定义作为追踪神经的总长除以该栈体积(μm3单位)和被计算每个图像栈平均覆盖所有用户的追踪.
These comparisons show that in every stack the branch merging algorithm substantially reduced the total number of errors present in the initial trace.
这些比较显示了每一个栈,分支合并算法本质上都是减少初始化追踪的错误数量.
What is more, when the density of labeled neurites was small (less than 0.003 μm−2, e.g., Figure 1D), the resulting final automated traces were on par with user reconstructions.
此外,当标记神经的密度是小的(少于than 0.003 μm−2 图1D),最终自动化追踪的结果等同于用户重建.

Comparisons with Other Automated Tracing Tools

与其他自动追踪工具比较

The geometrical and topological measures used to evaluate the quality of automated traces were also used to compare the performance of NCTracer, Vaa3D (Xiao and Peng, 2013), and NeuronStudio (Rodriguez et al., 2009).
几何和拓扑测量用于评估自动追踪的质量,比较NCTracer,Vaa3D,NeuronStudio
To this end, automated traces of OP and L6 image stacks were obtained with Vaa3D and NeuronStudio. We visually inspected these traces and varied the software parameters to achieve good coverage and performance.
为此,OP和L6图像栈的自动追踪是由Vaa3D和NeuronStudio所获得的.我们视觉上检查这些追踪和验证软件参数是否达到最佳覆盖和性能.
Inter-user and automated-to-user comparisons were performed as previously described. To evaluate the geometry of automated traces we calculated the mean distances between corresponding nodes and corresponding terminal and branch points.
用户间和机器与用户间比较是如前所述.评估自动化追踪的几何,我们计算相对应的节点和相对应的末端节点和分支节点之间的真实距离.
To assess the topology of automated traces, we obtained the Miss-Extra-Score (MES) for trace length and for the numbers of terminal and branch points (Xie et al., 2011).
评估自动追踪的拓扑,我们获取了MES为了跟踪长度,和末端节点和分支节点的数量
Trace MES is defined as the ratio of the gold standard length reduced by the false negative length to the gold standard length increased by the false positive length.
追踪MES是定义作为黄金准则长度的范围,减少假负向长度,到黄金准则长度,增加假正向长度
Terminal and branch point MES are defined in a similar manner. The results of these comparisons are shown in Table 1.
末端节点和分支节点MES是定义在相似的方式.这些比较结果展示在表1.


Smaller distance and higher MES indicate greater affinity between test and gold standard traces.
当距离越小,MES越高,说明它与黄金准则追踪越接近.
Table 1 shows that all automated tracing tools were able to capture trace geometry and topology of single OP axons reasonably well.
表1说明了所有自动追踪都可以正常获取到单OP轴突的形态和拓扑结构.
The advantage of the branch merging strategy proposed in this study becomes evident from examining the values of topological measures for L6 stacks, which contain multiple axons.
在包含多神经的L6栈的拓扑测量审核过程,分之合并策略优势变得明显.
According to these measures, NCTracer significantly outperforms other software. And in general, all geometrical and topological measures of NCTracer are closest to the inter-user measures.
根据这些测量,NCTracer很明显比其他软件好.总的来说,NCTracer在所有形态和拓扑测量中最接近用户间的测量.
Table 1 also shows a trade-off between the quality of automated traces and tracing time.
表1也显示了在自动追踪的质量与追踪时间的权衡.
Vaa3D and NeuronStudio are 15–20 fold faster than NCTracer.
Vaa3D 和 NeuronStudio 是都快于NCTracer 15到20个折叠.
We do not view this as a major drawback because tracing of single stacks with the current version of NCTracer can be easily performed on modern day workstations,
不将速度作为主要确定因为用最新版本的NCTracer单栈追踪可以很容易在现代工作站中运行.
while high-throughput projects could still be carried out on computer clusters.
而高通量工程还可以在计算机集群中运行.

Discussion

讨论

Much of our understanding of brain structure and function is derived from quantitative analyses of neuron shapes.
许多我们对大脑结构和功能的理解是起源于我们队神经形态上的定量分析.
Researchers routinely utilize partial or complete single cell reconstructions, as well as reconstructions of multiple cells often spanning several stacks of images in order to address various questions.
为了解决各种各样的问题,研究者常规利用部分或完全的单细胞重建,也多细胞重建常常生成图形栈.
Single cell reconstructions are often used in cell classification and comparative neuroanatomy studies, theoretical studies of neuron shapes, and detailed computational models of intracellular activity.
单细胞重建常常用于细胞分类和神经解剖学比较,神经形态的理论研究和细胞活性的精密计算模型.
Single cell reconstructions are frequently pooled in silico to simulate structural connectivity of local neural circuits.
单细胞重建常常汇集在硅肺去模拟局部结构上的神经环的连通性.
Reconstructions of multiple labeled cells are used for the analyses of synaptic connectivity in local circuits, in vivo studies of circuit plasticity, and large-scale brain mapping projects.
多标记细胞重建是用于在局部系统的突触的连通性的分析中,在体内系统可塑性研究,和大规模脑映射项目.
There is no doubt that automating the tracing process will advance these studies, significantly increasing their throughput and eliminating the biases and variability associated with manual tracing.
毫无疑问,自动追踪处理这些研究的优势,显著增加了它们的生产力,消除了手动追踪的偏差的变异性.

It is important to understand that it is usually not sufficient to obtain the basic layout of all labeled neurites.
通常是不能充分获取到所有被标记神经的基础布局的,明白它是重要的.
In particular, projects aimed at the analyses of synaptic connectivity require accurate knowledge of branching morphology of individual cells (Figure 1).
尤其是项目旨在突触连接的分析,需要分裂细胞中分支形态学上扎实的知识.
In this study, we use machine learning to evaluate topologically different scenarios of constructing automated traces (Figure 4A) and then determine the correct branching pattern based on previously learned morphological features.
在本研究中,我们用机器学习去评估自动重建追踪的不同事态(如图4A),和确定正确的分支模式基于之前所学习的形态学的特点.
A machine learning approach to image processing typically requires a large labeled set of examples, and creating such a set can be very time-consuming.
偏向图像处理的一个机器学习典型需要一个巨大标记集合例子,和创造非常耗时.
Our active learning strategy circumvents this problem by taking advantage of the combinatorial nature of the numbers of branch merging scenarios (Figure 4B).
我们主动学习的策略避免了采取大量分支合并情况的自然组合的问题.
Another advantage of this strategy is subtractive normalization, Equation (6). Branch merging scenarios are only compared within clusters, normalizing for the variations in local intensity and density of labeled neurites.
这个策略其他的优势是减法归一化-等式6.分之合并的方案不仅在集群中比较,还归一化标记的神经的局部强度和密度的变化.

The results of this study show that the quality of automated traces is strongly dependent on the length density of labeled neurites.
这项研究的结果表明,自动化追踪的质量高度依赖于标记的神经突的长度密度。
When this density is lower than 0.003 μm−2 the automated tracing algorithm performs on par with trained users (Figure 8D);
当该密度低于0.003μm-2时,自动跟踪算法与经训练的用户的表现相同(图8D);
the reliability of automated traces diminishes rapidly with increase in density beyond this point. Hence, proofreading and error-correction may be required for some automated traces.
自动追踪的可靠性随着密度的增加而迅速减小,超过该点上。 因此,对于一些自动化追踪可能需要校对和纠错。
Proofreading must be done in a computer guided manner, which is particularly important for high-throughput reconstruction projects.
校对必须以计算机引导的方式来完成,这对于高通量重建项目尤为重要。
The confidence measure described in Equation (10) can be used to convey information about the certainty in the outcome of automated tracing.
等式(10)中描述的置信度度量可以用于传达关于自动跟踪的结果的准确度信息。
This measure can be calculated for every vertex in the trace and can be used to direct the user’s attention to the most uncertain parts of the trace.
该度量可以针对追踪中的每个顶点进行计算,并且可以将用户的注意力引导到追踪的最不确定的部分。
Only the lowest confidence mergers will need to be examined by the user, leading to a substantial reduction in proofreading time.
只有最低置信度合并将需要由用户检查,校对时间的显着减少。
Such low confidence regions can be highlighted automatically and the user would choose from an ordered set of best alternative scenarios (based on decreasing confidence).
低置信区域会被自动高亮,用户将从最佳替代方案的有序集合中选择(基于降低的置信度)。

With the automation of tracing and proofreading it should be possible to map intact, sparsely labeled circuits on the scale of a whole brain, e.g.
利用自动化跟踪和校对,应该能够完成映射,例如在整个的大脑的稀疏标记的电路。
in the fly or the mouse. Consider a hypothetical experiment of mapping structural connectivity in the mouse brain. The adult mouse brain is roughly 500 mm3 in volume (Ma et al., 2005).
在苍蝇或老鼠,考虑到在小鼠大脑中映射结构连通性的假设实验。 成年小鼠脑体积约为500mm 3(Ma et al。,2005)。
Subsets of mouse neurons can be labeled in vivo to reveal the layout of their axonal and dendritic arbors.
小鼠神经元的子集可以在体内标记,以显示其轴突和树突状突触的布局。
The brain can then be divided into 0.5 × 0.5 × 0.1 mm3 optical sections, and imaged in 3D with two-photon or confocal microscopy at 0.5 × 0.5 × 1.0 μm3 spatial resolution.
大脑可以分为0.5×0.5×0.1 mm3的光学切片,并在3D中用双光子或共聚焦显微镜以0.5×0.5×1.0μm3的空间分辨率成像。
This procedure would result in 20,000 stacks of images, each composed of 1000 × 1000 × 100 voxels, totaling 2 TB of raw imaging data.
该过程将产生20,000个图像堆叠,每个图像由1000×1000×100个体素组成,总共2TB的原始成像数据。
A dataset of this size would have to be reconstructed on a high-performance computer cluster, and the results could be viewed and proofread on modern-day workstations.
这种大小的数据集必须在高性能计算机集群上重建,可在现代工作站上查看和校对结果。
Depending on the density of labeling, reconstruction of a single stack may take on the order of 1 core-hour, or 20,000 core-hours for the entire brain.
根据标记的密度,单个堆叠的重建可能需要大约1个核心/小时,整个大脑需要20,000个核心/小时。
Thus, whole mouse brain mapping is no longer an unfeasible goal.
因此,整个小鼠的脑映射不再是一个不可能的目标。

Brain mapping at a much lower spatial resolution has long been performed with diffusion tensor imaging (DTI).
极低空间分辨率的大脑映射解决方案已经用扩散张量成像(DTI)完成。
This non-invasive technique measures the diffusion tensor associated with anisotropic movement of water molecules along white matter fiber bundles.
这种非侵入式扩散张量测量技术可以与沿着白质纤维束的各向异性运动的水分子相联系。
Numerous algorithms have been developed to construct tracts from such information, establishing coarse-grain connectivity between brain regions (for review see Le Bihan, 2003; Hasan et al., 2011; Soares et al., 2013).
许多算法已经发展出从相关信息中构造域,如建立脑区之间的粗粒连接 (for review see Le Bihan, 2003; Hasan et al., 2011; Soares et al., 2013).
Such algorithms typically use streamline tractography to connect voxels of similar tensor-field orientations into a trace.
这种算法通常使用流线纤维成束连接相似张量场方向的体积像素进轨迹中.
The reconstruction problem encountered in DTI is somewhat related to the problem of neurite tracing described in this study; however, there is an important difference.
在扩散张量成像这个重建问题与在本研究描述的神经突触追踪问题有关,然而这有重要的不同.
Due to the relatively low resolution of DTI (typically 1 mm3 per voxel) there is no anatomical basis for trying to detect branching patterns in DTI tracts.
由于相对低分辨率的DTI(典型是每体积像素1立方毫米),在DTI追踪中尝试去发现分支模式,这是没有解剖学基础的。
Hence, unlike neurite tracing, where topological errors may have a catastrophic effect on the overall connectivity map, the results of DTI tracing are expected to be less sensitive to such errors.
因此,不像神经突触的追踪,拓扑错误会灾难性的影响全部连接映射,DTI追踪结果显示对这类错误不敏感.
It remains to be seen to what extent brain mapping at single neuron resolution correlates with the connectivity maps established with DTI.
尚待分晓的是用DTI单神经分辨率测量相连接的大脑映射能建立到什么程度。

Conflict of Interest Statement

利益冲突声明
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
作者宣布,该研究是在没有任何商业或财务关系,这可能是一个潜在的利益冲突。

Acknowledgments

致谢
We thank Vivek Mehta and Paarth Chothani for their significant contributions to the design and development of the NCTracer software, and Soham Mody for testing and debugging the software.
谢谢Vivek Mehta 和 Paarth Chothani,他们重大贡献设计和开发的软件 NCTracer , 和 Soham Mody 为NCTracer软件测试和debugging.
Active training, automated tracing, and manual tracing of neurites in this study were performed with NCTracer.
在本研究中用NCTracer进行主动训练,自动跟踪和神经突触的手动跟踪。
Information on NCTracer can be found at www.neurogeometry.net.
NCTracer信息可以在www.neurogeometry.net发现.
This work was supported by the NIH grant NS063494.
该工作由NIH拨款NS063494支持.